Bugzilla – Bug 5302
start: should not delete manifest criteria
Last modified: 2008-12-16 09:34:20 UTC
You need to log in before you can comment on or make changes to this bug.
When a manifest has been added and the service has been stopped. Performing a list on the service is not showing the added manifest. Going to the services web page is not showing the manifest either. But the manifest is still contained in the AI_data directory. -bash-3.2# installadm add -m /var/tmp/AI/manifests/criteria_i86pc.xml -n _install_service_46505 -bash-3.2# installadm list -n _install_service_46505 Manifest -------- ai_manifest1.xml -bash-3.2# installadm stop _install_service_46505 Stopping the service _install_service_46505._OSInstall._tcp.local -bash-3.2# installadm start _install_service_46505 -bash-3.2# installadm start _install_service_46505 -bash-3.2# installadm start _install_service_46505 Registering the service _install_service_46505._OSInstall._tcp.local -bash-3.2# installadm list -n _install_service_46505 Manifest -------- -bash-3.2# ls -al /var/ai/46505/AI_data/ total 17 drwxr-xr-x 2 root sys 4 Nov 20 11:01 . drwxr-xr-x 4 root root 7 Nov 21 14:20 .. -rw-r--r-- 1 root root 1773 Nov 21 14:18 ai_manifest1.xml -rwxr-xr-x 1 root sys 2265 Nov 20 11:01 default.xml -bash-3.2# wget golmaal:46505/manifests/ai_manifest1.xml --14:21:27-- http://golmaal:46505/manifests/ai_manifest1.xml => `ai_manifest1.xml.1' Resolving golmaal... 172.20.64.98 Connecting to golmaal|172.20.64.98|:46505... connected. HTTP request sent, awaiting response... 200 OK Length: 1,773 (1.7K) [application/x-download] 100%[=======================================================================================================>] 1,773 --.--K/s 14:21:27 (128.30 MB/s) - `ai_manifest1.xml.1' saved [1773/1773] -bash-3.2#
This is due to the AI.db file getting replaced(?) with a blank template (i.e. the manifests table is there, but no entries). All manifest criteria is lost - not just the added one!
Created an attachment (id=1000) [details] d-trace AI.db syscall output A listing of all creat, write, unlink, open, stat syscalls affecting *AI.db*
Created an attachment (id=1001) [details] All commands exec'd out of /usr/lib/installadm
Hrm, I can't understand what's causing this, but I can reproduce it reliably. It happens somewhere after running installadm start on the service. I've written two d-trace scripts, one to show me all activity on /var/ai/46501/AI.db (unlink*, creat*, write*, open*, stat*) and the other to show all exec's out of /usr/lib/installadm, but neither show anything unexpected (output of both attached for running installadm command block). All exec's of files under /usr/lib/installadm show the usual suspects: publish-manifest, list-manifest, and webserver. However, again running the commands out of usr/lib/installadm in the same order as the below command block does not seem to result in AI.db damage (including kill'ing the webserver). Example of running commands using installadm(1) resulting in DB damage: pfexec installadm list -n test1; pfexec installadm add -m /tmp/criteria_manifest_inc_smf.xml -n test1; pfexec installadm stop test1; /usr/lib/installadm/list-manifests /var/ai/46501; sleep 60; pfexec installadm start test1; installadm list -n test1 Example of running commands manually (w/o DB damage): [363] clayb@transsiberian: pfexec /usr/bin/python2.4 /usr/lib/installadm/publish-manifest -c /tmp/criteria_manifest_inc_smf.xml /var/ai/46501/ Error: Not copying manifest, source and current versions differ -- criteria in place. [364] clayb@transsiberian: /usr/bin/python2.4 /usr/lib/installadm/list-manifests /var/ai/46501 Manifest -------- manifestStopTest.xml [365] clayb@transsiberian: pkill webserver pkill: Failed to signal pid 23451: Not owner [366] clayb@transsiberian: pfexec !! pfexec pkill webserver [367] clayb@transsiberian: /usr/bin/python2.4 /usr/lib/installadm/webserver -p 46501 /var/ai/46501 [22/Nov/2008:03:53:53] ENGINE Listening for SIGHUP. [22/Nov/2008:03:53:53] ENGINE Listening for SIGTERM. [22/Nov/2008:03:53:53] ENGINE Listening for SIGUSR1. [22/Nov/2008:03:53:53] ENGINE Bus STARTING [22/Nov/2008:03:53:53] ENGINE Started monitor thread '_TimeoutMonitor'. [22/Nov/2008:03:53:53] ENGINE Started monitor thread 'Autoreloader'. [22/Nov/2008:03:53:53] ENGINE Serving on 0.0.0.0:46501 [22/Nov/2008:03:53:53] ENGINE Bus STARTED ^Z Suspended [368] clayb@transsiberian: bg [3] /usr/bin/python2.4 /usr/lib/installadm/webserver -p 46501 /var/ai/46501 & [369] clayb@transsiberian: /usr/bin/python2.4 /usr/lib/installadm/list-manifests /var/ai/46501 Manifest -------- manifestStopTest.xml
From trying a d-trace script using the IO provider looking for anything writing to /var/ai/46501 I see cpio(1) called: DEVICE FILE RW EXEC sd4 /var/ai/46501/zka4yU W cpio -pdmu /var/ai/46501 sd4 /var/ai/46501/zka4yU W cpio -pdmu /var/ai/46501 sd4 /var/ai/46501/xka4yU W cpio -pdmu /var/ai/46501 sd4 /var/ai/46501/AI_data/yka4yU W cpio -pdmu /var/ai/46501 sd4 /var/ai/46501/webserver.log W fsflush I also see in /usr/lib/installadm/setup-[image,service] cpio(1) is used, and it looks like cpio is being called out of setup_docroot() in /usr/lib/installadm/setup-service, overwriting /var/ai/<port>/AI.db. We should not be going down the setup_docroot() path every time start_ai_webserver() is called.
As all services must be (re)started on reboot this requires the admin to installadm add all criteria manifests every reboot after service start.
A fix is proposed (in webrev) at: http://cr.opensolaris.org/~clayb/5302/
Fixed in: changeset: 385:6b53eda765fc tag: tip user: Clay Baenziger <ClayB@OpenSolaris.ORG> date: Mon Dec 15 20:40:18 2008 -0700 description: 5302 start: should not delete manifest criteria