Bug 5302 - start: should not delete manifest criteria
: start: should not delete manifest criteria
Status: RESOLVED FIXINSOURCE
Product: installer
installadm
: unspecified
: i86pc/i386 OpenSolaris
: P1 critical (vote)
: ---
Assigned To: Clay Baenziger
:
:
:
:
:
:
  Show dependency treegraph
 
Reported: 2008-11-21 14:35 UTC by Rich Reinhard
Modified: 2008-12-16 09:34 UTC (History)
3 users (show)

See Also:


Attachments
d-trace AI.db syscall output (1.81 KB, text/plain)
2008-11-22 02:51 UTC, Clay Baenziger
no flags Details
All commands exec'd out of /usr/lib/installadm (1.09 KB, text/plain)
2008-11-22 02:56 UTC, Clay Baenziger
no flags Details


Note

You need to log in before you can comment on or make changes to this bug.


Description Rich Reinhard 2008-11-21 14:35:49 UTC
When a manifest has been added and the service has been stopped.  Performing a
list on the service is not showing the added manifest.  Going to the services
web page is not showing the manifest either.  But the manifest is still
contained in the AI_data directory.


-bash-3.2# installadm add -m /var/tmp/AI/manifests/criteria_i86pc.xml -n
_install_service_46505

-bash-3.2# installadm list -n _install_service_46505
Manifest
--------
ai_manifest1.xml

-bash-3.2# installadm stop _install_service_46505
Stopping the service _install_service_46505._OSInstall._tcp.local

-bash-3.2# installadm start _install_service_46505

-bash-3.2# installadm start _install_service_46505

-bash-3.2# installadm start _install_service_46505
Registering the service _install_service_46505._OSInstall._tcp.local
-bash-3.2# installadm list -n _install_service_46505
Manifest
--------

-bash-3.2# ls -al /var/ai/46505/AI_data/
total 17
drwxr-xr-x   2 root     sys            4 Nov 20 11:01 .
drwxr-xr-x   4 root     root           7 Nov 21 14:20 ..
-rw-r--r--   1 root     root        1773 Nov 21 14:18 ai_manifest1.xml
-rwxr-xr-x   1 root     sys         2265 Nov 20 11:01 default.xml

-bash-3.2# wget golmaal:46505/manifests/ai_manifest1.xml
--14:21:27--  http://golmaal:46505/manifests/ai_manifest1.xml
           => `ai_manifest1.xml.1'
Resolving golmaal... 172.20.64.98
Connecting to golmaal|172.20.64.98|:46505... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1,773 (1.7K) [application/x-download]

100%[=======================================================================================================>]
1,773         --.--K/s             

14:21:27 (128.30 MB/s) - `ai_manifest1.xml.1' saved [1773/1773]

-bash-3.2#
Comment 1 Clay Baenziger 2008-11-22 01:20:13 UTC
This is due to the AI.db file getting replaced(?) with a blank template (i.e.
the manifests table is there, but no entries). All manifest criteria is lost -
not just the added one!
Comment 2 Clay Baenziger 2008-11-22 02:51:33 UTC
Created an attachment (id=1000) [details]
d-trace AI.db syscall output

A listing of all creat, write, unlink, open, stat syscalls affecting *AI.db*
Comment 3 Clay Baenziger 2008-11-22 02:56:25 UTC
Created an attachment (id=1001) [details]
All commands exec'd out of /usr/lib/installadm
Comment 4 Clay Baenziger 2008-11-22 03:02:21 UTC
Hrm, I can't understand what's causing this, but I can reproduce it reliably.
It happens somewhere after running installadm start on the service.

I've written two d-trace scripts, one to show me all activity on
/var/ai/46501/AI.db (unlink*, creat*, write*, open*, stat*) and the other to
show all exec's out of /usr/lib/installadm, but neither show anything
unexpected (output of both attached for running installadm command block). All
exec's of files under /usr/lib/installadm show the usual suspects:
publish-manifest, list-manifest, and webserver. However, again running the
commands out of usr/lib/installadm in the same order as the below command block
does not seem to result in AI.db damage (including kill'ing the webserver).

Example of running commands using installadm(1) resulting in DB damage:
pfexec installadm list -n test1; pfexec installadm add -m
/tmp/criteria_manifest_inc_smf.xml -n test1; pfexec installadm stop test1;
/usr/lib/installadm/list-manifests /var/ai/46501; sleep 60; pfexec installadm
start test1; installadm list -n test1

Example of running commands manually (w/o DB damage):
[363] clayb@transsiberian: pfexec /usr/bin/python2.4
/usr/lib/installadm/publish-manifest -c /tmp/criteria_manifest_inc_smf.xml
/var/ai/46501/
Error:  Not copying manifest, source and current versions differ -- criteria in
place.
[364] clayb@transsiberian: /usr/bin/python2.4
/usr/lib/installadm/list-manifests /var/ai/46501
Manifest
--------
manifestStopTest.xml
[365] clayb@transsiberian: pkill webserver
pkill: Failed to signal pid 23451: Not owner
[366] clayb@transsiberian: pfexec !!
pfexec pkill webserver
[367] clayb@transsiberian: /usr/bin/python2.4 /usr/lib/installadm/webserver -p
46501 /var/ai/46501
[22/Nov/2008:03:53:53] ENGINE Listening for SIGHUP.
[22/Nov/2008:03:53:53] ENGINE Listening for SIGTERM.
[22/Nov/2008:03:53:53] ENGINE Listening for SIGUSR1.
[22/Nov/2008:03:53:53] ENGINE Bus STARTING
[22/Nov/2008:03:53:53] ENGINE Started monitor thread '_TimeoutMonitor'.
[22/Nov/2008:03:53:53] ENGINE Started monitor thread 'Autoreloader'.
[22/Nov/2008:03:53:53] ENGINE Serving on 0.0.0.0:46501
[22/Nov/2008:03:53:53] ENGINE Bus STARTED
^Z
Suspended
[368] clayb@transsiberian: bg
[3]    /usr/bin/python2.4 /usr/lib/installadm/webserver -p 46501 /var/ai/46501
&
[369] clayb@transsiberian: /usr/bin/python2.4
/usr/lib/installadm/list-manifests /var/ai/46501
Manifest
--------
manifestStopTest.xml
Comment 5 Clay Baenziger 2008-11-22 03:23:35 UTC
From trying a d-trace script using the IO provider looking for anything writing
to /var/ai/46501 I see cpio(1) called:
    DEVICE                                                       FILE RW EXEC
       sd4                                       /var/ai/46501/zka4yU  W cpio
-pdmu /var/ai/46501
       sd4                                       /var/ai/46501/zka4yU  W cpio
-pdmu /var/ai/46501
       sd4                                       /var/ai/46501/xka4yU  W cpio
-pdmu /var/ai/46501
       sd4                               /var/ai/46501/AI_data/yka4yU  W cpio
-pdmu /var/ai/46501
       sd4                                /var/ai/46501/webserver.log  W
fsflush

I also see in /usr/lib/installadm/setup-[image,service] cpio(1) is used, and it
looks like cpio is being called out of setup_docroot() in
/usr/lib/installadm/setup-service, overwriting /var/ai/<port>/AI.db. We should
not be going down the setup_docroot() path every time start_ai_webserver() is
called.
Comment 6 Clay Baenziger 2008-11-22 03:43:03 UTC
As all services must be (re)started on reboot this requires the admin to
installadm add all criteria manifests every reboot after service start.
Comment 7 Clay Baenziger 2008-11-22 15:06:47 UTC
A fix is proposed (in webrev) at: http://cr.opensolaris.org/~clayb/5302/
Comment 8 Clay Baenziger 2008-12-16 09:34:20 UTC
Fixed in:

changeset:   385:6b53eda765fc
tag:         tip
user:        Clay Baenziger <ClayB@OpenSolaris.ORG>
date:        Mon Dec 15 20:40:18 2008 -0700
description:
        5302 start: should not delete manifest criteria