Bug 8313 - update failure "Unable to clone the current boot environment" with zones
: update failure "Unable to clone the current boot environment" with zones
Status: CLOSED WONTFIX
Product: installer
beadm
: unspecified
: i86pc/amd64 OpenSolaris
: P2 major (vote)
: ---
Assigned To: installer watcher
: installer watcher
:
: timeout=6/25/2009
:
:
:
  Show dependency treegraph
 
Reported: 2009-04-21 09:58 UTC by Philip Torchinsky
Modified: 2009-08-20 18:26 UTC (History)
10 users (show)

See Also:


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description Philip Torchinsky 2009-04-21 09:58:29 UTC
I was unable to update the system because of a bug with image-update. It
required me to uninstall zones before update, and it's very bad.

# uname -a
SunOS marseille 5.11 snv_110 i86pc i386 i86pc

# zoneadm list -vc
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   - linux            installed  /rpool/zones/linux             lx       shared
   - master           configured /rpool/zones/master            ipkg     excl  
   - host1            configured /rpool/zones/host1             ipkg     excl  
   - host2            installed  /rpool/zones/host2             ipkg     excl  
   - vrouter          configured /rpool/zones/vrouter           ipkg     excl 

pkg image-update

pkg: 5/6 catalogs successfully updated:
    1: Transfer from 'sunfreeware' timed out: timed out.
2: Transfer from 'sunfreeware' timed out: timed out.
3: Transfer from 'sunfreeware' timed out: timed out.
4: Transfer from 'sunfreeware' timed out: timed out.

DOWNLOAD                                    PKGS       FILES     XFER (MB)
Completed                                631/631 10445/10445 487.58/487.58 

pkg: Unable to clone the current boot environment.


# pkg authority
PUBLISHER                             TYPE     STATUS   URI
opensolaris-dev          (preferred)  origin   online  
http://pkg.opensolaris.org/dev/
blastwave                             origin   online  
http://blastwave.network.com:10000/
extra                                 origin   online  
https://pkg.sun.com/opensolaris/extra/
opensolaris.org                       origin   online  
http://pkg.opensolaris.org/release/
sunfreeware                           origin   online  
http://pkg.sunfreeware.com:9000/
webstack                              origin   online  
http://pkg.opensolaris.org/webstack/

I tried and removed zones (with zoneadm detach or zoneadm uninstall), so the
picture become different:

# zoneadm list -vc
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   - linux            installed  /rpool/zones/linux             lx       shared
   - master           configured /rpool/zones/master            ipkg     excl  
   - host1            configured /rpool/zones/host1             ipkg     excl  
   - host2            configured /rpool/zones/host2             ipkg     excl  
   - vrouter          configured /rpool/zones/vrouter           ipkg     excl  



After uninstalling ipkg-zones pkg image-update runs smoothly:


# pkg image-update
DOWNLOAD                                    PKGS       FILES     XFER (MB)
Completed                                631/631 10445/10445 487.58/487.58 

PHASE                                        ACTIONS
Removal Phase                              3620/3620 
Install Phase                              4286/4286 
Update Phase                             17384/17384 
PHASE                                          ITEMS
Reading Existing Index                           7/7 
Indexing Packages                            631/631 
Optimizing Index...
PHASE                                          ITEMS
Indexing Packages                            677/677 

A clone of opensolaris-3 exists and has been updated and activated.
On the next boot the Boot Environment opensolaris-4 will be mounted on '/'.
Reboot when ready to switch to this updated BE.
Comment 1 Jerry Jelinek 2009-04-22 14:01:35 UTC
What was the zfs layout when this failure occured?  That
is, what does 'zfs list' show?
Comment 2 Jim Laurent 2009-05-31 17:04:13 UTC
I received the same error from the Package Manager GUI when trying to update
from 2008.11 to 2009.06.  Here is my ZFS list output and zones configuration
(after performing the uninstall.  Uninstalling the zones allowed the update to
complete


bash-3.2$ zfs list
NAME                         USED  AVAIL  REFER  MOUNTPOINT
rpool                       15.3G   373M  49.5K  /rpool
rpool/ROOT                  14.0G   373M    18K  legacy
rpool/ROOT/opensolaris      30.4M   373M  6.03G  /
rpool/ROOT/opensolaris-11   11.4G   373M  9.07G  /
rpool/ROOT/opensolaris-12   2.51G   373M  9.52G  /tmp/tmpZPzSox
rpool/dump                   511M   373M   511M  -
rpool/export                 284M   373M    19K  /export
rpool/export/home            284M   373M    21K  /export/home
rpool/export/home/jlaurent   284M   373M   284M  /export/home/jlaurent
rpool/export/home/zones       71K   373M    18K  /export/home/zones
rpool/swap                   524M   373M   524M  -
testpool                     224K  63.3M    18K  /testpool

bash-3.2$ zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   - jim              configured /export/home/zones/jim         ipkg     shared
   - jim2             configured /export/home/zones/jim2        ipkg     shared
   - jim3             configured /export/home/zones/jim3        ipkg     shared
Comment 3 Jim Laurent 2009-05-31 17:22:38 UTC
As an additional comment, it would be nice if the error message was more
complete in describing WHY it was unable to clone the boot environment.  The
error message as written doesn't provide the user with any hints as to what
might be wrong in the system preventing the clone process.

Is there not enough space?
Are file systems unmounted?
Are there to many BE already?
Did the user type a wrong parameter?
Are there problem with the GRUB settings?
Is there a permissions problem?
Comment 4 Shawn Walker 2009-05-31 19:41:19 UTC
(In reply to comment #3)
> As an additional comment, it would be nice if the error message was more
> complete in describing WHY it was unable to clone the boot environment.  The
> error message as written doesn't provide the user with any hints as to what
> might be wrong in the system preventing the clone process.
> 
> Is there not enough space?
> Are file systems unmounted?
> Are there to many BE already?
> Did the user type a wrong parameter?
> Are there problem with the GRUB settings?
> Is there a permissions problem?

pkg(5) doesn't provide the error; libbe does.  You might want to file a bug
against libbe for that.

However, setting BE_PRINT_ERR=true before you run the command might give you
more information.
Comment 5 Jim Laurent 2009-06-01 03:32:40 UTC
This is a bug against the GUI Update Manager.  Therefore expecting the user to
know what libbe or BE_PRINT_ERR=true are would be unreasonable.
Comment 6 Niklas Edmundsson 2009-06-02 12:52:01 UTC
So, what would the correct workaround be if I have zones running in production
that I want to keep after the upgrade? "Uninstalling the zones" doesn't sound
like a very pleasing solution in this regard...

I'm thinking that something like this should work if detaching the zones are
enough, anyone care to confirm?

zoneadm detach / pkg install SUNWipkg / pkg image-update / zoneadm attach -u ??
Comment 7 Shawn Walker 2009-06-02 13:18:10 UTC
(In reply to comment #5)
> This is a bug against the GUI Update Manager.  Therefore expecting the user to
> know what libbe or BE_PRINT_ERR=true are would be unreasonable.

I don't expect the user tknow that.

However, libbe has to be fixed to provide more verbose messaging.  pkg(5) can't
provide you with information it doesn't have.
Comment 8 edward.pilatowicz 2009-06-04 19:58:23 UTC
ok.  i'm seeing this same issue.  i did an image-update with BE_PRINT_ERR=true
and here's what i saw:

---8<---
root@mcescher$ BE_PRINT_ERR=true pkg image-update
DOWNLOAD                                    PKGS       FILES     XFER
...

be_get_uuid: failed to get uuid property from BE root dataset user properties.
be_copy_zones: failed to snapshot zone BE (export/zones/jds/ROOT/zbe@zbe-1):
dataset already exists
be_copy: failed to process zones
be_copy: destroying partially created boot environment
pkg: Unable to clone the current boot environment.

root@mcescher$ zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              native   shared
   - z1               installed  /export/zones/z1               ipkg     shared
   - ersa             installed  /export/zones/ersa             ipkg     shared
   - jds              installed  /export/zones/jds              ipkg     excl

root@mcescher$ zfs list -t all -r export/zones/z1
NAME                                      USED  AVAIL  REFER  MOUNTPOINT
export/zones/z1                           156M  89.2G    22K  /export/zones/z1
export/zones/z1/ROOT                      156M  89.2G    19K  legacy
export/zones/z1/ROOT/zbe                  151M  89.2G   146M  legacy
export/zones/z1/ROOT/zbe@zbe-1           4.57M      -   146M  -
export/zones/z1/ROOT/zbe-1               4.68M  89.2G   146M  legacy
---8<---


i can reproduce this problem by trying to do a "beadm clone", and in that case
the error message is even worse.  beadm tells me the name i've chosen is in
use, even though it isn't.
---8<---
root@mcescher$ BE_PRINT_ERR=true beadm create opensolaris-200906
be_get_uuid: failed to get uuid property from BE root dataset user properties.
be_copy_zones: failed to snapshot zone BE (export/zones/jds/ROOT/zbe@zbe-1):
dat
aset already exists
be_copy: failed to process zones
be_copy: destroying partially created boot environment
Unable to create opensolaris-200906.
BE opensolaris-200906 already exists. Please choose a different BE name.
---8<---


unfortunatly, if i detach zone z1, then i just fail on the next zone:
---8<---
root@mcescher$ zoneadm -z z1 detach
root@mcescher$ BE_PRINT_ERR=true beadm create opensolaris-200906
be_get_uuid: failed to get uuid property from BE root dataset user properties.
be_copy_zones: failed to snapshot zone BE (export/zones/jds/ROOT/zbe@zbe-1):
dataset already exists
be_copy: failed to process zones
be_copy: destroying partially created boot environment
Unable to create opensolaris-200906.
BE opensolaris-200906 already exists. Please choose a different BE name.
---8<---


if i detach all my zones, then beadm clone succeeds.

it seems that this bug should focus on the actual beadm clone failure
rather than beadm error reporting.  hence, i've gone ahead and filed a
seperate bug (bug 9335) on the poor error reporting by beadm.
Comment 9 Danek Duvall 2009-06-04 22:32:51 UTC
FWIW, the "failed to get uuid property" error is bug 7877, and turns out not to
actually be an error.
Comment 10 Pavel Zeldin 2009-06-08 11:46:03 UTC
I get slightly different error messages:
BE_PRINT_ERR=true pkg image-update
DOWNLOAD                                    PKGS       FILES     XFER (MB)
Completed                                736/736 49733/49733 986.82/986.82 

be_mount_root: failed to mount dataset rpool/ROOT/opensolaris-7 at
/tmp/.be..2ayUb: directory is not empty
be_mount: failed to mount BE root file system
be_copy: failed to mount newly created BE
be_unmount: (opensolaris-7) not mounted
be_copy: destroying partially created boot environment
be_get_uuid: failed to get uuid property from BE root dataset user properties.
pkg: Unable to clone the current boot environment.
root@cyber:~# ls -al /tmp/.be..2ayUb
total 8
drwx------ 2 root root 117 2009-06-08 14:24 .
drwxrwxrwt 9 root sys  868 2009-06-08 14:24 ..
root@cyber:~# zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
root@cyber:~#  uname -a
SunOS cyber 5.11 snv_101a i86pc i386 i86pc Solaris
Comment 11 Ethan Quach 2009-06-11 12:59:43 UTC
pzeldin, for your case, can you attach the output of 'zfs list' and 'zfs mount'
or 'beadm list -a'

ed, I think you're running into bug 5711 here again.  The auto-naming
algorithm for zone BEs has not been fixed.  I've up'ed that bug to a P2
so that it may get addressed sooner than later.
Comment 12 Pavel Zeldin 2009-06-11 13:41:58 UTC
Here are the requested logs. (I removed opensolaris-6 BE since the previous
run).
root@cyber:~# export BE_PRINT_ERR=true
root@cyber:~# zfs list
NAME                           USED  AVAIL  REFER  MOUNTPOINT
rpool                          207G  84.7G  62.5K  /rpool
rpool/ROOT                    18.2G  84.7G    19K  /rpool/ROOT
rpool/ROOT/opensolaris-3      76.3M  84.7G  3.23G  legacy
rpool/ROOT/opensolaris-3/opt      0  84.7G  3.60M  /opt
rpool/ROOT/opensolaris-4      37.0M  84.7G  5.20G  legacy
rpool/ROOT/opensolaris-4/opt      0  84.7G  3.60M  /opt
rpool/ROOT/opensolaris-5      18.1G  84.7G  12.5G  legacy
rpool/ROOT/opensolaris-5/opt  1.65G  84.7G  1.65G  /opt
rpool/export                   189G  84.7G  3.31G  /export
rpool/export/epiphan            23K  84.7G    23K  /export/epiphan
rpool/export/home             11.1G  84.7G  11.1G  /export/home
rpool/export/share             174G  84.7G   174G  /export/share
root@cyber:~# zfs mount
rpool/ROOT/opensolaris-5        /
rpool/ROOT/opensolaris-5/opt    /opt
rpool/export                    /export
rpool/export/epiphan            /export/epiphan
rpool/export/home               /export/home
rpool/export/share              /export/share
rpool                           /rpool
root@cyber:~# beadm list -a
BE/Dataset/Snapshot                                          Active Mountpoint
Space   Policy Created          
-------------------                                          ------ ----------
-----   ------ -------          
opensolaris-3
   rpool/ROOT/opensolaris-3                                  -      -         
76.31M  static 2008-09-30 14:29 
   rpool/ROOT/opensolaris-3/opt                              -      -         
0       static 2008-09-30 14:29 
opensolaris-4
   rpool/ROOT/opensolaris-4                                  -      -         
36.97M  static 2008-09-30 15:46 
   rpool/ROOT/opensolaris-4/opt                              -      -         
0       static 2008-09-30 15:46 
opensolaris-5
   rpool/ROOT/opensolaris-5                                  NR     /         
18.06G  static 2008-11-09 21:05 
   rpool/ROOT/opensolaris-5/opt                              -      /opt      
1.65G   static 2008-11-09 21:05 
   rpool/ROOT/opensolaris-5/opt@install                      -      -         
222.0K  static 2008-06-10 11:24 
   rpool/ROOT/opensolaris-5/opt@static:-:2008-09-30-19:46:07 -      -         
0       static 2008-09-30 15:46 
   rpool/ROOT/opensolaris-5/opt@static:-:2008-11-10-02:05:41 -      -         
0       static 2008-11-09 21:05 
   rpool/ROOT/opensolaris-5@install                          -      -         
1.63G   static 2008-06-10 11:24 
   rpool/ROOT/opensolaris-5@static:-:2008-09-30-19:46:07     -      -         
69.62M  static 2008-09-30 15:46 
   rpool/ROOT/opensolaris-5@static:-:2008-11-10-02:05:41     -      -         
615.70M static 2008-11-09 21:05 
root@cyber:~# pkg image-update
DOWNLOAD                                    PKGS       FILES     XFER (MB)
Completed                                736/736 49733/49733 986.82/986.82 

be_mount_root: failed to mount dataset rpool/ROOT/opensolaris-6 at
/tmp/.be.JSa48d: directory is not empty
be_mount: failed to mount BE root file system
be_copy: failed to mount newly created BE
be_unmount: (opensolaris-6) not mounted
be_copy: destroying partially created boot environment
be_get_uuid: failed to get uuid property from BE root dataset user properties.
pkg: Unable to clone the current boot environment.
Comment 13 evan.layton 2009-06-12 08:26:41 UTC
What do you get from the beadm command "BE_PRINT_ERR=true beadm create newbe"?

Also is there any way to find out what is actually in the temporary mountpoint
at the point where the mount is failing? (e.g. /tmp/.be.JSa48d)
Comment 14 Ethan Quach 2009-06-12 09:06:55 UTC
pzeldin, it looks like you could be running into bug 4958.  Do you have
the sharenfs property set to 'on' for rpool, rpool/ROOT, or
rpool/ROOT/opensoalris-5 ?
Comment 15 Pavel Zeldin 2009-06-12 09:37:02 UTC
(In reply to comment #14)
> pzeldin, it looks like you could be running into bug 4958.  Do you have
> the sharenfs property set to 'on' for rpool, rpool/ROOT, or
> rpool/ROOT/opensoalris-5 ?

No.
root@cyber:~# zfs get sharenfs rpool rpool/ROOT rpool/ROOT/opensolaris-5
NAME                      PROPERTY  VALUE                     SOURCE
rpool                     sharenfs  off                       default
rpool/ROOT                sharenfs  off                       default
rpool/ROOT/opensolaris-5  sharenfs  off                       default
Comment 16 Pavel Zeldin 2009-06-12 09:42:34 UTC
(In reply to comment #13)
> What do you get from the beadm command "BE_PRINT_ERR=true beadm create newbe"?

root@cyber:~# BE_PRINT_ERR=true beadm create newbe
be_mount_root: failed to mount dataset rpool/ROOT/newbe at /tmp/.be.0Eay1f:
directory is not empty
be_mount: failed to mount BE root file system
be_copy: failed to mount newly created BE
be_unmount: (newbe) not mounted
be_copy: destroying partially created boot environment
be_get_uuid: failed to get uuid property from BE root dataset user properties.
Unable to create newbe.
Mount failed.



> 
> Also is there any way to find out what is actually in the temporary mountpoint
> at the point where the mount is failing? (e.g. /tmp/.be.JSa48d)

Can you suggest how to do it?
root@cyber:~# ls -al  /tmp/.be.0Eay1f
total 8
drwx------  2 root root  117 2009-06-12 12:35 .
drwxrwxrwt 23 root sys  1835 2009-06-12 12:35 ..
Comment 17 evan.layton 2009-06-17 13:51:29 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > pzeldin, it looks like you could be running into bug 4958.  Do you have
> > the sharenfs property set to 'on' for rpool, rpool/ROOT, or
> > rpool/ROOT/opensoalris-5 ?
> 
> No.
> root@cyber:~# zfs get sharenfs rpool rpool/ROOT rpool/ROOT/opensolaris-5
> NAME                      PROPERTY  VALUE                     SOURCE
> rpool                     sharenfs  off                       default
> rpool/ROOT                sharenfs  off                       default
> rpool/ROOT/opensolaris-5  sharenfs  off                       default

Do any of the other datasets under rpool/ROOT have sharenfs on?
Comment 18 Pavel Zeldin 2009-06-17 14:02:23 UTC
(In reply to comment #17)
> (In reply to comment #15)
> > (In reply to comment #14)
> > > pzeldin, it looks like you could be running into bug 4958.  Do you have
> > > the sharenfs property set to 'on' for rpool, rpool/ROOT, or
> > > rpool/ROOT/opensoalris-5 ?
> > 
> > No.
> > root@cyber:~# zfs get sharenfs rpool rpool/ROOT rpool/ROOT/opensolaris-5
> > NAME                      PROPERTY  VALUE                     SOURCE
> > rpool                     sharenfs  off                       default
> > rpool/ROOT                sharenfs  off                       default
> > rpool/ROOT/opensolaris-5  sharenfs  off                       default
> 
> Do any of the other datasets under rpool/ROOT have sharenfs on?

no
Comment 19 evan.layton 2009-06-17 14:09:35 UTC
So far I can't see what the problem may be nor have I been able to reproduce
the issue. 

I'm not sure it will point us at anything but can you provide the contents of
vfstab?
Comment 20 Pavel Zeldin 2009-06-17 14:16:03 UTC
(In reply to comment #19)
> So far I can't see what the problem may be nor have I been able to reproduce
> the issue. 
> 
> I'm not sure it will point us at anything but can you provide the contents of
> vfstab?

root@cyber:~# cat /etc/vfstab 
#device         device          mount           FS      fsck    mount   mount
#to mount       to fsck         point           type    pass    at boot options
#
/devices        -       /devices        devfs   -       no      -
/proc   -       /proc   proc    -       no      -
ctfs    -       /system/contract        ctfs    -       no      -
objfs   -       /system/object  objfs   -       no      -
sharefs -       /etc/dfs/sharetab       sharefs -       no      -
fd      -       /dev/fd fd      -       no      -
swap    -       /tmp    tmpfs   -       yes     -
rpool/ROOT/opensolaris-5        -       /       zfs     -       no      -
/dev/dsk/c4d0s1 -       -       swap    -       no      -
Comment 21 evan.layton 2009-06-17 21:13:09 UTC
It appears that we have confused things a bit here with separate issues. The
first issue as described by Philip Torchinsky was that he couldn't do an update
with his zones installed. It appears that these zones were created using one of
the development builds before the November release and therefore didn't have
the proper datasets created which is why the update failed in that case. The
way these zones were created doesn't appear to be the supported method. (see
http://www.opensolaris.org/jive/thread.jspa?threadID=80209&tstart=0)

Philip, can you confirm when these zones were created and what build you were
running at the time?


The issues that Ed ran into where related to bug 8006 and bug 5711 where we are
not handling the naming of the zones datasets as we should be and the error
messages for this case are just plain incorrect.

As for the issue described by Pavel Zeldin it does not involve zones at all
since there are none configured on his system. Also it appears that what he is
seeing is specific to his machine.

At this point the original issue appears to have been caused by an unsupported
zones configuration and the other zones issues seen are duplicates of other
known issues. Based on this we should probably close this bug and open a new
one to track the issue Pavel is seeing more easily. I've created bug 9510 to
track the issue Pavel is seeing.
Comment 22 Moriah Waterland 2009-06-29 13:12:52 UTC
Closing this bug because we have not received response from reporter.  Bug was
marked incomplete with a timeout that has expired.