Bugzilla – Bug 8313
update failure "Unable to clone the current boot environment" with zones
Last modified: 2009-08-20 18:26:30 UTC
You need to log in before you can comment on or make changes to this bug.
I was unable to update the system because of a bug with image-update. It required me to uninstall zones before update, and it's very bad. # uname -a SunOS marseille 5.11 snv_110 i86pc i386 i86pc # zoneadm list -vc ID NAME STATUS PATH BRAND IP 0 global running / native shared - linux installed /rpool/zones/linux lx shared - master configured /rpool/zones/master ipkg excl - host1 configured /rpool/zones/host1 ipkg excl - host2 installed /rpool/zones/host2 ipkg excl - vrouter configured /rpool/zones/vrouter ipkg excl pkg image-update pkg: 5/6 catalogs successfully updated: 1: Transfer from 'sunfreeware' timed out: timed out. 2: Transfer from 'sunfreeware' timed out: timed out. 3: Transfer from 'sunfreeware' timed out: timed out. 4: Transfer from 'sunfreeware' timed out: timed out. DOWNLOAD PKGS FILES XFER (MB) Completed 631/631 10445/10445 487.58/487.58 pkg: Unable to clone the current boot environment. # pkg authority PUBLISHER TYPE STATUS URI opensolaris-dev (preferred) origin online http://pkg.opensolaris.org/dev/ blastwave origin online http://blastwave.network.com:10000/ extra origin online https://pkg.sun.com/opensolaris/extra/ opensolaris.org origin online http://pkg.opensolaris.org/release/ sunfreeware origin online http://pkg.sunfreeware.com:9000/ webstack origin online http://pkg.opensolaris.org/webstack/ I tried and removed zones (with zoneadm detach or zoneadm uninstall), so the picture become different: # zoneadm list -vc ID NAME STATUS PATH BRAND IP 0 global running / native shared - linux installed /rpool/zones/linux lx shared - master configured /rpool/zones/master ipkg excl - host1 configured /rpool/zones/host1 ipkg excl - host2 configured /rpool/zones/host2 ipkg excl - vrouter configured /rpool/zones/vrouter ipkg excl After uninstalling ipkg-zones pkg image-update runs smoothly: # pkg image-update DOWNLOAD PKGS FILES XFER (MB) Completed 631/631 10445/10445 487.58/487.58 PHASE ACTIONS Removal Phase 3620/3620 Install Phase 4286/4286 Update Phase 17384/17384 PHASE ITEMS Reading Existing Index 7/7 Indexing Packages 631/631 Optimizing Index... PHASE ITEMS Indexing Packages 677/677 A clone of opensolaris-3 exists and has been updated and activated. On the next boot the Boot Environment opensolaris-4 will be mounted on '/'. Reboot when ready to switch to this updated BE.
What was the zfs layout when this failure occured? That is, what does 'zfs list' show?
I received the same error from the Package Manager GUI when trying to update from 2008.11 to 2009.06. Here is my ZFS list output and zones configuration (after performing the uninstall. Uninstalling the zones allowed the update to complete bash-3.2$ zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 15.3G 373M 49.5K /rpool rpool/ROOT 14.0G 373M 18K legacy rpool/ROOT/opensolaris 30.4M 373M 6.03G / rpool/ROOT/opensolaris-11 11.4G 373M 9.07G / rpool/ROOT/opensolaris-12 2.51G 373M 9.52G /tmp/tmpZPzSox rpool/dump 511M 373M 511M - rpool/export 284M 373M 19K /export rpool/export/home 284M 373M 21K /export/home rpool/export/home/jlaurent 284M 373M 284M /export/home/jlaurent rpool/export/home/zones 71K 373M 18K /export/home/zones rpool/swap 524M 373M 524M - testpool 224K 63.3M 18K /testpool bash-3.2$ zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / native shared - jim configured /export/home/zones/jim ipkg shared - jim2 configured /export/home/zones/jim2 ipkg shared - jim3 configured /export/home/zones/jim3 ipkg shared
As an additional comment, it would be nice if the error message was more complete in describing WHY it was unable to clone the boot environment. The error message as written doesn't provide the user with any hints as to what might be wrong in the system preventing the clone process. Is there not enough space? Are file systems unmounted? Are there to many BE already? Did the user type a wrong parameter? Are there problem with the GRUB settings? Is there a permissions problem?
(In reply to comment #3) > As an additional comment, it would be nice if the error message was more > complete in describing WHY it was unable to clone the boot environment. The > error message as written doesn't provide the user with any hints as to what > might be wrong in the system preventing the clone process. > > Is there not enough space? > Are file systems unmounted? > Are there to many BE already? > Did the user type a wrong parameter? > Are there problem with the GRUB settings? > Is there a permissions problem? pkg(5) doesn't provide the error; libbe does. You might want to file a bug against libbe for that. However, setting BE_PRINT_ERR=true before you run the command might give you more information.
This is a bug against the GUI Update Manager. Therefore expecting the user to know what libbe or BE_PRINT_ERR=true are would be unreasonable.
So, what would the correct workaround be if I have zones running in production that I want to keep after the upgrade? "Uninstalling the zones" doesn't sound like a very pleasing solution in this regard... I'm thinking that something like this should work if detaching the zones are enough, anyone care to confirm? zoneadm detach / pkg install SUNWipkg / pkg image-update / zoneadm attach -u ??
(In reply to comment #5) > This is a bug against the GUI Update Manager. Therefore expecting the user to > know what libbe or BE_PRINT_ERR=true are would be unreasonable. I don't expect the user tknow that. However, libbe has to be fixed to provide more verbose messaging. pkg(5) can't provide you with information it doesn't have.
ok. i'm seeing this same issue. i did an image-update with BE_PRINT_ERR=true and here's what i saw: ---8<--- root@mcescher$ BE_PRINT_ERR=true pkg image-update DOWNLOAD PKGS FILES XFER ... be_get_uuid: failed to get uuid property from BE root dataset user properties. be_copy_zones: failed to snapshot zone BE (export/zones/jds/ROOT/zbe@zbe-1): dataset already exists be_copy: failed to process zones be_copy: destroying partially created boot environment pkg: Unable to clone the current boot environment. root@mcescher$ zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / native shared - z1 installed /export/zones/z1 ipkg shared - ersa installed /export/zones/ersa ipkg shared - jds installed /export/zones/jds ipkg excl root@mcescher$ zfs list -t all -r export/zones/z1 NAME USED AVAIL REFER MOUNTPOINT export/zones/z1 156M 89.2G 22K /export/zones/z1 export/zones/z1/ROOT 156M 89.2G 19K legacy export/zones/z1/ROOT/zbe 151M 89.2G 146M legacy export/zones/z1/ROOT/zbe@zbe-1 4.57M - 146M - export/zones/z1/ROOT/zbe-1 4.68M 89.2G 146M legacy ---8<--- i can reproduce this problem by trying to do a "beadm clone", and in that case the error message is even worse. beadm tells me the name i've chosen is in use, even though it isn't. ---8<--- root@mcescher$ BE_PRINT_ERR=true beadm create opensolaris-200906 be_get_uuid: failed to get uuid property from BE root dataset user properties. be_copy_zones: failed to snapshot zone BE (export/zones/jds/ROOT/zbe@zbe-1): dat aset already exists be_copy: failed to process zones be_copy: destroying partially created boot environment Unable to create opensolaris-200906. BE opensolaris-200906 already exists. Please choose a different BE name. ---8<--- unfortunatly, if i detach zone z1, then i just fail on the next zone: ---8<--- root@mcescher$ zoneadm -z z1 detach root@mcescher$ BE_PRINT_ERR=true beadm create opensolaris-200906 be_get_uuid: failed to get uuid property from BE root dataset user properties. be_copy_zones: failed to snapshot zone BE (export/zones/jds/ROOT/zbe@zbe-1): dataset already exists be_copy: failed to process zones be_copy: destroying partially created boot environment Unable to create opensolaris-200906. BE opensolaris-200906 already exists. Please choose a different BE name. ---8<--- if i detach all my zones, then beadm clone succeeds. it seems that this bug should focus on the actual beadm clone failure rather than beadm error reporting. hence, i've gone ahead and filed a seperate bug (bug 9335) on the poor error reporting by beadm.
FWIW, the "failed to get uuid property" error is bug 7877, and turns out not to actually be an error.
I get slightly different error messages: BE_PRINT_ERR=true pkg image-update DOWNLOAD PKGS FILES XFER (MB) Completed 736/736 49733/49733 986.82/986.82 be_mount_root: failed to mount dataset rpool/ROOT/opensolaris-7 at /tmp/.be..2ayUb: directory is not empty be_mount: failed to mount BE root file system be_copy: failed to mount newly created BE be_unmount: (opensolaris-7) not mounted be_copy: destroying partially created boot environment be_get_uuid: failed to get uuid property from BE root dataset user properties. pkg: Unable to clone the current boot environment. root@cyber:~# ls -al /tmp/.be..2ayUb total 8 drwx------ 2 root root 117 2009-06-08 14:24 . drwxrwxrwt 9 root sys 868 2009-06-08 14:24 .. root@cyber:~# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / native shared root@cyber:~# uname -a SunOS cyber 5.11 snv_101a i86pc i386 i86pc Solaris
pzeldin, for your case, can you attach the output of 'zfs list' and 'zfs mount' or 'beadm list -a' ed, I think you're running into bug 5711 here again. The auto-naming algorithm for zone BEs has not been fixed. I've up'ed that bug to a P2 so that it may get addressed sooner than later.
Here are the requested logs. (I removed opensolaris-6 BE since the previous run). root@cyber:~# export BE_PRINT_ERR=true root@cyber:~# zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 207G 84.7G 62.5K /rpool rpool/ROOT 18.2G 84.7G 19K /rpool/ROOT rpool/ROOT/opensolaris-3 76.3M 84.7G 3.23G legacy rpool/ROOT/opensolaris-3/opt 0 84.7G 3.60M /opt rpool/ROOT/opensolaris-4 37.0M 84.7G 5.20G legacy rpool/ROOT/opensolaris-4/opt 0 84.7G 3.60M /opt rpool/ROOT/opensolaris-5 18.1G 84.7G 12.5G legacy rpool/ROOT/opensolaris-5/opt 1.65G 84.7G 1.65G /opt rpool/export 189G 84.7G 3.31G /export rpool/export/epiphan 23K 84.7G 23K /export/epiphan rpool/export/home 11.1G 84.7G 11.1G /export/home rpool/export/share 174G 84.7G 174G /export/share root@cyber:~# zfs mount rpool/ROOT/opensolaris-5 / rpool/ROOT/opensolaris-5/opt /opt rpool/export /export rpool/export/epiphan /export/epiphan rpool/export/home /export/home rpool/export/share /export/share rpool /rpool root@cyber:~# beadm list -a BE/Dataset/Snapshot Active Mountpoint Space Policy Created ------------------- ------ ---------- ----- ------ ------- opensolaris-3 rpool/ROOT/opensolaris-3 - - 76.31M static 2008-09-30 14:29 rpool/ROOT/opensolaris-3/opt - - 0 static 2008-09-30 14:29 opensolaris-4 rpool/ROOT/opensolaris-4 - - 36.97M static 2008-09-30 15:46 rpool/ROOT/opensolaris-4/opt - - 0 static 2008-09-30 15:46 opensolaris-5 rpool/ROOT/opensolaris-5 NR / 18.06G static 2008-11-09 21:05 rpool/ROOT/opensolaris-5/opt - /opt 1.65G static 2008-11-09 21:05 rpool/ROOT/opensolaris-5/opt@install - - 222.0K static 2008-06-10 11:24 rpool/ROOT/opensolaris-5/opt@static:-:2008-09-30-19:46:07 - - 0 static 2008-09-30 15:46 rpool/ROOT/opensolaris-5/opt@static:-:2008-11-10-02:05:41 - - 0 static 2008-11-09 21:05 rpool/ROOT/opensolaris-5@install - - 1.63G static 2008-06-10 11:24 rpool/ROOT/opensolaris-5@static:-:2008-09-30-19:46:07 - - 69.62M static 2008-09-30 15:46 rpool/ROOT/opensolaris-5@static:-:2008-11-10-02:05:41 - - 615.70M static 2008-11-09 21:05 root@cyber:~# pkg image-update DOWNLOAD PKGS FILES XFER (MB) Completed 736/736 49733/49733 986.82/986.82 be_mount_root: failed to mount dataset rpool/ROOT/opensolaris-6 at /tmp/.be.JSa48d: directory is not empty be_mount: failed to mount BE root file system be_copy: failed to mount newly created BE be_unmount: (opensolaris-6) not mounted be_copy: destroying partially created boot environment be_get_uuid: failed to get uuid property from BE root dataset user properties. pkg: Unable to clone the current boot environment.
What do you get from the beadm command "BE_PRINT_ERR=true beadm create newbe"? Also is there any way to find out what is actually in the temporary mountpoint at the point where the mount is failing? (e.g. /tmp/.be.JSa48d)
pzeldin, it looks like you could be running into bug 4958. Do you have the sharenfs property set to 'on' for rpool, rpool/ROOT, or rpool/ROOT/opensoalris-5 ?
(In reply to comment #14) > pzeldin, it looks like you could be running into bug 4958. Do you have > the sharenfs property set to 'on' for rpool, rpool/ROOT, or > rpool/ROOT/opensoalris-5 ? No. root@cyber:~# zfs get sharenfs rpool rpool/ROOT rpool/ROOT/opensolaris-5 NAME PROPERTY VALUE SOURCE rpool sharenfs off default rpool/ROOT sharenfs off default rpool/ROOT/opensolaris-5 sharenfs off default
(In reply to comment #13) > What do you get from the beadm command "BE_PRINT_ERR=true beadm create newbe"? root@cyber:~# BE_PRINT_ERR=true beadm create newbe be_mount_root: failed to mount dataset rpool/ROOT/newbe at /tmp/.be.0Eay1f: directory is not empty be_mount: failed to mount BE root file system be_copy: failed to mount newly created BE be_unmount: (newbe) not mounted be_copy: destroying partially created boot environment be_get_uuid: failed to get uuid property from BE root dataset user properties. Unable to create newbe. Mount failed. > > Also is there any way to find out what is actually in the temporary mountpoint > at the point where the mount is failing? (e.g. /tmp/.be.JSa48d) Can you suggest how to do it? root@cyber:~# ls -al /tmp/.be.0Eay1f total 8 drwx------ 2 root root 117 2009-06-12 12:35 . drwxrwxrwt 23 root sys 1835 2009-06-12 12:35 ..
(In reply to comment #15) > (In reply to comment #14) > > pzeldin, it looks like you could be running into bug 4958. Do you have > > the sharenfs property set to 'on' for rpool, rpool/ROOT, or > > rpool/ROOT/opensoalris-5 ? > > No. > root@cyber:~# zfs get sharenfs rpool rpool/ROOT rpool/ROOT/opensolaris-5 > NAME PROPERTY VALUE SOURCE > rpool sharenfs off default > rpool/ROOT sharenfs off default > rpool/ROOT/opensolaris-5 sharenfs off default Do any of the other datasets under rpool/ROOT have sharenfs on?
(In reply to comment #17) > (In reply to comment #15) > > (In reply to comment #14) > > > pzeldin, it looks like you could be running into bug 4958. Do you have > > > the sharenfs property set to 'on' for rpool, rpool/ROOT, or > > > rpool/ROOT/opensoalris-5 ? > > > > No. > > root@cyber:~# zfs get sharenfs rpool rpool/ROOT rpool/ROOT/opensolaris-5 > > NAME PROPERTY VALUE SOURCE > > rpool sharenfs off default > > rpool/ROOT sharenfs off default > > rpool/ROOT/opensolaris-5 sharenfs off default > > Do any of the other datasets under rpool/ROOT have sharenfs on? no
So far I can't see what the problem may be nor have I been able to reproduce the issue. I'm not sure it will point us at anything but can you provide the contents of vfstab?
(In reply to comment #19) > So far I can't see what the problem may be nor have I been able to reproduce > the issue. > > I'm not sure it will point us at anything but can you provide the contents of > vfstab? root@cyber:~# cat /etc/vfstab #device device mount FS fsck mount mount #to mount to fsck point type pass at boot options # /devices - /devices devfs - no - /proc - /proc proc - no - ctfs - /system/contract ctfs - no - objfs - /system/object objfs - no - sharefs - /etc/dfs/sharetab sharefs - no - fd - /dev/fd fd - no - swap - /tmp tmpfs - yes - rpool/ROOT/opensolaris-5 - / zfs - no - /dev/dsk/c4d0s1 - - swap - no -
It appears that we have confused things a bit here with separate issues. The first issue as described by Philip Torchinsky was that he couldn't do an update with his zones installed. It appears that these zones were created using one of the development builds before the November release and therefore didn't have the proper datasets created which is why the update failed in that case. The way these zones were created doesn't appear to be the supported method. (see http://www.opensolaris.org/jive/thread.jspa?threadID=80209&tstart=0) Philip, can you confirm when these zones were created and what build you were running at the time? The issues that Ed ran into where related to bug 8006 and bug 5711 where we are not handling the naming of the zones datasets as we should be and the error messages for this case are just plain incorrect. As for the issue described by Pavel Zeldin it does not involve zones at all since there are none configured on his system. Also it appears that what he is seeing is specific to his machine. At this point the original issue appears to have been caused by an unsupported zones configuration and the other zones issues seen are duplicates of other known issues. Based on this we should probably close this bug and open a new one to track the issue Pavel is seeing more easily. I've created bug 9510 to track the issue Pavel is seeing.
Closing this bug because we have not received response from reporter. Bug was marked incomplete with a timeout that has expired.