Bugzilla – Bug 1771
Installer can't be restarted if it already created ZFS root pool
Last modified: 2008-11-24 12:49:32 UTC
You need to log in before you can comment on or make changes to this bug.
If a ZFS pool named "rpool" is already present (that is, has been created or imported during the current boot of the live CD, including from a previously failed installation attempt) then the install will fail immediately, the last message in the installation log being: Root pool rpool exists, we can't proceed with the installation. This was originally fixed in bug 770, but this portion of that fix was reverted in bug 1633 due to problems re-importing the temporary pool name. We need to come up with a permanent solution for this issue.
There are two part of this problem, we would like to address by this bug. [1] If user imported "rpool" (for example from other disk which might contain valid installation of OpenSolaris), we can't proceed with installation, since installer can't create target "rpool". The solution might be that if installer finds out, there is "rpool" imported by user, it could just export it. The problem is that "export" currently makes ZFS pool unbootable. This is being addressed by following bug: 6733239 Searching for ZFS volumes during Nevada installer induces panic on reboot of Indiana http://bugs.opensolaris.org/view_bug.do?bug_id=6733239 [2] When installer fails due to some reason after "rpool" is created, it can't be restarted without user intervention - user has to destroy "rpool" before installer can be restarted, since installer can't distinguish for now between fully and partially installed pools. After discussion with ZFS team, it turned that in this case we might take advantage of ZFS user properties. However, they can be only set on ZFS dataset, not pool. This is tracked by following bug 6739057 Allow setting user properties for ZSF pool http://bugs.opensolaris.org/view_bug.do?bug_id=6739057 AS a workaround, instead of setting ZFS properties on pool, we might set them on root dataset "rpool", for example: [installer creates ZFS pool "rpool"] # zfs set org.opensolaris.caiman:install=busy rpool [installation done] # zfs set org.opensolaris.caiman:install=ready rpool Also, if it turns out, we don't need to identify partially installed pools in installer, we might just export the pool like in [1].
After talking to ZFS team, it turned out that 6733239 doesn't affect OpenSolaris instances which contains fix for bug 2156 (build 91 and later). Base on this, we could address problem with installer refusing to install when "rpool" is present on the system using following approach: Installer would check, if "rpool" exists on the system. If this is the case, then [1] Release ZFS volume dedicated to swap [2] Release ZFS volume dedicated to dump. It has to be done using dumpadm(1M) command in follwoing way (procedure provided by ZFS team): # dumpadm -d swap [3] export "rpool" # zpool export -f rpool [4] Continue with instantiating the target However, this procedure will not work for OpenSolaris instances which don't have fix for bug 2156 (<91). We will need to work out, how to address this (maybe appropriate release note might address this part).
Capturing additional information obtained from ZFS team: * etc/zfs/zpool.cache is not needed in boot_archive on build >=88, but needs to be there for build <88 (which applies to 2008.05, since it is based on 86) * In general, ZFS team will handle this issue in such a way that taking care of "etc/zfs/zpool.cache is present in boot_archive" part of the problem will be handled by ZFS itself and will be opaque to ZFS consumers. Based on this, approach we are going to follow to address bug 1771 will work also for OpenSolaris instances which don't containt fix for bug 2156. Thanks Lin !
The appropriate ZFS bug is: 6748436 inconsistant zpool.cache in boot_archive could panic a zfs root filesystem upon boot-up http://bugs.opensolaris.org/view_bug.do?bug_id=6748436 The fix for this bug will assure that if out of sync etc/zfs/zpool.cache is present in boot_archive, ZFS will pick up correct configuration and system will not panic. However, BEs already present on the system won't be cured by this fix, since it won't be propagated to existing Solaris instances. But the fix will mitigate the problem tracked by bug 3240, when filelist.ramdisk containing etc/zfs/zpool.cache is copied to new BE during upgrade. Because of this, installer will to complain if it finds 'rpool' present on the system, which was already finalized by the installer as it might contain valid Solaris instance. zfs user properties (see comment #1) will be used in order to determine, if 'rpool' was finalized by the installer or if it was created by the installer which later failed and was interrupted. Following approach will be taken: [1] If there is 'rpool' with 'org.opensolaris.caiman:install' property set to 'busy', 'rpool' will be released and installer will proceed with the installation. [2] Otherwise, installer will refuse to install because of 'rpool' being present on the system.
Changing Summary, since this bug will only address part of the problem related to the fact that the installer can't be restarted after failure or interruption. Bug 3783 will continue to track the problem with manually imported 'rpool' causing the installer to fail.
fixed in changeset: 29220561caa53abbc0ec94879279659c2a913542