Bug 1771 - Installer can't be restarted if it already created ZFS root pool
: Installer can't be restarted if it already created ZFS root pool
Status: RESOLVED FIXINBUILD
Product: opensolaris
install
: unspecified
: ANY/Generic OpenSolaris
: P1 major (vote)
: ---
Assigned To: Jan Damborsky
:
:
: rn3
:
:
: 2413
  Show dependency treegraph
 
Reported: 2008-05-02 16:47 UTC by Dave Miner
Modified: 2008-11-24 12:49 UTC (History)
2 users (show)

See Also:


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description Dave Miner 2008-05-02 16:47:42 UTC
If a ZFS pool named "rpool" is already present (that is, has been created or
imported during the current boot of the live CD, including from a previously
failed installation attempt) then the install will fail immediately, the last
message in the installation log being:

Root pool rpool exists, we can't proceed with the installation.

This was originally fixed in bug 770, but this portion of that fix was reverted
in bug 1633 due to problems re-importing the temporary pool name.  We need to
come up with a permanent solution for this issue.
Comment 1 Jan Damborsky 2008-08-20 02:30:25 UTC
There are two part of this problem, we would like to address by this bug.

[1] If user imported "rpool" (for example from other disk which might contain
    valid installation of OpenSolaris), we can't proceed with installation,
    since installer can't create target "rpool".

    The solution might be that if installer finds out, there is "rpool"
    imported by user, it could just export it. The problem is that "export"
    currently makes ZFS pool unbootable. This is being addressed by following
    bug:

    6733239 Searching for ZFS volumes during Nevada installer induces panic on
reboot of Indiana

    http://bugs.opensolaris.org/view_bug.do?bug_id=6733239

[2] When installer fails due to some reason after "rpool" is created, it can't
    be restarted without user intervention - user has to destroy "rpool"
    before installer can be restarted, since installer can't distinguish for
    now between fully and partially installed pools.

    After discussion with ZFS team, it turned that in this case we might take
    advantage of ZFS user properties. However, they can be only set on ZFS
    dataset, not pool. This is tracked by following bug

    6739057 Allow setting user properties for ZSF pool
    http://bugs.opensolaris.org/view_bug.do?bug_id=6739057

    AS a workaround, instead of setting ZFS properties on pool, we might set
    them on root dataset "rpool", for example:

[installer creates ZFS pool "rpool"]
# zfs set org.opensolaris.caiman:install=busy rpool

[installation done]
# zfs set org.opensolaris.caiman:install=ready rpool

   Also, if it turns out, we don't need to identify partially installed pools
   in installer, we might just export the pool like in [1].
Comment 2 Jan Damborsky 2008-09-15 04:59:28 UTC
After talking to ZFS team, it turned out that 6733239 doesn't affect
OpenSolaris
instances which contains fix for bug 2156 (build 91 and later).

Base on this, we could address problem with installer refusing to install
when "rpool" is present on the system using following approach:

Installer would check, if "rpool" exists on the system. If this is the
case, then

[1] Release ZFS volume dedicated to swap

[2] Release ZFS volume dedicated to dump. It has to be done using dumpadm(1M)
    command in follwoing way (procedure provided by ZFS team):
# dumpadm -d swap

[3] export "rpool"
# zpool export -f rpool

[4] Continue with instantiating the target

However, this procedure will not work for OpenSolaris instances which don't
have fix for bug 2156 (<91). We will need to work out, how to address this
(maybe appropriate release note might address this part).
Comment 3 Jan Damborsky 2008-09-16 00:32:41 UTC
Capturing additional information obtained from ZFS team:

* etc/zfs/zpool.cache is not needed in boot_archive on build >=88, but needs
  to be there for build <88 (which applies to 2008.05, since it is based on 86)

* In general, ZFS team will handle this issue in such a way that taking care of
  "etc/zfs/zpool.cache is present in boot_archive" part of the problem will be
  handled by ZFS itself and will be opaque to ZFS consumers. Based on this,
  approach we are going to follow to address bug 1771 will work also for
  OpenSolaris instances which don't containt fix for bug 2156. Thanks Lin !
Comment 4 Jan Damborsky 2008-09-25 01:54:00 UTC
The appropriate ZFS bug is:

6748436 inconsistant zpool.cache in boot_archive could panic a zfs root
        filesystem upon boot-up
http://bugs.opensolaris.org/view_bug.do?bug_id=6748436

The fix for this bug will assure that if out of sync etc/zfs/zpool.cache is
present in boot_archive, ZFS will pick up correct configuration and system
will not panic.

However, BEs already present on the system won't be cured by this fix,
since it won't be propagated to existing Solaris instances. But the fix will
mitigate the problem tracked by bug 3240, when filelist.ramdisk containing
etc/zfs/zpool.cache is copied to new BE during upgrade.

Because of this, installer will to complain if it finds 'rpool' present on
the system, which was already finalized by the installer as it might contain
valid Solaris instance.

zfs user properties (see comment #1) will be used in order to determine,
if 'rpool' was finalized by the installer or if it was created by the
installer which later failed and was interrupted.

Following approach will be taken:
[1] If there is 'rpool' with 'org.opensolaris.caiman:install' property set
    to 'busy', 'rpool' will be released and installer will proceed with
    the installation.

[2] Otherwise, installer will refuse to install because of 'rpool' being
    present on the system.
Comment 5 Jan Damborsky 2008-10-08 04:30:05 UTC
Changing Summary, since this bug will only address part of the problem related
to the fact that the installer can't be restarted after failure or
interruption.

Bug 3783 will continue to track the problem with manually imported 'rpool'
causing the installer to fail.
Comment 6 Jan Damborsky 2008-10-16 09:21:56 UTC
fixed in changeset:
29220561caa53abbc0ec94879279659c2a913542