Bugzilla – Bug 4755
ZFS boot does not work with removable media (usb flash memory)
Last modified: 2009-03-19 07:09:35 UTC
You need to log in before you can comment on or make changes to this bug.
GRUB drops to prompt at boot, findroot, bootfs, kernel$, and module$ do not find the boot media. Errors were 15 and 17, if I remember correctly. Environment: Asus RS-120 E5/P4, 2.83GHz quad cpu, 8G memory 4*1T samsung disks (AHCI mode) (raidz pool, configured but not imported) 16G Verbatim store'n'go USB stick (root and boot media) (I so miss zfsroot, it was nice that only kernel and boot_archive were needed on the flash, root could be in the raidz pool...) steps to reproduce/what I've tried so far: 1) boot 2008.05 CD, install as "use whole disk". -> System boots to GRUB menu and works as expected. 2) run pkg image-update as per snv86 -> >=snv93 manual update_grub instructions (snv101a rc1b in the repository?), reboot -> System boots to GRUB prompt. Same steps worked with snv98 in VMWare Fusion some weeks ago. 3) boot from 2008.05 media, import the rpool, boot without exporting -> System boots to GRUB prompt. 4) repeat for phase1 & phase2 files I had handy (snv98 from virtual machine, snv99 from sol-nv-b99-x86-dvd.iso): boot from 2008.05 media, install grub -> System boots to GRUB prompt. 5) install 2008.05 phase1 and phase2 files from the live media -> System boots to GRUB menu, allows to boot the 2008.05 BE from flash, but trying to boot the the updated BE kernel panics (maybe to be expected with such an old GRUB?) 6) wipe the root/boot usb stick, install from osol-0811-101a-rc1b.usb (another stick) (the root/boot USB is listed in the installer as not bootable as the live-usb is taking the bootable slot from bios, changed back after install) -> System boots to GRUB prompt on first boot I also tried update-archive at several points in-between the steps but there was no noticeable difference. Speculation: - http://opensolaris.org/jive/thread.jspa?threadID=81852&tstart=0 seems to mention need for devid in the zfs metadata, it was not present in any of the installations above. zfs pool version was 10 for the 2008.05 based install, 13 for the 101a rc1b install. Maybe something to do with differences with use full disk and partitioned install? - snv88 added findroot support, maybe something broke around that time. There is apparently no working way (maybe entire@?) to image-update to a specific version to test it (nor am I itching to do it many times with the 384kbps 3G network the box is behind for base system install. The update takes forever.)
Most likely this one is a duplicate of bug 4772 (with the exception that the "zpool import -f rpool" workaround won't help in this case) Do you have a bootable solaris box where you can run "cdrecord -scanbus" when that Verbatim usb flash memory stick (intended to be used as zfs boot/root) is attached? What kind of disk attributes are listed in cdrecord scanbus output for the verbatim usb flash memory stick? Is it "Disk" or is it "Removable Disk"? Most likely you'll see "Removable Disk" with an usb flash memory stick, and the removable attribute has the bad side effect that the Solaris sd(7D) driver won't create a "devid" property for the flash memory stick. Since grub's zfs code requires the "devid" property, you can't boot from such a "removable media" usb flash memory stick any more.
I booted with the 2008.05 CD that I had handy. The Verbatim flash does report itself as "Removable Disk" as Jürgen Keil expected in comment #1. If this really is the intended behaviour it raises some questions/changes the bug report. Sorry for the rambling. 1) Why is the devid restricted to non-removable disk devices? The "Removable" device in this case, for example, will be located inside the server and take minutes to remove whereas the "non-removable" hotswap sata disks will take a whole second to remove. If it is considered a problem that people install opensolaris on removable drives, just make the installer tell them that they can't plug the drive to another port and expect it to work if live usb install is not (yet) supported ? So should the bug be "ZFS boot does not work with removable media in fixed slot"? 2) Why is the devid suddenly needed? ZFS did boot just fine before. I haven't had time to look at the zfs grub code since initial release, though... There must be some other reason than just 'not needing to look up the root device again in kernel after loading kernel and boot_archive with grub', but I didn't come across anything in my googling around the issue before posting the bug. 3) To me, 1 & 2 seem to make it even harder (~impossible) to make a live flash opensolaris installs with ZFS boot. Intentional design choice? This isn't an issue to me, I assume there are other bug reports for it. 4) This is another step back in the initially flexible ZFS booting. The initial zfsroot hack allowed the server to be installed with a small flash device with grub, kernel(s), and boot_archive(s). The real root could be mounted from pools of raidz vdevs. Then came zfsboot, which began considering bootdisk==rootdisk and in the process dropped support for anything but ZFS mirrors. As there didn't seem to be interest in putting the zfsroot option back (or equivalent configuration option) the obvious solution was to get a bigger flash device and install the core OS on it, too. Then came post-2008.05 which can't even boot from removable media... Maybe there should be a bug for "Boot and root device should not be considered the same". (a rack server fits nicely four 3.5" disks sideways. Sata ports seem nowadays come in multiples of 4. Raidz over 4 drives gives a nice 75% usable space for bulk data storage and protection against atleast random bit rot. Four disk sets are also at a nice pricepoint to add or replace disks with bigger ones (thinking home media server/small office shared drive). This leaves the question where to boot from, leave 4 sata/slots for just one boot disk/mirror? raidz2 would be reliable solution for 1U servers, but zfsboot can't boot from raidz at all...) Two workarounds come to mind, one is to manually add the devid field if I ever figure out how to construct and add it. And hope sd doesn't remove it on its own. The second is to find "not removable" USB flash disk or big IDE Flash Module that will work in the DVD PATA slot. I wonder how long it'll take before PATA booting is phased out...
I see the same; trying to install a (mirrored pair of) usb flash drives from within VirtualBox, which i want to be the new root pool for a storage server that's presently burning two sata ports for small useless boot disks. (In reply to comment #2) > 1) Why is the devid restricted to non-removable disk devices? Indeed, it's precisely the removable disks (which may appear at different locations each time they're plugged in) that would most benefit from a persistent locator value. > 3) To me, 1 & 2 seem to make it even harder (~impossible) to make a live flash > opensolaris installs with ZFS boot. Intentional design choice? This isn't an > issue to me, I assume there are other bug reports for it. Recent releases are even providing a usb boot image (as an alternative to the cd) for install, but this means that the same usb stick I can boot an install image from can't also be used to boot a zfs root. > 4) This is another step back in the initially flexible ZFS booting. Agreed; the same storage server above used to have the two extra disks in the storage pool, together with root, but had to be rearranged when the new zfs root stuff made that no longer work, and I needed a separate rpool.
(In reply to comment #2) > I booted with the 2008.05 CD that I had handy. The Verbatim flash does report > itself as "Removable Disk" as Jürgen Keil expected in comment #1. If this > really is the intended behaviour Apparently Microsoft windows doesn't use "autorun" on USB removable media devices (unless they are CD-ROM drives); autorun only works on USB fix media storage devices ... http://www.microsoft.com/whdc/archive/usbfaq.mspx Most USB flash memory stick vendors return "removable media" in the data returned by a SCSI INQUIRY; this prevents automatic start of programs when such an USB flash memory stick is attached to a windows box (XP SP2 seems to relax this a bit, and at least opens a menu and you can add a customized action to it). > it raises some questions/changes the bug report. > > Sorry for the rambling. > > 1) Why is the devid restricted to non-removable disk devices? This is implemented in the sd(7D) driver, in function sd_set_unit_attributes. In case a disk device returns a cleared "removable media" bit in scsi inquiry data, sd is allowed to create a unique "devid" for it (un->un_f_devid_supported == TRUE); and when the "removable media" bit is set no "devid" gets created: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/scsi/targets/sd.c#sd_set_unit_attibutes http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/scsi/targets/sd.c#29262 I suspect the reason no "devid" is created for removable media devices is that more than one media could exist that all would share the same "devid" (assuming the devid is contructed from inquiry data: vendor, product name, device serial number). Since in some cases the fabricated "devid" is stored somewhere in the sunos disk label on the media, this devid could suddenly appear on a different machine, in a different drive, ... The root of the problem is that the usb flash memory stick vendors return "removable media", although the end user has no way to actually remove or change the media (the flash ram chips). > The "Removable" > device in this case, for example, will be located inside the server and take > minutes to remove whereas the "non-removable" hotswap sata disks will take a > whole second to remove. Yes, I'd say an usb flash memory stick has non-removable media, but that's not what the typical usb flash memory stick returns. > If it is considered a problem that people install > opensolaris on removable drives, just make the installer tell them that they > can't plug the drive to another port and expect it to work if live usb install > is not (yet) supported ? > > So should the bug be "ZFS boot does not work with removable media in fixed > slot"? That would be a better summary for this bug, yes. > 2) Why is the devid suddenly needed? ZFS did boot just fine before. I haven't > had time to look at the zfs grub code since initial release, though... I think it's the test at line 1087 that is responsible for the failed boot from zfs when there is no "devid": http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/grub/grub-0.97/stage2/fsys_zfs.c#vdev_get_bootpath 1087 if (strcmp(type, VDEV_TYPE_DISK) == 0) { 1088 if (vdev_validate(nv) != 0 || 1089 (nvlist_lookup_value(nv, ZPOOL_CONFIG_PHYS_PATH, 1090 bootpath, DATA_TYPE_STRING, NULL) != 0) || 1091 (nvlist_lookup_value(nv, ZPOOL_CONFIG_DEVID, 1092 devid, DATA_TYPE_STRING, NULL) != 0)) 1093 return (ERR_NO_BOOTPATH); E.g. we now need both a "physical device path" and a "devid" in the zpool configuration data on disk to be able to boot from zfs. In the past, the physical device path was enough. I think this is a bug in grub's zfs code; the "devid" should be optional. It's nice to have the "devid" because it allows to boot from zfs after you moved the device to a different physical location. On devices without "devid" you have to make sure that you don't change boot device's physical device path. But boot should work using the physical device path found in the zpool's label; the way it used to work before the putbacks for 6704717 ZFS mirrored root doesn't live up to expectations 6710937 Boot failed information should be more friendly (these putbacks did change grub to require the "devid") > There > must be some other reason than just 'not needing to look up the root device > again in kernel after loading kernel and boot_archive with grub', but I didn't > come across anything in my googling around the issue before posting the bug. > > 3) To me, 1 & 2 seem to make it even harder (~impossible) to make a live flash > opensolaris installs with ZFS boot. Intentional design choice? This isn't an > issue to me, I assume there are other bug reports for it. There are other users that have found the same problem: http://www.opensolaris.org/jive/thread.jspa?threadID=82172&tstart=30 http://www.opensolaris.org/jive/thread.jspa?messageID=271583񂓟 One bug report that is interesting: 6513775 zfs root disk portability > 4) This is another step back in the initially flexible ZFS booting. The initial > zfsroot hack allowed the server to be installed with a small flash device with > grub, kernel(s), and boot_archive(s). The real root could be mounted from pools > of raidz vdevs. Then came zfsboot, which began considering bootdisk==rootdisk > and in the process dropped support for anything but ZFS mirrors. As there > didn't seem to be interest in putting the zfsroot option back (or equivalent > configuration option) the obvious solution was to get a bigger flash device and > install the core OS on it, too. Then came post-2008.05 which can't even boot > from removable media... > > Maybe there should be a bug for "Boot and root device should not be considered > the same". > > (a rack server fits nicely four 3.5" disks sideways. Sata ports seem nowadays > come in multiples of 4. Raidz over 4 drives gives a nice 75% usable space for > bulk data storage and protection against atleast random bit rot. Four disk sets > are also at a nice pricepoint to add or replace disks with bigger ones > (thinking home media server/small office shared drive). This leaves the > question where to boot from, leave 4 sata/slots for just one boot disk/mirror? > raidz2 would be reliable solution for 1U servers, but zfsboot can't boot from > raidz at all...) > > > Two workarounds come to mind, one is to manually add the devid field if I ever > figure out how to construct and add it. And hope sd doesn't remove it on its > own. I'm using such a hack, for my SX:CE zfs bootable usb flash memory stick. I did extend the "removable" feature in /kernel/drv/scsa2usb.conf a bit. In the original scsa2usb driver you can only set "removable=true" to override a cleared vendor supplied removable media bit in scsi inquiry data. I've extended this to allow "removable=false", so that my flash memory stick is treated as a (hotpluggable) fixed media device; and the side effect is that sd(7D) will create a "devid" for it, and the usb stick becomes zfs bootable on whatever usb port on whatever system the stick is connected to. 8-) > The second is to find "not removable" USB flash disk I actually have a very old usb 1.x flash memory stick that did return "fixed media" with the scsi inquiry command. But most (all) current flash memory sticks return "removable media". > or big IDE Flash > Module that will work in the DVD PATA slot. I wonder how long it'll take before > PATA booting is phased out...
Created an attachment (id=844) [details] patch to add "removable=true" support to scsa2usb This is the patch for the scsa2usb module I'm currently using to force my "OCZ ATV" usb flash memory stick to be handled as a "fixed media" disk device.
Created an attachment (id=845) [details] compiled (x86) scsa2usb driver modules, with "removable=false" suport And the same as a compiled driver module (complied for ~ build 103, but I expect it works on older systems, too). Copy obj32/scsa2usb to kernel/drv/scsa2usb, and obj64/scsa2usb to kernel/drv/amd64/scsa2usb. Add something like this to kernel/drv/scsa2usb.conf (vid and pid can be found in prtconf -Dv output, and are the 'usb-vendor-id' and 'usb-product-id' for the flash memory stick): attribute-override-list = # OCZ ATV flash memory stick reports removable media, but for zfs root # we need fixed media so that we get devids... "vid=0x324 pid=0xbc06 rev=* removable=false"; Note that you have to do these changes both on the root filesystem of the usb flash memory stick and on some other opensolaris or SX:CE system where you "zpool import -f rpool" the usb flash memory stick (so that the zpool label on the flash memory stick is rewritten with "devid"). % iostat -En ... c9t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: OCZ Product: ATV Revision: 1100 Serial No: Size: 8.02GB <8019509248 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 70 Predictive Failure Analysis: 0 % prtconf -Dv /dev/rdsk/c9t0d0p0 disk, instance #10 (driver name: sd) Driver properties: ... Hardware properties: name='devid' type=string items=1 value='id1,sd@f1870ec4c487120d00009c8e70006' <<<<<<<<<<<<<<<<<<<< name='inquiry-revision-id' type=string items=1 value='1100' name='inquiry-product-id' type=string items=1 value='ATV' name='inquiry-vendor-id' type=string items=1 value='OCZ' name='inquiry-device-type' type=int items=1 value=00000000 name='usb' type=boolean name='compatible' type=string items=1 value='sd' name='lun' type=int items=1 value=00000000 name='pm-capable' type=int items=1 value=00000001 name='hotpluggable' type=boolean name='target' type=int items=1 value=00000000 # zpool import -f rpool # zdb rpool version=10 name='rpool' state=0 txg=12500 pool_guid=12983198538186227724 hostid=740952579 hostname='tiger2' vdev_tree type='root' id=0 guid=12983198538186227724 children[0] type='disk' id=0 guid=6645906143762113720 path='/dev/dsk/c9t0d0s0' devid='id1,sd@f1870ec4c487120d00009c8e70006/a' <<<<<<<<<<<<< phys_path='/pci@0,0/pci1043,81c0@b,1/storage@5/disk@0,0:a' whole_disk=0 metaslab_array=15 metaslab_shift=26 ashift=9 asize=8006402048 is_log=0 DTL=120
(In reply to comment #3) > I see the same; trying to install a (mirrored pair of) usb flash drives > from within VirtualBox, which i want to be the new root pool for a storage > server that's presently burning two sata ports for small useless boot disks. What is the host system you're using for running VirtualBox? (I havn't done this, but) I think you have to use the virtualbox "usb path through" feature (which isn't available for Solaris hosts at this time) so that the opensolaris kernel actually sees a usb storage device on an usb controller. Note that the format of the "devid" property depends on the Solaris driver used for the disk device. E.g. with scsa2usb / sd the format is something like this: id1,sd@f1870ec4c487120d00009c8e70006 nv_sata / sd uses id1,sd@ASAMSUNG_HD400LJ=S0H2J1WL712015 ata / cmdk uses id1,cmdk@AWDC_WD1000BB-32CCB0=WD-WMA9P1202169 That is, for zfs boot via "devid" you have to make sure that the device is always connected to the same type of disk controller, otherwise the devids are changing and the kernel isn't able to "find the disk device by devid".
Not an installer bug, moving to more appropriate product/component.
(In reply to comment #7) > What is the host system you're using for running VirtualBox? Vista with usb passthrough; had tried also the same hardware booted from cd, with same result. I had earlier failed trying to install under Xen HVM from netbsd making images to dd to the usb stick - those tripped over the changed device path issues in an older release, before this problem. > Note that the format of the "devid" property depends on the > Solaris driver used for the disk device. Understood, though not the issue in this specific example, this could be a problem elsewhere - booting a disk image via xen or native, for example. Maybe devid is the wrong property, then, especially if it has established uses and implications for "real" removable media devices (the old MO drives come to mind). Perhaps what's needed here is a mediaid property that comes from the media label, whatever the hardware path to the image. I'm not clear why this is needed, but the answer may be in the other change references above, which I have yet to follow. > That is, for zfs boot via "devid" you have to make sure that the > device is always connected to the same type of disk controller, > otherwise the devids are changing and the kernel isn't able to > "find the disk device by devid". As before, I think this perhaps should be "find the media by mediaid".
(In reply to comment #4) Thanks for the detailed explanation; this has highlighted the difference between "removable device" and "removable media" and clarified that sd is doing the right thing according to what it finds. > On devices without "devid" you have to make sure that you don't change > boot device's physical device path. But boot should work using the physical > device path found in the zpool's label; the way it used to work before the > putbacks for > > 6704717 ZFS mirrored root doesn't live up to expectations > 6710937 Boot failed information should be more friendly > > (these putbacks did change grub to require the "devid") I think that's backwards, I think tying to a locator that belongs to the image/media is a better idea - it's just that devid doesn't quite have the right behaviour and wasn't really the locator the above changes should have used. There is a counter-point, though - if it depends solely on the contents of the media, we can get in trouble when that media image is copied, replicated, or multi-pathed. > > Two workarounds come to mind, one is to manually add the devid field if I ever > > figure out how to construct and add it. And hope sd doesn't remove it on its > > own. > > I'm using such a hack, for my SX:CE zfs bootable usb flash memory stick. Thanks for posting this, I'll see if I can use this workaround in the meantime!
(In reply to comment #10) > > I'm using such a hack, for my SX:CE zfs bootable usb flash memory stick. > > Thanks for posting this, I'll see if I can use this workaround in the meantime! This turns out to be just that little more involved than first assumed; of course the system I need to do this on can't have it's own "rpool" that it booted from, and I have none such left. I'll need to spin up a VM of SXCE with UFS root or something similar, and that will take more time. In the meantime, I filed an RFE outlining an idea of how I see the overall wider problem could be addressed nicely. I didn't get a number back from submitting, I assume it's waiting in an anti-spam queue before getting created for real. The basic idea is to use the usb stick as bootable L2ARC, and force the data needed for booting to be in cache (what we previously used to put on a ufs boot fs). I'll add cross-references when I have a bug id.
(In reply to comment #11) > This turns out to be just that little more involved than first assumed; of > course the system I need to do this on can't have it's own "rpool" that it > booted from, and I have none such left. I'll need to spin up a VM of SXCE with > UFS root or something similar, and that will take more time. I ran into some problems with the precompiled scsa2usb drivers, they do not work with snv101a (see bug 5082 I accidentally filed as I didn't remember I had patched the systems). I updated the devid with snv101a with zfsboot. I installed 101a in vmware, renamed the rootpool (it's surprisingly easy, see bug 5116 ), mounted the usb to the vm and was done. I got the usb to boot to the grub menu on real hardware, but got a kernel panic later in the boot. I'll look into it later to see if it's related.
(In reply to comment #6) > Note that you have to do these changes both on the > root filesystem of the usb flash memory stick and > on some other opensolaris or SX:CE system where you > "zpool import -f rpool" the usb flash memory stick > (so that the zpool label on the flash memory stick is > rewritten with "devid"). I got the devid into the usb (see comment #12), and it loads kernel but panics after that: ---8<--- NOTICE: *************************************************** * This device is not bootable! * * It is either offlined or detached or faulted. * * Please try to boot from a different device. * *************************************************** NOTICE: spa_import_rootpool: error 19 Cannot mount root on /pci@0,0/pci15ad,790@11/pci15ad,770@3/storage@1/disk@0,0:a fstype zfs panic[cpu0]/thread=fffffffffbc2a8a0: vfs_mountroot: cannot mount root fffffffffbc4b560 genunix:vfs_mountroot+350 () fffffffffbc4b590 genunix:main+e9 () fffffffffbc4b5a0 unix:_locore_start+92 () panic: entering debugger (no dumo device, continue to reboot) Welcome to kmdb Loaded modules: [ scsi_vhci uppc sd unix mpt zfs krtld genunix specfs pcplusmpcpu.generic ] ---8<--- Did you do some other magic besides the scsa2usb updates with your usb stick install? (maybe this is related to the version difference between kernel and the scsa2usb binaries) I tried updating the zpool.cache in the usb rpool (and updating the boot_archive) with a version that had the pool and its devid mentioned (the one left there before didn't mention devid). The device path is most likely wrong, as the zpool.cache was copied from vmware with the usb stick mounted and has another pool, 'r', mentioned in it. Thought, another bug (see bug #5116) I ran into makes me think zfsboot doesn't care much about the zpool.cache when it comes to root mounting. ...if only vmware would allow booting from a real usb device for reference...
(In reply to comment #11) > I'll add cross-references when I have a bug id. http://bugs.opensolaris.org/view_bug.do?bug_id=6771273
We're not going to hold the release for this; it would be great to get this working for next spring though.
(In reply to comment #13) > I got the devid into the usb (see comment #12), and it loads kernel but panics > after that: > ---8<--- > NOTICE: > *************************************************** > * This device is not bootable! * > * It is either offlined or detached or faulted. * > * Please try to boot from a different device. * > *************************************************** > NOTICE: spa_import_rootpool: error 19 > Cannot mount root on /pci@0,0/pci15ad,790@11/pci15ad,770@3/storage@1/disk@0,0:a fstype zfs Error 19 is ENODEV (No such device). zfs boot uses ldi_open_by_devid(), and this function should return ENODEV when no device is found in the system matching a given "devid". The new scsa2usb module is on the usb flash memory stick? Both 32- and 64-bit versions (kernel/drv/scsa2usb and kernel/drv/amd64/scsa2usb)? And the kernel/drv/scsa2usb.conf file on the flash memory stick was updated with the removable=false setting for the usb flash memory stick? And finally you also have to make sure the boot_archive on the usb flash memory stick is updated.
Some more b.o.o. bugs have been filed for this problem: Bug ID: 6770866 Synopsis: GRUB/ZFS should require physical path or devid, but not both http://bugs.opensolaris.org/view_bug.do?bug_id=6770866 Bug ID: 6769487 Synopsis: devid failures on SATA drives and removable drives make OpenSolaris fail to boot http://bugs.opensolaris.org/view_bug.do?bug_id=6769487
(In reply to comment #16) > > Cannot mount root on /pci@0,0/pci15ad,790@11/pci15ad,770@3/storage@1/disk@0,0:a fstype zfs > > zfs boot uses ldi_open_by_devid(), and this function > should return ENODEV when no device is found in the system > matching a given "devid". I had verified all the points you mentioned. I did some more testing, and got the usb stick finally to boot. The problem was phys_path, the path above was the path in the VMWare instance. The real phys_path in the hardware it's running on is /pci@0,0/pci1043,819e@1d,7/storage@2/disk@0,0:a (the devid stayed the same). I got it working by installing 101a with the scsa2usb patch on a local disk (luckily I had an empty disk handy), renaming the rpool to r and importing the usb stick in its proper hardware slot. I tried booting the usb stick in another usb slot after I got it working - it failed like with the VMWare-phys_path. (the real path in the another slot was /pci@0,0/pci1043,819e@1d,7/storage@3/disk@0,0:a). Maybe this is something that has been fixed between 101a and ~103, otherwise your previous claim of "... and the usb stick becomes zfs bootable on whatever usb port on whatever system the stick is connected to." doesn't hold. Just a FYI. I'm happy with your workaround until it's fixed properly. Thank you very much for the patch. It seems that the boot requires both the devid present and the correct phys_path. Worst of both worlds.=) (maybe this could explain some of the mirror boot problems if the phys_path is mirrored/not mirrored, too?) I hope http://bugs.opensolaris.org/view_bug.do?bug_id=6770866 addresses the basic boot issues, snv104 isn't too far away. (then again, it won't be in time for 2008.11 release? 6 months of no "release" support...)
(In reply to comment #18) > (In reply to comment #16) > > > Cannot mount root on /pci@0,0/pci15ad,790@11/pci15ad,770@3/storage@1/disk@0,0:a fstype zfs > > > > zfs boot uses ldi_open_by_devid(), and this function > > should return ENODEV when no device is found in the system > > matching a given "devid". > > I had verified all the points you mentioned. I did some more testing, and got > the usb stick finally to boot. The problem was phys_path, the path above was > the path in the VMWare instance. Correct. The PCI vendor-id "15ad" is a hint that this path was constructed for the vmware hardware. > The real phys_path in the hardware it's > running on is /pci@0,0/pci1043,819e@1d,7/storage@2/disk@0,0:a > (the devid stayed the same). That (vendor-id 1043) seems to be a physical path for an ASUS mainboard. That we have devids and they stayed the same same is the expected behavior. > I got it working by installing 101a with the scsa2usb patch on a > local disk (luckily I had an empty disk handy), renaming the rpool to r and > importing the usb stick in its proper hardware slot. Hmm, I think this import did update the "phys_path" and "path" in the zpool label. Seems as if it is booting using the physical path, and the devid isn't used? > I tried booting the usb stick in another usb slot after I got it working - it > failed like with the VMWare-phys_path. (the real path in the another slot was > /pci@0,0/pci1043,819e@1d,7/storage@3/disk@0,0:a). Hmm, strange that it still knows about the vmware physical path. Is this physical path still present in the usb stick's /boot/solaris/bootenv.rc file as "bootpath" property, perhaps? On my zfs bootable usb flash memory stick I have no more bootpath property in /boot/solaris/bootenv.rc. When you run "zdb -l /dev/rdsk/c?t0d0s?" using the disk device for your usb zpool, dumping the zpool's label: does that still report the vmware physical device path? > Maybe this is something that > has been fixed between 101a and ~103, otherwise your previous claim of "... and > the usb stick becomes > zfs bootable on whatever usb port on whatever system the stick is connected > to." doesn't hold. Just a FYI. Hmm, I don't remember that this part of the code did change recently. http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_disk.c#vdev_disk_open Btw. did it always panic with the error 19, ENODEV? I'm not sure when it was changed, but the zfs code did check that the hostid stored in the zpool label matches the one of the system that is booting. That was bad in case you imported the zfs root on some other system, because it changed the hostid in the zpool's label. I think this test was relaxed in build 100? http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6716241 > I'm happy with your workaround until it's fixed > properly. Thank you very much for the patch. > > It seems that the boot requires both the devid present and the correct > phys_path. Worst of both worlds.=) (maybe this could explain some of the mirror > boot problems if the phys_path is mirrored/not mirrored, too?) > > I hope http://bugs.opensolaris.org/view_bug.do?bug_id=6770866 addresses the > basic boot issues, snv104 isn't too far away. (then again, it won't be in time > for 2008.11 release? 6 months of no "release" support...)
(In reply to comment #19) > Hmm, I think this import did update the "phys_path" and "path" in the > zpool label. Seems as if it is booting using the physical path, and the > devid isn't used? It seems to me that the devid is used by grub to load the menu.lst. Then, when the root is being mounted, the phys_path is used and not the devid. > > I tried booting the usb stick in another usb slot after I got it working - it > > failed like with the VMWare-phys_path. (the real path in the another slot was > > /pci@0,0/pci1043,819e@1d,7/storage@3/disk@0,0:a). > > Hmm, strange that it still knows about the vmware physical path. Is this > physical path still present in the usb stick's /boot/solaris/bootenv.rc file > as "bootpath" property, perhaps? On my zfs bootable usb flash memory stick > I have no more bootpath property in /boot/solaris/bootenv.rc. Sorry, maybe I was a little bit unclear. I didn't mean it fails because the vmware phys_path is still remembered after booting successfully from the usb stick. I mean it fails because the ".../storage@2/..." phys_path updated to the usb stick is not valid for booting in the same hardware but in another usb slot. The physical path listed for that other slot is ".../storage@3/..." (which, ofcourse, was never updated to the usb stick label). Moving the stick back from the storage@3 slot to the storage@2 slot allowed the kernel to mount the root again. > > Maybe this is something that > > has been fixed between 101a and ~103, otherwise your previous claim of "... and > > the usb stick becomes > > zfs bootable on whatever usb port on whatever system the stick is connected > > to." doesn't hold. Just a FYI. > > Hmm, I don't remember that this part of the code did change recently. > > http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_disk.c#vdev_disk_open Looking at the code, it would have to be ldi_open_by_devid failing when looking up the root. Didn't grub use a separate implementation of devid lookup? > Btw. did it always panic with the error 19, ENODEV? Yes, the panic was always error 19. The path reported was the phys_path stored in the label. (and the phys_path wasn't the same as the the path to the port to which the usb stick was inserted into) > I'm not sure when it was changed, but the zfs code did check that the > hostid stored in the zpool label matches the one of the system that is > booting. That was bad in case you imported the zfs root on some other > system, because it changed the hostid in the zpool's label. > > I think this test was relaxed in build 100? > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6716241 I'm quite sure we can count out both "hostid" and "disk" affecting the root mounting - I remounted the usb stick with booting from the disk and both were changed to reflect the disk-os instance (the controller number in which the usb stick is located is different on disk boots and usb boots. Also the hostid is not the same). After that I rebooted the usb stick without any problems and the values in the label changed to reflect the usb stick's os instance. (expected behaviour)
(In reply to comment #20) > Sorry, maybe I was a little bit unclear. I didn't mean it fails because the > vmware phys_path is still remembered after booting successfully from the usb > stick. I mean it fails because the ".../storage@2/..." phys_path updated to the > usb stick is not valid for booting in the same hardware but in another usb > slot. The physical path listed for that other slot is ".../storage@3/..." > (which, ofcourse, was never updated to the usb stick label). Moving the stick > back from the storage@3 slot to the storage@2 slot allowed the kernel to mount > the root again. Ahh, ok, now I understand. When you've booted the OS from the usb flash memory stick, what is the kernel command line passed to the loaded kernel? You should find it in the kernel string variable "saved_cmdline". Like this: # mdb -k Loading modules: [ unix genunix specfs dtrace cpu.generic uppc pcplusmp scsi_vhci zfs sd fctl ip hook neti sctp arp usba uhci stmf md cpc random crypto nfs fcip nca logindmux ptm nsctl ufs sppp ipc ] > saved_cmdline/S saved_cmdline: saved_cmdline: /platform/i86pc/kernel//unix -B zfs-bootfs=rpool/53,bootpath=\"/pci@0,0/pci-ide@4,1/ide@0/cmdk@0,0:a\",diskdevid=\"id1,cmdk@AIBM-DTLA-307030=_________YKEYKV12622/a\" Note that the string after -B, starting with "zfs-bootfs..:", is the expanded version of $ZFS-BOOTFS, from a menu.lst kernel$ grub line like this: kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS bootpath and diskdevid is what grub has found in the zpool's label, the -B option is used to pass that information to the solaris kernel so that it can be used when mounting the root pool.
(In reply to comment #21) > # mdb -k > > saved_cmdline/S > saved_cmdline: > saved_cmdline: /platform/i86pc/kernel//unix -B > zfs-bootfs=rpool/53,bootpath=\"/pci@0,0/pci-ide@4,1/ide@0/cmdk@0,0:a\",diskdevid=\"id1,cmdk@AIBM-DTLA-307030=_________YKEYKV12622/a\" And with the usb flash memory stick, last booted on an ASUS M2N-SLI deluxe mainboard but now attached to and booted on an ASUS N4L-VM: # mdb -k ... > saved_cmdline/S saved_cmdline: saved_cmdline: /platform/i86pc/kernel//unix -B zfs-bootfs=rpool/56,bootpath=\"/pci@0,0/pci1043,8239@2,1/storage@a/disk@0,0:a\",diskdevid=\"id1,sd@f1870ec4c487120d00009c8e70006/a\" -k # zdb -l /dev/rdsk/c14t0d0s0 -------------------------------------------- LABEL 0 -------------------------------------------- version=10 name='rpool' state=0 txg=13690 pool_guid=12983198538186227724 hostid=504967997 hostname='opensolaris' top_guid=6645906143762113720 guid=6645906143762113720 vdev_tree type='disk' id=0 guid=6645906143762113720 path='/dev/dsk/c14t0d0s0' devid='id1,sd@f1870ec4c487120d00009c8e70006/a' phys_path='/pci@0,0/pci1043,8219@1d,7/storage@7/disk@0,0:a' whole_disk=0 metaslab_array=15 metaslab_shift=26 ashift=9 asize=8006402048 is_log=0 DTL=120 ...
(In reply to comment #21) > When you've booted the OS from the usb flash memory stick, what is the kernel > command line passed to the loaded kernel? You should find it in the kernel > string variable "saved_cmdline". Like this: I tried to get the boot fail - with no luck. That is, now it's working as advertised. I tried all the USB slots available, and the saved_cmdline was always what was to be expected: path to previous boot's phys_path and the correct devid. And no panic. I went through all what I had done after the last comment to this bug (I got the system booting, so I began to set it up for moving to a server room). Everything I did was plain application software package installs, except for removing swap from rpool and fixing a shutdown-related bug with devfsadm (see bug 5203). I tried re-enabling the swap, but it didn't cause the panic. Maybe devfsadm fixed something related to the boot? (is /dev or /devices even in boot_archive? bootadm manpage doesn't seem to list anything in FILES) I may have the time to go through the install process yet again with another usb stick during the weekend to verify.
(In reply to comment #18) > I tried booting the usb stick in another usb slot after I got it working - it > failed like with the VMWare-phys_path. (the real path in the another slot was > /pci@0,0/pci1043,819e@1d,7/storage@3/disk@0,0:a). Maybe this is something that > has been fixed between 101a and ~103, otherwise your previous claim of "... and > the usb stick becomes > zfs bootable on whatever usb port on whatever system the stick is connected > to." doesn't hold. Just a FYI. Yep, "zfs booting on whatever system the stick is connected to" is apparently wrong: I did a fresh install of OS 2008.11 rc1 to a second usb flash memory stick (+ a grub devid fix, + the scsa2usb workaround), and was able to reproduce the same "error 19" panics when trying to boot the stick for the first time on a new machine. Problem is that the ehci usb2.0 host controller driver doesn't get loaded at mountroot time, so that the kernel does not yet enumerate the devices available on usb. Since it does not yet know what usb storage devices are available, it can't open them by "devid". On my first SX:CE usb flash memory stick this appears to work, but only because I already did boot that stick on all of my test systems at least once [*]. The operating system has built a big cache of possible "devid" -> "physical path" mappings in /etc/devices/devid_cache on that stick. The attempt to mount the zfs root by "devid" tries all physical pathes found in that /etc/devices/devid_cache file, and there will be at least one entry in that cache for each of my test systems that forces an attach of the ehci device driver. With the second usb stick and a fresh install, the /etc/devices/devid_cache is more or less empty, so it won't help getting ehci attached when it try to boot with that stick on a new machine. [*] before zfs mount root via "devid" was added to zfs, I used special grub boot menu entries for each of my systems, using a command line option -B bootpath="...physical/device/path/for/usb/storage/for/this/machine...".
just adding symptoms for this bug from installer log (so google can find this bug ;) ): e.g. <ICT Nov 24 20:36:25> update GRUB boot menu on device /dev/rdsk/c5t0d0s0 <ICT Nov 24 20:36:38> bootadm_update_menu output: bootadm: biosdev command failed for disk: /dev/dsk/c5t0d0s0. <ICT Nov 24 20:36:38> bootadm_update_menu output: bootadm: is_bootdisk(): cannot determine BIOS disk ID 'hd?' for disk: /dev/dsk/c5t0d0s0 <ICT Nov 24 20:36:38> bootadm_update_menu output: bootadm: get_grubroot(): cannot get (hd?,?,?) for menu. menu not on bootdisk: /dev/rdsk/c5t0d0s0 <ICT Nov 24 20:36:38> current task:remove_bootpath it kinda indicates that you install on disk which is not having a device id, then grub will fail finding the root pool because of http://bugs.opensolaris.org/view_bug.do?bug_id=6770866 once you will try to boot the usb disk Is there an always working way to get the device id of usb sticks ? Do usb sticks have the device id (I mean, you can plug them in after BIOS initializes ;) ) ?
(In reply to comment #25) > just adding > symptoms for this bug from installer log (so google can find this bug ;) ): > e.g. > <ICT Nov 24 20:36:25> update GRUB boot menu on device /dev/rdsk/c5t0d0s0 > <ICT Nov 24 20:36:38> bootadm_update_menu output: bootadm: biosdev command > failed for disk: /dev/dsk/c5t0d0s0. > <ICT Nov 24 20:36:38> bootadm_update_menu output: bootadm: is_bootdisk(): > cannot determine BIOS disk ID 'hd?' for disk: /dev/dsk/c5t0d0s0 > <ICT Nov 24 20:36:38> bootadm_update_menu output: bootadm: get_grubroot(): > cannot get (hd?,?,?) for menu. menu not on bootdisk: /dev/rdsk/c5t0d0s0 > <ICT Nov 24 20:36:38> current task:remove_bootpath > > it kinda indicates that you install on disk which is not having a device id, I think the "BIOS disk ID" is different from the "devid". BIOS disk ID should be the ID you have to pass to a BIOS call to tell the BIOS which from which disk you're trying to read / write. I think it was used in the past, when menu.lst contained entries like "root (hd0,0,a)"; the "hd0" should be derived from the bios disk id. Current opensolaris grub will be setup to use the "findroot (pool_rpool,0,a)" command. > then grub will fail finding the root pool because of > http://bugs.opensolaris.org/view_bug.do?bug_id=6770866 once you will try to > boot the usb disk > > Is there an always working way to get the device id of usb sticks ? It seems that in almost all cases, usb flash memory sticks don't get a "devid" (that is, Solaris' sd(7D) driver doesn't generate it): Because they report that they have "removable media" and sd(7D) only generates the "devid" for fixed media devices. > Do usb sticks have the device id (I mean, you can plug them in after BIOS > initializes ;) ) ? In the cases I know, the BIOS (and GRUB) can't boot from an usb storage device if it isn't plugged in during BIOS POST.
Created an attachment (id=1028) [details] patch for grub to allow booting from a zpool without devid, see CR 6770866
Created an attachment (id=1029) [details] grub that allows booting from a zpool without devid, see CR 6770866 This is a grub that I modified to make the zpool "devid" optional for a grub boot from zfs. To use it, run the opensolaris gui installer and install to an usb flash memory stick. At the end of the installation, before clicking on the "reboot" button: - download this grub_zfs_devid.tar.gz attachment, unpack in /tmp - run zpool status to find out the unix device name that was used for the "rpool" zpool E.g. pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c4t0d0s0 ONLINE 0 0 0 - install the new grub with installgrub installgrub /tmp/stage1 /tmp/stage2 /dev/rdsk/c4t0d0s0 - now click on the "reboot" button
(In reply to comment #28) > Created an attachment (id=1029) [details] [details] > grub that allows booting from a zpool without devid, see CR 6770866 > > This is a grub that I modified to make the zpool "devid" optional > for a grub boot from zfs. If this grub patch is installed without the scsa2usb patch, the usb stick will be only able to boot from the same phys_path it was installed with, right? If that is the case, this won't help me, atleast. The controller order seems to be rearranged on my motherboard depending on if the boot device is configured to USB or CD or disk in BIOS. So install from CD will cause phys_path in the USB stick to be invalid for USB boot. The same probably applies to other ami bios motherboards.
(In reply to comment #29) > (In reply to comment #28) > > Created an attachment (id=1029) [details] [details] [details] > > grub that allows booting from a zpool without devid, see CR 6770866 > > > > This is a grub that I modified to make the zpool "devid" optional > > for a grub boot from zfs. > > If this grub patch is installed without the scsa2usb patch, the usb stick will > be only able to boot from the same phys_path it was installed with, right? Correct. If you would like to connect the usb flash stick to a different usb port and boot from it you need the scsa2usb patch, too (so that you get devids for the flash memory stick and at least in some cases this allows booting from it by opening the zfs root filesystem using the devid).
(In reply to comment #30) > (In reply to comment #29) > > (In reply to comment #28) > > > Created an attachment (id=1029) [details] [details] [details] [details] > > > grub that allows booting from a zpool without devid, see CR 6770866 > > > > > > This is a grub that I modified to make the zpool "devid" optional > > > for a grub boot from zfs. > > > > If this grub patch is installed without the scsa2usb patch, the usb stick will > > be only able to boot from the same phys_path it was installed with, right? > > Correct. > > If you would like to connect the usb flash stick to a different usb port and > boot from it you need the scsa2usb patch, too (so that you get devids for the > flash memory stick and at least in some cases this allows booting from it by > opening the zfs root filesystem using the devid). I think another workaround for this is to " # zfs import -f rpool ; zfs export rpool " in the new usb slot and it will work again, but I also would like to have this without this tedious fixing of zfs labels and phys paths ... (zdb -l /dev/dsk/xxx can show the phys, you get xxx from iostat -En or rmformat ) working for ALL cases ... hmm ? I am able to start the installed bits from an usb stick after having the new stage2 installgrub-ed and imported rpool from installer once I switch usb slots, but this is really tedious. Anyways, I'd like this to work without this phys path hacking . It seems like a *regression* for some reason, because 2008/05 worked from usb stick afaik.
In order to get the attention of the Solaris engineers, I've opened a bug in the Sun Bugster database, and am closing this one as TRACKEDINBUGSTER. If the Bugster bug is now in a public category/sub-category, the URL of the new bug at bugs.opensolaris.org will be: http://bugs.opensolaris.org/view_bug.do?bug_id=6819531 (This will happen the next time the new Bugster bug information is pushed to bugs.opensolaris.org). I've added the bug submitter to the Bugster interest list. If you'd like to be removed, just let me know (by adding a comment to this bug). If anybody else would like to be added to the cc: of the Bugster bug, again just let me know (by adding a comment to this bug). Note that there is currently no easy way for external-to-Sun people to update the Bugster bug. That's been worked on. In the meantime, if a Bugster evaluator asks you for further information and you are external-to-Sun, then I suggest emailing that person directly. Thanks.