Bugzilla – Bug 8883
AI seems to be writing a bad partition table to xVM disks
Last modified: 2009-05-17 15:58:33 UTC
You need to log in before you can comment on or make changes to this bug.
Created an attachment (id=1932) [details] symptomatic zpool create hang We were seeing that all AI installs of PV 2009.06 111b guests were hanging early on when creating the zpool on the disk assigned to the guest. The symptoms inside the guest looked like the attachment "hung-zpool-create.txt" To reproduce, I created a 16gb file, using # mkfile -n 16g /xvmpool/ai_pv.raw then started a virt-install # virt-install -n ai_pv -r 1024 -p -f /xvmpool/ai_pv.raw -b nge1 -l http://x.y.z.a:5555/var/tmp/AI/targets/x86_svc --autocf install_service=x86_svc After the guest had hung, I quit the install, and destroyed the guest: # virsh destroy ai_pv # virsh undefine ai_pv then created a new guest, this time booting to single-user mode: # virt-install -n ai_pv -r 1024 -p -f /xvmpool/ai_pv.raw -b nge1 -l http://x.y.z.a:5555/var/tmp/AI/targets/x86_svc --autocf install_service=x86_svc --extra-args -s From there, I examined the partition table on the disk, and verified we couldn't newfs the disk: root@opensolaris:~# prtvtoc /dev/rdsk/c7t0d0p0 * /dev/rdsk/c7t0d0p0 partition map * * Dimensions: * 512 bytes/sector * 63 sectors/track * 255 tracks/cylinder * 16065 sectors/cylinder * 2087 cylinders * 2087 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * Unallocated space: * First Sector Last * Sector Count Sector * 0 16065 16064 * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 0 00 16065 33527655 33543719 2 5 01 0 33554432 33554431 8 1 01 0 16065 16064 root@opensolaris:~# newfs /dev/dsk/c7t0d0s0 newfs: construct a new file system /dev/rdsk/c7t0d0s0: (y/n)? y read error on sector 33527654: I/O error Destroying the guest, then trying again with the single-user mode boot, this time with a new disk: # virsh destroy ai_pv # virsh undefine ai_pv # rm /xvmpool/ai_pv.raw # mkfile -n 16g /xvmpool/ai_pv.raw # virt-install -n ai_pv -r 1024 -p -f /xvmpool/ai_pv.raw -b nge1 -l http://x.y.z.a:5555/var/tmp/AI/targets/x86_svc --autocf install_service=x86_svc --extra-args -s we saw the following disk layout, and verified that we were able to newfs the disk: root@opensolaris:~# prtvtoc /dev/dsk/c7t0d0p0 * /dev/dsk/c7t0d0p0 partition map * * Dimensions: * 512 bytes/sector * 63 sectors/track * 255 tracks/cylinder * 16065 sectors/cylinder * 2088 cylinders * 2088 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 0 00 0 33554432 33554431 2 5 01 0 33554432 33554431 8 1 01 0 16065 16064 root@opensolaris:~# newfs /dev/dsk/c7t0d0s0 newfs: construct a new file system /dev/rdsk/c7t0d0s0: (y/n)? y Warning: 4096 sector(s) in last cylinder unallocated /dev/rdsk/c7t0d0s0: 33554432 sectors in 5462 cylinders of 48 tracks, 128 sectors 16384.0MB in 342 cyl groups (16 c/g, 48.00MB/g, 5824 i/g) super-block backups (for fsck -F ufs -o b=#) at: 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920, Initializing cylinder groups: ...... super-block backups for last 10 cylinder groups at: 32638496, 32736928, 32835360, 32933792, 33032224, 33130656, 33229088, 33327520, 33425952, 33524384 I'll attach the install_log and auto-installer SMF log files, the former has the following, which might be relevant: <TDDM_E May 13 07:06:45> ddm_drive_get_ctype():Can't get DM_CONTROLLER assoc. w/ DM_DRIVE, err=0 <OM May 13 07:06:55> System reports enough physical memory for installation, swap is optional <AI May 13 07:06:55> Checking any disks for minimum recommended size of 12646 MB<AI May 13 07:06:55> Disk c7t0d0 size listed as 16384 MB <AI May 13 07:06:55> Default disk selected is c7t0d0 <AI May 13 07:06:55> Cannot find the partitions for disk c7t0d0 on the target system <OM May 13 07:06:55> No disk partitions defined prior to install <AI May 13 07:06:55> Disk name selected for installation is c7t0d0
Created an attachment (id=1933) [details] The AI install_log for a hanging AI install
Created an attachment (id=1934) [details] The AI application-auto-installer:default.log SMF log
Does virt-install create the partition table and an initial vtoc label on the given disk file? Does not using the -n during the mkfile make a difference? In your second test (with the fresh disk file), if you do a zpool create instead of newfs, what is the result. (marking incomplete to get this data) I will also be trying to reproduce this here.
A couple of updates... This bug appears to happen regardless of the whether the underlying guest disk is a file or a zvol. This bug appears to have been introduced in 111a. I wasn't able to reproduce this with 111.
Adding this to the blocker list while we continue triaging it.
(In reply to comment #3) > Does virt-install create the partition table and an initial vtoc label on > the given disk file? No, I think we leave that to the installer. > Does not using the -n during the mkfile make a difference? Nope, we get the same behaviour with complete files as sparse files. > In your second test (with the fresh disk file), if you do a zpool > create instead of newfs, what is the result. The zpool create succeeds as expected, though I do need a -f flag (which AI uses too however) root@opensolaris:~# zpool create rpool c7t0d0s0 invalid vdev specification use '-f' to override the following errors: /dev/dsk/c7t0d0s0 overlaps with /dev/dsk/c7t0d0s2 root@opensolaris:~# zpool create -f rpool c7t0d0s0 root@opensolaris:~# zpool status -v pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c7t0d0s0 ONLINE 0 0 0 errors: No known data errors
(In reply to comment #6) > (In reply to comment #3) > > Does virt-install create the partition table and an initial vtoc label on > > the given disk file? > > No, I think we leave that to the installer. Reason why I ask this is because in this test case, you've provided a fresh disk file, and booted -s, so nothing has touched the disk yet, but prtvtoc seems to have returned a valid table (maybe it defaults to some initial table perhaps.) > > > In your second test (with the fresh disk file), if you do a zpool > > create instead of newfs, what is the result. > > The zpool create succeeds as expected, though I do need a -f flag (which AI > uses too however) So far, this appears to be a zpool issue (though maybe in combination with something the installer is doing since with an untouched partition table, the zpool creation succeeds.) We're getting some zfs folks involved to help diagnose this ...
For this given disk size scenario, the installer does seem to be writing a different vtoc label in 111a as compared to 111. In 111, the fdisk partition is of length 2087 cylinders, and the vtoc label written seems to be correct for that (note slice 2 in the vtoc table): Cylinders Partition Status Type Start End Length % ========= ====== ============ ===== === ====== === 1 Active Solaris2 1 2087 2087 100 Current partition table (original): Total disk cylinders available: 2087 + 0 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 root wm 132 - 2086 14.98GB (1955/0/0) 31407075 1 swap wu 1 - 131 1.00GB (131/0/0) 2104515 2 backup wu 0 - 2086 15.99GB (2087/0/0) 33527655 3 unassigned wm 0 0 (0/0/0) 0 4 unassigned wm 0 0 (0/0/0) 0 5 unassigned wm 0 0 (0/0/0) 0 6 unassigned wm 0 0 (0/0/0) 0 7 unassigned wm 0 0 (0/0/0) 0 8 boot wu 0 - 0 7.84MB (1/0/0) 16065 9 unassigned wm 0 0 (0/0/0) 0 In 111a and 111b, the fdisk partition is still length 2087 cynlinders, but the vtoc label written appears to be of length 2088 (note slice 2): Cylinders Partition Status Type Start End Length % ========= ====== ============ ===== === ====== === 1 Active Solaris2 1 2087 2087 100 Current partition table (original): Total disk cylinders available: 2087 + 0 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 unassigned wm 1 - 2087 15.99GB (2087/0/0) 33527655 1 unassigned wm 0 0 (0/0/0) 0 2 backup wu 0 - 2087 16.00GB (2088/170/2) 33554432 3 unassigned wm 0 0 (0/0/0) 0 4 unassigned wm 0 0 (0/0/0) 0 5 unassigned wm 0 0 (0/0/0) 0 6 unassigned wm 0 0 (0/0/0) 0 7 unassigned wm 0 0 (0/0/0) 0 8 boot wu 0 - 0 7.84MB (1/0/0) 16065 9 unassigned wm 0 0 (0/0/0) 0 I don't know if that's what causes zpool to hang, but George is looking at that right now. Either way, it does appear that something changed in the installer in 111a such thata different size is written in vtoc label.
> > I don't know if that's what causes zpool to hang, but George is looking > at that right now. Either way, it does appear that something changed in > the installer in 111a such thata different size is written in vtoc label. Just want to clarify that it might not be something in the installer that changed, but rather something in 111a changed such that a +1 sized vtoc label is written. This obviously doesn't happen on bare metal. The update from George is that it ... might be related as we've tried to read from the labels at the end of the disk. This has resulted in an EIO which then gets us to call into the driver to see if the drive was removed: if (zio->io_error == EIO) { vdev_disk_t *dvd = vd->vdev_tsd; int state = DKIO_NONE; if (ldi_ioctl(dvd->vd_lh, DKIOCSTATE, (intptr_t)&state, FKIOCTL, kcred, NULL) == 0 && state != DKIO_INSERTED) { vd->vdev_remove_wanted = B_TRUE; spa_async_request(zio->io_spa, SPA_ASYNC_REMOVE); } } This ioctl() is stuck in the driver: > 0xd4ee1dc0::findstack -v stack pointer for thread d4ee1dc0: d4ee1ad8 d4ee1b18 swtch+0x188() d4ee1b28 cv_wait+0x53(d2ec2de8, d2ec2d6c, 0, 0) d4ee1b68 cv_wait_sig+0x260(d2ec2de8, d2ec2d6c, 0, 0) d4ee1ba8 xdf_dkstate+0x45(d2ec2c00, 0, 4, 80100000) d4ee1c58 xdf_ioctl+0x1e8() d4ee1c88 cdev_ioctl+0x31(3040000, 40d, d4ee1cec, 80100000, d2e3be78, 0) d4ee1cb8 ldi_ioctl+0xa2(d4da2030, 40d, d4ee1cec, 80000000, d2e3be78, 0) d4ee1cf8 vdev_disk_io_done+0x43(d4857778, d4857778) d4ee1d28 zio_vdev_io_done+0xe0(d4857778, d2e342a8, d4ee1d48, d4604430) d4ee1d48 zio_execute+0x81(d4857778, d4604430) d4ee1da8 taskq_thread+0x192(d2e34288, 0) d4ee1db8 thread_start+8() It seems like there's a bug in the Xen xdf_dkstate() function. Can the Xen guys take a look? So can someone from the xvm team take a look at this please.
We should at least take a look at the fix for bug 7758, as it's not obvious to me why we're creating swap in the old case and not in the new. Or was that a result of the workaround provided by the bug 7718 fix that was removed?
(In reply to comment #10) > We should at least take a look at the fix for bug 7758, as it's not obvious to > me why we're creating swap in the old case and not in the new. Or was that a > result of the workaround provided by the bug 7718 fix that was removed? 7718 itself was the workaround, that's why 111 has the swap slice created. This was removed in 111a. 7758 is indeed a totally separate bug, that was fixed in 111a so I'll take a look at that.
Has a Live CD PV install been attempted and if so, what was the result?
Yep, our test folks have done pv gui installs of nv_111a.
By nv_111a, I assume you mean OpenSolaris and not Nevada, right? For completeness, we should also verify the same for OpenSolaris snv_111b.
We looked at some recent pushes in the caiman gate and bug 7758 looked like it modified code which could have caused the slice sizes in the vtoc label to have changed for bld 111a. So we built an AI image with this backed out, and the problem went away. So at this point we are treating this as a regression introduced by 7758, and we're looking at a fix in this area.
(In reply to comment #7) > (In reply to comment #6) > > (In reply to comment #3) > > > Does virt-install create the partition table and an initial vtoc label on > > > the given disk file? > > > > No, I think we leave that to the installer. > > Reason why I ask this is because in this test case, you've provided > a fresh disk file, and booted -s, so nothing has touched the disk > yet, but prtvtoc seems to have returned a valid table (maybe it > defaults to some initial table perhaps.) This turns out to be a contributing factor to the bug in the installer. Before the disk is even touched, why does prtvtoc report there to be a label there, especially something for slice 0? This appears to be what confuses the installer. On bare metal, prtvtoc does not report anything when given a blank disk, so this behavior appears to be specific in xvm guests. Is this expected? a bug? What's happening is that the installer is checking for a vtoc label up front, and its getting exactly what prtvtoc is showing above, in the second test case where the disk hadn't been touched yet -- slice 0 being 2088 cylinders, or 33554432 sectors. root@opensolaris:~# prtvtoc /dev/dsk/c7t0d0p0 * /dev/dsk/c7t0d0p0 partition map * * Dimensions: * 512 bytes/sector * 63 sectors/track * 255 tracks/cylinder * 16065 sectors/cylinder * 2088 cylinders * 2088 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 0 00 0 33554432 33554431 2 5 01 0 33554432 33554431 8 1 01 0 16065 16064 Then when it comes time to start installing and writing to the disk, the installer has to create an solaris2 fdisk partition, and it creates that with "fdisk -n -B cXtXdXp0", which creates a solaris2 partition on the whole disk, but avoids cylinder 0. This means the solaris2 partition is only 2087 cylinders long: Cylinders Partition Status Type Start End Length % ========= ====== ============ ===== === ====== === 1 Active Solaris2 1 2087 2087 100 Next it writes the vtoc label, but since it thinks there was an existing slice 0 (from that upfront query for a vtoc label) it uses that slice sizing information to write the label, and obviously 2088 doesn't fit in the 2087 partition. This is the part that's the bug in the installer. If it had to create the fdisk partition, it should use better logic on whether it should trust whatever vtoc information it had gathered upfront. But still, it seems like there really shouldn't be any vtoc information returned when the disk is blank.
I think the default label comes from xdf_clmb_attach()->clmb_attach() by specifying CMLB_FAKE_LABEL_ONE_PARTITION. http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/xen/io/xdf.c#590 I note it differs in PV vs. HVM domains, but this doesn't strike me as bad: brand new lofi devices also come with a default partition table, for example.
Tim, is this issue only seen with PV guests? Perhaps that's been stated somewhere but I didn't see it called out explicitly. Also, I thought that PV guests with AI had other necessary changes. Is that not the case? From my perspective, what's important to fix right now is any code which prevents the booting of the AI image, the subsequent installation of the guest and anything which prevents the network from work after the fact (in order to download sustaining fixes). Whether the issue is the installer or the vitual disk driver or something else, I'd like to understand if there are remaining *client-side* issues that prevent PV guests from performing these steps.
(In reply to comment #18) > Tim, is this issue only seen with PV guests? Perhaps that's been stated > somewhere but I didn't see it called out explicitly. I've only seen the problem with PV guests. In theory it's possible other platforms presenting similar looking disks to the installer could hit this. It might be good to verify that VirtualBox and VMware guests don't hit this problem. [ HVM guests don't, for example ] > Also, I thought that PV guests with AI had other necessary changes. Is that > not the case? Yes, there are virt-install changes pending that allow users to initiate PV guest AI installs, which haven't putback yet (part of the xVM 3.3 work). It is possible to start an AI install of a PV guest even from an nv_111 dom0 without these changes, it's just a bit more complex. However the contents of the AI image itself are where the problem lies: so in the future with our changes available, in, say an nv_116 dom0, we wouldn't be able to install a PV guest from a stock 2009.06_111b AI image. Another AI image would need to be produced at a later date for PV guest installs to work. > From my perspective, what's important to fix right now is any code which > prevents the booting of the AI image, the subsequent installation of the guest > and anything which prevents the network from work after the fact (in order to > download sustaining fixes). Whether the issue is the installer or the vitual > disk driver or something else, I'd like to understand if there are remaining > *client-side* issues that prevent PV guests from performing these steps. So yes, I think this fix would qualify for the above: without the fix in the AI image (booted from the AI server, running on the client), we never get to install.
Thanks Tim. Are we aware of any additional changes that live on the image that prevent PV boot/install from taking place then?
No problem, I'm not aware of any yet. It'd be nice to test the fix in a real AI image to be sure.
We have a potential fix in the installer for this. Basically, for cases where the AI installer finds no fdisk partition, it shouldn't try to read a vtoc label off the disk.
Fixed in changesets slim_source: cfb2652634d609b0ea7502d0cb997e47a67015a9 slim_0906: f36b9d6820761ae7c0cf2d782285e9d839b519cc