Bugzilla – Bug 2133
Could not install - "fdisk(1M) -F" fails in some scenarios
Last modified: 2008-10-30 19:42:03 UTC
You need to log in before you can comment on or make changes to this bug.
Sony VAIO laptop with four partitions: 1. FAT32 (diags from Sony) 2. NTFS (Windows Vista) 3. Linux Swap 4. Extended (contains three logical drives, for Ubuntu 7.10) I created partition 3 specifically for installing OpenSolaris 2008.05. I booted the LiveCD and then started the installer. It correctly recognized the Linux swap partition and allowed me to select it as the target for the installation. See attachment: 1.png. The installation failed, however. See attachment: 3.png. The installation log file contained this text: <OM Jun 2 10:25:48> Timezone setting will be TZ=UTC <OM Jun 2 10:25:48> Set timezone <OM Jun 2 10:28:34> Timezone setting will be TZ=UTC <OM Jun 2 10:28:34> Set timezone <OM Jun 2 10:29:56> disk partition info changed <OM Jun 2 15:30:38> Timezone setting will be TZ=US/Central <OM Jun 2 15:30:38> Set timezone <OM Jun 2 15:31:45> Set user root in password and shadow file <OM Jun 2 15:31:45> list_ufs_db:: The entry 'gs145266' was not found in the /etc/passwd table <OM Jun 2 15:31:45> Set user gs145266 in password and shadow file <OM Jun 2 15:31:45> Renaming table /etc/inet/AAAzHaiGb to /etc/inet/hosts <OM Jun 2 15:31:45> Disk was changed <OM Jun 2 15:31:45> Disk contains valid Solaris partition <OM Jun 2 15:31:45> whole_disk = 0 <OM Jun 2 15:31:45> diskname set = c5d0 <OM Jun 2 15:31:45> Set fdisk attrs <TIDM_E Jun 2 15:31:45> fdisk: fdisk -n -F failed. Couldn't create fdisk partition table on disk c5d0 <TIMM_E Jun 2 15:31:45> Couldn't create fdisk partition table on disk <c5d0> <OM Jun 2 15:31:45> Could not create fdisk target <OM Jun 2 15:31:45> TI process completed unsuccessfully <OM Jun 2 15:31:45> ti_create_target exited with status = -1 <OM Jun 2 15:31:45> Target instantiation failed exit_val=-1 It looks like something went wrong when invoking fdisk -n -F Workaround: Run format -e and then start fdisk. I then used fdisk to change the partition type for partition 3 from "Solaris" to "Solaris2." After restarting the installer, it recognized the partition as a Solaris partition (see attachment: 2.png) and wrote all files for the installation correctly. FWIW, the install log contents were: <OM Jun 2 11:16:45> Timezone setting will be TZ=UTC <OM Jun 2 11:16:45> Set timezone <OM Jun 2 11:17:03> disk partition info not changed <OM Jun 2 16:17:26> Timezone setting will be TZ=US/Central <OM Jun 2 16:17:26> Set timezone <OM Jun 2 16:18:19> Set user root in password and shadow file <OM Jun 2 16:18:19> list_ufs_db:: The entry 'gs145266' was not found in the /etc/passwd table <OM Jun 2 16:18:19> Set user gs145266 in password and shadow file <OM Jun 2 16:18:19> Renaming table /etc/inet/AAAlAaGKb to /etc/inet/hosts <OM Jun 2 16:18:19> Disk was changed <OM Jun 2 16:18:19> Disk contains valid Solaris partition <OM Jun 2 16:18:19> whole_disk = 0 <OM Jun 2 16:18:19> diskname set = c6d0 <OM Jun 2 16:18:19> Set fdisk attrs <OM Jun 2 16:18:22> Set zfs root pool device <OM Jun 2 16:18:22> creating zpool <OM Jun 2 16:18:25> TI process completed <OM Jun 2 16:18:25> TI process completed successfully <OM Jun 2 16:18:25> ti_create_target exited with status = 0 <OM Jun 2 16:18:25> TI procesing completed. Beginning transfer service <TRANSFERMOD Jun 2 16:18:26> -- Starting transfer process, Mon, 02 Jun 2008 11:18:26 +0000 -- <TRANSFERMOD Jun 2 16:18:26> Building cpio file lists <TRANSFERMOD Jun 2 16:18:26> Scanning //. <TRANSFERMOD Jun 2 16:18:28> Scanning //usr <TRANSFERMOD Jun 2 16:18:55> Scanning //opt <TRANSFERMOD Jun 2 16:18:55> Scanning //dev <TRANSFERMOD Jun 2 16:18:55> Scanning /mnt/misc/. <TRANSFERMOD Jun 2 16:18:56> Scanning /.cdrom/. <TRANSFERMOD Jun 2 16:18:56> Beginning cpio actions <TRANSFERMOD Jun 2 16:33:58> Creating zero-length files <TRANSFERMOD Jun 2 16:33:58> Extracting archive <TRANSFERMOD Jun 2 16:34:00> Performing file operations <TRANSFERMOD Jun 2 16:34:00> Fetching and updating keyboard layout <TRANSFERMOD Jun 2 16:34:01> Detected US-English keyboard layout <TRANSFERMOD Jun 2 16:34:01> -- Completed transfer process, Mon, 02 Jun 2008 11:34:01 +0000 -- <OM_E Jun 2 16:34:08> Nwam is not enabled <OM Jun 2 16:34:08> Could not enable nwam <OM Jun 2 16:34:08> Setting up zfs legacy mount in /etc/vfstab <OM Jun 2 16:34:09> Setting up swap mount in /etc/vfstab <OM Jun 2 16:34:09> /bin/sed -e 's/^PATH/export &/' /jack/.profile >/a/export/home/gs145266/.bashrc <OM Jun 2 16:34:09> setup_hostid() to path32 ->/a/kernel/misc/sysinit<- <OM Jun 2 16:34:09> setup_hostid() to path64 ->/a/kernel/misc/amd64/sysinit<- <OM Jun 2 16:34:11> /usr/sbin/zpool set bootfs=rpool/ROOT/opensolaris rpool <OM Jun 2 16:34:12> Running installgrub to set MBR <OM Jun 2 16:34:12> /usr/sbin/installgrub /a/boot/grub/stage1 /a/boot/grub/stage2 /dev/rdsk/c6d0s0 <OM Jun 2 16:34:12> /bin/sed -e '/^jack/d' /etc/passwd >/a/etc/passwd <OM Jun 2 16:34:12> /bin/sed -e '/^jack/d' /etc/shadow >/a/etc/shadow <OM Jun 2 16:34:12> /bin/sed -e 's/^jack/gs145266/' /etc/user_attr >/a/etc/user_attr <OM Jun 2 16:34:12> /bin/cp /etc/inet/hosts /a/etc/inet/hosts <OM Jun 2 16:34:12> Unmounting BE <OM Jun 2 16:34:12> /usr/sbin/zfs unmount rpool/export/home <OM Jun 2 16:34:12> /usr/sbin/zfs set mountpoint=/export/home rpool/export/home <OM Jun 2 16:34:13> /usr/sbin/zfs unmount rpool/export <OM Jun 2 16:34:13> /usr/sbin/zfs set mountpoint=/export rpool/export <OM Jun 2 16:34:16> /sbin/mount -F zfs rpool/ROOT/opensolaris /a <OM Jun 2 16:34:16> Running install-finish script <OM Jun 2 16:34:16> /sbin/install-finish /a initial_install
Created an attachment (id=315) [details] screen snapshot
Created an attachment (id=316) [details] screen snapshot
Created an attachment (id=317) [details] screen snapshot
As a starting point of investigation, I have created following partition configuration using Ubuntu (Linux fdisk & mkswap commands): [1] FAT32 [2] NTFS (active) [3] Linux swap [4] Extended partition Then I launched OpenSolaris 2008.05 installer and selected Linux swap partition on Disk screen for installation (type changed to Solaris, other entries left unmodified). Installation went fine and I was able to boot newly installed Solaris instance. So it seems for now that installing on Linux swap partition works in general, but fdisk(1M) fails in some cases refusing to update partition table. As far as installer is concerned, it utilizes "fdisk -F" form of Solaris fdisk(1M) command - two times during installation process: (1) For creating/modifying partition configuration according to the user input provided on "Disk screen". If no changes are done by user, this step is actually skipped and this is the reason, why reporter was able to continue with installation, when he manually changed partition type from Linux swap to Solaris. I assume that fdisk then failed again when trying to mark Solaris partition active at the end. (2) For marking Solaris2 partition active after installation finishes Reporter was encountering "fdisk -F" command invoked by installer failing - but when running fdisk(1M) in interactive mode, it worked (for example when reporter marked Solaris partition active by hand). Gregg, in order to obtain more information what is happening, could I please ask you to run the installer in verbose mode when you use your original partition configuration (with Linux swap) ? In order to turn on debug verbosity, after you boot LiveCD, please invoke the installer from terminal in following way: $ export LS_DBG_LVL=4 $ pfexec /usr/bin/gui-install When it fails, please attach output of /tmp/install_log file. Thank you
Okay, I can do that. Sounds like I will need to blow away my current install. Or is there an easy way I can backup it before wiping it out? Thanks.
Hi Gregg, if you did only small customization, I would say that the easiest way would be to reinstall. However, there might be possibility that you could preserve your installation. If you just changed the Solaris partition type to Linux swap (provided that this is enough to trigger the problem) and the installer failed in the same way as you originally reported, content of the partition would remain untouched. Then you could change partition type back to Solaris and I guess everything might work. But to be honest, I am not 100% sure.
Hi Jan - >If you just changed the Solaris partition type to Linux swap Do you know if there is any way to do this with the fdisk that is included with OpenSolaris? It has an option to change a partition type from Solaris (which is the name it uses for Linux swap type partitions) to Solaris2, but I do not see an option for going the other way.... Thanks, Gregg
It is possible to do it with fdisk(1M) delivered with OpenSolaris when used in non-interactive mode. Actually, since this mode fails when used by installer, I would be interested if it will work when invoked from command line - by trying this we might obtain additional data for investigation: [1] Save current partition configuration into the file $ pfexec fdisk -W /tmp/fd <disk_name>p0 [2] Edit /tmp/fd manually and only change Solaris (0xbf) type to Linux swap (0x82) for partition in question [3] Save new configuration $ pfexec fdisk -F /tmp/fd <disk_name>p0 If fdisk(1M) fails, it would mean we find the problem and it is not necessary to run the installer, since in that case, fdisk(1M) would be the culprit. If this works, then let's give the installer a try :-)
It appears the problem is with fdisk(1M). The command line output from the experiment to change the type of partition 3 from Solaris2 back to Linux Swap looked like this: jack@opensolaris:~$ pfexec fdisk -W /tmp/fd c6d0p0 jack@opensolaris:~$ pfexec gedit /tmp/fd jack@opensolaris:~$ pfexec fdisk -F /tmp/fd c6d0p0 fdisk: Partition table exceeds the size of the disk. fdisk: Error on entry " 15 0 254 63 1023 254 63 1023 293507612 97214356". And apparently because of that error, fdisk(1M) error-exited and did *not* change the partition type of partition #3 to Linux swap (0x82), which fdisk(1M) refers to as SUNIXOS. Note: it appears fdisk(1M) uses base 10 numerals for this, not hexadecimal - so the value I specified in /tmp/fd was 130. I will attach the /tmp/fd file. Somehow, my original workaround of running fdisk(1M) interactively (via format -e) avoids this problem because when invoked interactively, fdisk(1M) does not seem to have any problems changing the type of a partition or with marking a partition Active. But when run with -F from the command line, it does not like the entry for my final partition, which is an extended partition, and so it exits without making any changes. This might also explain http://defect.opensolaris.org/bz/show_bug.cgi?id=2134 ?
Created an attachment (id=319) [details] Output from fdisk, changed 191 to 130, then attempted to feed it back into fdisk
It seems that fdisk(1M) thinks that the last primary partition exceeds disk capacity. When run with "-F", fdisk checks if every partition fits in disk. Looking at the code how fdisk(1M) calculates disk size available, it uses two approaches: [a] First, it tries to obtain size of disk in sectors using DKIOCGMEDIAINFO ioctl(2) - see dkio(7I) for more details. disk_size_a = dk_minfo.dki_capacity [b] If [a] fails, size of the disk in sectors is calculated from CHS(cylinder/header/sector) geometry: disk_size_b = C*H*S [b] is smaller in most of the cases than [a] due to the rounding. Number [b] is in this particular case disk_size_b = 24321*255*63 = 390716865 And the last sector of 4th primary partition is last_sector = 293507612 + 97214356 = 390721968 So the last sector of 4th primary partition exceed the last cylinder available. However, this shouldn't cause the troubles if fdisk(1M) is able to obtain [a] and [a] is higher or equal than last sector of 4th partition. I think there might be several scenarios, why fdisk(1M) might fail: [1] 4th primary partition exceeds physical disk size * it could be verified by taking a look at the disk size reported by some Linux tool and compare the numbers. [2] Solaris device driver reports less capacity than actually available * it could be verified by taking a look at what "iostat -En" and some non-Solaris tool report as far as disk size is concerned. [3] DKIOCGMEDIAINFO ioctl command fails for that disk, disk size calculated from CHS is smaller and thus last partition seems to exceed the disk size. Gregg, could you please take a look, what "iostat -En" and some non-Solaris tool report as far as disk size (reported either in bytes or in sectors) is concerned ? Thank you
(In reply to comment #9) > > This might also explain http://defect.opensolaris.org/bz/show_bug.cgi?id=2134 ? Agreed - so far it seems bug 2134 might have the same root cause.
The output from iostat -En is: gs145266@opensolaris-08.05-gs:~$ iostat -En c5d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: Hitachi HTS7220 Revision: Serial No: 070719DP0400DTG Size: 200.05GB <200047067136 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 c4t0d0 Soft Errors: 0 Hard Errors: 6 Transport Errors: 0 Vendor: MATSHITA Product: DVD-RAM UJ-852S Revision: 1.01 Serial No: Size: 0.00GB <0 bytes> Media Error: 0 Device Not Ready: 6 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 I ran four additional tools, one under Windows Vista the other three under Ubuntu. I will post one additional comment for each tool. HTH - Gregg
from Paragon Partition Manager (a third-party tool that runs on Windows Vista): Basic Hard Disk 0 (Hitachi HTS722020K9SA00) Type: Basic Hard Disk Drive Total size: 186.3 GB Sectors per track: 63 Heads: 255 Cylinders: 24321 And then for that final partition, it reports: Sector No: Cyl: Hd: Sec: First physical sector: 293,507,612 18270 0 63 Last physical sector: 390,721,967 24321 80 63
from fdisk on Ubuntu 7.10: Command (m for help): p Disk /dev/sda: 200.0 GB, 200049647616 bytes 255 heads, 63 sectors/track, 24321 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x02f35ceb Device Boot Start End Blocks Id System /dev/sda1 1 889 7137280 27 Unknown Partition 1 does not end on cylinder boundary. /dev/sda2 889 8973 64937318+ 7 HPFS/NTFS /dev/sda3 * 8974 18271 74678183+ bf Solaris /dev/sda4 18271 24322 48607178 f W95 Ext'd (LBA) /dev/sda5 18271 18394 995998+ 83 Linux /dev/sda6 18395 18882 3919828+ 82 Linux swap / Solaris /dev/sda7 18883 24322 43691287+ 83 Linux
from sfdisk on Ubuntu 7.10: gs145266@gs145266-laptop-ubu:~$ sudo sfdisk -ls /dev/sda: 195360984 Disk /dev/sda: 24321 cylinders, 255 heads, 63 sectors/track Warning: extended partition does not start at a cylinder boundary. DOS and Linux will interpret the contents differently. Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0 Device Boot Start End #cyls #blocks Id System /dev/sda1 0+ 888- 889- 7137280 27 Unknown /dev/sda2 888+ 8972 8085- 64937318+ 7 HPFS/NTFS /dev/sda3 * 8973 18270- 9298- 74678183+ bf Solaris /dev/sda4 18270+ 24321- 6052- 48607178 f W95 Ext'd (LBA) /dev/sda5 18270+ 18393 124- 995998+ 83 Linux /dev/sda6 18394+ 18881 488- 3919828+ 82 Linux swap / Solaris /dev/sda7 18882+ 24321- 5440- 43691287+ 83 Linux /dev/sdb: 978944 Disk /dev/sdb: 1018 cylinders, 31 heads, 62 sectors/track Warning: The partition table looks like it was made for C/H/S=*/65/32 (instead of 1018/31/62). For this listing I'll assume that geometry. Units = cylinders of 1064960 bytes, blocks of 1024 bytes, counting from 0 Device Boot Start End #cyls #blocks Id System /dev/sdb1 * 0+ 941- 942- 978928 6 FAT16 end: (c,h,s) expected (941,18,32) found (956,64,32) /dev/sdb2 0 - 0 0 0 Empty /dev/sdb3 0 - 0 0 0 Empty /dev/sdb4 0 - 0 0 0 Empty total: 196339928 blocks gs145266@gs145266-laptop-ubu:~$ sudo sfdisk -G Warning: extended partition does not start at a cylinder boundary. DOS and Linux will interpret the contents differently. /dev/sda: 24321 cylinders, 255 heads, 63 sectors/track /dev/sdb: 941 cylinders, 65 heads, 32 sectors/track gs145266@gs145266-laptop-ubu:~$ sudo sfdisk -g /dev/sda: 24321 cylinders, 255 heads, 63 sectors/track /dev/sdb: 1018 cylinders, 31 heads, 62 sectors/track
from GParted on Ubuntu 7.10 : Model: ATA Hitachi HTS72202 Size: 186.31 GiB Path: /dev/sda DiskLabelType: msdos Heads: 255 Sectors/Track: 63 Cylinders: 24321 Total Sectors: 390716865 And for the final partition, it reports: Filesystem: extended Size: 46.36 GiB Flags: lba Path: /dev/sda4 First Sector: 293507612 Last Sector: 390721967 Total Sectors: 97214356
Gregg, thanks for providing all that data. Looking at the output of Ubuntu fdisk, it reports 200049647616 bytes for /dev/sda, which equals to 390721968 sectors. However, according to what iostat(1M) provides, Solaris sees 200047067136 bytes, which is 390716928 sectors. Which means that fdisk(1M) thinks the disk is smaller and complains about last partition exceeding the disk size. It seems for now, that from installer point of view, we can't do too much in order to solve this problem. I think that potential solution might to make fdisk(1M) more tolerant of what other systems created and let it only check modifed/created partitions and skip unmodified ones as far as that kind of sanity checking is concerned. Might you agree that the appropriate approach would be to file bug against fdisk(1M) in order to address this problem ?
>Might you agree that the appropriate approach would be to file bug against fdisk(1M) in order to address this problem ? Yes! :-) Just tell me what you want me to write and what bug system you want me to report it in and I'll be happy to do that. Thanks for all your help - Gregg
(In reply to comment #19) > > Just tell me what you want me to write and what bug system you want me to > report it in and I'll be happy to do that. Please report this issue using following bug report tool (you would probably need to create account before you could file it): http://bugs.opensolaris.org/ Category/Subcategory: utility/fdisk Release: snv_90 Hardware: x86 Please add the observations about the fdisk(1M) behavior when used with -F option - it refuses to modify partition table, if last partition exceeds the disk size Solaris can see, even if other systems sees more space. I think that fdisk(1M) should be more tolerant in cases when partition was previously created by other system and is left unmodified - it seems that interactive mode already works in this way, since you were able to do all changes manually. Please feel free to add any information you think might help to better clarify the problem. After you report the bug, could you please close this one and also bug 2134 as "trackedinbugster" and add "BugsterCR=<bug_number>" text to the Whiteboard ? Thank you ! > > Thanks for all your help - Gregg I thank you for the cooperation :-) Jan
*** Bug 2134 has been marked as a duplicate of this bug. ***
(In reply to comment #18) > Looking at the output of Ubuntu fdisk, it reports 200049647616 bytes for > /dev/sda, which equals to 390721968 sectors. > > However, according to what iostat(1M) provides, Solaris sees 200047067136 > bytes, which is 390716928 sectors. Yep, that's because Solaris' ata driver: - is looking at the default CHS translation values found in the disk's ATA IDENTIFY data and detects ai_heads 0x10 ai_sectors 0x3f ai_fixcyls 0x3fff (16 heads, 63 sectors / track) - using the above 16 heads and 63 sectors per track, computes 390721968/(63*16) == 387621 cylinders - calls ata_fix_large_disk_geometry(); this doubles the "heads" until the computed cylinders fits into a 16-bit unsigned short. We end up with a geometry of 128 heads, 63 sectors and 48452 cylinders 390721968/(63*128) == 48452 cylinders (Note: this is different from that cmlb is using; cmlb uses 255 heads and 63 sectors / track) - now, when we compute the number of sectors that we can access using that geometry, we get 48452 cyl * 63 sectors * 128 heads = 390716928 sectors - When cmlb tries to find out the disk's capacity, ata returns the product of 48452 cyl * 63 sectos * 128 heads = 390716928 sectors as the capacity, *not* the disk's real capacity. usr/src/uts/intel/io/dktp/controller/ata/ata_disk.c line 987 clips a few sectors at the end of the drive: 957 static int 958 ata_disk_ioctl(opaque_t ctl_data, int cmd, intptr_t arg, int flag) 959 { ... 980 case DIOCTL_GETGEOM: 981 case DIOCTL_GETPHYGEOM: 982 tgdk.g_cyl = ata_drvp->ad_drvrcyl; 983 tgdk.g_head = ata_drvp->ad_drvrhd; 984 tgdk.g_sec = ata_drvp->ad_drvrsec; 985 tgdk.g_acyl = ata_drvp->ad_acyl; 986 tgdk.g_secsiz = 512; 987 tgdk.g_cap = tgdk.g_cyl * tgdk.g_head * tgdk.g_sec; 988 if (ddi_copyout(&tgdk, (caddr_t)arg, sizeof (tgdk), flag )) 989 return (EFAULT); 990 return (0); Question is: why does line 987 in usr/src/uts/intel/io/dktp/controller/ata/ata_disk.c compute the drive's capacity as (virtual) cylinders * heads * sectors? Why doesn't it pass the ata_drvp->ad_capacity value? It seems that if we change line 987 to tgdk.g_cap = ata_drvp->ad_capacity the problem disappears. A kmdb patch that implements such a fix for the OpenSolaris 2008.05 CD is: Boot the OpenSolaris 2008.05 snv_86 kernel with -kd, and in kmdb use this: ::bp ata`_init :c ata_disk_ioctl+105?w b70f 8683 2 8900 2444 c710 ata_disk_ioctl+118?w 838b 29c 0 :c Using this patch I've been able to workaround the "Partition table exceeds the size of the disk" problem; but that was with a 160GB drive, from this thread http://www.opensolaris.org/jive/thread.jspa?threadID=62970
Created an attachment (id=340) [details] Shell script to create a 160GB hdd image, to reproduce fdisk -F failure under qemu or virtualbox This script creates a 160GB raw qemu hdd disk image, and includes a valid fdisk partition table. The fdisk partition information is from this thread: http://www.opensolaris.org/jive/thread.jspa?threadID=62970 Use qemu's qemu-img utility to convert it from raw format into virtualbox vmdk format: qemu-img convert -f raw -O vmdk hdd160gb.img hdd160gb.vmdk Using this HDD image under qemu or virtualbox can be used to reproduce bug 2133.
Created an attachment (id=341) [details] Shell script to create a 200GB hdd image, to reproduce fdisk -F failure under qemu or virtualbox This script creates a 200GB raw qemu HDD image with the partition information from this bug (bug 2133). I'm just testing my kmdb patch with this 200GB HDD image, and OpenSolaris 2008.05 now starts to install just fine.
Note: The "SXCE_snv99 Installer" has difficulty dealing with installation of SXCE in one partition (while maintaining (avoiding loss of) WinXP drives in other partitions) and that repairing this bug is (at least) part of the solution to 4205 . http://defect.opensolaris.org/bz/show_bug.cgi?id=4205
*** Bug 9382 has been marked as a duplicate of this bug. ***