Bug 2133 - Could not install - "fdisk(1M) -F" fails in some scenarios
: Could not install - "fdisk(1M) -F" fails in some scenarios
Status: CLOSED TRACKEDINBUGSTER
Product: installer
library
: unspecified
: i86pc/i386 OpenSolaris
: P3 normal (vote)
: in-preview
Assigned To: Jan Damborsky
:
:
: BugsterCR=6711786
:
:
: 4205
  Show dependency treegraph
 
Reported: 2008-06-03 11:53 UTC by Gregg Sporar
Modified: 2008-10-30 19:42 UTC (History)
5 users (show)

See Also:


Attachments
screen snapshot (87.27 KB, image/png)
2008-06-03 11:54 UTC, Gregg Sporar
no flags Details
screen snapshot (89.09 KB, image/png)
2008-06-03 11:54 UTC, Gregg Sporar
no flags Details
screen snapshot (43.96 KB, image/png)
2008-06-03 11:55 UTC, Gregg Sporar
no flags Details
Output from fdisk, changed 191 to 130, then attempted to feed it back into fdisk (1.17 KB, text/plain)
2008-06-04 12:46 UTC, Gregg Sporar
no flags Details
Shell script to create a 160GB hdd image, to reproduce fdisk -F failure under qemu or virtualbox (1.17 KB, text/plain)
2008-06-23 06:07 UTC, Jürgen Keil
no flags Details
Shell script to create a 200GB hdd image, to reproduce fdisk -F failure under qemu or virtualbox (1.17 KB, text/plain)
2008-06-23 06:49 UTC, Jürgen Keil
no flags Details


Note

You need to log in before you can comment on or make changes to this bug.


Description Gregg Sporar 2008-06-03 11:53:36 UTC
Sony VAIO laptop with four partitions:

1. FAT32 (diags from Sony)
2. NTFS (Windows Vista)
3. Linux Swap 
4. Extended (contains three logical drives, for Ubuntu 7.10)

I created partition 3 specifically for installing OpenSolaris 2008.05.  I
booted the LiveCD and then started the installer.  It correctly recognized the
Linux swap partition and allowed me to select it as the target for the
installation.  See attachment: 1.png.

The installation failed, however.  See attachment: 3.png.

The installation log file contained this text:

<OM Jun  2 10:25:48> Timezone setting will be TZ=UTC
<OM Jun  2 10:25:48> Set timezone 
<OM Jun  2 10:28:34> Timezone setting will be TZ=UTC
<OM Jun  2 10:28:34> Set timezone 
<OM Jun  2 10:29:56> disk partition info changed
<OM Jun  2 15:30:38> Timezone setting will be TZ=US/Central
<OM Jun  2 15:30:38> Set timezone 
<OM Jun  2 15:31:45> Set user root in password and shadow file
<OM Jun  2 15:31:45> list_ufs_db:: The entry 'gs145266' was not found in the
/etc/passwd table
<OM Jun  2 15:31:45> Set user gs145266 in password and shadow file
<OM Jun  2 15:31:45> Renaming table /etc/inet/AAAzHaiGb to /etc/inet/hosts
<OM Jun  2 15:31:45> Disk was changed
<OM Jun  2 15:31:45> Disk contains valid Solaris partition
<OM Jun  2 15:31:45> whole_disk = 0
<OM Jun  2 15:31:45> diskname set = c5d0
<OM Jun  2 15:31:45> Set fdisk attrs
<TIDM_E Jun  2 15:31:45> fdisk: fdisk -n -F failed. Couldn't create fdisk
partition table on disk c5d0
<TIMM_E Jun  2 15:31:45> Couldn't create fdisk partition table on disk <c5d0>
<OM Jun  2 15:31:45> Could not create fdisk target
<OM Jun  2 15:31:45> TI process completed unsuccessfully 
<OM Jun  2 15:31:45> ti_create_target exited with status = -1
<OM Jun  2 15:31:45> Target instantiation failed exit_val=-1


It looks like something went wrong when invoking 

  fdisk -n -F

Workaround: Run format -e and then start fdisk.  I then used fdisk to change
the partition type for partition 3 from "Solaris" to "Solaris2."  After
restarting the installer, it recognized the partition as a Solaris partition
(see attachment: 2.png) and wrote all files for the installation correctly. 
FWIW, the install log contents were:

<OM Jun  2 11:16:45> Timezone setting will be TZ=UTC
<OM Jun  2 11:16:45> Set timezone 
<OM Jun  2 11:17:03> disk partition info not changed
<OM Jun  2 16:17:26> Timezone setting will be TZ=US/Central
<OM Jun  2 16:17:26> Set timezone 
<OM Jun  2 16:18:19> Set user root in password and shadow file
<OM Jun  2 16:18:19> list_ufs_db:: The entry 'gs145266' was not found in the
/etc/passwd table
<OM Jun  2 16:18:19> Set user gs145266 in password and shadow file
<OM Jun  2 16:18:19> Renaming table /etc/inet/AAAlAaGKb to /etc/inet/hosts
<OM Jun  2 16:18:19> Disk was changed
<OM Jun  2 16:18:19> Disk contains valid Solaris partition
<OM Jun  2 16:18:19> whole_disk = 0
<OM Jun  2 16:18:19> diskname set = c6d0
<OM Jun  2 16:18:19> Set fdisk attrs
<OM Jun  2 16:18:22> Set zfs root pool device
<OM Jun  2 16:18:22> creating zpool
<OM Jun  2 16:18:25> TI process completed 
<OM Jun  2 16:18:25> TI process completed successfully 
<OM Jun  2 16:18:25> ti_create_target exited with status = 0
<OM Jun  2 16:18:25> TI procesing completed. Beginning transfer service 
<TRANSFERMOD Jun  2 16:18:26> -- Starting transfer process, Mon, 02 Jun 2008
11:18:26 +0000 --
<TRANSFERMOD Jun  2 16:18:26> Building cpio file lists
<TRANSFERMOD Jun  2 16:18:26> Scanning //.
<TRANSFERMOD Jun  2 16:18:28> Scanning //usr
<TRANSFERMOD Jun  2 16:18:55> Scanning //opt
<TRANSFERMOD Jun  2 16:18:55> Scanning //dev
<TRANSFERMOD Jun  2 16:18:55> Scanning /mnt/misc/.
<TRANSFERMOD Jun  2 16:18:56> Scanning /.cdrom/.
<TRANSFERMOD Jun  2 16:18:56> Beginning cpio actions
<TRANSFERMOD Jun  2 16:33:58> Creating zero-length files
<TRANSFERMOD Jun  2 16:33:58> Extracting archive
<TRANSFERMOD Jun  2 16:34:00> Performing file operations
<TRANSFERMOD Jun  2 16:34:00> Fetching and updating keyboard layout
<TRANSFERMOD Jun  2 16:34:01> Detected US-English keyboard layout
<TRANSFERMOD Jun  2 16:34:01> -- Completed transfer process, Mon, 02 Jun 2008
11:34:01 +0000 --
<OM_E Jun  2 16:34:08> Nwam is not enabled
<OM Jun  2 16:34:08> Could not enable nwam
<OM Jun  2 16:34:08> Setting up zfs legacy mount in /etc/vfstab
<OM Jun  2 16:34:09> Setting up swap mount in /etc/vfstab
<OM Jun  2 16:34:09> /bin/sed -e 's/^PATH/export &/' /jack/.profile
>/a/export/home/gs145266/.bashrc
<OM Jun  2 16:34:09> setup_hostid() to path32 ->/a/kernel/misc/sysinit<-
<OM Jun  2 16:34:09> setup_hostid() to path64 ->/a/kernel/misc/amd64/sysinit<-
<OM Jun  2 16:34:11> /usr/sbin/zpool set bootfs=rpool/ROOT/opensolaris rpool
<OM Jun  2 16:34:12> Running installgrub to set MBR
<OM Jun  2 16:34:12> /usr/sbin/installgrub /a/boot/grub/stage1
/a/boot/grub/stage2 /dev/rdsk/c6d0s0
<OM Jun  2 16:34:12> /bin/sed -e '/^jack/d' /etc/passwd >/a/etc/passwd
<OM Jun  2 16:34:12> /bin/sed -e '/^jack/d' /etc/shadow >/a/etc/shadow
<OM Jun  2 16:34:12> /bin/sed -e 's/^jack/gs145266/' /etc/user_attr
>/a/etc/user_attr
<OM Jun  2 16:34:12> /bin/cp /etc/inet/hosts /a/etc/inet/hosts
<OM Jun  2 16:34:12> Unmounting BE
<OM Jun  2 16:34:12> /usr/sbin/zfs unmount rpool/export/home
<OM Jun  2 16:34:12> /usr/sbin/zfs set mountpoint=/export/home
rpool/export/home
<OM Jun  2 16:34:13> /usr/sbin/zfs unmount rpool/export
<OM Jun  2 16:34:13> /usr/sbin/zfs set mountpoint=/export rpool/export
<OM Jun  2 16:34:16> /sbin/mount -F zfs rpool/ROOT/opensolaris /a
<OM Jun  2 16:34:16> Running install-finish script
<OM Jun  2 16:34:16> /sbin/install-finish /a initial_install
Comment 1 Gregg Sporar 2008-06-03 11:54:28 UTC
Created an attachment (id=315) [details]
screen snapshot
Comment 2 Gregg Sporar 2008-06-03 11:54:49 UTC
Created an attachment (id=316) [details]
screen snapshot
Comment 3 Gregg Sporar 2008-06-03 11:55:05 UTC
Created an attachment (id=317) [details]
screen snapshot
Comment 4 Jan Damborsky 2008-06-04 02:42:05 UTC
As a starting point of investigation, I have created following partition
configuration using Ubuntu (Linux fdisk & mkswap commands):

[1] FAT32
[2] NTFS (active)
[3] Linux swap
[4] Extended partition

Then I launched OpenSolaris 2008.05 installer and selected Linux swap partition
on Disk screen for installation (type changed to Solaris, other entries left
unmodified).

Installation went fine and I was able to boot newly installed Solaris instance.
So it seems for now that installing on Linux swap partition works in general,
but fdisk(1M) fails in some cases refusing to update partition table.

As far as installer is concerned, it utilizes "fdisk -F" form of Solaris
fdisk(1M) command - two times during installation process:

(1) For creating/modifying partition configuration according to the user input
provided on "Disk screen". If no changes are done by user, this step is
actually skipped and this is the reason, why reporter was able to continue
with installation, when he manually changed partition type from Linux swap
to Solaris. I assume that fdisk then failed again when trying to mark Solaris
partition active at the end.

(2) For marking Solaris2 partition active after installation finishes

Reporter was encountering "fdisk -F" command invoked by installer failing -
but when running fdisk(1M) in interactive mode, it worked (for example when
reporter marked Solaris partition active by hand).

Gregg, in order to obtain more information what is happening, could I please
ask you to run the installer in verbose mode when you use your original
partition configuration (with Linux swap) ?

In order to turn on debug verbosity, after you boot LiveCD, please invoke
the installer from terminal in following way:

$ export LS_DBG_LVL=4
$ pfexec /usr/bin/gui-install

When it fails, please attach output of /tmp/install_log file.

Thank you
Comment 5 Gregg Sporar 2008-06-04 07:08:56 UTC
Okay, I can do that.  Sounds like I will need to blow away my current install. 
Or is there an easy way I can backup it before wiping it out?  Thanks.
Comment 6 Jan Damborsky 2008-06-04 07:23:14 UTC
Hi Gregg,

if you did only small customization, I would say that the easiest way
would be to reinstall.

However, there might be possibility that you could preserve your installation.

If you just changed the Solaris partition type to Linux swap (provided that
this is enough to trigger the problem) and the installer failed in the same
way as you originally reported, content of the partition would remain
untouched.

Then you could change partition type back to Solaris and I guess everything
might
work. But to be honest, I am not 100% sure.
Comment 7 Gregg Sporar 2008-06-04 09:41:03 UTC
Hi Jan - 

>If you just changed the Solaris partition type to Linux swap

Do you know if there is any way to do this with the fdisk that is included with
OpenSolaris?  It has an option to change a partition type from Solaris (which
is the name it uses for Linux swap type partitions) to Solaris2, but I do not
see an option for going the other way....

Thanks,
Gregg
Comment 8 Jan Damborsky 2008-06-04 10:08:01 UTC
It is possible to do it with fdisk(1M) delivered with OpenSolaris when used in
non-interactive mode.

Actually, since this mode fails when used by installer, I would be interested
if it will work when invoked from command line - by trying this we might obtain
additional data for investigation:

[1] Save current partition configuration into the file
$ pfexec fdisk -W /tmp/fd <disk_name>p0

[2] Edit /tmp/fd manually and only change Solaris (0xbf) type to Linux swap
(0x82)
    for partition in question

[3] Save new configuration
$ pfexec fdisk -F /tmp/fd <disk_name>p0

If fdisk(1M) fails, it would mean we find the problem and it is not necessary
to
run the installer, since in that case, fdisk(1M) would be the culprit.

If this works, then let's give the installer a try :-)
Comment 9 Gregg Sporar 2008-06-04 12:44:48 UTC
It appears the problem is with fdisk(1M).

The command line output from the experiment to change the type of partition 3
from Solaris2 back to Linux Swap looked like this:

jack@opensolaris:~$ pfexec fdisk -W /tmp/fd c6d0p0
jack@opensolaris:~$ pfexec gedit /tmp/fd
jack@opensolaris:~$ pfexec fdisk -F /tmp/fd c6d0p0
fdisk: Partition table exceeds the size of the disk.
fdisk: Error on entry "  15    0    254    63     1023    254    63     1023   
293507612 97214356".

And apparently because of that error, fdisk(1M) error-exited and did *not*
change the partition type of partition #3 to Linux swap (0x82), which fdisk(1M)
refers to as SUNIXOS.  Note: it appears fdisk(1M) uses base 10 numerals for
this, not hexadecimal - so the value I specified in /tmp/fd was 130.

I will attach the /tmp/fd file.  

Somehow, my original workaround of running fdisk(1M) interactively (via format
-e) avoids this problem because when invoked interactively, fdisk(1M) does not
seem to have any problems changing the type of a partition or with marking a
partition Active.  But when run with -F from the command line, it does not like
the entry for my final partition, which is an extended partition, and so it
exits without making any changes.

This might also explain http://defect.opensolaris.org/bz/show_bug.cgi?id=2134 ?
Comment 10 Gregg Sporar 2008-06-04 12:46:11 UTC
Created an attachment (id=319) [details]
Output from fdisk, changed 191 to 130, then attempted to feed it back into
fdisk
Comment 11 Jan Damborsky 2008-06-05 03:46:58 UTC
It seems that fdisk(1M) thinks that the last primary partition exceeds disk
capacity. When run with "-F", fdisk checks if every partition fits in
disk. Looking at the code how fdisk(1M) calculates disk size available,
it uses two approaches:

[a] First, it tries to obtain size of disk in sectors using DKIOCGMEDIAINFO
    ioctl(2) - see dkio(7I) for more details.

    disk_size_a = dk_minfo.dki_capacity

[b] If [a] fails, size of the disk in sectors is calculated from
    CHS(cylinder/header/sector) geometry:

    disk_size_b = C*H*S

[b] is smaller in most of the cases than [a] due to the rounding.

Number [b] is in this particular case
disk_size_b = 24321*255*63 = 390716865

And the last sector of 4th primary partition is
last_sector = 293507612 + 97214356 = 390721968

So the last sector of 4th primary partition exceed the last cylinder available.
However, this shouldn't cause the troubles if fdisk(1M) is able to obtain [a]
and [a] is higher or equal than last sector of 4th partition.

I think there might be several scenarios, why fdisk(1M) might fail:

[1] 4th primary partition exceeds physical disk size
* it could be verified by taking a look at the disk size reported by some
  Linux  tool and compare the numbers.

[2] Solaris device driver reports less capacity than actually available
* it could be verified by taking a look at what "iostat -En" and some
  non-Solaris tool report as far as disk size is concerned.

[3] DKIOCGMEDIAINFO ioctl command fails for that disk, disk size calculated
    from CHS is smaller and thus last partition seems to exceed the disk size.

Gregg, could you please take a look, what "iostat -En" and some non-Solaris
tool report as far as disk size (reported either in bytes or in sectors)
is concerned ? 

Thank you
Comment 12 Jan Damborsky 2008-06-05 03:50:03 UTC
(In reply to comment #9)
> 
> This might also explain http://defect.opensolaris.org/bz/show_bug.cgi?id=2134 ?

Agreed - so far it seems bug 2134 might have the same root cause.
Comment 13 Gregg Sporar 2008-06-05 15:00:18 UTC
The output from iostat -En is:

gs145266@opensolaris-08.05-gs:~$ iostat -En
c5d0             Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Model: Hitachi HTS7220 Revision:  Serial No: 070719DP0400DTG Size: 200.05GB
<200047067136 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 
c4t0d0           Soft Errors: 0 Hard Errors: 6 Transport Errors: 0 
Vendor: MATSHITA Product: DVD-RAM UJ-852S  Revision: 1.01 Serial No:  
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 6 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 

I ran four additional tools, one under Windows Vista the other three under
Ubuntu.  I will post one additional comment for each tool.

HTH - Gregg
Comment 14 Gregg Sporar 2008-06-05 15:01:11 UTC
from Paragon Partition Manager (a third-party tool that runs on Windows Vista):





Basic Hard Disk 0 (Hitachi HTS722020K9SA00)





Type: Basic Hard Disk Drive

Total size: 186.3 GB

Sectors per track: 63

Heads: 255

Cylinders: 24321







And then for that final partition, it reports:



                           Sector No:   Cyl:  Hd:  Sec:

First physical sector:    293,507,612  18270    0    63

Last physical sector:     390,721,967  24321   80    63
Comment 15 Gregg Sporar 2008-06-05 15:01:59 UTC
from fdisk on Ubuntu 7.10:

Command (m for help): p

Disk /dev/sda: 200.0 GB, 200049647616 bytes
255 heads, 63 sectors/track, 24321 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x02f35ceb

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1         889     7137280   27  Unknown
Partition 1 does not end on cylinder boundary.
/dev/sda2             889        8973    64937318+   7  HPFS/NTFS
/dev/sda3   *        8974       18271    74678183+  bf  Solaris
/dev/sda4           18271       24322    48607178    f  W95 Ext'd (LBA)
/dev/sda5           18271       18394      995998+  83  Linux
/dev/sda6           18395       18882     3919828+  82  Linux swap / Solaris
/dev/sda7           18883       24322    43691287+  83  Linux
Comment 16 Gregg Sporar 2008-06-05 15:02:49 UTC
from sfdisk on Ubuntu 7.10: 

gs145266@gs145266-laptop-ubu:~$ sudo sfdisk -ls
/dev/sda: 195360984

Disk /dev/sda: 24321 cylinders, 255 heads, 63 sectors/track
Warning: extended partition does not start at a cylinder boundary.
DOS and Linux will interpret the contents differently.
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sda1          0+    888-    889-   7137280   27  Unknown
/dev/sda2        888+   8972    8085-  64937318+   7  HPFS/NTFS
/dev/sda3   *   8973   18270-   9298-  74678183+  bf  Solaris
/dev/sda4      18270+  24321-   6052-  48607178    f  W95 Ext'd (LBA)
/dev/sda5      18270+  18393     124-    995998+  83  Linux
/dev/sda6      18394+  18881     488-   3919828+  82  Linux swap / Solaris
/dev/sda7      18882+  24321-   5440-  43691287+  83  Linux
/dev/sdb:    978944

Disk /dev/sdb: 1018 cylinders, 31 heads, 62 sectors/track
Warning: The partition table looks like it was made
  for C/H/S=*/65/32 (instead of 1018/31/62).
For this listing I'll assume that geometry.
Units = cylinders of 1064960 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sdb1   *      0+    941-    942-    978928    6  FAT16
                end: (c,h,s) expected (941,18,32) found (956,64,32)
/dev/sdb2          0       -       0          0    0  Empty
/dev/sdb3          0       -       0          0    0  Empty
/dev/sdb4          0       -       0          0    0  Empty
total: 196339928 blocks


gs145266@gs145266-laptop-ubu:~$ sudo sfdisk -G
Warning: extended partition does not start at a cylinder boundary.
DOS and Linux will interpret the contents differently.
/dev/sda: 24321 cylinders, 255 heads, 63 sectors/track
/dev/sdb: 941 cylinders, 65 heads, 32 sectors/track
gs145266@gs145266-laptop-ubu:~$ sudo sfdisk -g
/dev/sda: 24321 cylinders, 255 heads, 63 sectors/track
/dev/sdb: 1018 cylinders, 31 heads, 62 sectors/track
Comment 17 Gregg Sporar 2008-06-05 15:03:35 UTC
from GParted on Ubuntu 7.10 :

Model: ATA Hitachi HTS72202
Size: 186.31 GiB
Path: /dev/sda
DiskLabelType: msdos
Heads: 255
Sectors/Track: 63
Cylinders: 24321
Total Sectors: 390716865

And for the final partition, it reports:

Filesystem: extended
Size: 46.36 GiB
Flags: lba
Path: /dev/sda4
First Sector: 293507612
Last Sector: 390721967
Total Sectors: 97214356
Comment 18 Jan Damborsky 2008-06-06 07:50:55 UTC
Gregg, thanks for providing all that data.

Looking at the output of Ubuntu fdisk, it reports 200049647616 bytes for
/dev/sda,
which equals to 390721968 sectors.

However, according to what iostat(1M) provides, Solaris sees 200047067136
bytes,
which is 390716928 sectors.

Which means that fdisk(1M) thinks the disk is smaller and complains about last
partition exceeding the disk size.

It seems for now, that from installer point of view, we can't do too much in
order to solve this problem.

I think that potential solution might to make fdisk(1M) more tolerant of what
other systems created and let it only check modifed/created partitions and skip
unmodified ones as far as that kind of sanity checking is concerned.

Might you agree that the appropriate approach would be to file bug against
fdisk(1M) in order to address this problem ?
Comment 19 Gregg Sporar 2008-06-06 08:00:26 UTC
>Might you agree that the appropriate approach would be to file bug against fdisk(1M) in order to address this problem ?

Yes!  :-)

Just tell me what you want me to write and what bug system you want me to
report it in and I'll be happy to do that.

Thanks for all your help - Gregg
Comment 20 Jan Damborsky 2008-06-06 08:29:09 UTC
(In reply to comment #19)
> 
> Just tell me what you want me to write and what bug system you want me to
> report it in and I'll be happy to do that.

Please report this issue using following bug report tool (you would probably
need to create account before you could file it):

http://bugs.opensolaris.org/

Category/Subcategory: utility/fdisk
Release: snv_90
Hardware: x86

Please add the observations about the fdisk(1M) behavior when used with -F
option - it refuses to modify partition table, if last partition exceeds
the disk size Solaris can see, even if other systems sees more space.

I think that fdisk(1M) should be more tolerant in cases when partition
was previously created by other system and is left unmodified - it seems
that interactive mode already works in this way, since you were able to
do all changes manually.
Please feel free to add any information you think might help to better
clarify the problem.

After you report the bug, could you please close this one and also bug 2134
as "trackedinbugster" and add "BugsterCR=<bug_number>" text to the Whiteboard ?

Thank you !

> 
> Thanks for all your help - Gregg

I thank you for the cooperation :-)
Jan
Comment 21 Jan Damborsky 2008-06-07 02:02:51 UTC
*** Bug 2134 has been marked as a duplicate of this bug. ***
Comment 22 Jürgen Keil 2008-06-23 05:52:40 UTC
(In reply to comment #18)

> Looking at the output of Ubuntu fdisk, it reports 200049647616 bytes for
> /dev/sda, which equals to 390721968 sectors.
> 
> However, according to what iostat(1M) provides, Solaris sees 200047067136
> bytes, which is 390716928 sectors.

Yep, that's because Solaris' ata driver:

- is looking at the default CHS translation values found in the disk's ATA
   IDENTIFY data and detects 

    ai_heads 0x10
    ai_sectors 0x3f
    ai_fixcyls 0x3fff

  (16 heads, 63 sectors / track)

- using the above 16 heads and 63 sectors per track, computes 
  390721968/(63*16) == 387621 cylinders

- calls ata_fix_large_disk_geometry(); this doubles the "heads" until the
  computed cylinders fits into a 16-bit unsigned short.  We end up with a 
  geometry of 128 heads, 63 sectors and 48452 cylinders

    390721968/(63*128) == 48452 cylinders

  (Note: this is different from that cmlb is using; cmlb uses 255 heads
   and 63 sectors / track)

- now, when we compute the number of sectors that we can access using that
  geometry, we get 48452 cyl * 63 sectors * 128 heads = 390716928 sectors


- When cmlb tries to find out the disk's capacity, ata returns the
  product of 48452 cyl * 63 sectos * 128 heads = 390716928 sectors as
  the capacity, *not* the disk's real capacity.

usr/src/uts/intel/io/dktp/controller/ata/ata_disk.c line 987 clips
a few sectors at the end of the drive:

   957    static int
   958    ata_disk_ioctl(opaque_t ctl_data, int cmd, intptr_t arg, int flag)
   959    {
   ...
   980        case DIOCTL_GETGEOM:
   981        case DIOCTL_GETPHYGEOM:
   982            tgdk.g_cyl = ata_drvp->ad_drvrcyl;
   983            tgdk.g_head = ata_drvp->ad_drvrhd;
   984            tgdk.g_sec = ata_drvp->ad_drvrsec;
   985            tgdk.g_acyl = ata_drvp->ad_acyl;
   986            tgdk.g_secsiz = 512;
   987            tgdk.g_cap = tgdk.g_cyl * tgdk.g_head * tgdk.g_sec;
   988            if (ddi_copyout(&tgdk, (caddr_t)arg, sizeof (tgdk), flag
))
   989                return (EFAULT);
   990            return (0);


Question is: why does line 987 in
usr/src/uts/intel/io/dktp/controller/ata/ata_disk.c
compute the drive's capacity as (virtual) cylinders * heads * sectors?
Why doesn't it pass the ata_drvp->ad_capacity value?


It seems that if we change line 987 to

    tgdk.g_cap = ata_drvp->ad_capacity

the problem disappears.


A kmdb patch that implements such a fix for the OpenSolaris 2008.05 CD is:

Boot the OpenSolaris 2008.05 snv_86 kernel with -kd, and in kmdb use this:

  ::bp ata`_init
  :c
  ata_disk_ioctl+105?w b70f 8683 2 8900 2444 c710
  ata_disk_ioctl+118?w 838b 29c 0
  :c


Using this patch I've been able to workaround the "Partition table exceeds
the size of the disk" problem;  but that was with a 160GB drive,
from this thread  http://www.opensolaris.org/jive/thread.jspa?threadID=62970
Comment 23 Jürgen Keil 2008-06-23 06:07:21 UTC
Created an attachment (id=340) [details]
Shell script to create a 160GB hdd image, to reproduce fdisk -F failure under
qemu or virtualbox


This script creates a 160GB raw qemu hdd disk image, and includes a valid
fdisk partition table.

The fdisk partition information is from this thread:
    http://www.opensolaris.org/jive/thread.jspa?threadID=62970


Use qemu's qemu-img utility to convert it from raw format into virtualbox
vmdk format: qemu-img convert -f raw -O vmdk hdd160gb.img hdd160gb.vmdk


Using this HDD image under qemu or virtualbox can be used to reproduce 
bug 2133.
Comment 24 Jürgen Keil 2008-06-23 06:49:26 UTC
Created an attachment (id=341) [details]
Shell script to create a 200GB hdd image, to reproduce fdisk -F failure under
qemu or virtualbox

This script creates a 200GB raw qemu HDD image with the partition information 
from this bug (bug 2133).

I'm just testing my kmdb patch with this 200GB HDD image, and OpenSolaris
2008.05
now starts to install just fine.
Comment 25 Rob_C 2008-10-24 09:55:16 UTC
Note: The "SXCE_snv99 Installer" has difficulty dealing with installation
of SXCE in one partition (while maintaining (avoiding loss of) WinXP drives 
in other partitions) and that repairing this bug is (at least) part of the
solution to 4205 .

http://defect.opensolaris.org/bz/show_bug.cgi?id=4205
Comment 26 Jan Damborsky 2009-06-10 23:31:04 UTC
*** Bug 9382 has been marked as a duplicate of this bug. ***