Bugzilla – Bug 1119
LZMA performance needs a boost
Last modified: 2009-03-12 10:46:05 UTC
You need to log in before you can comment on or make changes to this bug.
The time to startup desktop applications on RC0 has nearly doubled compared to DP2. On my Tecra M2 Firefox has increased from 38 sec to 1:03, Thunderbird from 38 sec to 1:04, and the installer has increased from 8 sec to 16. While the installer time is still acceptable, Firefox and Thunderbird at > 1 min are way too long.
Unfortunately, this is basically a result of using LZMA for compressing the usr file system. Need time to work on performance.
Indiana RC1 is horribly slow under VirtualBox compared to Preview 2 which was fast and could install in about 10 or 15 minutes. RC1 is so slow I'm guessing the time will be well over 1 hour. You can see from the VBox GUI that the problem is that the virtual CPU is CPU bound, as the disk activity icons only flicker lightly. By running: pfexec lockstat -kIW -s 20 sleep 20 | less while the install is running shows that we are spending a lot of our time in either Lzma_DecodeReal() - which I expect is the decompression. But then lots and lots of time in the HAT/kmem/vmem constantly allocating and mapping, then deallocating and demapping some sort of large kernel buffer. This is just broken and is the worst possible way to do things in a Virtual environment. At a minimum it should allocate buffers ahead of time and reuse them.
As gzip is the primary image for 2008.05 release, removing from blocker list.
Should we just turn this bug into a general "LZMA performance needs improvement" bug?
Makes sense. Changing summary accordingly.
*** Bug 5338 has been marked as a duplicate of this bug. ***
(In reply to comment #2) > Indiana RC1 is horribly slow under VirtualBox compared to Preview 2 > which was fast and could install in about 10 or 15 minutes. RC1 is so slow > I'm guessing the time will be well over 1 hour. You can see from the VBox > GUI that the problem is that the virtual CPU is CPU bound, as the disk > activity icons only flicker lightly. By running: > > pfexec lockstat -kIW -s 20 sleep 20 | less > > while the install is running shows that we are spending a lot of our time > in either Lzma_DecodeReal() - which I expect is the decompression. I've noticed the same, on 32-bit Pentium-M 1.3GHz, 2GB memory, installing (osol-0811-rc2-global.iso) to an 8 GB USB flash memory stick (on metal, no virtualbox): while the gui installer runs, system uses almost 100% kernel cpu time and interactive performance is *extremely bad* (something like 5 seconds until a key press echoes in a gnome-terminal). I also used a lockstat similar to the one above and found 70 - 80% of the kernel cpu time be to consumed by lzma decode. Total install time was almost 2 hours...
RC2 is only 65 minutes whereas it took ~ 2 hours for RC1 (on my system). Rob Someone (else) _might_ want to close this as "FIXED".
What I'm observing is that compressed lofi frequently gets a lot of 2K or 4K strategy read requests, all inside the same 128 KB compression segment. The same compressed data is decompressed over and over again, which should waste a lot of cpu cycles. I think that caching the contents of the last decompressed segment and reusing the cached decompressed data when we get another read request for that same segment might speed up installing from compressed lofi dramatically.
(In reply to comment #9) > I think that caching the contents of the last decompressed segment and reusing > the cached decompressed data when we get another read request for that same > segment might speed up installing from compressed lofi dramatically. Yep, a simple lofi change to cache and reuse the last decompressed segment speeds up reading from a gzip compressed solaris.zlib image by a factor of 5.5 ! Test case is: get the solaris.zlib from OS 2008.11 RC2, uncompress and compress it in gzip format. I measure the time we need to build a cpio archive from the hsfs mounted gzip'ed solaris.zlib image (this should be similar to what the gui-installer does during installation; omitting the time we need to extract the data to the install target hdd; and using gzip instead of lzma): SX:CE ~ build 104: # mount -F hsfs /files2/test/solaris.zlib /mnt # ptime sh -c "find /mnt | cpio -ocB > /dev/null" 4399150 blocks real 7:48.662871977 user 2.096814604 sys 38.577857142 And the same with a lofi module changed to cache and reuse the last decompressed data segment: # ptime sh -c "find /mnt | cpio -ocB > /dev/null" 4399150 blocks real 1:26.739143753 user 1.862050649 sys 32.694810881
(In reply to comment #9) For the original lzma encoded solaris.zlib image the time to create a cpio archive from this filesystem has improved from 20 min 35 sec to 2 min 49 sec, when caching and reusing the last decompressed data block.
(In reply to comment #0) > The time to startup desktop applications on RC0 has nearly doubled compared to > DP2. On my Tecra M2 Firefox has increased from 38 sec to 1:03, Thunderbird from > 38 sec to 1:04, and the installer has increased from 8 sec to 16. While the > installer time is still acceptable, Firefox and Thunderbird at > 1 min are way > too long. Using the OS 2008.11 RC2 global live-cd under virtualbox, without / with my lofi decompress cache change: startup time for firefox drops from 1:30 to 20 seconds.
Juergen, if you've got a patch, we'd love to experiment with it.
(In reply to comment #8) > RC2 is only 65 minutes whereas it took ~ 2 hours for RC1 (on my system). > Rob > > Someone (else) _might_ want to close this as "FIXED". I just installed RC2 under virtualbox, and installation time was 15 minutes... 8-) So I don't think this bug should be closed... Maybe it's priority should be raised, and a fix should be included for the 2008.11 release, but it's probably too late now (and the risk of breaking something is too high) ?
Created an attachment (id=1084) [details] Fix issue with compressed lofi images >= 4GB
Created an attachment (id=1085) [details] Fix kernel hang when compressed lofi image uses compressed segment size 0
(In reply to comment #15) > Created an attachment (id=1084) [details] [details] > Fix issue with compressed lofi images >= 4GB If you'd like to work on the fixes of these bugs filed as bugster CR 6778233 (seems like you would since you already have a suggested fix), I'd be happy to sponsor them for you.
Created an attachment (id=1086) [details] Cache and reuse decompressed data This is my experimental patch to cache and reuse uncompressed data segments. By patching the new kernel tunable "lofi_max_comp_cache" you can experiment with caching only the last decompressed segment (lofi_max_comp_cache = 1), disable caching decompressed data (lofi_max_comp_cache = 0), or caching two or more decompressed segments. Using lofi_max_comp_cache > 1 appears to be ~ 5-10% faster compared to lofi_max_comp_cache == 1.
Created an attachment (id=1087) [details] current opensolaris lofi module, with my three lofi patches, + lzma support Compiled lofi module, with the patches included in this bug, + lzma support from http://cr.opensolaris.org/~nadkarni/clofi-1/
Created an attachment (id=1088) [details] patch to increase hsfs hash table size; hsfs is slow with the opensolaris solaris.zlib fs hsfs uses a hash table with 32 entries; when used with opensolaris 2008.11 solaris.zlib (~ 100000 files), we have hash chains using 100000 / 32 = 3125 entries. Functions hs_findhash and hs_freenode start to consume a signigicant amount of kernel cpu time. Quick fix is to increase the size of the hash table from 32 entries to something reasonable.
(In reply to comment #7) > I've noticed the same, on 32-bit Pentium-M 1.3GHz, 2GB memory, installing > (osol-0811-rc2-global.iso) to an 8 GB USB flash memory stick (on metal, > no virtualbox): while the gui installer runs, system uses almost 100% kernel > cpu time and interactive performance is *extremely bad* (something like 5 > seconds > until a key press echoes in a gnome-terminal). I also used a lockstat > similar to the one above and found 70 - 80% of the kernel cpu time be to > consumed by lzma decode. I repeated this install with an RC2 image including my suggested fixes for lofi and hsfs, and the system now actually has idle time during installation, apparently I/O waiting for the CD / optical device... > Total install time was almost 2 hours... ... and the total install time has dropped to 30 minutes. On the same box, startup time for firefox in the live-cd environment has improved from 2 minutes to 20 seconds.
I ran some tests with a modified global image, using just the lofi module in attachment 1087 [details], hadn't rebuilt hsfs yet. virtual box (1024 MB memory) on Ultra 24: installation time improved from 38:22 to 8:13 Toshiba Tecra M5 (3072 MB memory, Core 2 duo): installation time improved from 44:36 to 33:04 Amusingly, in the M5 case the boot time went up by a minute or so, because the CD kept spinning down since we weren't keeping it hot. Will have to see if re-optimizing layout helps there, but might be other choices to make. We'll be running some more tests with other systems to get a better profile of the performance effect.
(In reply to comment #22) > I ran some tests with a modified global image, using just the lofi module in > attachment 1087 [details] [details], hadn't rebuilt hsfs yet. > > virtual box (1024 MB memory) on Ultra 24: > installation time improved from 38:22 to 8:13 Was it installed from a media in an optical drive, or did you use an ISO image on HDD? > Toshiba Tecra M5 (3072 MB memory, Core 2 duo): > installation time improved from 44:36 to 33:04 > > Amusingly, in the M5 case the boot time went up by a minute or so, because the > CD kept spinning down since we weren't keeping it hot. Will have to see if > re-optimizing layout helps there, but might be other choices to make. Did the optical drive spin down frequently during installation on the Toshiba Tecra M5, too? Could that be an explanation why the install time did not improve that much, compared to the virtualbox case? I just compared the time it needs to boot the livecd from the two keyboard layout / language prompts into the gnome desktop: 2min 20sec, on an AMD64 X2, 8GB memory. The lofi changes seem to have no influence in this test case? > We'll be running some more tests with other systems to get a better profile of > the performance effect.
Here are some more numbers based on ISO image that Dave created which has the updated lofi driver compared to global iso. System Global Iso dminer-lzma-12-01.iso Dell Latitude D531 34.5 min 11.5 Dell Optiflex 330 23 min 6.05 min V65x (32bit system) 58 min 18 min Compaq 6710b 33.5 min 34 min For Compaq6710b, the optical drive kept starting and stopping constantly for the image with the modified lofi driver. I am a little suspicious of the drive so I will retry that with an external drive.
(In reply to comment #24) > System Global Iso dminer-lzma-12-01.iso ... > Compaq 6710b 33.5 min 34 min > > For Compaq6710b, the optical drive kept starting and stopping constantly for > the image with the modified lofi driver. An idea would be to play with the optical device's IDLE and STANDBY timers. Apparently these can be controlled with SCSI MODE SELECT, mode page 0x1A. There already is a define in the kernel sources (MODEPAGE_POWER_COND, in usr/src/uts/common/sys/scsi/generic/mode.h) but it does not seem to be used anywhere in the Solaris kernel (with the exception of the sata framework, but that code contains a "not implemented" comment. Btw. I do remember that on some older systems, there used to be a BIOS setup page to control ata drive standby / idle timers. Maybe that Compaq 6710b or the Terca M5 has something like that?
Created an attachment (id=1104) [details] powercond utility to dump an optical drive's idle and standby timers To test my theory that Dave's Tecra M5 and Sanjay's Compaq 6710b enter idle or standby mode after a few seconds: what is reported by the powercond utility in this attachment? For a P-ATA SONY "DVD RW AW-G170A" optical drive I found: # ./powercond /dev/rdsk/c1t1d0p0 IDLE timer is enabled, 20.0 seconds STANDBY timer is enabled, 480.0 seconds
(In reply to comment #23) > (In reply to comment #22) > > I ran some tests with a modified global image, using just the lofi module in > > attachment 1087 [details] [details] [details], hadn't rebuilt hsfs yet. > > > > virtual box (1024 MB memory) on Ultra 24: > > installation time improved from 38:22 to 8:13 > > Was it installed from a media in an optical drive, > or did you use an ISO image on HDD? > This was just using an ISO image. > > > Toshiba Tecra M5 (3072 MB memory, Core 2 duo): > > installation time improved from 44:36 to 33:04 > > > > Amusingly, in the M5 case the boot time went up by a minute or so, because the > > CD kept spinning down since we weren't keeping it hot. Will have to see if > > re-optimizing layout helps there, but might be other choices to make. > > Did the optical drive spin down frequently during installation > on the Toshiba Tecra M5, too? Could that be an explanation > why the install time did not improve that much, compared to > the virtualbox case? > I didn't notice the drive spinning down frequently during installation, certainly not the way it was during boot. > > > I just compared the time it needs to boot the livecd from the > two keyboard layout / language prompts into the gnome desktop: > 2min 20sec, on an AMD64 X2, 8GB memory. The lofi changes > seem to have no influence in this test case? They should have some influence, though the construction process for the CD attempts to optimize the sort order of the solaris.zlib HSFS to limit seeking, so the lofi changes may be less helpful here. The powercond results from the M5: # ./powercond /dev/rdsk/c6t0d0p0 IDLE timer is enabled, 5.0 seconds STANDBY timer is enabled, 30.0 seconds Very short idle time, obviously. I don't remember any way to control this in the bios, though I'll recheck that.
(In reply to comment #27) > The powercond results from the M5: > # ./powercond /dev/rdsk/c6t0d0p0 > IDLE timer is enabled, 5.0 seconds > STANDBY timer is enabled, 30.0 seconds > > Very short idle time, obviously. I don't remember any way to control this in > the bios, though I'll recheck that. What happens when you boot the lofi modified livecd with option "-s", login in single user mode, manually configure the network interface, pull the "powercond" into the livecd environment via ftp, and use it with options "-i 20 -s 60" (to change the IDLE / STANDBY times to 20 / 60 seconds), and after that continue with the boot process into the gnome desktop? Would that still spin down the optical device during gnome startup?
(In reply to comment #28) > What happens when you boot the lofi modified livecd with > option "-s", login in single user mode, manually configure > the network interface, pull the "powercond" into the livecd > environment via ftp, and use it with options "-i 20 -s 60" > (to change the IDLE / STANDBY times to 20 / 60 seconds), > and after that continue with the boot process into the > gnome desktop? Would that still spin down the optical > device during gnome startup? Tried this. Still get some spinning up and down. I ran through the install, and did note some spinning up and down, especially later in the transfer process. Install timel was 30:07, so it seems to have helped a bit.
I added an hsfs with the hash table patch and re-ran, though without the powercond tweak, and it actually got 10% slower on the M5, though it's possible my probing was somewhat to blame. What appears to happen is that it cruises along very nicely until about the 65% mark, getting there in less than 10 minutes, and then starts acting pathologically badly, starting and stopping the CD drive. Watching the cpio process with "truss -t open", at that point it's working its way through /usr/share, and really struggling.
(In reply to comment #22) > Amusingly, in the M5 case the boot time went up by a minute or so, because the > CD kept spinning down since we weren't keeping it hot. Will have to see if > re-optimizing layout helps there, but might be other choices to make. On my Toshiba Tecra S1 (1.3 GHz Pentium-M, 1 core, 32-bit, 2GB memory) I see the opposite; time to start the gnome desktop after the keyboard layout and language prompts went down from 3min:30sec to 2min:30sec. That is the only system so far where I observed a big improvement in desktop startup time. What about cpu power management? Is that enabled when booted from the livecd, on your Tecra M5? It isn't supported on my Tecra S1. Maybe the M5 is running more often at reduced cpu frequency now, because the boot process into the gnome desktop consumes less cpu? And when we finally need some cpu cycles to decompress another lzma block from the compressed lofi file, the decompression runs at the slowest cpu speed, and in the end this slows down the boot process?
(In reply to comment #31) > > What about cpu power management? Is that enabled when booted from the livecd, > on your Tecra M5? It isn't supported on my Tecra S1. Maybe the M5 is running > more often at reduced cpu frequency now, because the boot process into the > gnome desktop consumes less cpu? And when we finally need some cpu cycles to > decompress another lzma block from the compressed lofi file, the decompression > runs at the slowest cpu speed, and in the end this slows down the boot process? Yes, the M5 does CPU power management. I tried running it with the CPU threshold turned up to 30s, didn't make any appreciable difference in boot or installation time, pretty similar times and behavior were observed. The boot time issue is interesting since this machine is the exact one that I use for generating the ISO optimization that's applied to the release, so it should work better than anything else out there. The solaris.zlib layout is optimized for 32-bit booting, though, so just for fun I also booted it in 32-bit mode and ran the install, with the default power management settings. Boot time was still unremarkable, though a bit better than in 64-bit. The install, however, was *much* faster, 13:50 (this image now has the hsfs patch applied as well). Very strange. Tests so far with only the lofi changes on 12 systems and 2 virtual machine guests (VMware and VirtualBox) show an average improvement of 50% in both installation time and std. deviation of installation time. Further tests with the hsfs modification are in progress.
comment #30 by dminer jogged my memory about the behavior on the Compaq system. The install seemed to run smoothly until it started hitting about 70%. It stalled there for a while and there was a great deal of churning/head movement on the DVD. The image used was w/o hsfs tuning. I need to investigate what was occurring at that point.
Booting the liveCD(dminer-lzma-12-02.iso) in 32bit mode on compaq 6710b and installing it took 15.5 minutes. A previous run on the same system using the same image booted in 64 bit mode took 34 minutes.
(In reply to comment #22) > Amusingly, in the M5 case the boot time went up by a minute or so, because the > CD kept spinning down since we weren't keeping it hot. Will have to see if > re-optimizing layout helps there, but might be other choices to make. Two ideas: 1. Is it possible that the optical drive in the M5 is getting some sort of media errors with that lzma optimized iso image? I think that error recovery in the Teac DVD-ROM/CD-Writer in my Tecra S1 sometimes spins down the media, moves the laser picup to the lead-in position (?), spins up the media and retries the read attempt. 2. Is it possible that the optical drive in that M5 is using constant linear velocity (CLV) instead of constant angular velocity (CAV), so that the drive has to increase / decrease the rotational speed of the media when reading the inner / outer part of the ISO? (I do notice that the optical drive in my Tecra S1 is performing quite a few seek operations during desktop startup, but doesn't accelerate of slow down rotating the optical media; so apparently it is using CAV) On the M5, what kind of transfer rates do you get in "iostat -xnz 5" for the lzma iso media when you read it with "dd if=/dev/rdsk/c?t?d0p0 of=/dev/null bs=32k" ? Is the transfer rate always identical during that dd run (CLV), or is is slowly increasing (CAV) ?
(In reply to comment #30) > I added an hsfs with the hash table patch and re-ran, though without the > powercond tweak, and it actually got 10% slower on the M5, though it's possible > my probing was somewhat to blame. What appears to happen is that it cruises > along very nicely until about the 65% mark, getting there in less than 10 > minutes, and then starts acting pathologically badly, starting and stopping the > CD drive. Watching the cpio process with "truss -t open", at that point it's > working its way through /usr/share, and really struggling. Somewhere around the 50% - 70% mark I also notice quite a few seeks by the laser pic-up. Are there any attempts to minimize seek operations on the optical drive during the cpio copy to hdd? I don't think so, and it seems that optimizing the file layout on the livecd for desktop boot performance might not help the cpio performance. I made some experiments with sorting the cpio file lists by the "extent location" of each file (transfer_mod.py), and that seems to have reduced the number of seek operations during the cpio transfer from optical media to hdd (but it did not eliminate all seeks). For my Tecra S1 the cpio time was reduced from 19m15s to 15m; for an AMD64 X2 box the cpio time was reduced from 5m40s to 5m9s. So, by optimizing the order of the file lists the cpio time can be reduced by another 10 - 20%.
Created an attachment (id=1120) [details] sort cpio file lists by file extent location, in an attempt to reduce seek ops on the optical drive This patch to the python transfer_mod.py tries to minimize the number of optical drive seek operations during the cpio transfer process, by sorting the cpio file lists by "file location on disk". For a 1.3GHz 32-bit Pentium-M this has reduced installation time by 5min (19m15s -> 15m), or 20% less time during cpio. For a 2.2Ghz 64-bit AMD64 X2 the installation time was reduced by 30sec (5m40s -> 5m09s), or 10% less time used for cpio.
(In reply to comment #35) > 1. Is it possible that the optical drive in the M5 is getting some sort of > media errors with that lzma optimized iso image? I think that error > recovery in the Teac DVD-ROM/CD-Writer in my Tecra S1 sometimes spins down > the media, moves the laser picup to the lead-in position (?), spins up > the media and retries the read attempt. > Thought about that, but it doesn't seem to be the problem, and the specific media I'm using is a new DVD+RW. > On the M5, what kind of transfer rates do you get in "iostat -xnz 5" > for the lzma iso media when you read it with > "dd if=/dev/rdsk/c?t?d0p0 of=/dev/null bs=32k" ? > Is the transfer rate always identical during that dd run (CLV), > or is is slowly increasing (CAV) ? Transfer rates start out about 3276 kr/s, then jump to about 12400 kr/s, so CAV seems likely.
(In reply to comment #38) > > On the M5, what kind of transfer rates do you get in "iostat -xnz 5" > > for the lzma iso media when you read it with > > "dd if=/dev/rdsk/c?t?d0p0 of=/dev/null bs=32k" ? > > Is the transfer rate always identical during that dd run (CLV), > > or is is slowly increasing (CAV) ? > > Transfer rates start out about 3276 kr/s, then jump to about 12400 kr/s, so CAV > seems likely. Hmm, that's quite a big jump from 3276 kr/s -> 12400 kr/s ! Did you observe the 3276 kr/s only for the first 5 second iostat interval, after starting the dd? For a DVD-RW media, I get transfer rates starting at 7455 kr/s and slowly increasing to 9420 kr/s (ignoring the first and last 5 second iostat measurement, because it does not include i/o for the full 5 second interval): r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 129.8 0.6 4153.7 0.0 0.0 0.6 0.0 4.4 1 58 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 233.0 0.0 7455.9 0.0 0.0 1.0 0.0 4.1 1 96 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 237.2 0.0 7590.5 0.0 0.0 1.0 0.0 4.0 1 96 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 241.6 0.0 7731.2 0.0 0.0 1.0 0.0 4.0 1 96 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 246.0 0.0 7872.0 0.0 0.0 1.0 0.0 3.9 1 96 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 250.4 0.0 8012.8 0.0 0.0 1.0 0.0 3.8 1 96 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 254.6 0.0 8147.3 0.0 0.0 1.0 0.0 3.8 1 96 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 259.2 0.0 8294.5 0.0 0.0 1.0 0.0 3.7 1 96 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 263.6 0.0 8435.0 0.0 0.0 1.0 0.0 3.6 1 96 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 267.8 0.0 8569.6 0.0 0.0 1.0 0.0 3.6 1 96 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 272.4 0.0 8716.8 0.0 0.0 1.0 0.0 3.5 1 95 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 276.6 0.0 8851.3 0.0 0.0 1.0 0.0 3.5 1 95 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 281.2 0.0 8998.3 0.0 0.0 1.0 0.0 3.4 1 95 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 285.4 0.0 9132.9 0.0 0.0 1.0 0.0 3.3 1 95 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 289.8 0.0 9273.6 0.0 0.0 1.0 0.0 3.3 1 96 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 294.4 0.0 9420.7 0.0 0.0 1.0 0.0 3.2 1 95 c1t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 24.4 0.0 778.0 0.0 0.0 0.1 0.0 3.2 0 8 c1t1d0 ^C In case that drive in the Tera M5 really reads ~ 3 MB/sec for the inner area of the media, and 12 MB/sec in the outer area, then I suspect the drive did change the rotational speed of the media while dd was reading it.
(In reply to comment #34) > Booting the liveCD(dminer-lzma-12-02.iso) in 32bit mode on compaq 6710b and > installing it took 15.5 minutes. A previous run on the same system using the > same image booted in 64 bit mode took 34 minutes. When that compaq 6710b is booted from an installed opensolaris system (both 32-bit and 64-bit), and you mount the solaris.zlib compressed filesystem image from the optical media with something like mount -F hsfs /media/OpenSolaris/solaris.zlib /mnt and after that read all of the files from that media: find /mnt/ -type f -print | xargs digest -a md5 > /dev/null How much time does that need to read all the files, in 32-bit vs 64-bit mode? Is 32-bit twice as fast as 64-bit?
(In reply to comment #38) > > On the M5, what kind of transfer rates do you get in "iostat -xnz 5" > > for the lzma iso media when you read it with > > "dd if=/dev/rdsk/c?t?d0p0 of=/dev/null bs=32k" ? > > Is the transfer rate always identical during that dd run (CLV), > > or is is slowly increasing (CAV) ? > > Transfer rates start out about 3276 kr/s, then jump to about 12400 kr/s, so CAV > seems likely. I just found a Samsung drive that is showing such a jump in transfer rates when using a DVD+RW media: Vendor: TSSTcorp Product: CD/DVDW SH-S183A Revision: SB00 Serial No: This drive starts to read with transfer rates of 4730.7 kr/s and slowly increases to 5579.9 kr/s, and then jumps to 19340.9 kr/s. Apparently this drive does not return "EOF" after all recorded blocks have been read, and the drive returned zero filled blocks after that, probably without accessing the media at all. # dd if=/dev/rdsk/c15t1d0p0 of=/dev/null bs=32k 143444+0 blocks in 143444+0 blocks out dd did read 4.7 GB of data from this DVD+RW media, although the recorded OS 2008.11 image has a size of only 672 MB. # cdrw -M Device : TSSTcorp CD/DVDW SH-S183A Firmware : Rev. SB00 ( 0804 ) Track No. |Type |Start address ----------+--------+------------- 1 |Data |0 Leadout |Data |2295104 Last session start address: 0 A workaround for the dd test would be to limit the number of 32k reads, using "dd if=/dev/rdsk/c?t?d0p0 of=/dev/null bs=32k count=20500" so that we stop reading after 670 MB, before we access unrecorded DVD+RW sectors. But the box with that Samsung drive and a DVD+RW media still doesn't reproduce the issue with the drive spinning down during live-cd boot into the gnome desktop...
(In reply to comment #27) > (In reply to comment #23) > > (In reply to comment #22) > > > I ran some tests with a modified global image, using just the lofi module in > > > attachment 1087 [details] [details] [details] [details], hadn't rebuilt hsfs yet. > > > > > > virtual box (1024 MB memory) on Ultra 24: > > > installation time improved from 38:22 to 8:13 > > > > Was it installed from a media in an optical drive, > > or did you use an ISO image on HDD? > > > > This was just using an ISO image. > > > ... I find that using VirtualBox with an RC2 .iso image Installer is painfully slow (in it's booting) compared to using the same image, the same way, with VMware. _Perhaps_ the reading (and decoding) of the .iso is grabbing a lot of memory and allowing less to be available for the "un-lzma-ing". Whatever the reason the difference is very noticeable. I think that either burning a disk, using VMware "direct .iso", or doing a bare-metal installation will give a more accurate timing result than using VirtualBox "direct .iso" for timing the installation - until VBox is fixed. Rob
(In reply to comment #42) > I find that using VirtualBox with an RC2 .iso image Installer is painfully slow > (in it's booting) compared to using the same image, the same way, with VMware. What exactly is slow? The initial boot, from the GRUB menu until we get to the language / keyboard layout prompt, esp. when GRUB displays the series of dots at the top of the console screen? This is very fast in VirtualBox, when the host cpu is AMD and AMD-V hardware virtualization is enabled (GRUB loads the kernel in 3 seconds on my AMD box under virtualbox). When AMD-V is disabled, GRUB bootstrap procedure is an order of magnitude slower. On Intel there isn't much difference between VT-X enabled / disabled, it's always quite slow in the initial GRUB boot phase. AFAIK, virtualbox is able to run x86 real mode code directly on an AMD cpu when AMD-V is enabled, but is slowly emulating real mode code in software on Intel cpus.
Created an attachment (id=1162) [details] test results from install group Results from my test images. Substantial performance improvements, no failures other than low-memory machines which were known to have problems. Some outliers that we're still investigating. Summary installation times: 2008.11 global lofi lofi+hsfs Mean 60.63 40.41 38.05 Median 48 32 28 Std. Deviation 40.4 31.14 25.73
(In reply to comment #41) ... > > A workaround for the dd test would be to limit the number of > 32k reads, using "dd if=/dev/rdsk/c?t?d0p0 of=/dev/null bs=32k count=20500" > so that we stop reading after 670 MB, before we access unrecorded > DVD+RW sectors. > Re-ran this dd test on the M5. Ignoring first and last iostat results, it starts at 3507 kr/s and ramps to 4499 kr/s, indicating CAV behavior. I also built an image with a re-optimized solaris.zlib for booting, and it showed no difference in boot performance; the DVD spinning up and down is still observed.
(In reply to comment #45) > I also built an image with a re-optimized solaris.zlib for booting, and it > showed no difference in boot performance; the DVD spinning up and down is still > observed. The DVD spinning up and down happens on your Tecra M5, in 64-bit mode? The solaris.zlib is optimized for a 32-bit boot? Does it spin up and down during a 32-bit boot, too?
(In reply to comment #46) > The DVD spinning up and down happens on your Tecra M5, in 64-bit mode? > The solaris.zlib is optimized for a 32-bit boot? Does it spin up and down > during a 32-bit boot, too? Yes, the DVD behavior is essentially the same whether booted in 32- or 64-bit mode.
(In reply to comment #47) > (In reply to comment #46) > > > The DVD spinning up and down happens on your Tecra M5, in 64-bit mode? > > The solaris.zlib is optimized for a 32-bit boot? Does it spin up and down > > during a 32-bit boot, too? > > Yes, the DVD behavior is essentially the same whether booted in 32- or 64-bit > mode. But install time in 32-bit mode was 14 minutes, vs. 42 minutes in 64-bit mode? Is there less spinning up and down during install in 32-bit mode? Why was the install in 32-bit mode so much faster compared to 64-bit mode?
(In reply to comment #48) > (In reply to comment #47) > > (In reply to comment #46) > > > > > The DVD spinning up and down happens on your Tecra M5, in 64-bit mode? > > > The solaris.zlib is optimized for a 32-bit boot? Does it spin up and down > > > during a 32-bit boot, too? > > > > Yes, the DVD behavior is essentially the same whether booted in 32- or 64-bit > > mode. > > But install time in 32-bit mode was 14 minutes, vs. 42 minutes in 64-bit mode? > Is there less spinning up and down during install in 32-bit mode? > Yes, and yes. > Why was the install in 32-bit mode so much faster compared to 64-bit mode? That is the mystery at this point.
There are additional optimizations possible: 1) Remove unnecessary bcopy of uncompressed segments. 2) Do not cache uncompressed segments. Only cache those that need decompression. 3) In corollary to the above remove an unnecessary kmem_alloc when accessing uncompressed segments. I will try these out shortly.
Raising the priority of this as it's important to address this if we can.
Created an attachment (id=1364) [details] Cache and reuse decompressed data Add a few comments to the code
Created an attachment (id=1372) [details] Fix checks for compressed segment size in lofi module and lofiadm
(In reply to comment #22) > I ran some tests with a modified global image, using just the lofi module in > attachment 1087 [details] [details], hadn't rebuilt hsfs yet. > > virtual box (1024 MB memory) on Ultra 24: > installation time improved from 38:22 to 8:13 > > Toshiba Tecra M5 (3072 MB memory, Core 2 duo): > installation time improved from 44:36 to 33:04 > > Amusingly, in the M5 case the boot time went up by a minute or so, because the > CD kept spinning down since we weren't keeping it hot. Will have to see if > re-optimizing layout helps there, but might be other choices to make. Maybe I've found an optical drive that does something similar: c1t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: TSSTcorp Product: CD/DVDW SH-S183A Revision: SB00 Serial No: While gui-installer is copying CD contents to HDD the drive seems to change read speed on it's own (between 40x / 32x / 24x / 16x ...?). In 40x mode the drive makes a lot of noise; and it is much more quiet in the other modes. But it doesn't spin down the media completely. During installation I hear that the drive sometimes makes more and sometimes makes less noise. And it happens both with and without lzma decompress optimization in the lofi module. Installation time (for 106a) with the changed lofi module is a bit better (18m30s instead of 22m). Boot time for the LiveCD is identical.
(In reply to comment #37) > Created an attachment (id=1120) [details] [details] > sort cpio file lists by file extent location, in an attempt to reduce seek ops > on the optical drive > > This patch to the python transfer_mod.py tries to minimize the number of > optical drive seek operations during the cpio transfer process, by sorting > the cpio file lists by "file location on disk". > > > For a 1.3GHz 32-bit Pentium-M this has reduced installation time by 5min > (19m15s -> 15m), or 20% less time during cpio. > > For a 2.2Ghz 64-bit AMD64 X2 the installation time was reduced by 30sec > (5m40s -> 5m09s), or 10% less time used for cpio. Any chance that we could get this change added to the installer? It reduces the number of seek operation on the optical device while transferring CD contents to the HDD, and saves a couple of minutes of installation time on systems that have an optcal drive with slow seek times.
(In reply to comment #55) > (In reply to comment #37) > > Created an attachment (id=1120) [details] [details] [details] > > sort cpio file lists by file extent location, in an attempt to reduce seek ops > > on the optical drive > > > > This patch to the python transfer_mod.py tries to minimize the number of > > optical drive seek operations during the cpio transfer process, by sorting > > the cpio file lists by "file location on disk". > > > > > > For a 1.3GHz 32-bit Pentium-M this has reduced installation time by 5min > > (19m15s -> 15m), or 20% less time during cpio. > > > > For a 2.2Ghz 64-bit AMD64 X2 the installation time was reduced by 30sec > > (5m40s -> 5m09s), or 10% less time used for cpio. > > Any chance that we could get this change added to the installer? > > It reduces the number of seek operation on the optical device while > transferring CD contents to the HDD, and saves a couple of minutes > of installation time on systems that have an optcal drive with slow > seek times. I have split this change into a separate bug - 6690 transfer module should sort file list by location so that this defect (1119) can be used to track just the performance enhancements to lofi.
(In reply to comment #18) > Created an attachment (id=1086) [details] [details] > Cache and reuse decompressed data > > This is my experimental patch to cache and reuse uncompressed data segments. > > By patching the new kernel tunable "lofi_max_comp_cache" you can experiment > with caching only the last decompressed segment (lofi_max_comp_cache = 1), > disable caching decompressed data (lofi_max_comp_cache = 0), or caching > two or more decompressed segments. > > Using lofi_max_comp_cache > 1 appears to be ~ 5-10% faster compared to > lofi_max_comp_cache == 1. While adding LZMA support into ON, I've been running some tests and it seems to me that for read requests that straddle multiple compressed segments it would be beneficial to cache the last two decompressed segments (lofi_max_comp_cache = 2) as opposed to just the last one.
(In reply to comment #49) > (In reply to comment #48) > > (In reply to comment #47) > > > (In reply to comment #46) > > > > > > > The DVD spinning up and down happens on your Tecra M5, in 64-bit mode? > > > > The solaris.zlib is optimized for a 32-bit boot? Does it spin up and down > > > > during a 32-bit boot, too? > > > > > > Yes, the DVD behavior is essentially the same whether booted in 32- or 64-bit > > > mode. > > > > But install time in 32-bit mode was 14 minutes, vs. 42 minutes in 64-bit mode? > > Is there less spinning up and down during install in 32-bit mode? > > > > Yes, and yes. > > > Why was the install in 32-bit mode so much faster compared to 64-bit mode? > > That is the mystery at this point. A new theory for the Tecra M5 issue: What drivers are used for the HDD and optical drive with the M5? Is it possible that both devices share one ide channel? So that I/O on the optical drive blocks I/O on the HDD - and vice versa?
(In reply to comment #58) > A new theory for the Tecra M5 issue: > > What drivers are used for the HDD and optical drive with the M5? > Is it possible that both devices share one ide channel? > So that I/O on the optical drive blocks I/O on the HDD - and vice versa? An interesting theory, but not the case here. While the internal drive and the optical drive both appear to be hung off of the 82801 SATA IDE controller, all of the test installs I'm reporting are to external USB hard drives (I have a couple of different ones I use, depending on whether I'm at home or the office).
The lofi caching changes are being tracked 6805505