Bug 1119 - LZMA performance needs a boost
: LZMA performance needs a boost
Status: CLOSED TRACKEDINBUGSTER
Product: livecd
application
: unspecified
: i86pc/i386 OpenSolaris
: P2 major (vote)
: 2009.04
Assigned To: Sanjay Nadkarni
:
:
: BugsterCR=6805505
:
:
: 6372
  Show dependency treegraph
 
Reported: 2008-04-11 10:45 UTC by Frank Ludolph
Modified: 2009-03-12 10:46 UTC (History)
7 users (show)

See Also:


Attachments
Fix issue with compressed lofi images >= 4GB (5.73 KB, patch)
2008-12-01 09:22 UTC, Jürgen Keil
no flags Details | Diff
Fix kernel hang when compressed lofi image uses compressed segment size 0 (921 bytes, patch)
2008-12-01 09:23 UTC, Jürgen Keil
no flags Details | Diff
Cache and reuse decompressed data (12.63 KB, patch)
2008-12-01 09:28 UTC, Jürgen Keil
no flags Details | Diff
current opensolaris lofi module, with my three lofi patches, + lzma support (33.89 KB, application/octet-stream)
2008-12-01 09:33 UTC, Jürgen Keil
no flags Details
patch to increase hsfs hash table size; hsfs is slow with the opensolaris solaris.zlib fs (17.74 KB, patch)
2008-12-01 09:39 UTC, Jürgen Keil
no flags Details | Diff
powercond utility to dump an optical drive's idle and standby timers (7.30 KB, application/octet-stream)
2008-12-04 08:21 UTC, Jürgen Keil
no flags Details
sort cpio file lists by file extent location, in an attempt to reduce seek ops on the optical drive (2.82 KB, patch)
2008-12-08 03:24 UTC, Jürgen Keil
no flags Details | Diff
test results from install group (13.10 KB, application/vnd.oasis.opendocument.spreadsheet)
2008-12-12 09:32 UTC, Dave Miner
no flags Details
Cache and reuse decompressed data (15.88 KB, patch)
2009-02-14 11:44 UTC, Jürgen Keil
no flags Details | Diff
Fix checks for compressed segment size in lofi module and lofiadm (2.65 KB, patch)
2009-02-17 06:45 UTC, Jürgen Keil
no flags Details | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description Frank Ludolph 2008-04-11 10:45:29 UTC
The time to startup desktop applications on RC0 has nearly doubled compared to
DP2. On my Tecra M2 Firefox has increased from 38 sec to 1:03, Thunderbird from
38 sec to 1:04, and the installer has increased from 8 sec to 16. While the
installer time is still acceptable, Firefox and Thunderbird at > 1 min are way
too long.
Comment 1 Dave Miner 2008-04-14 11:33:18 UTC
Unfortunately, this is basically a result of using LZMA for compressing the usr
file system.  Need time to work on performance.
Comment 2 Joe Bonasera 2008-04-18 10:52:26 UTC
Indiana RC1 is horribly slow under VirtualBox compared to Preview 2
which was fast and could install in about 10 or 15 minutes.  RC1 is so slow
I'm guessing the time will be well over 1 hour. You can see from the VBox
GUI that the problem is that the virtual CPU is CPU bound, as the disk
activity icons only flicker lightly. By running:

pfexec lockstat -kIW -s 20 sleep 20 | less 

while the install is running shows that we are spending a lot of our time
in either Lzma_DecodeReal() - which I expect is the decompression.
But then lots and lots of time in the HAT/kmem/vmem constantly allocating and
mapping, then deallocating and demapping some sort of large kernel buffer.
This is just broken and is the worst possible way to do things in a
Virtual environment. At a minimum it should allocate
buffers ahead of time and reuse them.
Comment 3 Dave Miner 2008-04-24 11:41:40 UTC
As gzip is the primary image for 2008.05 release, removing from blocker list.
Comment 4 David Comay 2008-11-09 19:29:26 UTC
Should we just turn this bug into a general "LZMA performance needs
improvement" bug?
Comment 5 Alok Aggarwal 2008-11-10 06:24:46 UTC
Makes sense. Changing summary accordingly.
Comment 6 Alok Aggarwal 2008-11-25 12:47:03 UTC
*** Bug 5338 has been marked as a duplicate of this bug. ***
Comment 7 Jürgen Keil 2008-11-26 01:55:55 UTC
(In reply to comment #2)
> Indiana RC1 is horribly slow under VirtualBox compared to Preview 2
> which was fast and could install in about 10 or 15 minutes.  RC1 is so slow
> I'm guessing the time will be well over 1 hour. You can see from the VBox
> GUI that the problem is that the virtual CPU is CPU bound, as the disk
> activity icons only flicker lightly. By running:
> 
> pfexec lockstat -kIW -s 20 sleep 20 | less 
> 
> while the install is running shows that we are spending a lot of our time
> in either Lzma_DecodeReal() - which I expect is the decompression.

I've noticed the same, on 32-bit Pentium-M 1.3GHz, 2GB memory, installing 
(osol-0811-rc2-global.iso) to an 8 GB USB flash memory stick (on metal,
no virtualbox): while the gui installer runs, system uses almost 100% kernel
cpu time and interactive performance is *extremely bad* (something like 5
seconds
until a key press echoes in a gnome-terminal).  I also used a lockstat
similar to the one above and found 70 - 80% of the kernel cpu time be to
consumed by lzma decode.

Total install time was almost 2 hours...
Comment 8 Rob_C 2008-11-28 11:52:35 UTC
RC2 is only 65 minutes whereas it took ~ 2 hours for RC1 (on my system).
Rob

Someone (else) _might_ want to close this as "FIXED".
Comment 9 Jürgen Keil 2008-12-01 01:53:17 UTC
What I'm observing is that compressed lofi frequently gets a lot of 2K or 4K
strategy read requests, all inside the same 128 KB compression segment.  The
same compressed data is decompressed over and over again, which should waste
a lot of cpu cycles.

I think that caching the contents of the last decompressed segment and reusing
the cached decompressed data when we get another read request for that same
segment might speed up installing from compressed lofi dramatically.
Comment 10 Jürgen Keil 2008-12-01 03:40:17 UTC
(In reply to comment #9)

> I think that caching the contents of the last decompressed segment and reusing
> the cached decompressed data when we get another read request for that same
> segment might speed up installing from compressed lofi dramatically.

Yep, a simple lofi change to cache and reuse the last decompressed segment
speeds up reading from a gzip compressed solaris.zlib image by a factor
of 5.5 !

Test case is:  get the solaris.zlib from OS 2008.11 RC2, uncompress and
compress it in gzip format.

I measure the time we need to build a cpio archive from the hsfs mounted
gzip'ed solaris.zlib image (this should be similar to what the gui-installer
does during installation; omitting the time we need to extract the data to
the install target hdd; and using gzip instead of lzma):

SX:CE ~ build 104:

# mount -F hsfs /files2/test/solaris.zlib /mnt
# ptime sh -c "find /mnt | cpio -ocB > /dev/null"
4399150 blocks

real     7:48.662871977
user        2.096814604
sys        38.577857142


And the same with a lofi module changed to cache and reuse the last
decompressed
data segment:

# ptime sh -c "find /mnt | cpio -ocB > /dev/null"
4399150 blocks

real     1:26.739143753
user        1.862050649
sys        32.694810881
Comment 11 Jürgen Keil 2008-12-01 04:25:52 UTC
(In reply to comment #9)

For the original lzma encoded solaris.zlib image the time to create a cpio 
archive from this filesystem has improved from 20 min 35 sec to 2 min 49 sec,
when caching and reusing the last decompressed data block.
Comment 12 Jürgen Keil 2008-12-01 08:01:24 UTC
(In reply to comment #0)
> The time to startup desktop applications on RC0 has nearly doubled compared to
> DP2. On my Tecra M2 Firefox has increased from 38 sec to 1:03, Thunderbird from
> 38 sec to 1:04, and the installer has increased from 8 sec to 16. While the
> installer time is still acceptable, Firefox and Thunderbird at > 1 min are way
> too long.

Using the OS 2008.11 RC2 global live-cd under virtualbox, without / with my
lofi decompress cache change:  startup time for firefox drops from 1:30 to
20 seconds.
Comment 13 Dave Miner 2008-12-01 08:47:18 UTC
Juergen, if you've got a patch, we'd love to experiment with it.
Comment 14 Jürgen Keil 2008-12-01 09:17:26 UTC
(In reply to comment #8)
> RC2 is only 65 minutes whereas it took ~ 2 hours for RC1 (on my system).
> Rob
> 
> Someone (else) _might_ want to close this as "FIXED".

I just installed RC2 under virtualbox, and installation time was 15 minutes...
8-)

So I don't think this bug should be closed...

Maybe it's priority should be raised, and a fix should be included for the
2008.11
release, but it's probably too late now (and the risk of breaking something
is too high) ?
Comment 15 Jürgen Keil 2008-12-01 09:22:19 UTC
Created an attachment (id=1084) [details]
Fix issue with compressed lofi images >= 4GB
Comment 16 Jürgen Keil 2008-12-01 09:23:36 UTC
Created an attachment (id=1085) [details]
Fix kernel hang when compressed lofi image uses compressed segment size 0
Comment 17 Alok Aggarwal 2008-12-01 09:28:16 UTC
(In reply to comment #15)
> Created an attachment (id=1084) [details] [details]
> Fix issue with compressed lofi images >= 4GB

If you'd like to work on the fixes of these bugs
filed as bugster CR 6778233 (seems like you would
since you already have a suggested fix), I'd be
happy to sponsor them for you.
Comment 18 Jürgen Keil 2008-12-01 09:28:55 UTC
Created an attachment (id=1086) [details]
Cache and reuse decompressed data

This is my experimental patch to cache and reuse uncompressed data segments.

By patching the new kernel tunable "lofi_max_comp_cache" you can experiment
with caching only the last decompressed segment (lofi_max_comp_cache = 1),
disable caching decompressed data (lofi_max_comp_cache = 0), or caching
two or more decompressed segments.

Using lofi_max_comp_cache > 1 appears to be ~ 5-10% faster compared to
lofi_max_comp_cache == 1.
Comment 19 Jürgen Keil 2008-12-01 09:33:33 UTC
Created an attachment (id=1087) [details]
current opensolaris lofi module, with my three lofi patches, + lzma support

Compiled lofi module, with the patches included in this bug,
+ lzma support from http://cr.opensolaris.org/~nadkarni/clofi-1/
Comment 20 Jürgen Keil 2008-12-01 09:39:16 UTC
Created an attachment (id=1088) [details]
patch to increase hsfs hash table size; hsfs is slow with the opensolaris
solaris.zlib fs

hsfs uses a hash table with 32 entries; when used with opensolaris 2008.11
solaris.zlib (~ 100000 files), we have hash chains using 100000 / 32 = 3125
entries.  Functions hs_findhash and hs_freenode start to consume a signigicant
amount of kernel cpu time.

Quick fix is to increase the size of the hash table from 32 entries to
something
reasonable.
Comment 21 Jürgen Keil 2008-12-02 02:06:37 UTC
(In reply to comment #7)

> I've noticed the same, on 32-bit Pentium-M 1.3GHz, 2GB memory, installing 
> (osol-0811-rc2-global.iso) to an 8 GB USB flash memory stick (on metal,
> no virtualbox): while the gui installer runs, system uses almost 100% kernel
> cpu time and interactive performance is *extremely bad* (something like 5
> seconds
> until a key press echoes in a gnome-terminal).  I also used a lockstat
> similar to the one above and found 70 - 80% of the kernel cpu time be to
> consumed by lzma decode.

I repeated this install with an RC2 image including my suggested fixes for lofi
and hsfs, and the system now actually has idle time during installation,
apparently I/O waiting for the CD / optical device...

> Total install time was almost 2 hours...

... and the total install time has dropped to 30 minutes.


On the same box, startup time for firefox in the live-cd environment has
improved from 2 minutes to 20 seconds.
Comment 22 Dave Miner 2008-12-03 11:35:22 UTC
I ran some tests with a modified global image, using just the lofi module in
attachment 1087 [details], hadn't rebuilt hsfs yet.

virtual box (1024 MB memory) on Ultra 24:
installation time improved from 38:22 to 8:13

Toshiba Tecra M5 (3072 MB memory, Core 2 duo):
installation time improved from 44:36 to 33:04

Amusingly, in the M5 case the boot time went up by a minute or so, because the
CD kept spinning down since we weren't keeping it hot.  Will have to see if
re-optimizing layout helps there, but might be other choices to make.

We'll be running some more tests with other systems to get a better profile of
the performance effect.
Comment 23 Jürgen Keil 2008-12-04 06:00:57 UTC
(In reply to comment #22)
> I ran some tests with a modified global image, using just the lofi module in
> attachment 1087 [details] [details], hadn't rebuilt hsfs yet.
> 
> virtual box (1024 MB memory) on Ultra 24:
> installation time improved from 38:22 to 8:13

Was it installed from a media in an optical drive,
or did you use an ISO image on HDD?


> Toshiba Tecra M5 (3072 MB memory, Core 2 duo):
> installation time improved from 44:36 to 33:04
> 
> Amusingly, in the M5 case the boot time went up by a minute or so, because the
> CD kept spinning down since we weren't keeping it hot.  Will have to see if
> re-optimizing layout helps there, but might be other choices to make.

Did the optical drive spin down frequently during installation
on the Toshiba Tecra M5, too?  Could that be an explanation
why the install time did not improve that much, compared to
the virtualbox case?



I just compared the time it needs to boot the livecd from the
two keyboard layout / language prompts into the gnome desktop:
2min 20sec, on an AMD64 X2, 8GB memory.  The lofi changes
seem to have no influence in this test case?


> We'll be running some more tests with other systems to get a better profile of
> the performance effect.
Comment 24 Sanjay Nadkarni 2008-12-04 06:41:33 UTC
Here are some more numbers based on ISO image that Dave created which has the
updated lofi driver compared to global iso.  
System                Global Iso              dminer-lzma-12-01.iso
Dell Latitude D531     34.5 min                11.5
Dell Optiflex  330        23 min               6.05 min
V65x (32bit system)     58 min               18 min
Compaq 6710b             33.5 min            34 min 

For Compaq6710b, the optical drive kept starting and stopping constantly for
the image with the modified lofi driver. I am a little suspicious of the drive
so I will retry that with an external drive.
Comment 25 Jürgen Keil 2008-12-04 07:01:25 UTC
(In reply to comment #24)

> System                Global Iso              dminer-lzma-12-01.iso
...
> Compaq 6710b             33.5 min            34 min 
> 
> For Compaq6710b, the optical drive kept starting and stopping constantly for
> the image with the modified lofi driver.

An idea would be to play with the optical device's IDLE and STANDBY timers.
Apparently these can be controlled with SCSI MODE SELECT, mode page 0x1A.

There already is a define in the kernel sources (MODEPAGE_POWER_COND,
in usr/src/uts/common/sys/scsi/generic/mode.h) but it does not seem to be
used anywhere in the Solaris kernel (with the exception of the sata framework,
but that code contains a "not implemented" comment.

Btw. I do remember that on some older systems, there used to be a BIOS setup
page to control ata drive standby / idle timers.  Maybe that Compaq 6710b or
the Terca M5 has something like that?
Comment 26 Jürgen Keil 2008-12-04 08:21:13 UTC
Created an attachment (id=1104) [details]
powercond utility to dump an optical drive's idle and standby timers

To test my theory that Dave's Tecra M5 and Sanjay's Compaq 6710b 
enter idle or standby mode after a few seconds:  what is reported
by the powercond utility in this attachment?

For a P-ATA SONY "DVD RW AW-G170A" optical drive I found:

# ./powercond /dev/rdsk/c1t1d0p0
IDLE timer is enabled, 20.0 seconds
STANDBY timer is enabled, 480.0 seconds
Comment 27 Dave Miner 2008-12-04 08:37:00 UTC
(In reply to comment #23)
> (In reply to comment #22)
> > I ran some tests with a modified global image, using just the lofi module in
> > attachment 1087 [details] [details] [details], hadn't rebuilt hsfs yet.
> > 
> > virtual box (1024 MB memory) on Ultra 24:
> > installation time improved from 38:22 to 8:13
> 
> Was it installed from a media in an optical drive,
> or did you use an ISO image on HDD?
> 

This was just using an ISO image.

> 
> > Toshiba Tecra M5 (3072 MB memory, Core 2 duo):
> > installation time improved from 44:36 to 33:04
> > 
> > Amusingly, in the M5 case the boot time went up by a minute or so, because the
> > CD kept spinning down since we weren't keeping it hot.  Will have to see if
> > re-optimizing layout helps there, but might be other choices to make.
> 
> Did the optical drive spin down frequently during installation
> on the Toshiba Tecra M5, too?  Could that be an explanation
> why the install time did not improve that much, compared to
> the virtualbox case?
> 

I didn't notice the drive spinning down frequently during installation,
certainly not the way it was during boot.

> 
> 
> I just compared the time it needs to boot the livecd from the
> two keyboard layout / language prompts into the gnome desktop:
> 2min 20sec, on an AMD64 X2, 8GB memory.  The lofi changes
> seem to have no influence in this test case?

They should have some influence, though the construction process for the CD
attempts to optimize the sort order of the solaris.zlib HSFS to limit seeking,
so the lofi changes may be less helpful here.

The powercond results from the M5:
# ./powercond /dev/rdsk/c6t0d0p0
IDLE timer is enabled, 5.0 seconds
STANDBY timer is enabled, 30.0 seconds

Very short idle time, obviously.  I don't remember any way to control this in
the bios, though I'll recheck that.
Comment 28 Jürgen Keil 2008-12-04 09:42:33 UTC
(In reply to comment #27)

> The powercond results from the M5:
> # ./powercond /dev/rdsk/c6t0d0p0
> IDLE timer is enabled, 5.0 seconds
> STANDBY timer is enabled, 30.0 seconds
> 
> Very short idle time, obviously.  I don't remember any way to control this in
> the bios, though I'll recheck that.

What happens when you boot the lofi modified livecd with
option "-s", login in single user mode, manually configure
the network interface, pull the "powercond" into the livecd
environment via ftp, and use it with options "-i 20 -s 60"
(to change the IDLE / STANDBY times to 20 / 60 seconds),
and after that continue with the boot process into the
gnome desktop?   Would that still spin down the optical
device during gnome startup?
Comment 29 Dave Miner 2008-12-04 11:35:57 UTC
(In reply to comment #28)

> What happens when you boot the lofi modified livecd with
> option "-s", login in single user mode, manually configure
> the network interface, pull the "powercond" into the livecd
> environment via ftp, and use it with options "-i 20 -s 60"
> (to change the IDLE / STANDBY times to 20 / 60 seconds),
> and after that continue with the boot process into the
> gnome desktop?   Would that still spin down the optical
> device during gnome startup?

Tried this.  Still get some spinning up and down.  I ran through the install,
and did note some spinning up and down, especially later in the transfer
process.  Install timel was 30:07, so it seems to have helped a bit.
Comment 30 Dave Miner 2008-12-04 15:10:04 UTC
I added an hsfs with the hash table patch and re-ran, though without the
powercond tweak, and it actually got 10% slower on the M5, though it's possible
my probing was somewhat to blame.  What appears to happen is that it cruises
along very nicely until about the 65% mark, getting there in less than 10
minutes, and then starts acting pathologically badly, starting and stopping the
CD drive.  Watching the cpio process with "truss -t open", at that point it's
working its way through /usr/share, and really struggling.
Comment 31 Jürgen Keil 2008-12-05 03:18:26 UTC
(In reply to comment #22)

> Amusingly, in the M5 case the boot time went up by a minute or so, because the
> CD kept spinning down since we weren't keeping it hot.  Will have to see if
> re-optimizing layout helps there, but might be other choices to make.

On my Toshiba Tecra S1 (1.3 GHz Pentium-M, 1 core, 32-bit, 2GB memory)
I see the opposite; time to start the gnome desktop after the keyboard layout
and language prompts went down from 3min:30sec to 2min:30sec.  That is the
only system so far where I observed a big improvement in desktop startup time.


What about cpu power management?  Is that enabled when booted from the livecd,
on your Tecra M5?  It isn't supported on my Tecra S1.  Maybe the M5 is running
more often at reduced cpu frequency now, because the boot process into the
gnome desktop consumes less cpu?  And when we finally need some cpu cycles to
decompress another lzma block from the compressed lofi file, the decompression
runs at the slowest cpu speed, and in the end this slows down the boot process?
Comment 32 Dave Miner 2008-12-05 09:51:03 UTC
(In reply to comment #31)
> 
> What about cpu power management?  Is that enabled when booted from the livecd,
> on your Tecra M5?  It isn't supported on my Tecra S1.  Maybe the M5 is running
> more often at reduced cpu frequency now, because the boot process into the
> gnome desktop consumes less cpu?  And when we finally need some cpu cycles to
> decompress another lzma block from the compressed lofi file, the decompression
> runs at the slowest cpu speed, and in the end this slows down the boot process?

Yes, the M5 does CPU power management.  I tried running it with the CPU
threshold turned up to 30s, didn't make any appreciable difference in boot or
installation time, pretty similar times and behavior were observed.  The boot
time issue is interesting since this machine is the exact one that I use for
generating the ISO optimization that's applied to the release, so it should
work better than anything else out there.  The solaris.zlib layout is optimized
for 32-bit booting, though, so just for fun I also booted it in 32-bit mode and
ran the install, with the default power management settings.  Boot time was
still unremarkable, though a bit better than in 64-bit.  The install, however,
was *much* faster, 13:50 (this image now has the hsfs patch applied as well). 
Very strange.

Tests so far with only the lofi changes on 12 systems and 2 virtual machine
guests (VMware and VirtualBox) show an average improvement of 50% in both
installation time and std. deviation of installation time.  Further tests with
the hsfs modification are in progress.
Comment 33 Sanjay Nadkarni 2008-12-05 10:39:01 UTC
comment #30 by dminer jogged my memory about the behavior on the Compaq system.
 The install seemed to run smoothly until it started hitting about 70%.  It
stalled there for a while and there was a great deal of churning/head movement
on the DVD. The image used was w/o hsfs tuning.

I need to investigate what was occurring at that point.
Comment 34 Sanjay Nadkarni 2008-12-06 14:16:35 UTC
Booting the liveCD(dminer-lzma-12-02.iso) in 32bit mode on compaq 6710b and
installing it took 15.5 minutes.  A previous run on the same system using the
same image booted in 64 bit mode took 34 minutes.
Comment 35 Jürgen Keil 2008-12-08 02:58:43 UTC
(In reply to comment #22)

> Amusingly, in the M5 case the boot time went up by a minute or so, because the
> CD kept spinning down since we weren't keeping it hot.  Will have to see if
> re-optimizing layout helps there, but might be other choices to make.

Two ideas:

1.  Is it possible that the optical drive in the M5 is getting some sort of
    media errors with that lzma optimized iso image?  I think that error
    recovery in the Teac DVD-ROM/CD-Writer in my Tecra S1 sometimes spins down
    the media, moves the laser picup to the lead-in position (?), spins up
    the media and retries the read attempt.


2.  Is it possible that the optical drive in that M5 is using 
    constant linear velocity (CLV) instead of constant angular velocity (CAV),
    so that the drive has to increase / decrease the rotational speed of the
    media when reading the inner / outer part of the ISO?

    (I do notice that the optical drive in my Tecra S1 is performing quite
    a few seek operations during desktop startup, but doesn't accelerate of
    slow down rotating the optical media; so apparently it is using CAV)


On the M5, what kind of transfer rates do you get in "iostat -xnz 5"
for the lzma iso media when you read it with
"dd if=/dev/rdsk/c?t?d0p0 of=/dev/null bs=32k" ?
Is the transfer rate always identical during that dd run (CLV),
or is is slowly increasing (CAV) ?
Comment 36 Jürgen Keil 2008-12-08 03:14:36 UTC
(In reply to comment #30)
> I added an hsfs with the hash table patch and re-ran, though without the
> powercond tweak, and it actually got 10% slower on the M5, though it's possible
> my probing was somewhat to blame.  What appears to happen is that it cruises
> along very nicely until about the 65% mark, getting there in less than 10
> minutes, and then starts acting pathologically badly, starting and stopping the
> CD drive.  Watching the cpio process with "truss -t open", at that point it's
> working its way through /usr/share, and really struggling.


Somewhere around the 50% - 70% mark I also notice quite a few seeks by the
laser
pic-up.

Are there any attempts to minimize seek operations on the optical drive during
the cpio copy to hdd?  I don't think so, and it seems that optimizing the file
layout on the livecd for desktop boot performance might not help the cpio
performance.


I made some experiments with sorting the cpio file lists by the "extent
location"
of each file (transfer_mod.py), and that seems to have reduced the number of
seek operations during the cpio transfer from optical media to hdd (but it did
not eliminate all seeks).  For my Tecra S1 the cpio time was reduced from
19m15s
to 15m;  for an AMD64 X2 box the cpio time was reduced from 5m40s to 5m9s.
So, by optimizing the order of the file lists the cpio time can be reduced by
another 10 - 20%.
Comment 37 Jürgen Keil 2008-12-08 03:24:20 UTC
Created an attachment (id=1120) [details]
sort cpio file lists by file extent location, in an attempt to reduce seek ops
on the optical drive

This patch to the python transfer_mod.py tries to minimize the number of 
optical drive seek operations during the cpio transfer process, by sorting
the cpio file lists by "file location on disk".


For a 1.3GHz 32-bit Pentium-M this has reduced installation time by 5min
(19m15s -> 15m), or 20% less time during cpio.

For a 2.2Ghz 64-bit AMD64 X2 the installation time was reduced by 30sec
(5m40s -> 5m09s), or 10% less time used for cpio.
Comment 38 Dave Miner 2008-12-08 13:55:59 UTC
(In reply to comment #35)

> 1.  Is it possible that the optical drive in the M5 is getting some sort of
>     media errors with that lzma optimized iso image?  I think that error
>     recovery in the Teac DVD-ROM/CD-Writer in my Tecra S1 sometimes spins down
>     the media, moves the laser picup to the lead-in position (?), spins up
>     the media and retries the read attempt.
> 

Thought about that, but it doesn't seem to be the problem, and the specific
media I'm using is a new DVD+RW.


> On the M5, what kind of transfer rates do you get in "iostat -xnz 5"
> for the lzma iso media when you read it with
> "dd if=/dev/rdsk/c?t?d0p0 of=/dev/null bs=32k" ?
> Is the transfer rate always identical during that dd run (CLV),
> or is is slowly increasing (CAV) ?

Transfer rates start out about 3276 kr/s, then jump to about 12400 kr/s, so CAV
seems likely.
Comment 39 Jürgen Keil 2008-12-09 04:05:24 UTC
(In reply to comment #38)

> > On the M5, what kind of transfer rates do you get in "iostat -xnz 5"
> > for the lzma iso media when you read it with
> > "dd if=/dev/rdsk/c?t?d0p0 of=/dev/null bs=32k" ?
> > Is the transfer rate always identical during that dd run (CLV),
> > or is is slowly increasing (CAV) ?
> 
> Transfer rates start out about 3276 kr/s, then jump to about 12400 kr/s, so CAV
> seems likely.

Hmm, that's quite a big jump from 3276 kr/s -> 12400 kr/s !
Did you observe the 3276 kr/s only for the first 5 second iostat
interval, after starting the dd?


For a DVD-RW media, I get transfer rates starting at 7455 kr/s and
slowly increasing to 9420 kr/s (ignoring the first and last 5 second
iostat measurement, because it does not include i/o for the full
5 second interval):

    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  129.8    0.6 4153.7    0.0  0.0  0.6    0.0    4.4   1  58 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  233.0    0.0 7455.9    0.0  0.0  1.0    0.0    4.1   1  96 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  237.2    0.0 7590.5    0.0  0.0  1.0    0.0    4.0   1  96 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  241.6    0.0 7731.2    0.0  0.0  1.0    0.0    4.0   1  96 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  246.0    0.0 7872.0    0.0  0.0  1.0    0.0    3.9   1  96 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  250.4    0.0 8012.8    0.0  0.0  1.0    0.0    3.8   1  96 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  254.6    0.0 8147.3    0.0  0.0  1.0    0.0    3.8   1  96 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  259.2    0.0 8294.5    0.0  0.0  1.0    0.0    3.7   1  96 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  263.6    0.0 8435.0    0.0  0.0  1.0    0.0    3.6   1  96 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  267.8    0.0 8569.6    0.0  0.0  1.0    0.0    3.6   1  96 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  272.4    0.0 8716.8    0.0  0.0  1.0    0.0    3.5   1  95 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  276.6    0.0 8851.3    0.0  0.0  1.0    0.0    3.5   1  95 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  281.2    0.0 8998.3    0.0  0.0  1.0    0.0    3.4   1  95 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  285.4    0.0 9132.9    0.0  0.0  1.0    0.0    3.3   1  95 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  289.8    0.0 9273.6    0.0  0.0  1.0    0.0    3.3   1  96 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  294.4    0.0 9420.7    0.0  0.0  1.0    0.0    3.2   1  95 c1t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   24.4    0.0  778.0    0.0  0.0  0.1    0.0    3.2   0   8 c1t1d0
^C


In case that drive in the Tera M5 really reads ~ 3 MB/sec for the inner
area of the media, and 12 MB/sec in the outer area, then I suspect the
drive did change the rotational speed of the media while dd was reading it.
Comment 40 Jürgen Keil 2008-12-09 05:41:37 UTC
(In reply to comment #34)
> Booting the liveCD(dminer-lzma-12-02.iso) in 32bit mode on compaq 6710b and
> installing it took 15.5 minutes.  A previous run on the same system using the
> same image booted in 64 bit mode took 34 minutes.

When that compaq 6710b is booted from an installed opensolaris system
(both 32-bit and 64-bit), and you mount the solaris.zlib compressed filesystem
image from the optical media with something like

    mount -F hsfs /media/OpenSolaris/solaris.zlib /mnt

and after that read all of the files from that media:

    find /mnt/ -type f -print | xargs digest -a md5 > /dev/null


How much time does that need to read all the files, in 32-bit vs 64-bit mode?
Is 32-bit twice as fast as 64-bit?
Comment 41 Jürgen Keil 2008-12-09 08:27:10 UTC
(In reply to comment #38)

> > On the M5, what kind of transfer rates do you get in "iostat -xnz 5"
> > for the lzma iso media when you read it with
> > "dd if=/dev/rdsk/c?t?d0p0 of=/dev/null bs=32k" ?
> > Is the transfer rate always identical during that dd run (CLV),
> > or is is slowly increasing (CAV) ?
> 
> Transfer rates start out about 3276 kr/s, then jump to about 12400 kr/s, so CAV
> seems likely.

I just found a Samsung drive that is showing such a jump in 
transfer rates when using a DVD+RW media:

  Vendor: TSSTcorp Product: CD/DVDW SH-S183A Revision: SB00 Serial No:  


This drive starts to read with transfer rates of 4730.7 kr/s and slowly
increases to 5579.9 kr/s, and then jumps to 19340.9 kr/s.  Apparently
this drive does not return "EOF" after all recorded blocks have been read,
and the drive returned zero filled blocks after that, probably without
accessing the media at all.

# dd if=/dev/rdsk/c15t1d0p0 of=/dev/null bs=32k
143444+0 blocks in
143444+0 blocks out

dd did read 4.7 GB of data from this DVD+RW media, although the recorded
OS 2008.11 image has a size of only 672 MB.


# cdrw -M

Device : TSSTcorp CD/DVDW SH-S183A
Firmware : Rev. SB00 (  0804      )

Track No. |Type    |Start address
----------+--------+-------------
 1        |Data    |0
Leadout   |Data    |2295104

Last session start address: 0



A workaround for the dd test would be to limit the number of
32k reads, using "dd if=/dev/rdsk/c?t?d0p0 of=/dev/null bs=32k count=20500"
so that we stop reading after 670 MB, before we access unrecorded
DVD+RW sectors.



But the box with that Samsung drive and a DVD+RW media
still doesn't reproduce the issue with the drive spinning
down during live-cd boot into the gnome desktop...
Comment 42 Rob_C 2008-12-09 11:42:39 UTC
(In reply to comment #27)
> (In reply to comment #23)
> > (In reply to comment #22)
> > > I ran some tests with a modified global image, using just the lofi module in
> > > attachment 1087 [details] [details] [details] [details], hadn't rebuilt hsfs yet.
> > > 
> > > virtual box (1024 MB memory) on Ultra 24:
> > > installation time improved from 38:22 to 8:13
> > 
> > Was it installed from a media in an optical drive,
> > or did you use an ISO image on HDD?
> > 
> 
> This was just using an ISO image.
> 
> > ...

I find that using VirtualBox with an RC2 .iso image Installer is painfully slow
(in it's booting) compared to using the same image, the same way, with VMware.

_Perhaps_ the reading (and decoding) of the .iso is grabbing a lot of memory
and allowing less to be available for the "un-lzma-ing". Whatever the reason
the difference is very noticeable.

I think that either burning a disk, using VMware "direct .iso", or doing a
bare-metal installation will give a more accurate timing result than using 
VirtualBox "direct .iso" for timing the installation - until VBox is fixed.

Rob
Comment 43 Jürgen Keil 2008-12-10 01:55:17 UTC
(In reply to comment #42)

> I find that using VirtualBox with an RC2 .iso image Installer is painfully slow
> (in it's booting) compared to using the same image, the same way, with VMware.

What exactly is slow? The initial boot, from the GRUB menu until we get to the
language / keyboard layout prompt, esp. when GRUB displays the series of dots
at the top of the console screen?

This is very fast in VirtualBox, when the host cpu is AMD and AMD-V hardware
virtualization is enabled (GRUB loads the kernel in 3 seconds on my AMD box
under virtualbox).  When AMD-V is disabled, GRUB bootstrap procedure is an 
order of magnitude slower.

On Intel there isn't much difference between VT-X enabled / disabled, it's
always quite slow in the initial GRUB boot phase.

AFAIK, virtualbox is able to run x86 real mode code directly on an AMD cpu
when AMD-V is enabled, but is slowly emulating real mode code in software
on Intel cpus.
Comment 44 Dave Miner 2008-12-12 09:32:46 UTC
Created an attachment (id=1162) [details]
test results from install group

Results from my test images.  Substantial performance improvements, no failures
other than low-memory machines which were known to have problems.  Some
outliers that we're still investigating.

Summary installation times:
        2008.11 global  lofi    lofi+hsfs
Mean            60.63    40.41    38.05
Median            48    32    28
Std. Deviation    40.4    31.14    25.73
Comment 45 Dave Miner 2008-12-17 13:23:07 UTC
(In reply to comment #41)
...
> 
> A workaround for the dd test would be to limit the number of
> 32k reads, using "dd if=/dev/rdsk/c?t?d0p0 of=/dev/null bs=32k count=20500"
> so that we stop reading after 670 MB, before we access unrecorded
> DVD+RW sectors.
> 

Re-ran this dd test on the M5.  Ignoring first and last iostat results, it
starts at 3507 kr/s and ramps to 4499 kr/s, indicating CAV behavior.

I also built an image with a re-optimized solaris.zlib for booting, and it
showed no difference in boot performance; the DVD spinning up and down is still
observed.
Comment 46 Jürgen Keil 2008-12-18 05:04:17 UTC
(In reply to comment #45)

> I also built an image with a re-optimized solaris.zlib for booting, and it
> showed no difference in boot performance; the DVD spinning up and down is still
> observed.

The DVD spinning up and down happens on your Tecra M5, in 64-bit mode?
The solaris.zlib is optimized for a 32-bit boot?  Does it spin up and down
during a 32-bit boot, too?
Comment 47 Dave Miner 2008-12-18 07:10:40 UTC
(In reply to comment #46)

> The DVD spinning up and down happens on your Tecra M5, in 64-bit mode?
> The solaris.zlib is optimized for a 32-bit boot?  Does it spin up and down
> during a 32-bit boot, too?

Yes, the DVD behavior is essentially the same whether booted in 32- or 64-bit
mode.
Comment 48 Jürgen Keil 2008-12-18 07:27:29 UTC
(In reply to comment #47)
> (In reply to comment #46)
> 
> > The DVD spinning up and down happens on your Tecra M5, in 64-bit mode?
> > The solaris.zlib is optimized for a 32-bit boot?  Does it spin up and down
> > during a 32-bit boot, too?
> 
> Yes, the DVD behavior is essentially the same whether booted in 32- or 64-bit
> mode.

But install time in 32-bit mode was 14 minutes, vs. 42 minutes in 64-bit mode?
Is there less spinning up and down during install in 32-bit mode?

Why was the install in 32-bit mode so much faster compared to 64-bit mode?
Comment 49 Dave Miner 2008-12-18 07:41:03 UTC
(In reply to comment #48)
> (In reply to comment #47)
> > (In reply to comment #46)
> > 
> > > The DVD spinning up and down happens on your Tecra M5, in 64-bit mode?
> > > The solaris.zlib is optimized for a 32-bit boot?  Does it spin up and down
> > > during a 32-bit boot, too?
> > 
> > Yes, the DVD behavior is essentially the same whether booted in 32- or 64-bit
> > mode.
> 
> But install time in 32-bit mode was 14 minutes, vs. 42 minutes in 64-bit mode?
> Is there less spinning up and down during install in 32-bit mode?
> 

Yes, and yes.

> Why was the install in 32-bit mode so much faster compared to 64-bit mode?

That is the mystery at this point.
Comment 50 Moinak Ghosh 2009-01-08 11:10:27 UTC
There are additional optimizations possible:
1) Remove unnecessary bcopy of uncompressed segments.
2) Do not cache uncompressed segments. Only cache those that need
decompression.
3) In corollary to the above remove an unnecessary kmem_alloc when accessing 
   uncompressed segments.

I will try these out shortly.
Comment 51 David Comay 2009-02-11 09:00:12 UTC
Raising the priority of this as it's important to address this if we can.
Comment 52 Jürgen Keil 2009-02-14 11:44:42 UTC
Created an attachment (id=1364) [details]
Cache and reuse decompressed data

Add a few comments to the code
Comment 53 Jürgen Keil 2009-02-17 06:45:51 UTC
Created an attachment (id=1372) [details]
Fix checks for compressed segment size in lofi module and lofiadm
Comment 54 Jürgen Keil 2009-02-17 09:48:32 UTC
(In reply to comment #22)
> I ran some tests with a modified global image, using just the lofi module in
> attachment 1087 [details] [details], hadn't rebuilt hsfs yet.
> 
> virtual box (1024 MB memory) on Ultra 24:
> installation time improved from 38:22 to 8:13
> 
> Toshiba Tecra M5 (3072 MB memory, Core 2 duo):
> installation time improved from 44:36 to 33:04
> 
> Amusingly, in the M5 case the boot time went up by a minute or so, because the
> CD kept spinning down since we weren't keeping it hot.  Will have to see if
> re-optimizing layout helps there, but might be other choices to make.

Maybe I've found an optical drive that does something similar:

c1t1d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: TSSTcorp Product: CD/DVDW SH-S183A Revision: SB00 Serial No:  

While gui-installer is copying CD contents to HDD
the drive seems to change read speed on it's own
(between 40x / 32x / 24x / 16x ...?). In 40x mode
the drive makes a lot of noise; and it is much more
quiet in the other modes.  But it doesn't spin down
the media completely. During installation I hear
that the drive sometimes makes more and sometimes
makes less noise.

And it happens both with and without lzma decompress
optimization in the lofi module.

Installation time (for 106a) with the changed lofi module
is a bit better (18m30s instead of 22m).

Boot time for the LiveCD is identical.
Comment 55 Jürgen Keil 2009-02-17 09:58:23 UTC
(In reply to comment #37)
> Created an attachment (id=1120) [details] [details]
> sort cpio file lists by file extent location, in an attempt to reduce seek ops
> on the optical drive
> 
> This patch to the python transfer_mod.py tries to minimize the number of 
> optical drive seek operations during the cpio transfer process, by sorting
> the cpio file lists by "file location on disk".
> 
> 
> For a 1.3GHz 32-bit Pentium-M this has reduced installation time by 5min
> (19m15s -> 15m), or 20% less time during cpio.
> 
> For a 2.2Ghz 64-bit AMD64 X2 the installation time was reduced by 30sec
> (5m40s -> 5m09s), or 10% less time used for cpio.

Any chance that we could get this change added to the installer?

It reduces the number of seek operation on the optical device while
transferring CD contents to the HDD, and saves a couple of minutes
of installation time on systems that have an optcal drive with slow
seek times.
Comment 56 Alok Aggarwal 2009-02-17 10:12:55 UTC
(In reply to comment #55)
> (In reply to comment #37)
> > Created an attachment (id=1120) [details] [details] [details]
> > sort cpio file lists by file extent location, in an attempt to reduce seek ops
> > on the optical drive
> > 
> > This patch to the python transfer_mod.py tries to minimize the number of 
> > optical drive seek operations during the cpio transfer process, by sorting
> > the cpio file lists by "file location on disk".
> > 
> > 
> > For a 1.3GHz 32-bit Pentium-M this has reduced installation time by 5min
> > (19m15s -> 15m), or 20% less time during cpio.
> > 
> > For a 2.2Ghz 64-bit AMD64 X2 the installation time was reduced by 30sec
> > (5m40s -> 5m09s), or 10% less time used for cpio.
> 
> Any chance that we could get this change added to the installer?
> 
> It reduces the number of seek operation on the optical device while
> transferring CD contents to the HDD, and saves a couple of minutes
> of installation time on systems that have an optcal drive with slow
> seek times.

I have split this change into a separate bug -

6690 transfer module should sort file list by location

so that this defect (1119) can be used to track just the
performance enhancements to lofi.
Comment 57 Alok Aggarwal 2009-02-19 09:56:47 UTC
(In reply to comment #18)
> Created an attachment (id=1086) [details] [details]
> Cache and reuse decompressed data
> 
> This is my experimental patch to cache and reuse uncompressed data segments.
> 
> By patching the new kernel tunable "lofi_max_comp_cache" you can experiment
> with caching only the last decompressed segment (lofi_max_comp_cache = 1),
> disable caching decompressed data (lofi_max_comp_cache = 0), or caching
> two or more decompressed segments.
> 
> Using lofi_max_comp_cache > 1 appears to be ~ 5-10% faster compared to
> lofi_max_comp_cache == 1.

While adding LZMA support into ON, I've been running some tests and it seems
to me that for read requests that straddle multiple compressed segments it
would be beneficial to cache the last two decompressed segments
(lofi_max_comp_cache = 2) as opposed to just the last one.
Comment 58 Jürgen Keil 2009-02-20 03:06:48 UTC
(In reply to comment #49)
> (In reply to comment #48)
> > (In reply to comment #47)
> > > (In reply to comment #46)
> > > 
> > > > The DVD spinning up and down happens on your Tecra M5, in 64-bit mode?
> > > > The solaris.zlib is optimized for a 32-bit boot?  Does it spin up and down
> > > > during a 32-bit boot, too?
> > > 
> > > Yes, the DVD behavior is essentially the same whether booted in 32- or 64-bit
> > > mode.
> > 
> > But install time in 32-bit mode was 14 minutes, vs. 42 minutes in 64-bit mode?
> > Is there less spinning up and down during install in 32-bit mode?
> > 
> 
> Yes, and yes.
> 
> > Why was the install in 32-bit mode so much faster compared to 64-bit mode?
> 
> That is the mystery at this point.

A new theory for the Tecra M5 issue:

What drivers are used for the HDD and optical drive with the M5?
Is it possible that both devices share one ide channel?
So that I/O on the optical drive blocks I/O on the HDD - and vice versa?
Comment 59 Dave Miner 2009-02-20 10:48:11 UTC
(In reply to comment #58)

> A new theory for the Tecra M5 issue:
> 
> What drivers are used for the HDD and optical drive with the M5?
> Is it possible that both devices share one ide channel?
> So that I/O on the optical drive blocks I/O on the HDD - and vice versa?

An interesting theory, but not the case here.  While the internal drive and the
optical drive both appear to be hung off of the 82801 SATA IDE controller, all
of the test installs I'm reporting are to external USB hard drives (I have a
couple of different ones I use, depending on whether I'm at home or the
office).
Comment 60 Sanjay Nadkarni 2009-03-12 10:46:05 UTC
The lofi caching changes are being tracked 6805505