Bug#645999: (lbzip2: uses excessive amounts of memory when decompressing highly compressible data)

Ersek, Laszlo Tue, 25 Oct 2011 16:03:24 -0700

Hi Mike,

On Tue, 25 Oct 2011, Mike Simons wrote:

My guess of the easiest way to "fix" this is to change the behavior
of the splitter.  Instead of sending random blocks of input to the
worker threads for them to split, the splitter actually reads and scans
for the block markers at the same time and passes along single blocks into
a queue of worker threads.

This was the initial logic, but that made the splitter CPU-bound. Abovesome number of worker threads, the splitter could emit blocks fordecompression only slower than the combined throughput of the workerthreads would have allowed, in effect starving the workers. See thechangelog for 0.05.

I think this approach would mean that blocks always being decompression
in correct order and it's very easy to control the size of worker to muxer
queue.  A downside is the splitter would be both I/O and CPU bound scanning
for blocks ... which could be "fixed" by having two threads a "reader" and
a different "splitter" (aka "scanner).


The scanner would still become a bottleneck.

I tested pbzip2's (IIRC) memmem()-based splitter in April 2010 on theSPARC described in [1]. Even that method seemed to starve the workers whenthere were at least 50 of the latter, and possibly even earlier.

I do fallback on pbzip2 when lbzip2 fails.

I should re-compare them.

Re-comparing is always good, but you should also stay tuned forlbzip2-2.0! :)


Laszlo

[1] http://lacos.hu/lbzip2-scaling/scaling.html#Hardware




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#645999: (lbzip2: uses excessive amounts of memory when decompressing highly compressible data)

Reply via email to