Hi Mike,

On Tue, 25 Oct 2011, Mike Simons wrote:

My guess of the easiest way to "fix" this is to change the behavior
of the splitter.  Instead of sending random blocks of input to the
worker threads for them to split, the splitter actually reads and scans
for the block markers at the same time and passes along single blocks into
a queue of worker threads.

This was the initial logic, but that made the splitter CPU-bound. Above some number of worker threads, the splitter could emit blocks for decompression only slower than the combined throughput of the worker threads would have allowed, in effect starving the workers. See the changelog for 0.05.

I think this approach would mean that blocks always being decompression
in correct order and it's very easy to control the size of worker to muxer
queue.  A downside is the splitter would be both I/O and CPU bound scanning
for blocks ... which could be "fixed" by having two threads a "reader" and
a different "splitter" (aka "scanner).

The scanner would still become a bottleneck.

I tested pbzip2's (IIRC) memmem()-based splitter in April 2010 on the SPARC described in [1]. Even that method seemed to starve the workers when there were at least 50 of the latter, and possibly even earlier.

I do fallback on pbzip2 when lbzip2 fails.

I should re-compare them.

Re-comparing is always good, but you should also stay tuned for lbzip2-2.0! :)

Laszlo

[1] http://lacos.hu/lbzip2-scaling/scaling.html#Hardware




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to