Hi Mike,
On Tue, 25 Oct 2011, Mike Simons wrote:
My guess of the easiest way to "fix" this is to change the behavior
of the splitter. Instead of sending random blocks of input to the
worker threads for them to split, the splitter actually reads and scans
for the block markers at the same time and passes along single blocks into
a queue of worker threads.
This was the initial logic, but that made the splitter CPU-bound. Above
some number of worker threads, the splitter could emit blocks for
decompression only slower than the combined throughput of the worker
threads would have allowed, in effect starving the workers. See the
changelog for 0.05.
I think this approach would mean that blocks always being decompression
in correct order and it's very easy to control the size of worker to muxer
queue. A downside is the splitter would be both I/O and CPU bound scanning
for blocks ... which could be "fixed" by having two threads a "reader" and
a different "splitter" (aka "scanner).
The scanner would still become a bottleneck.
I tested pbzip2's (IIRC) memmem()-based splitter in April 2010 on the
SPARC described in [1]. Even that method seemed to starve the workers when
there were at least 50 of the latter, and possibly even earlier.
I do fallback on pbzip2 when lbzip2 fails.
I should re-compare them.
Re-comparing is always good, but you should also stay tuned for
lbzip2-2.0! :)
Laszlo
[1] http://lacos.hu/lbzip2-scaling/scaling.html#Hardware
--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org