Interesting questions, Tibor - and some of them have quite complex answers. I think this will end up as some kind of blog post when we approach release.
First the obvious: Scatter the cpu-intensive compression algorithm (zip Deflater) to all available threads. Initially I let each thread write it's own complete (consistent zip file), but I pretty soon discovered that writing the "correct" zip file format was too costly and quite suboptimal. Zip is a pretty bastardly format that cannot be written single pass. So "ScatterZipOutputStream" was born, which keeps all the metadata in memory but writes compressed output in a raw format to a "target". Inside commons-compress this target is always a tempfile. Inside plexus-archiver OffloadingOutputStream (a commons-compress ScatterOutputStream) is used. This writes to some pretty huge memory buffers, but when a certain treshold is reached it offloads to tempfile while retaining what was initially written to memory. I'll be putting this offloading stream class in commons-io fairly soon. The interesting thing is that the difference between these two is smaller than one could think on my mac. Perhaps SSD based IO and effective OS-level caching is enough. I have yet to perform structured measurements on this; it's somewhere on that todo list that seems to grow/get lost all the time :) At some point I was happy with the "scatter" bit; all the cores were busy doing compress. Then my problem was that the "gather" phase was far too slow. As an example; my 6 core machine could produce a 400mb zip file in around 5 seconds; ~2 seconds with all 12 cpu threads at 100%, 3 seconds with a single thread just writing the consolidated output stream. Clearly this is not what a good nerd wants :=) The biggest "gain" I managed to find in this last phase was to modify the algorithm in ZipArchiveOutputStream to be able to write the correct archive single pass (since this was possible with ScatterZipOutputStream as source - all sizes and checksums are known). Thereafter I tried a quite large number of different strategies, about half of which I committed to commons-compress. One strategy was to avoid lots of small calls to RandomAccessFile#write. Building larger byte arrays in memory was significantly faster. I have this favourite strategy for pinching 5% improvements; simply run with a profiler and look for object allocations. Wherever there's excessive allocations, there's inefficiencies. The most interesting one as usual is https://github.com/apache/commons-compress/commit/fee28f061d91351d93edf13156d142ac00fd0764 I replaced a LinkedHashMap with a simple array (that would normally only contain 1-3 elements), which performs much better in terms of locality, for an overall performance improvement of 10% on the "bread and butter" use case; and that's overall performance (!). I would really like to have moved the zip header creations in its completeness into the multithreaded part. Conceptually it should be doable. But c-compress has compatibility constraints that make this hard; it's been a fairly hard fight already. Kristian 2015-01-10 18:33 GMT+01:00 Tibor Digana <tibordig...@apache.org>: > Great job Kristian! > > Where was the hotspot you gained the performance? Was it just the Java code > you add ZipEntries in the stream, or parallel writes in file, or this > improvement is specific on the hard drive? Does it apply to normal hard > drive or SSD better maybe? > > > > -- > View this message in context: > http://maven.40175.n5.nabble.com/Preview-release-Plexus-Archiver-multithreaded-Zip-edition-tp5822942p5823013.html > Sent from the Maven Developers mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org > For additional commands, e-mail: dev-h...@maven.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org For additional commands, e-mail: dev-h...@maven.apache.org