[
https://issues.apache.org/jira/browse/HADOOP-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gopal V updated HADOOP-10047:
-----------------------------
Attachment: decompress-benchmark.tgz
A multi-threaded decompress benchmark comparing Deflater vs ZlibDirect
I build my hadoop-trunk branch with version 3.0.0-COMPRESS
$ mvn package -Dhadoop.version=3.0.0-COMPRESS
$ LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/ java -jar
target/compress-benchmark-1.0-SNAPSHOT.jar -n 200 -s 4096 -p 4
this spawns 4 threads and tries to decompress 200 tasks each of 4mb raw data in
an executor & prints out the sum of all System.currentMillis() spent.
And we get the following nearly linear trend with both methods
|| 4mb x || before || after ||
|n=1|181|8|
|n=10|1816|60|
|n=100|18183|480|
> Add a directbuffer Decompressor API to hadoop
> ---------------------------------------------
>
> Key: HADOOP-10047
> URL: https://issues.apache.org/jira/browse/HADOOP-10047
> Project: Hadoop Common
> Issue Type: New Feature
> Components: io
> Affects Versions: 2.3.0
> Reporter: Gopal V
> Assignee: Gopal V
> Labels: compression
> Fix For: 3.0.0
>
> Attachments: DirectCompressor.html, DirectDecompressor.html,
> HADOOP-10047-WIP.patch, HADOOP-10047-final.patch,
> HADOOP-10047-redo-WIP.patch, HADOOP-10047-trunk.patch,
> HADOOP-10047-with-tests.patch, decompress-benchmark.tgz
>
>
> With the Zero-Copy reads in HDFS (HDFS-5260), it becomes important to perform
> all I/O operations without copying data into byte[] buffers or other buffers
> which wrap over them.
> This is a proposal for adding a DirectDecompressor interface to the
> io.compress, to indicate codecs which want to surface the direct buffer layer
> upwards.
> The implementation should work with direct heap/mmap buffers and cannot
> assume .array() availability.
--
This message was sent by Atlassian JIRA
(v6.1#6144)