[jira] [Updated] (HADOOP-10047) Add a directbuffer Decompressor API to hadoop

Gopal V (JIRA) Wed, 06 Nov 2013 11:25:24 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gopal V updated HADOOP-10047:
-----------------------------

    Attachment: decompress-benchmark.tgz

A multi-threaded decompress benchmark comparing Deflater vs ZlibDirect

I build my hadoop-trunk branch with version 3.0.0-COMPRESS

$ mvn package -Dhadoop.version=3.0.0-COMPRESS
$ LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/ java -jar  
target/compress-benchmark-1.0-SNAPSHOT.jar -n 200 -s 4096 -p 4

this spawns 4 threads and tries to decompress 200 tasks each of 4mb raw data in 
an executor & prints out the sum of all System.currentMillis() spent.

And we get the following nearly linear trend with both methods

|| 4mb x || before || after ||
|n=1|181|8|
|n=10|1816|60|
|n=100|18183|480|


> Add a directbuffer Decompressor API to hadoop
> ---------------------------------------------
>
>                 Key: HADOOP-10047
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10047
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io
>    Affects Versions: 2.3.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: compression
>             Fix For: 3.0.0
>
>         Attachments: DirectCompressor.html, DirectDecompressor.html, 
> HADOOP-10047-WIP.patch, HADOOP-10047-final.patch, 
> HADOOP-10047-redo-WIP.patch, HADOOP-10047-trunk.patch, 
> HADOOP-10047-with-tests.patch, decompress-benchmark.tgz
>
>
> With the Zero-Copy reads in HDFS (HDFS-5260), it becomes important to perform 
> all I/O operations without copying data into byte[] buffers or other buffers 
> which wrap over them.
> This is a proposal for adding a DirectDecompressor interface to the 
> io.compress, to indicate codecs which want to surface the direct buffer layer 
> upwards.
> The implementation should work with direct heap/mmap buffers and cannot 
> assume .array() availability.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HADOOP-10047) Add a directbuffer Decompressor API to hadoop

Reply via email to