[jira] [Commented] (HADOOP-10047) Add a directbuffer Decompressor API to hadoop

Colin Patrick McCabe (JIRA) Thu, 31 Oct 2013 12:01:39 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810553#comment-13810553
 ]


Colin Patrick McCabe commented on HADOOP-10047:
-----------------------------------------------

OK.  So your concern is that if the {{DirectDecompressor}}  object held a 
reference to the {{src}} buffer, it might be a dangling buffer after the user 
closed it.  That's reasonable.  In that case, I think the following should have 
a try... finally block to ensure we don't hold on to a reference to {{src}}:

{code}
+        Buffer originalCompressed = compressedDirectBuf;
+        Buffer originalUncompressed = uncompressedDirectBuf;
+        compressedDirectBuf = src;
+        [....]
+        compressedDirectBuf = originalCompressed;
+        uncompressedDirectBuf = originalUncompressed;        
{code}

It also makes sense that you may need to pass another {{src}} buffer to 
continue decompressing a multi-block file.  After all, each 
{{MappedByteBuffer}} covers at most one block.

bq. And as for using different src buffers, it would've been nice if it worked 
on all algorithms because mapped direct buffers cannot be consolidated. So if 
the compressed stream goes over the block boundaries, it makes sense to call 
decompress more times with new src buffers -

I guess the question is what happens if someone has something compressed via 
snappy that spans multiple blocks / {{MappedByteBuffer}} instances.  Will they 
be able to decompress that using this code?

> Add a directbuffer Decompressor API to hadoop
> ---------------------------------------------
>
>                 Key: HADOOP-10047
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10047
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io
>    Affects Versions: 2.3.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: compression
>             Fix For: 2.3.0
>
>         Attachments: DirectCompressor.html, DirectDecompressor.html, 
> HADOOP-10047-WIP.patch, HADOOP-10047-final.patch, 
> HADOOP-10047-redo-WIP.patch, HADOOP-10047-with-tests.patch
>
>
> With the Zero-Copy reads in HDFS (HDFS-5260), it becomes important to perform 
> all I/O operations without copying data into byte[] buffers or other buffers 
> which wrap over them.
> This is a proposal for adding a DirectDecompressor interface to the 
> io.compress, to indicate codecs which want to surface the direct buffer layer 
> upwards.
> The implementation should work with direct heap/mmap buffers and cannot 
> assume .array() availability.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-10047) Add a directbuffer Decompressor API to hadoop

Reply via email to