[
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746346#comment-15746346
]
Jason Lowe commented on HADOOP-13578:
-------------------------------------
The thing I'm worried about is that when we call ZSTD_compressStream we are
passing descriptors for both the input buffer and the output buffer. When we
call ZSTD_endStream we are only passing the descriptor for the output buffer.
Therefore I don't know how ZSTD_endStream is supposed to finish consuming any
input that ZSTD_compressStream didn't get to if it doesn't have access to that
input buffer descriptor.
Looking at the zstd code you'll see that when it does call ZSTD_compressStream
inside ZSTD_endStream, it's calling it with srcSize == 0. That means there is
no more source to consume. So if the last call of the JNI code to
ZSTD_compressStream did not fully consume the input buffer's data (i.e.: input
pos is not moved to the end of the data) then it looks like calling
ZSTD_endStream will simply flush out what input data did make it and then end
the frame. That matches what the documentation for ZSTD_endStream says. So I
still think we need to make sure we do not call ZSTD_endStream if input.pos is
not at the end of the input buffer after we call ZSTD_compressStream, or we
risk losing the last chunk of data if the zstd library for some reason cannot
fully consume the input buffer when we try to finish.
> Add Codec for ZStandard Compression
> -----------------------------------
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
> Issue Type: New Feature
> Reporter: churro morales
> Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch,
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch,
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6
> months by facebook now. v1.0 was recently released. Create a codec for this
> library.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]