[
https://issues.apache.org/jira/browse/IMPALA-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18068789#comment-18068789
]
ASF subversion and git services commented on IMPALA-14700:
----------------------------------------------------------
Commit adcff60d2d4ff5ec2556bf90a088804407247ed8 in impala's branch
refs/heads/master from Balazs Hevele
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=adcff60d2 ]
IMPALA-14700: Add read support for Parquet LZ4_RAW compression
This change requires a Parquet Version higher than the current
one (1.12.3), because of the LZ4_RAW Thrift enum value.
For that reason, APACHE_PARQUET_VERSION is increased to 1.15.2 in this
patch, and is used instead of CDP_PARQUET_VERSION, until
CDP_PARQUET_VERSION gets to a high enough version.
Parquet deprecated LZ4 compression, and added a new one, LZ4_RAW.
This patch adds read support for LZ4_RAW. It uses Lz4Compressor
(Corresponding to THdfsCompression::LZ4).
The write path hasn't changed and continues to use LZ4_BLOCKED.
Testing:
-Added a small test file using lz4_raw compression, from the
parquet-testing repository.
-Added a test case to test_scanners.py to check we can read the file.
Change-Id: I22ee4e5bf9abec37be941c1dca8019a563343d34
Reviewed-on: http://gerrit.cloudera.org:8080/24059
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Add support for Parquet's LZ4_RAW compression
> ---------------------------------------------
>
> Key: IMPALA-14700
> URL: https://issues.apache.org/jira/browse/IMPALA-14700
> Project: IMPALA
> Issue Type: Task
> Components: Backend
> Affects Versions: Impala 5.0.0
> Reporter: Joe McDonnell
> Assignee: Balazs Hevele
> Priority: Major
> Fix For: Impala 5.0.0
>
>
> Parquet's current LZ4 compression uses a framing mechanism from Hadoop.
> Parquet decided to deprecate this and instead introduced the LZ4_RAW
> compression without the Hadoop framing. See
> https://issues.apache.org/jira/browse/PARQUET-1996 /
> https://issues.apache.org/jira/browse/PARQUET-2032
> We should add support for reading / writing LZ4_RAW. This should be fairly
> simple, as LZ4_RAW just uses the block compression directly. It should
> correspond to Lz4Compressor rather than Lz4BlockCompressor.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]