[ 
https://issues.apache.org/jira/browse/IMPALA-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18068789#comment-18068789
 ] 

ASF subversion and git services commented on IMPALA-14700:
----------------------------------------------------------

Commit adcff60d2d4ff5ec2556bf90a088804407247ed8 in impala's branch 
refs/heads/master from Balazs Hevele
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=adcff60d2 ]

IMPALA-14700: Add read support for Parquet LZ4_RAW compression

This change requires a Parquet Version higher than the current
one (1.12.3), because of the LZ4_RAW Thrift enum value.
For that reason, APACHE_PARQUET_VERSION is increased to 1.15.2 in this
patch, and is used instead of CDP_PARQUET_VERSION, until
CDP_PARQUET_VERSION gets to a high enough version.

Parquet deprecated LZ4 compression, and added a new one, LZ4_RAW.
This patch adds read support for LZ4_RAW. It uses Lz4Compressor
(Corresponding to THdfsCompression::LZ4).
The write path hasn't changed and continues to use LZ4_BLOCKED.

Testing:
-Added a small test file using lz4_raw compression, from the
parquet-testing repository.
-Added a test case to test_scanners.py to check we can read the file.

Change-Id: I22ee4e5bf9abec37be941c1dca8019a563343d34
Reviewed-on: http://gerrit.cloudera.org:8080/24059
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Add support for Parquet's LZ4_RAW compression
> ---------------------------------------------
>
>                 Key: IMPALA-14700
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14700
>             Project: IMPALA
>          Issue Type: Task
>          Components: Backend
>    Affects Versions: Impala 5.0.0
>            Reporter: Joe McDonnell
>            Assignee: Balazs Hevele
>            Priority: Major
>             Fix For: Impala 5.0.0
>
>
> Parquet's current LZ4 compression uses a framing mechanism from Hadoop. 
> Parquet decided to deprecate this and instead introduced the LZ4_RAW 
> compression without the Hadoop framing. See 
> https://issues.apache.org/jira/browse/PARQUET-1996 / 
> https://issues.apache.org/jira/browse/PARQUET-2032
> We should add support for reading / writing LZ4_RAW. This should be fairly 
> simple, as LZ4_RAW just uses the block compression directly. It should 
> correspond to Lz4Compressor rather than Lz4BlockCompressor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to