awishnick opened a new issue #6894:
URL: https://github.com/apache/incubator-pinot/issues/6894


   I tried to run a SegmentCreation job to ingest some Parquet files written by 
Trino. I got some confusing error messages that made it look like the file was 
corrupted. It turns out that this is because the files were compressed with 
ZSTD (which is suggested by Trino). Desired behavior would be to detect and 
support ZSTD compression, or at least to error out saying to use a supported 
compression algorithm.
   
   Examples of the error:
   ```
   java.io.IOException: Could not read footer: java.io.IOException: Could not 
read footer for file 
DeprecatedRawLocalFileStatus{path=file:/tmp/pinot-7dd1e9e9-b1bd-416c-ab
   
4b-1e66a887d7ca/input/20210507_201917_26196_rv8nu_14b41b59-66e3-4f97-9df0-56c76d859102;
 isDirectory=false; length=3804; replication=1; blocksize=33554432; 
modification_time=1620419283015; access_time=0; owner=; group=; 
permission=rw-rw-rw-; isSymlink=false}                                          
                                   
           at 
org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:248)
 
~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]
     
   
   ...
   
   Caused by: java.io.IOException: can not read class 
org.apache.parquet.format.FileMetaData: Required field 'codec' was not present! 
Struct: ColumnMetaData(type:INT32, e
   ncodings:[BIT_PACKED, PLAIN_DICTIONARY, RLE], path_in_schema:[date], 
codec:null, num_values:2, total_uncompressed_size:54, total_compressed_size:72, 
data_page_offset:4
   , statistics:Statistics(max:7F 62 34 01, min:7F 62 34 01, null_count:0), 
encoding_stats:[PageEncodingStats(page_type:DICTIONARY_PAGE, 
encoding:PLAIN_DICTIONARY, count:
   1), PageEncodingStats(page_type:DATA_PAGE, encoding:PLAIN_DICTIONARY, 
count:1)])            
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to