kkrugler opened a new issue #6311:
URL: https://github.com/apache/incubator-pinot/issues/6311


   Currently a big segment fails during the “converting segment” phase:
   
   ```
   Converting segment: 
/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0
 to v3 format
   v3 segment location for segment: crawldata_OFFLINE_2018-10-13_2020-10-11_0 
is 
/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3
   Deleting files in v1 segment directory: 
/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0
   Computed crc = 1033854200, based on files 
[/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3/columns.psf,
 
/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3/index_map,
 
/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3/metadata.properties]
   Driver, record read time : 236809
   Driver, stats collector time : 0
   Driver, indexing time : 122449
   Tarring segment from: 
/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0
 to: 
/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0.tar.gz
   Failed to generate Pinot segment for file - 
s3://adbeat-pinot-files/compressed/3.gz
   java.lang.RuntimeException: entry size ‘8991809155’ is too big ( > 
8589934591 ).
        at 
org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.failForBigNumber(TarArchiveOutputStream.java:636)
 
~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
   ```
   
   As per https://commons.apache.org/proper/commons-compress/tar.html, Pinot 
should be using `BIGNUMBER_POSIX` for the bigNumberMode so that it doesn't have 
an 8GB limit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to