kkrugler opened a new issue #6311: URL: https://github.com/apache/incubator-pinot/issues/6311
Currently a big segment fails during the “converting segment” phase: ``` Converting segment: /tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0 to v3 format v3 segment location for segment: crawldata_OFFLINE_2018-10-13_2020-10-11_0 is /tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3 Deleting files in v1 segment directory: /tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0 Computed crc = 1033854200, based on files [/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3/columns.psf, /tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3/index_map, /tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3/metadata.properties] Driver, record read time : 236809 Driver, stats collector time : 0 Driver, indexing time : 122449 Tarring segment from: /tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0 to: /tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0.tar.gz Failed to generate Pinot segment for file - s3://adbeat-pinot-files/compressed/3.gz java.lang.RuntimeException: entry size ‘8991809155’ is too big ( > 8589934591 ). at org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.failForBigNumber(TarArchiveOutputStream.java:636) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] ``` As per https://commons.apache.org/proper/commons-compress/tar.html, Pinot should be using `BIGNUMBER_POSIX` for the bigNumberMode so that it doesn't have an 8GB limit. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org