siddharthteotia edited a comment on pull request #7931: URL: https://github.com/apache/pinot/pull/7931#issuecomment-998997989
Discussed offline with @richardstartin - The change here is limited to V4 format and I am ok with the reason behind the change. Basically, with V4 the use of memory mapped buffer for compression is pretty much eliminated. So the direct buffer is introduced in this PR to help improve the performance for the vast majority of cases - Regarding V3 v/s V4: they have similarities but the main difference is 4GB limitation imposed in V4. V3 has no such limitation as the chunk offset in the file header is tracked using `long`. V3 also derives the `numDocsPerChunk` (as opposed to using fixed 1000 rows per chunk) or packs a single row in the chunk if the `row > 1MB`. In V3 we never thought of using memory mapping for compressed buffer even though our chunks are potentially huge in size (from the production use case where V3 is being used). - Regarding changing the default from V2: I didn't make V3 default initially since wanted to test it out in production. It's been used for over an year now and hasn't had any issues. We discussed that we can bridge the functionality gap between V3 and V4 as a new change and then hopefully remove V3 code (ensuring the V4+ reader continues to read it) if possible since V3 and V4 had common goals -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org