siddharthteotia edited a comment on pull request #7931:
URL: https://github.com/apache/pinot/pull/7931#issuecomment-998997989


   Discussed offline with @richardstartin
   
   - The change here is limited to V4 format and I am ok with the reason behind 
the change. Basically, with V4 the use of memory mapped buffer for compression 
is pretty much eliminated. So the direct buffer is introduced in this PR to 
help improve the performance for the vast majority of cases 
   
   - Regarding V3 v/s V4: they have similarities but the main difference is 4GB 
limitation imposed in V4. V3 has no such limitation as the chunk offset in the 
file header is tracked using `long`. V3 also derives the `numDocsPerChunk` (as 
opposed to using fixed 1000 rows per chunk) or packs a single row in the chunk 
if the `row > 1MB`. In V3 we never thought of using memory mapping for 
compressed buffer even though our chunks are potentially huge in size (from the 
production use case where V3 is being used). 
   
   - Regarding changing the default from V2: I didn't make V3 default initially 
since wanted to test it out in production. It's been used for over an year now 
and hasn't had any issues. We discussed that we can bridge the functionality 
gap between V3 and V4 as a new change and then hopefully remove V3 code 
(ensuring the V4+ reader continues to read it) if possible since V3 and V4 had 
common goals 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to