itschrispeck commented on PR #12945:
URL: https://github.com/apache/pinot/pull/12945#issuecomment-2065911916

   > I don't think we need a lower bound for the chunk size. We can probably 
simply do `min(maxLength * DEFAULT_NUM_DOCS_PER_CHUNK, TARGET_MAX_CHUNK_SIZE)`.
   
   I had the same thought, but we ran into two issues that blocked segment 
build: maxLength can be 0, and int overflow for large maxLength. Setting a 
minimum size seemed like a good way to catch both cases. 
   
   > I feel it can also be useful to allow user to specify the chunk size
   
   Makes a lot of sense. I added a config `targetMaxChunkSize` which sets the 
upper bound. The chunk size can still be dynamically reduced if `maxLength` is 
small, since I couldn't think of a strong case for a user increasing chunk size 
when values are always short. I will document the behavior in the docs. 
Reducing the max chunk size is very useful for both avoiding on the fly 
allocations/huge chunks w/ V4, and in reducing direct buffer usage.
   
   This config can also apply to V2/V3 format with `deriveNumDocsPerChunk`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to