itschrispeck commented on PR #12945: URL: https://github.com/apache/pinot/pull/12945#issuecomment-2065911916
> I don't think we need a lower bound for the chunk size. We can probably simply do `min(maxLength * DEFAULT_NUM_DOCS_PER_CHUNK, TARGET_MAX_CHUNK_SIZE)`. I had the same thought, but we ran into two issues that blocked segment build: maxLength can be 0, and int overflow for large maxLength. Setting a minimum size seemed like a good way to catch both cases. > I feel it can also be useful to allow user to specify the chunk size Makes a lot of sense. I added a config `targetMaxChunkSize` which sets the upper bound. The chunk size can still be dynamically reduced if `maxLength` is small, since I couldn't think of a strong case for a user increasing chunk size when values are always short. I will document the behavior in the docs. Reducing the max chunk size is very useful for both avoiding on the fly allocations/huge chunks w/ V4, and in reducing direct buffer usage. This config can also apply to V2/V3 format with `deriveNumDocsPerChunk`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org