richardstartin commented on pull request #7595: URL: https://github.com/apache/pinot/pull/7595#issuecomment-947836855
I had to force derivation of `numDocs` for variable length data because there's no good solution to the buffer size problem given the following constraints: * There is a fixed number of documents per chunk * We don't want to OOM if there is a very large row in a segment, and applying an arbitrary multiplier amplifies this risk * Compression is applied at a chunk level, not intrachunk * The compression libraries all require a single buffer When there is a very large row (> 1MB) we end up with 1 doc per chunk in the segment. The only good solution is to evolve the forward index format to allow variable numbers of docs per chunk for variable length data, but we can do that later if this becomes a problem, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org