richardstartin commented on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-947836855


   I had to force derivation of `numDocs` for variable length data because 
there's no good solution to the buffer size problem given the following 
constraints:
   
   * There is a fixed number of documents per chunk
   * We don't want to OOM if there is a very large row in a segment, and 
applying an arbitrary multiplier amplifies this risk
   * Compression is applied at a chunk level, not intrachunk
   * The compression libraries all require a single buffer
   
   When there is a very large row (> 1MB) we end up with 1 doc per chunk in the 
segment. The only good solution is to evolve the forward index format to allow 
variable numbers of docs per chunk for variable length data, but we can do that 
later if this becomes a problem,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to