lnbest0707-uber opened a new pull request, #15120:
URL: https://github.com/apache/pinot/pull/15120

   `enhancement` `ingestion`
   Inspired by https://github.com/apache/pinot/pull/14479
   Apart from the MVForwardIndex check added in that PR, this PR adds similar 
check for SVForwardIndex and Dictionary. The PR only adds the interface and UTs.
   The actual check policy is not within this PR and could be an open question 
to discuss. The end criteria needs to consider
   
   - ForwardIndex would end in a 4GB limit once converted into immutable 
segment.
   - Dictionary has its cardinality limit
   
   While it is tricky to make it correct as:
   
   - 4GB limit is a **compressed** size, during ingestion into mutable segment, 
it cannot predict the correct compression ratio
   - With `optimizeDictionary` enabled, Dictionary encoding could also end in 
immutable forward index. It would be even harder to guess
         - If Dictionary would be converted to ForwardIndex
         - If converted, what would be the final compressed size
   
   Some proposed policies:
   
   - Make the uncompressed size limit configurable, relying on that value to 
set the mutable segment's threshold.
   - Use the last segment's compression ratio as reference to predict current 
segment's.
   - For dictionary, use the same policy (but maybe a larger number by config) 
as the forward index.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to