davecromberge opened a new pull request, #12164:
URL: https://github.com/apache/pinot/pull/12164

   This PR relates to https://github.com/apache/pinot/issues/12111
   
   Some aggregation functions have different calcite signatures but can 
leverage the same underlying pre-computed aggregate.  This is commonly true for 
many sketch functions where the underlying aggregate is the sketch itself which 
is typically stored as an array of bytes.
   
   To use these functions at query time together with a StarTree, the aggregate 
would need to be duplicated.  This is inefficient in practice because data 
volume increases and segments typically have fewer rows, often adversely 
affecting query performance for certain query patterns.  
   
   To address this problem it is possible to encode the association via a 
mapping between a query time aggregate and the underlying index value 
aggregate.  This can be done implicitly or explicitly by allowing the user to 
encode the function in the `AggregateSpec` within the Table configuration.
   
   However, there are properties of the system that require careful 
consideration for these changes.  Ultimately, it's not clear whether a 
segment's metadata should reflect the query aggregates that are supported or 
the value aggregates that are actually stored.  Some use cases affected are:
   - StarTree index rebuild when metadata is compared to aggregation spec 
configuration
   - StarTree index fit and whether it covers a query
   
   From an initial attempt and investigation it appears correct to reflect what 
the segment actually contains in its metadata.  
   Finally, if a relationship changes over time between a query aggregate and 
the value aggregate, this might result in undesirable behaviour if a segment 
was not actually constructed with the new mapped value in mind.
   
   I'd be grateful for any input on this work and how best to proceed.
   
   `release-notes`:
   - New configuration options
   - StarTree efficiency optimization
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to