davecromberge opened a new pull request, #12164: URL: https://github.com/apache/pinot/pull/12164
This PR relates to https://github.com/apache/pinot/issues/12111 Some aggregation functions have different calcite signatures but can leverage the same underlying pre-computed aggregate. This is commonly true for many sketch functions where the underlying aggregate is the sketch itself which is typically stored as an array of bytes. To use these functions at query time together with a StarTree, the aggregate would need to be duplicated. This is inefficient in practice because data volume increases and segments typically have fewer rows, often adversely affecting query performance for certain query patterns. To address this problem it is possible to encode the association via a mapping between a query time aggregate and the underlying index value aggregate. This can be done implicitly or explicitly by allowing the user to encode the function in the `AggregateSpec` within the Table configuration. However, there are properties of the system that require careful consideration for these changes. Ultimately, it's not clear whether a segment's metadata should reflect the query aggregates that are supported or the value aggregates that are actually stored. Some use cases affected are: - StarTree index rebuild when metadata is compared to aggregation spec configuration - StarTree index fit and whether it covers a query From an initial attempt and investigation it appears correct to reflect what the segment actually contains in its metadata. Finally, if a relationship changes over time between a query aggregate and the value aggregate, this might result in undesirable behaviour if a segment was not actually constructed with the new mapped value in mind. I'd be grateful for any input on this work and how best to proceed. `release-notes`: - New configuration options - StarTree efficiency optimization -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org