davecromberge opened a new issue, #12111: URL: https://github.com/apache/pinot/issues/12111
Problem --------- Some aggregation functions are closely related and can produce results from the same underlying metric. In the StarTree index, the metric data is replicated for each function name pair. For example: ``` { "dimensionsSplitOrder": [ "team" ], "functionColumnPairs": [ "DISTINCT_COUNT_CPC_SKETCH__players_cpc", "DISTINCT_COUNT_RAW_CPC_SKETCH__players_cpc" ], } ``` If the "players" metric was an array of bytes, this would be replicated in the StarTree for each aggregation function above. This increases storage and the resources used to construct and merge segments. Proposal --------- I would like to propose using the value aggregator name to remove redundant metric computation, and de-duplicate these to store the metric once. This would impact the StarTree construction logic and the query evaluation logic. It would be nice to have feedback on this idea before I explore it further. /cc @Jackie-Jiang @snleee -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org