yashmayya commented on code in PR #13835: URL: https://github.com/apache/pinot/pull/13835#discussion_r1743453260
########## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/startree/AggregationFunctionColumn.java: ########## @@ -127,9 +149,14 @@ public boolean equals(Object obj) { if (this == obj) { return true; } - if (obj instanceof AggregationFunctionColumnPair) { - AggregationFunctionColumnPair anotherPair = (AggregationFunctionColumnPair) obj; - return _functionType == anotherPair._functionType && _column.equals(anotherPair._column); + if (obj instanceof AggregationFunctionColumn) { + AggregationFunctionColumn other = (AggregationFunctionColumn) obj; + // TODO: Revisit this since it means that for aggregation functions where a certain config parameter need not be + // checked to determine whether a query can be served by a star-tree index, we won't rebuild a star-tree index + // if the parameter value is changed in the index configuration. + return _functionType == other._functionType && _column.equals(other._column) + && AggregationFunctionType.compareFunctionParametersForStarTree(_functionType, _functionParameters, + other._functionParameters) == 0; Review Comment: Hm actually I just realized that there's an assumption probably baked into many places that one star-tree can't have multiple pre-aggregations for the same aggregation function / column pair (see [here](https://github.com/apache/pinot/blob/d411f34b869aaecad803a1dde659a712c8e8c18a/pinot-core/src/main/java/org/apache/pinot/core/startree/plan/StarTreeProjectPlanNode.java#L61-L91) for instance where we're using the `functionName__columnName` to retrieve data from the star-tree). I think it might be cleaner overall to make this assumption explicit now that we're introducing configurable function parameters - essentially add a table config validation that prevents a single star-tree index configuration (`StarTreeIndexConfig`) from having multiple configurations for the same function / column in `StarTreeAggregationConfig`. So, if a user wants to have a pre-aggregation on `DISTINCTCOUNTHLL(col, 16)` as well as `DISTINCTCOUNTHLL(col, 8)` for some reason, these should be configured in sepa rate star-trees. This way we can safely continue to use the existing logic for retrieving data from a star-tree index once it has already been matched for a query. I agree with your comment on generalizing the logic of matching aggregations in `StarTreeUtils.createStarTreeBasedProjectOperator()` though. I like your idea of modifying the `AggregationFunction` interface to allow each aggregation to decide whether a particular star-tree index can be used. Although I'm thinking that given the above observations on not having multiple entities for the same function / column pair in a single star-tree, we can extract the function parameters out of this `AggregationFunctionColumn` class (reverting it to what it was earlier). We can instead maintain the function parameters alongside the existing info in something like a `Map<AggregationFunctionColumnPair, Pair<Map<String, Object>, AggregationSpec>` (which also seems easier to reason about) in `StarTreeV2BuilderConfig`, `StarTreeV2Metadata` etc. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org