yashmayya commented on code in PR #13835: URL: https://github.com/apache/pinot/pull/13835#discussion_r1746502018
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/aggregator/DistinctCountCPCSketchValueAggregator.java: ########## @@ -34,11 +35,11 @@ public class DistinctCountCPCSketchValueAggregator implements ValueAggregator<Ob private final int _lgK; public DistinctCountCPCSketchValueAggregator(List<ExpressionContext> arguments) { Review Comment: Sketches with different `lgK` are mergeable - while the accuracy will differ, I was initially a bit hesitant to add this check because the `DistinctCountCPCSketchAggregationFunction` takes in the number of nominal entries which is `2 ^ lgK` whereas `DistinctCountCPCSketchValueAggregator` takes in `lgK` directly. So I was wondering if it'll be confusing for users to determine which queries with `DistinctCountCPCSketchAggregationFunction` can use the star-tree index. However, thinking about it some more, it doesn't really make sense to allow configuring `lgK` on a star-tree index for `DistinctCountCPCSketchAggregationFunction` but not add the corresponding query matching logic. I've made this change and I'll plan to add this caveat to the documentation. ########## pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunction.java: ########## @@ -132,6 +133,32 @@ default FinalResult mergeFinalResult(FinalResult finalResult1, FinalResult final throw new UnsupportedOperationException("Cannot merge final results for function: " + getType()); } + /** + * Returns whether a star-tree index with the specified properties can be used for this aggregation function. + */ + default boolean canUseStarTree(AggregationFunctionColumnPair functionColumnPair, Review Comment: That's true, my original intention was to push more of that matching logic into `AggregationFunction` itself, but I think that might require more unrelated changes. I can remove this for now. ########## pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/PercentileTDigestAggregationFunction.java: ########## @@ -241,6 +243,24 @@ public Double extractFinalResult(TDigest intermediateResult) { return intermediateResult.quantile(_percentile / 100.0); } + @Override + public boolean canUseStarTree(AggregationFunctionColumnPair functionColumnPair, + Map<String, Object> functionParameters) { + if (!super.canUseStarTree(functionColumnPair, functionParameters)) { + return false; + } + + // Check if compression factor matches + if (functionParameters.containsKey(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY)) { Review Comment: They are mergeable but the percentile accuracy will differ from what the user requested at query time, so I thought it'd be safer to only use the star-tree index when the compression factor value matches. ########## pinot-core/src/main/java/org/apache/pinot/core/startree/StarTreeUtils.java: ########## @@ -363,8 +372,14 @@ public static BaseProjectOperator<?> createStarTreeBasedProjectOperator(IndexSeg ExpressionContext[] groupByExpressions = queryContext.getGroupByExpressions() != null ? queryContext.getGroupByExpressions() .toArray(new ExpressionContext[0]) : null; + + List<Pair<AggregationFunction, AggregationFunctionColumnPair>> aggregations = Review Comment: Hm, the aggregation function length should be in the order of 10s at most so I thought it wouldn't really make a difference right? I'm fine either way though since the loop version is also pretty minimal, so I've made the change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org