yashmayya commented on code in PR #14211: URL: https://github.com/apache/pinot/pull/14211#discussion_r1797322234
########## pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java: ########## @@ -458,6 +458,15 @@ public static class QueryOptionKey { // executed in an Unbounded FCFS fashion. However, secondary workloads are executed in a constrainted FCFS // fashion with limited compute. public static final String IS_SECONDARY_WORKLOAD = "isSecondaryWorkload"; + + // For group by queries with only filtered aggregations (and no non-filtered aggregations), the v1 query engine + // does not compute all groups by default - instead, it will only compute the groups from the filtered result + // set (i.e., union of the main query filter and all the individual aggregation filters). This is good for + // performance, since indexes can be used, but the result won't match standard SQL behavior (where all groups + // are expected to be returned). If this option is set to true, the v1 query engine will compute all groups for + // group by queries with only filtered aggregations. This could require a full scan if the main query does not + // have a filter and performance could be much worse, but the result will be SQL compliant. + public static final String FILTERED_AGGREGATIONS_COMPUTE_ALL_GROUPS = "filteredAggregationsComputeAllGroups"; Review Comment: > Since the overhead seems fine (no extra aggregate needed), I'd suggest doing the standard SQL behavior by default, and add an option to skip the all groups computation I think it makes sense to have standard SQL behavior by default and have an option to use the more performant method of only computing required groups. But won't the overhead potentially be quite high for a query like `SELECT SUM(X) FILTER (WHERE Y = 1) FROM mytable` and there's an inverted index on the filter column `Y`? If we want to compute all groups, it'll involve a scan over all docs whereas with the previous default behavior, no scan would be required and we'd only need to do the aggregation over the docs matching the filter computed using the inverted index. There won't be any additional _aggregation_ being done but the scan would still be expensive right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org