yashmayya opened a new pull request, #14211: URL: https://github.com/apache/pinot/pull/14211
- Fixes https://github.com/apache/pinot/issues/12647 and https://github.com/apache/pinot/issues/13982. - Both the above issues have a good description of the problem but the TLDR is that for group by queries with only filtered aggregations, we currently only compute the groups based on the rows that match the query's main filter and the rows that match at least one of the aggregation filters. - This is good for performance, since in cases where the group by query doesn't have a main filter and there are only filtered aggregations (i.e., no non-filtered aggregations), indexes can be used and a full scan is avoided. However, this leads to results that are not SQL compliant, since all groups (computed over the results of the main query filter) are expected to be returned even for queries with only filtered aggregations. - For the single-stage query engine, this actually appears to be a regression from https://github.com/apache/pinot/pull/10356, prior to which all groups should have been returned by default. - For the multi-stage query engine's group by executor, this appears to always have been an issue. - This patch takes a different approach for both the query engines - in the single-stage engine, the default behavior remains the same, but the standard SQL behavior of returning all groups can be enabled using a new query option. In the multi-stage engine, however, all groups will be computed by default. The reasoning is that Pinot has historically chosen performance over standard SQL behavior for the v1 engine in such cases where the behavioral difference might not be too significant. For the multi-stage query engine, however, we have been moving towards standard SQL behavior regardless of potential performance impact. - For the v1 engine's executor, the fix is to simply put an additional `AggregationInfo` with an empty `AggregationFunction` array and the main query filter. This ensures that the `GroupByExecutor` will compute all the groups (from the result of applying the main query filter) but no additional unnecessary aggregation will be done. - For the v2 engine's executor, the fix is similar. Before this patch, all the group keys are only computed if there is at least one non-filtered aggregation. For group by queries with only filtered aggregations, groups were computed over the rows matching the aggregation filters. Now, all the groups will be computed even for group by queries with only filtered aggregations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org