[PR] Compute all groups for group by queries with only filtered aggregations [pinot]

via GitHub Fri, 11 Oct 2024 05:53:45 -0700


yashmayya opened a new pull request, #14211:
URL: https://github.com/apache/pinot/pull/14211


   - Fixes https://github.com/apache/pinot/issues/12647 and 
https://github.com/apache/pinot/issues/13982.
   - Both the above issues have a good description of the problem but the TLDR 
is that for group by queries with only filtered aggregations, we currently only 
compute the groups based on the rows that match the query's main filter and the 
rows that match at least one of the aggregation filters.
   - This is good for performance, since in cases where the group by query 
doesn't have a main filter and there are only filtered aggregations (i.e., no 
non-filtered aggregations), indexes can be used and a full scan is avoided. 
However, this leads to results that are not SQL compliant, since all groups 
(computed over the results of the main query filter) are expected to be 
returned even for queries with only filtered aggregations.
   - For the single-stage query engine, this actually appears to be a 
regression from https://github.com/apache/pinot/pull/10356, prior to which all 
groups should have been returned by default.
   - For the multi-stage query engine's group by executor, this appears to 
always have been an issue.
   - This patch takes a different approach for both the query engines - in the 
single-stage engine, the default behavior remains the same, but the standard 
SQL behavior of returning all groups can be enabled using a new query option. 
In the multi-stage engine, however, all groups will be computed by default. The 
reasoning is that Pinot has historically chosen performance over standard SQL 
behavior for the v1 engine in such cases where the behavioral difference might 
not be too significant. For the multi-stage query engine, however, we have been 
moving towards standard SQL behavior regardless of potential performance impact.
   - For the v1 engine's executor, the fix is to simply put an additional 
`AggregationInfo` with an empty `AggregationFunction` array and the main query 
filter. This ensures that the `GroupByExecutor` will compute all the groups 
(from the result of applying the main query filter) but no additional 
unnecessary aggregation will be done.
   - For the v2 engine's executor, the fix is similar. Before this patch, all 
the group keys are only computed if there is at least one non-filtered 
aggregation. For group by queries with only filtered aggregations, groups were 
computed over the rows matching the aggregation filters. Now, all the groups 
will be computed even for group by queries with only filtered aggregations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[PR] Compute all groups for group by queries with only filtered aggregations [pinot]

Reply via email to