bziobrowski commented on issue #12647: URL: https://github.com/apache/pinot/issues/12647#issuecomment-2304144849
Thanks @gortiz . From what I see the issue is not fixed in latest master . You can see it if you run and compare ```sql select AirTime, count(*) filter ( where AirTime < 0 ) from AirlineStats group by AirTime ``` with ```sql select AirTime, sum( case when AirTime < 0 then 1 else 0 end) from airlineStats group by AirTime ``` in query console (assuming instance was started with `./bin/pinot-admin.sh QuickStart -type batch`). The second query contains all AirTime values, not only those matching filter predicate. It's not necessary to include more than one aggregate function to see the effect. In fact, if filter-less aggregate functions are present, they hide the bug. That's because all group by key combinations produced by filter-less aggregate get merged with some produced by filtered aggregate. By keyed aggregation I meant, as you guessed, one with group by key[s]. The way I see it, the problem is that aggregation filters get pushed down even when aggregation is keyed (see AggregationFunctionUtils.buildFilteredAggregationInfos()), leading to filtering of both aggregation keys and values. It only seems valid when aggregation doesn't have any key. You are right about the first approach related to AGGREGATE_CASE_TO_FILTER being a partial solution since issue affects both engines and that rule only applies to V2. My initial idea was to limit AGGREGATE_CASE_TO_FILTER to keyless aggregates and add reversed rule for keyed aggregates with filter, e.g. ``` select AirTime, count(*) filter ( where AirTime < 0 ) --> select AirTime, count( case when AirTime < 0 end ) ``` but that transformation is problematic to apply correctly for all aggregate functions (multiple parameters, quirks around null handling, etc.) . Back to the core issue - I think the most promising approach is `option two` : `for keyed aggregations, separate aggregation filters from other filters in AggregationFunctionUtils.buildFilteredAggregationInfos() and push it to e.g. DefaultGroupByExecutor to execute it after computing keys but before aggregate values.` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org