Re: [I] Aggregation filter is being pushed down in a way it breaks SQL semantics [pinot]

via GitHub Thu, 22 Aug 2024 02:01:28 -0700


bziobrowski commented on issue #12647:
URL: https://github.com/apache/pinot/issues/12647#issuecomment-2304144849


   Thanks @gortiz . From what I see the issue is not fixed in latest master . 
You can see it if you run and compare 
   
   ```sql
   select AirTime, count(*) filter ( where AirTime < 0 ) from AirlineStats 
group by AirTime
   ```
   with
   ```sql
   select AirTime, sum( case when AirTime < 0 then 1 else 0 end)  
   from airlineStats 
   group by AirTime
   ```
   
   in query console (assuming instance was started with `./bin/pinot-admin.sh 
QuickStart -type batch`). 
   The second query contains all AirTime values, not only those matching filter 
predicate. 
   It's not necessary to include more than one aggregate function to see the 
effect. In fact, if filter-less aggregate functions are present, they hide the 
bug. That's because all group by key combinations produced by  filter-less 
aggregate get merged with some  produced by filtered aggregate.
   By keyed aggregation I meant, as you guessed, one with group by key[s]. 
   
   The way I see it, the problem is that aggregation filters get pushed down 
even when aggregation is keyed (see 
AggregationFunctionUtils.buildFilteredAggregationInfos()), leading to filtering 
of both aggregation keys and values. 
   It only seems valid when aggregation doesn't have any key. 
   
   You are right about the first approach related to AGGREGATE_CASE_TO_FILTER 
being a partial solution since issue affects both engines and that rule only 
applies to V2. My initial idea was to limit AGGREGATE_CASE_TO_FILTER to keyless 
aggregates and add reversed rule for keyed aggregates with filter, e.g. 
   
   ```
   select AirTime, count(*) filter ( where AirTime < 0 )
   -->
   select AirTime, count( case when AirTime < 0 end )
   ```
   but that transformation is problematic to apply correctly for all aggregate 
functions (multiple parameters, quirks around null handling, etc.) .
   
   Back to the core issue - I think the most promising approach is `option two` 
:
   `for keyed aggregations, separate aggregation filters from other filters in 
AggregationFunctionUtils.buildFilteredAggregationInfos() and push it to e.g. 
DefaultGroupByExecutor to execute it after computing keys but before aggregate 
values.`
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [I] Aggregation filter is being pushed down in a way it breaks SQL semantics [pinot]

Reply via email to