[I] Support for Dictionary Based Group-By [pinot]

via GitHub Sat, 07 Oct 2023 22:44:04 -0700


ankitsultana opened a new issue, #11759:
URL: https://github.com/apache/pinot/issues/11759


   In a recent discussion with @itschrispeck on some related issue, we wondered 
if it would make sense to have support for executing Group-By by bypassing 
DocIdSetOperator, and instead using a Dictionary + Inverted Index.
   
   It could help optimize many use-cases, particularly those where a user wants 
to run some transform function on a group-by/count query e.g.
   
   ```
   select 
     case
       when AirTime < 100 then
          'ok'
       else 'not-ok'
     end as airtime_category,
     count(*)
   from airlineStats 
     where AirlineID = 19805
     group by airtime_category
   limit 10
   ```
   
   We can also optimize json_extract_scalar queries (assuming user has json 
index on the column):
   ```
   explain plan for select 
     count(*),
     json_extract_scalar(group_json, '$.group_city', 'STRING') 
   from meetupRsvpJson 
    group by json_extract_scalar(group_json, '$.group_city', 'STRING') 
   limit 1000
   ```
   
   Creating this issue to test the waters and see how other folks feel about 
this. If there's support we can follow-up with a design doc with more details. 
Here's a PoC PR I am using to estimate the perf gains: 
https://github.com/apache/pinot/pull/11758
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Support for Dictionary Based Group-By [pinot]

Reply via email to