ankitsultana opened a new issue, #11759:
URL: https://github.com/apache/pinot/issues/11759
In a recent discussion with @itschrispeck on some related issue, we wondered
if it would make sense to have support for executing Group-By by bypassing
DocIdSetOperator, and instead using a Dictionary + Inverted Index.
It could help optimize many use-cases, particularly those where a user wants
to run some transform function on a group-by/count query e.g.
```
select
case
when AirTime < 100 then
'ok'
else 'not-ok'
end as airtime_category,
count(*)
from airlineStats
where AirlineID = 19805
group by airtime_category
limit 10
```
We can also optimize json_extract_scalar queries (assuming user has json
index on the column):
```
explain plan for select
count(*),
json_extract_scalar(group_json, '$.group_city', 'STRING')
from meetupRsvpJson
group by json_extract_scalar(group_json, '$.group_city', 'STRING')
limit 1000
```
Creating this issue to test the waters and see how other folks feel about
this. If there's support we can follow-up with a design doc with more details.
Here's a PoC PR I am using to estimate the perf gains:
https://github.com/apache/pinot/pull/11758
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]