[GitHub] [pinot] walterddr opened a new issue, #11689: [multistage] distinct or group-by aggregate doesn't pushdown limit

via GitHub Tue, 26 Sep 2023 15:38:36 -0700


walterddr opened a new issue, #11689:
URL: https://github.com/apache/pinot/issues/11689


   Currently:
   
   - query similar to
   ```
   SELECT distinct a, b FROM tbl LIMIT 10
   ```
   will run the entire distinct set of values `(a,b)` on leaf; reshuffle based 
on hash-key and dedup in the intermediate stage, then finally keep 10 records 
at the very last stage.
   
   - similar but a much more subtle optimization is on group-by / order-by 
group key with limit. 
   ```
   SELECT a, SUM(b) FROM tbl GROUP BY a ORDER BY a DESC LIMIT 10
   ```
   
   a good proposal is to pushdown the sorted limit all the way to the leaf 
stage and only keeping the limited rows before sending data out 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] walterddr opened a new issue, #11689: [multistage] distinct or group-by aggregate doesn't pushdown limit

Reply via email to