Jackie-Jiang opened a new issue, #11706:
URL: https://github.com/apache/pinot/issues/11706

   When a group-by query does not have order-by on the aggregate column, we 
don't need to keep more groups than the LIMIT because the order-by value won't 
change. We can maintain a heap (PriorityQueue) of LIMIT values. On the 
group-key generation side, we should also keep only the relevant keys.
   
   One common query is:
   `SELECT COUNT(*) FROM myTable GROUP BY timeCol ORDER BY timeCol DESC LIMIT 
10`
   
   Problems to solve:
   1. Group-by query with order-by on the key column:
   Currently we keep `Math.max(5000, LIMIT * 5)` groups, which is not necessary 
since only the top `LIMIT` groups are relevant
   
   2. Group-by query without order-by:
   Currently we keep random `LIMIT` groups per server, and there is no 
guarantee the same group is picked across different servers, which can lead to 
wrong result when there are more than `LIMIT` groups
   
   Solution:
   1. To ensure the ordering is deterministic (we need this guarantee to ensure 
the groups returned from all servers are the same), we should append all 
non-ordering group keys implicitly. This is one exception: when we want to keep 
all groups on the server, we don't need this since all groups will be returned 
anyway.
   2. Optimize the execution when all the ordering keys are group key
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to