siddharthteotia opened a new issue #8039:
URL: https://github.com/apache/pinot/issues/8039


   Currently GROUP BY queries without ORDER BY can generate very inaccurate 
results since each server will keep at max N (N coming from LIMIT N) groups 
which are randomly selected and there is no resize/trimming unlike ORDER BY.
   
   An easier way to handle this would be to add implicit ORDER BY on GROUP BY 
and/or agg columns if there is no ORDER BY in the query. This will allow us to 
reuse to current ORDER BY code path which is more accurate. This will provide 
same levels of accuracy and determinism as current GROUP BY with ORDER BY
   
   If we want to improve accuracy without ordering results, then some changes 
in TableResizer might be needed. We will continue to accumulate more records 
(upto `trimThreshold` like in ORDER BY code) but resizer won't sort when 
trimming to `trimSize`. It can simply evict `trimThreshold - trimSize` records 
without worrying about order. While this will improve the accuracy, the result 
won't be deterministic. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to