siddharthteotia opened a new issue #8039: URL: https://github.com/apache/pinot/issues/8039
Currently GROUP BY queries without ORDER BY can generate very inaccurate results since each server will keep at max N (N coming from LIMIT N) groups which are randomly selected and there is no resize/trimming unlike ORDER BY. An easier way to handle this would be to add implicit ORDER BY on GROUP BY and/or agg columns if there is no ORDER BY in the query. This will allow us to reuse to current ORDER BY code path which is more accurate. This will provide same levels of accuracy and determinism as current GROUP BY with ORDER BY If we want to improve accuracy without ordering results, then some changes in TableResizer might be needed. We will continue to accumulate more records (upto `trimThreshold` like in ORDER BY code) but resizer won't sort when trimming to `trimSize`. It can simply evict `trimThreshold - trimSize` records without worrying about order. While this will improve the accuracy, the result won't be deterministic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org