gortiz commented on code in PR #15095:
URL: https://github.com/apache/pinot/pull/15095#discussion_r1963038947


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/AggregateOperator.java:
##########
@@ -85,8 +85,10 @@ public class AggregateOperator extends MultiStageOperator {
 
   // trimming - related members
   private final int _groupTrimSize;
+  // Comparator is used in priority queue, and the order is reversed so that 
peek() can be used to compare with each
+  // output row

Review Comment:
   nit: Use javadoc style comments. Alternatively, use the javadoc style 
introduced in [Java 23](https://openjdk.org/jeps/467). We probably won't get 
javadoc rendering in the IDE unless we configure it to use a new JDK version, 
but eventually we will migrate to a Java version that supports it ;)



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/AggregateOperator.java:
##########
@@ -108,33 +110,28 @@ public AggregateOperator(OpChainExecutionContext context, 
MultiStageOperator inp
 
     List<Integer> groupKeys = node.getGroupKeys();
 
-    //process order trimming hint
-    int groupTrimSize = getGroupTrimSize(node.getNodeHint(), 
context.getOpChainMetadata());
-
-    if (groupTrimSize > -1) {
-      // limit is set to 0 if not pushed
-      int nodeLimit = node.getLimit() > 0 ? node.getLimit() : 
Integer.MAX_VALUE;
-      int limit = GroupByUtils.getTableCapacity(nodeLimit, groupTrimSize);
-      _groupTrimSize = limit;
-      if (limit == Integer.MAX_VALUE) {
-        // disable sorting because actual result can't realistically be bigger 
the limit
-        _priorityQueue = null;
+    int groupTrimSize = Integer.MAX_VALUE;
+    Comparator<Object[]> comparator = null;
+    int limit = node.getLimit();
+    if (limit > 0) {
+      List<RelFieldCollation> collations = node.getCollations();
+      if (collations.isEmpty()) {
+        groupTrimSize = limit;

Review Comment:
   This behavior is not formally correct, right? If the stage has a parallelism 
higher than 1 each worker may pick their own keys. If there is a reduce phase 
later (which I think is always the case when limit is applied), the worker 
executing that reduce will not see the correct values.
   
   Assuming what I said is correct, I think we need the ability to disable this 
optimization with a config and/or hint



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to