[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5451: Refactor DistinctTable to use PriorityQueue based algorithm

GitBox Fri, 29 May 2020 12:04:14 -0700


siddharthteotia commented on a change in pull request #5451:
URL: https://github.com/apache/incubator-pinot/pull/5451#discussion_r432678859




##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctAggregationFunction.java
##########
@@ -123,21 +120,20 @@ public void aggregate(int length, AggregationResultHolder 
aggregationResultHolde
         columnDataTypes[i] = 
ColumnDataType.fromDataTypeSV(blockValSetMap.get(_inputExpressions.get(i)).getValueType());
       }
       DataSchema dataSchema = new DataSchema(_columns, columnDataTypes);
-      distinctTable = new DistinctTable(dataSchema, _orderBy, _capacity);
+      distinctTable = new DistinctTable(dataSchema, _orderBy, _limit);
       aggregationResultHolder.setValue(distinctTable);
+    } else if (distinctTable.shouldNotAddMore()) {
+      return;
     }
 
-    // TODO: Follow up PR will make few changes to start using 
DictionaryBasedAggregationOperator
-    // for DISTINCT queries without filter.
+    // TODO: Follow up PR will make few changes to start using 
DictionaryBasedAggregationOperator for DISTINCT queries
+    //       without filter.
     RowBasedBlockValueFetcher blockValueFetcher = new 
RowBasedBlockValueFetcher(blockValSets);
 
-    // TODO: Do early termination in the operator itself which should
-    // not call aggregate function at all if the limit has reached
-    // that will require the interface change since this function
-    // has to communicate back that required number of records have
-    // been collected
     for (int i = 0; i < length; i++) {
-      distinctTable.upsert(new Record(blockValueFetcher.getRow(i)));
+      if (!distinctTable.add(new Record(blockValueFetcher.getRow(i)))) {

Review comment:
       I think this for loop should be written separately for order by and non 
order by.
   
   For order by, there is no early termination so if check can be avoided since 
the return value will always be true.
   For non order, after adding every record, check the return value to see if 
limit has been reached and terminate early

##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctAggregationFunction.java
##########
@@ -123,21 +120,20 @@ public void aggregate(int length, AggregationResultHolder 
aggregationResultHolde
         columnDataTypes[i] = 
ColumnDataType.fromDataTypeSV(blockValSetMap.get(_inputExpressions.get(i)).getValueType());
       }
       DataSchema dataSchema = new DataSchema(_columns, columnDataTypes);
-      distinctTable = new DistinctTable(dataSchema, _orderBy, _capacity);
+      distinctTable = new DistinctTable(dataSchema, _orderBy, _limit);
       aggregationResultHolder.setValue(distinctTable);
+    } else if (distinctTable.shouldNotAddMore()) {
+      return;
     }
 
-    // TODO: Follow up PR will make few changes to start using 
DictionaryBasedAggregationOperator
-    // for DISTINCT queries without filter.
+    // TODO: Follow up PR will make few changes to start using 
DictionaryBasedAggregationOperator for DISTINCT queries
+    //       without filter.
     RowBasedBlockValueFetcher blockValueFetcher = new 
RowBasedBlockValueFetcher(blockValSets);
 
-    // TODO: Do early termination in the operator itself which should
-    // not call aggregate function at all if the limit has reached
-    // that will require the interface change since this function
-    // has to communicate back that required number of records have
-    // been collected
     for (int i = 0; i < length; i++) {
-      distinctTable.upsert(new Record(blockValueFetcher.getRow(i)));
+      if (!distinctTable.add(new Record(blockValueFetcher.getRow(i)))) {

Review comment:
       I think this for loop should be written separately for order by and non 
order by.
   
   For order by, there is no early termination so if check can be avoided since 
the return value will always be true.
   For non order, after adding every record, check the return value to see if 
limit has been reached and terminate early within the loop




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5451: Refactor DistinctTable to use PriorityQueue based algorithm

Reply via email to