gortiz commented on code in PR #17355:
URL: https://github.com/apache/pinot/pull/17355#discussion_r2613559810


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/query/NonScanBasedAggregationOperator.java:
##########
@@ -248,36 +249,42 @@ private static Set getDistinctValueSet(Dictionary 
dictionary) {
       case INT:
         IntOpenHashSet intSet = new IntOpenHashSet(dictionarySize);
         for (int dictId = 0; dictId < dictionarySize; dictId++) {
+          
QueryThreadContext.checkTerminationAndSampleUsagePeriodically(dictId, 
EXPLAIN_NAME);
           intSet.add(dictionary.getIntValue(dictId));
         }
         return intSet;
       case LONG:
         LongOpenHashSet longSet = new LongOpenHashSet(dictionarySize);
         for (int dictId = 0; dictId < dictionarySize; dictId++) {
+          
QueryThreadContext.checkTerminationAndSampleUsagePeriodically(dictId, 
EXPLAIN_NAME);
           longSet.add(dictionary.getLongValue(dictId));
         }
         return longSet;

Review Comment:
   I'm cool with this as a short term solution, but we should probably change 
all these usages to a double loop like
   
   ```java
   LongOpenHashSet longSet = new LongOpenHashSet(dictionarySize);
   final int BATCH_SIZE = 1000;
   for (int batchStart = 0; batchStart < dictionarySize; batchStart += 
BATCH_SIZE) {
     QueryThreadContext.checkTerminationAndSampleUsagePeriodically(batchStart, 
EXPLAIN_NAME);
     int batchEnd = Math.min(batchStart + BATCH_SIZE, dictionarySize);
     for (int dictId = batchStart; dictId < batchEnd; dictId++) {
       longSet.add(dictionary.getLongValue(dictId));
     }
   }
   ```
   
   This code is more JIT friendly as here the inner loop can be easily 
optimized, which is specially important in cases like this where the body is 
quite simple



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to