Copilot commented on code in PR #17126:
URL: https://github.com/apache/pinot/pull/17126#discussion_r2488995702


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/generator/BaseTaskGenerator.java:
##########
@@ -217,4 +220,52 @@ public Map<String, String> getBaseTaskConfigs(TableConfig 
tableConfig, List<Stri
         MinionConstants.SEGMENT_NAME_SEPARATOR));
     return baseConfigs;
   }
+
+  /**
+   * Selects random items from a sorted list using reservoir sampling for 
efficiency.
+   * This method is useful for segment selection with randomization to avoid 
contention.
+   *
+   * @param sortedItems List of items paired with their priority values (e.g., 
invalid record count)
+   * @param maxItems Maximum number of items to select
+   * @param randomizationFactor Factor to expand candidate pool (e.g., 2.0 = 
select from top 2x items)
+   * @param <T> Type of items to select
+   * @return List of randomly selected items
+   */
+  public static <T> List<T> selectRandomItems(List<Pair<T, Long>> sortedItems,
+      int maxItems, double randomizationFactor) {
+    if (randomizationFactor <= 1.0 || maxItems <= 0 || sortedItems.isEmpty()) {
+      // No randomization, return top items
+      return 
sortedItems.stream().limit(maxItems).map(Pair::getKey).collect(Collectors.toList());
+    }
+
+    // Calculate expanded candidate pool size
+    int candidatePoolSize = Math.min((int) Math.ceil(maxItems * 
randomizationFactor), sortedItems.size());
+
+    // Get top candidates based on the expanded pool size
+    List<Pair<T, Long>> candidates = sortedItems.subList(0, candidatePoolSize);
+
+    // Use reservoir sampling to efficiently select random items
+    List<Pair<T, Long>> selectedCandidates = new ArrayList<>(maxItems);
+    Random random = new Random();

Review Comment:
   Using `new Random()` without a seed can lead to predictable pseudo-random 
sequences in some environments. Consider using `SecureRandom` for better 
randomness or accepting a seed parameter for testability.



##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/upsertcompaction/UpsertCompactionTaskGenerator.java:
##########
@@ -269,8 +269,17 @@ public static SegmentSelectionResult 
processValidDocIdsMetadata(Map<String, Stri
       return 1;
     });
 
-    return new SegmentSelectionResult(
-        
segmentsForCompaction.stream().map(Map.Entry::getKey).collect(Collectors.toList()),
 segmentsForDeletion);
+    // Apply randomization if configured
+    List<SegmentZKMetadata> finalSegmentsForCompaction;
+    double randomizationFactor = Double.parseDouble(
+        
taskConfigs.getOrDefault(UpsertCompactionTask.SEGMENT_SELECTION_RANDOMIZATION_FACTOR,
+            
String.valueOf(UpsertCompactionTask.DEFAULT_SEGMENT_SELECTION_RANDOMIZATION_FACTOR)));
+
+    // Apply segment selection with randomization using BaseTaskGenerator 
utility
+    finalSegmentsForCompaction = 
BaseTaskGenerator.selectRandomItems(segmentsForCompaction, maxTasks,
+        randomizationFactor);

Review Comment:
   The method `selectRandomItems` expects `List<Pair<T, Long>>` but 
`segmentsForCompaction` is a `List<Map.Entry<SegmentZKMetadata, Long>>`. The 
types are incompatible and this will cause a compilation error.



##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/upsertcompaction/UpsertCompactionTaskGenerator.java:
##########
@@ -269,8 +269,17 @@ public static SegmentSelectionResult 
processValidDocIdsMetadata(Map<String, Stri
       return 1;
     });
 
-    return new SegmentSelectionResult(
-        
segmentsForCompaction.stream().map(Map.Entry::getKey).collect(Collectors.toList()),
 segmentsForDeletion);
+    // Apply randomization if configured
+    List<SegmentZKMetadata> finalSegmentsForCompaction;
+    double randomizationFactor = Double.parseDouble(
+        
taskConfigs.getOrDefault(UpsertCompactionTask.SEGMENT_SELECTION_RANDOMIZATION_FACTOR,
+            
String.valueOf(UpsertCompactionTask.DEFAULT_SEGMENT_SELECTION_RANDOMIZATION_FACTOR)));
+
+    // Apply segment selection with randomization using BaseTaskGenerator 
utility
+    finalSegmentsForCompaction = 
BaseTaskGenerator.selectRandomItems(segmentsForCompaction, maxTasks,
+        randomizationFactor);

Review Comment:
   `SegmentSelectionResult` constructor expects `List<SegmentZKMetadata>` for 
the first parameter, but `finalSegmentsForCompaction` will be 
`List<Map.Entry<SegmentZKMetadata, Long>>` from the `selectRandomItems` method, 
causing a type mismatch.
   ```suggestion
       List<Pair<SegmentZKMetadata, Long>> selectedPairs;
       double randomizationFactor = Double.parseDouble(
           
taskConfigs.getOrDefault(UpsertCompactionTask.SEGMENT_SELECTION_RANDOMIZATION_FACTOR,
               
String.valueOf(UpsertCompactionTask.DEFAULT_SEGMENT_SELECTION_RANDOMIZATION_FACTOR)));
   
       // Apply segment selection with randomization using BaseTaskGenerator 
utility
       selectedPairs = 
BaseTaskGenerator.selectRandomItems(segmentsForCompaction, maxTasks,
           randomizationFactor);
       List<SegmentZKMetadata> finalSegmentsForCompaction = 
selectedPairs.stream()
           .map(Pair::getLeft)
           .collect(Collectors.toList());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to