tarun11Mavani commented on code in PR #15863:
URL: https://github.com/apache/pinot/pull/15863#discussion_r2109498518


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/upsertcompactmerge/UpsertCompactMergeTaskGenerator.java:
##########
@@ -214,15 +216,24 @@ public List<PinotTaskConfig> 
generateTasks(List<TableConfig> tableConfigs) {
           continue;
         }
         // TODO see if multiple groups of same partition can be added
-        Map<String, String> configs = new 
HashMap<>(getBaseTaskConfigs(tableConfig,
-            groups.get(0).stream().map(x -> 
x.getSegmentZKMetadata().getSegmentName()).collect(Collectors.toList())));
+
+        List<String> segmentNames =
+            groups.get(0).stream().map(x -> 
x.getSegmentZKMetadata().getSegmentName()).collect(Collectors.toList());
+        //get max creation time for the segments across all servers. This will 
be used as the creation time of the
+        // merge segment
+        Long maxCreationTimeMillis =
+            getMaxCreationTimeMillis(tableNameWithType, segmentNames, 
serverToSegments, serverToEndpoints,

Review Comment:
   @rohityadav1993 as we discussed offline — if we skip selecting a group where 
the newest segment is an UploadedSegment, we might miss valid compaction 
opportunities.
   
   For example, consider S1 and S2 compacted into C1, S3 and S4 into C2, and S5 
and S6 into C3. If we then receive updates to C1, C2, and C3, we should ideally 
compact them into a new segment (say, C4).
   However, if we skip a group just because its newest segment is an 
UploadedSegment, we might not compact at all, even when it’s necessary.
   
   In the example above, we would end up creating C4 that replaces docs from C1 
and C2 but not C3 — even though C3 and C4 will eventually get merged with other 
segments, skipping C3 initially could lead to inefficient behavior.
   
   Given this, I believe we should stick to the current compaction logic. If a 
segment ends up being selected again in a later run, that’s acceptable — the 
subsequent runs will handle it.
   I have reverted the change related to this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to