snleee commented on a change in pull request #8242:
URL: https://github.com/apache/pinot/pull/8242#discussion_r816382003



##########
File path: 
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
##########
@@ -2963,6 +2961,25 @@ public String startReplaceSegments(String 
tableNameWithType, List<String> segmen
     return segmentLineageEntryId;
   }
 
+  // TODO: Add more conflict checks over segmentsTo later. For example, for 
APPEND table,
+  //       if the new segments from 2 batch jobs are overlapping, we reject 
one of job.
+  private static boolean isConflicted(List<String> segmentsFrom, LineageEntry 
lineageEntry, TableConfig tableConfig) {
+    if (!segmentsFrom.isEmpty()) {
+      // It's conflicted if there is any overlap between segmentsFrom.
+      return !Collections.disjoint(segmentsFrom, 
lineageEntry.getSegmentsFrom());
+    }
+    // For REFRESH table, it's conflicted if both segmentsFrom are empty.
+    if (isRefreshTable(tableConfig)) {

Review comment:
       @Jackie-Jiang 
   
   My reasoning was based on the following assumption:
   
   ```
   For REFRESH table, each offline ingestion flow would push the full data 
snapshot.
   ```
   
   If this is the case, the correct behavior should be blocking if 2 flows with 
`segmentsFrom=empty` are running at the same time because this can result in 
the REFRESH table having 2 snapshots of data. (Even if the input files are the 
same, we now put `timestamp` to the segment name so all data from 2 flows will 
be uploaded)
   
   If we can make the assumption that we have 1 flow per REFRESH table, the 
above case can only happen if some user runs 2 push flows at the same time for 
an empty table, which will not likely happen. Then, I see your point that it's 
cleaner if we don't handle special logic for different push types. I'm fine 
with removing special handling for REFRESH.
   
   On the other hand, if you have REFRESH tables with multiple flows set up, I 
think that this model will be hard to support the consistent push feature 
because you will have the hard time differentiating the segments from one flow 
to another as @jackjlli mentioned above. IMO, we should not support such model 
(having 2+ flows pushing to a single REFRESH table).
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to