snleee commented on a change in pull request #8242: URL: https://github.com/apache/pinot/pull/8242#discussion_r816382003
########## File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java ########## @@ -2963,6 +2961,25 @@ public String startReplaceSegments(String tableNameWithType, List<String> segmen return segmentLineageEntryId; } + // TODO: Add more conflict checks over segmentsTo later. For example, for APPEND table, + // if the new segments from 2 batch jobs are overlapping, we reject one of job. + private static boolean isConflicted(List<String> segmentsFrom, LineageEntry lineageEntry, TableConfig tableConfig) { + if (!segmentsFrom.isEmpty()) { + // It's conflicted if there is any overlap between segmentsFrom. + return !Collections.disjoint(segmentsFrom, lineageEntry.getSegmentsFrom()); + } + // For REFRESH table, it's conflicted if both segmentsFrom are empty. + if (isRefreshTable(tableConfig)) { Review comment: @Jackie-Jiang My reasoning was based on the following assumption: ``` For REFRESH table, each offline ingestion flow would push the full data snapshot. ``` If this is the case, the correct behavior should be blocking if 2 flows with `segmentsFrom=empty` are running at the same time because this can result in the REFRESH table having 2 snapshots of data. (Even if the input files are the same, we now put `timestamp` to the segment name so 2 flows will be uploaded) If we can make the assumption that we have 1 flow per REFRESH table, this can only happen if some user runs 2 push flow at the same time for an empty table, which will not likely happen. Then, I see your point that it's cleaner if we don't handle special logic for different push types. I'm fine with removing special handling for REFRESH. On the other hand, if you have REFRESH tables with multiple flows set up, I think that this model will be hard to support the consistent push feature because you will have the hard time differentiating the segments from one flow to another as @jackjlli mentioned above. IMO, we should not support such model (having 2+ flows pushing to a single REFRESH table). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org