J-HowHuang commented on code in PR #16341:
URL: https://github.com/apache/pinot/pull/16341#discussion_r2237250422


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java:
##########
@@ -1813,9 +1878,99 @@ public int fetch(String segmentName) {
     }
   }
 
+  @VisibleForTesting
+  interface DataLossRiskAssessor {
+    boolean hasDataLossRisk(String segmentName);
+
+    String generateDataLossRiskMessage(String segmentName);
+  }
+
+  /**
+   * To be used for non-peer download enabled tables or peer-download enabled 
tables rebalanced with
+   * minAvailableReplicas > 0
+   */
+  @VisibleForTesting
+  static class NoOpRiskAssessor implements DataLossRiskAssessor {
+    NoOpRiskAssessor() {
+    }
+
+    @Override
+    public boolean hasDataLossRisk(String segmentName) {
+      return false;
+    }
+
+    @Override
+    public String generateDataLossRiskMessage(String segmentName) {
+      throw new UnsupportedOperationException("No data loss risk message 
should be generated for NoOpRiskAssessor");
+    }
+  }
+
+  /**
+   * To be used for peer-download enabled tables with downtime=true or 
minAvailableReplicas=0
+   */
+  @VisibleForTesting
+  static class PeerDownloadTableDataLossRiskAssessor implements 
DataLossRiskAssessor {
+    private final String _tableNameWithType;
+    private final TableConfig _tableConfig;
+    private final HelixManager _helixManager;
+    private final PinotLLCRealtimeSegmentManager 
_pinotLLCRealtimeSegmentManager;
+    private final boolean _isPauselessEnabled;
+
+    @VisibleForTesting
+    PeerDownloadTableDataLossRiskAssessor(String tableNameWithType, 
TableConfig tableConfig,
+        int minAvailableReplicas, HelixManager helixManager,
+        PinotLLCRealtimeSegmentManager pinotLLCRealtimeSegmentManager) {
+      // Should only be created for peer-download enabled tables with 
minAvailableReplicas = 0
+      
Preconditions.checkState(tableConfig.getValidationConfig().getPeerSegmentDownloadScheme()
 != null
+          && minAvailableReplicas == 0);
+      _tableNameWithType = tableNameWithType;
+      _tableConfig = tableConfig;
+      _helixManager = helixManager;
+      _pinotLLCRealtimeSegmentManager = pinotLLCRealtimeSegmentManager;
+      _isPauselessEnabled = 
PauselessConsumptionUtils.isPauselessEnabled(tableConfig);
+    }
+
+    @Override
+    public boolean hasDataLossRisk(String segmentName) {
+      SegmentZKMetadata segmentZKMetadata = ZKMetadataProvider
+          .getSegmentZKMetadata(_helixManager.getHelixPropertyStore(), 
_tableNameWithType, segmentName);
+      if (segmentZKMetadata == null) {
+        return false;
+      }
+
+      // If the segment state is COMPLETED and the peer download URL is empty, 
there is a data loss risk
+      if (segmentZKMetadata.getStatus().isCompleted()) {
+        return 
CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(segmentZKMetadata.getDownloadUrl());
+      }
+
+      // If the segment is not yet completed, then the following scenarios are 
possible:
+      // - Non-upsert / non-dedup table:
+      //     - data loss scenarios are not possible. Either the segment will 
restart consumption or the
+      //       RealtimeSegmentValidationManager will kick in to fix up the 
segment if pauseless is enabled
+      // - Upsert / dedup table:
+      //     - For non-pauseless tables, it is safe to move the segment 
without data loss concerns
+      //     - For pauseless tables, if the segment is still in CONSUMING 
state, moving it is safe, but if it is in
+      //       COMMITTING state then there is a risk of data loss on segment 
build failures as well since the
+      //       RealtimeSegmentValidationManager does not automatically try to 
fix up these segments. To be safe it is
+      //       best to return that there is a risk of data loss for pauseless 
enabled tables for segments in COMMITTING
+      //       state

Review Comment:
   My point was that if we see a segment is COMMITTING , it does not impose a 
data loss risk as directly as an empty download url. A segment with COMMITTING 
is only a problem if
   1. The build fails in the future, and this is a partial-upsert/dedup table
   2. The build succeeds in the future but upload fails, leaving an empty 
download url
   
   I'm not sure if we want to kill all rebalance with COMMITTING segments as 
most of the time the committing is gonna be alright. That's why I suggested to 
ignore checking COMMITTING it in this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to