J-HowHuang commented on code in PR #16341: URL: https://github.com/apache/pinot/pull/16341#discussion_r2216647912
########## pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/DefaultRebalancePreChecker.java: ########## @@ -352,35 +351,49 @@ private RebalancePreCheckerResult checkRebalanceConfig(RebalanceConfig rebalance List<String> segmentsToMove = SegmentAssignmentUtils.getSegmentsToMove(currentAssignment, targetAssignment); int numReplicas = Integer.MAX_VALUE; - if (rebalanceConfig.isDowntime() || PauselessConsumptionUtils.isPauselessEnabled(tableConfig)) { + String peerSegmentDownloadScheme = tableConfig.getValidationConfig().getPeerSegmentDownloadScheme(); + if (rebalanceConfig.isDowntime() || peerSegmentDownloadScheme != null) { for (String segment : segmentsToMove) { numReplicas = Math.min(targetAssignment.get(segment).size(), numReplicas); } } + // For non-peer download enabled tables, warn if downtime is enabled but numReplicas > 1. Should only use + // downtime=true for such tables if downtime is indeed acceptable whereas for numReplicas = 1, rebalance cannot + // be done without downtime if (rebalanceConfig.isDowntime()) { if (!segmentsToMove.isEmpty() && numReplicas > 1) { pass = false; warnings.add("Number of replicas (" + numReplicas + ") is greater than 1, downtime is not recommended."); } } - // It was revealed a risk of data loss for pauseless tables during rebalance, when downtime=true or - // minAvailableReplicas=0 -- If a segment is being moved and has not yet uploaded to deep store, premature - // deletion could cause irrecoverable data loss. This pre-check added as a workaround to warn the potential risk. - // TODO: Get to the root cause of the issue and revisit this pre-check. - if (PauselessConsumptionUtils.isPauselessEnabled(tableConfig)) { + // Peer download enabled tables may have data loss during rebalance, when downtime=true or minAvailableReplicas=0. + // The scenario plays out as follows: + // 1. If the newly built consuming segment is cannot be uploaded to deep store, it may set up the download URI + // as an empty string: "" + // 2. When this happens, other servers expect to download the segment from a peer server that built the segment or + // has a copy of the segment + // 3. With downtime rebalance (or if minAvailableReplicas=0), the IS may be updated for all the servers of a given + // segment + // 4. The above may lead to dropping the existing segments from the existing servers without waiting for the newly + // added servers to download the segment from the peer. In this case since a deep store copy does not exist, + // there is no way to recover this segment without manually re-building it + // Thus, to avoid the above data loss scenario, it is not recommended to run downtime rebalance for peer download + // enabled tables. This pre-check is added to warn of the potential risk. + if (peerSegmentDownloadScheme != null) { int minAvailableReplica = rebalanceConfig.getMinAvailableReplicas(); if (minAvailableReplica < 0) { minAvailableReplica = numReplicas + minAvailableReplica; } if (numReplicas == 1) { Review Comment: No. For example if the original replica was 2, and the new replica is now 1, it can never be rebalanced without downtime (i.e. have to either set `downtime=true` or `minAvailableReplicas=0`). But not vice versa. If the original was 1 and changed to 2: ``` Current assignment server_1: ONLINE Target assignment server_2: ONLINE server_3: ONLINE ``` ``` Step 1 server_1: ONLINE server_2: ONLINE Step 2 server_2: ONLINE server_3: ONLINE ``` There's always one instance unmoved and thus available. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org