somandal commented on code in PR #16341: URL: https://github.com/apache/pinot/pull/16341#discussion_r2210908874
########## pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java: ########## @@ -1813,9 +1865,63 @@ public int fetch(String segmentName) { } } + @VisibleForTesting + @FunctionalInterface + interface DataLossRiskAssessor { + boolean hasDataLossRisk(String segmentName); + } + + private static class DataLossRiskAssessorImpl implements DataLossRiskAssessor { + private final String _tableNameWithType; + private final HelixManager _helixManager; + private final boolean _isPeerDownloadEnabled; + private final boolean _isUpsertOrDedupTable; + private final boolean _isPauselessEnabled; + + private DataLossRiskAssessorImpl(String tableNameWithType, TableConfig tableConfig, HelixManager helixManager) { + _tableNameWithType = tableNameWithType; + _helixManager = helixManager; + _isPeerDownloadEnabled = !StringUtils.isEmpty(tableConfig.getValidationConfig().getPeerSegmentDownloadScheme()); + _isUpsertOrDedupTable = tableConfig.isUpsertEnabled() || tableConfig.isDedupEnabled(); + _isPauselessEnabled = PauselessConsumptionUtils.isPauselessEnabled(tableConfig); + } + + @Override + public boolean hasDataLossRisk(String segmentName) { + // If peer-download is disabled, no data loss risk exists + if (!_isPeerDownloadEnabled) { + return false; + } + + SegmentZKMetadata segmentZKMetadata = ZKMetadataProvider + .getSegmentZKMetadata(_helixManager.getHelixPropertyStore(), _tableNameWithType, segmentName); + if (segmentZKMetadata == null) { + return false; + } + + // If the segment state is COMPLETED and the peer download URL is empty, there is a data loss risk + if (!segmentZKMetadata.getStatus().isCompleted()) { + return CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(segmentZKMetadata.getDownloadUrl()); + } + + // If the segment is not yet completed, then the following scenarios are possible: + // - Non-upsert / non-dedup table: data loss scenarios are not possible. Either the segment will restart + // consumption or the RealtimeSegmentValidationManager will kick in to fix up the segment if pauseless is + // enabled + // - Upsert / dedup table: For non-pauseless tables, it is safe to move the segment without data loss concerns. + // For pauseless tables, if the segment is still in CONSUMING state, moving it is safe, but if it is in + // COMMITTING state then there is a risk of data loss on segment build failures as well since the + // RealtimeSegmentValidationManager does not automatically try to fix up these segments. To be safe it is best + // to return that there is a risk of data loss in case of race conditions for pauseless enabled tables + // (rebalance updates IS at the same time as segment commit protocol starts and moves it to COMMITTING) + return _isUpsertOrDedupTable && _isPauselessEnabled; Review Comment: @noob-se7en should this check only be done for segments in COMMITTING state? right now it does it for all, but I guess even if we ask users to forceCommit and pause ingestion, we can have some segments that are consuming, right? and it's safe to move consuming segments I had thought that by the time IS is updated it is possible the segment has moved to COMMITING so that's why hadn't checked for that here, but maybe that'll cause more issues where it always lands up failing rebalance for downtime? edit: updated this to also check for COMMITTING state so we allow moving CONSUMING segments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org