klsince commented on code in PR #11740: URL: https://github.com/apache/pinot/pull/11740#discussion_r1348093609
########## pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/ZkBasedTableRebalanceObserver.java: ########## @@ -126,30 +143,74 @@ public int getNumUpdatesToZk() { return _numUpdatesToZk; } + @VisibleForTesting + TableRebalanceRetryConfig getTableRebalanceJobRetryConfig() { + return _tableRebalanceJobRetryConfig; + } + private void trackStatsInZk() { + Map<String, String> jobMetadata = + createJobMetadata(_tableNameWithType, _rebalanceJobId, _tableRebalanceProgressStats, + _tableRebalanceJobRetryConfig); + _pinotHelixResourceManager.addControllerJobToZK(_rebalanceJobId, jobMetadata, + ZKMetadataProvider.constructPropertyStorePathForControllerJob(ControllerJobType.TABLE_REBALANCE), + prevJobMetadata -> { + // Abort the job when we're sure it has failed, otherwise continue to update the status. + if (prevJobMetadata == null) { + return true; + } + String prevStatsInStr = prevJobMetadata.get(RebalanceJobConstants.JOB_METADATA_KEY_REBALANCE_PROGRESS_STATS); + TableRebalanceProgressStats prevStats; + try { + prevStats = JsonUtils.stringToObject(prevStatsInStr, TableRebalanceProgressStats.class); + } catch (JsonProcessingException e) { + throw new RuntimeException(e); + } Review Comment: oh, good catch, this should be `return true`, to let current rebalance continue. As commented above, we abort if we really get a FAILED status, otherwise, just be conservative and let it continue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org