yashmayya opened a new pull request, #16140: URL: https://github.com/apache/pinot/pull/16140
- Currently, if a table rebalance results in instance reassignment but no segment rebalance, we end up writing some incorrect rebalance progress stats to ZK. For instance (notice `startTimeMs` and `timeToFinishInSeconds`): ``` { "id": "/CONTROLLER_JOBS/TABLE_REBALANCE", "simpleFields": {}, "mapFields": { "7d45b962-c001-4eec-a54e-c0ed3a791d31": { "jobId": "7d45b962-c001-4eec-a54e-c0ed3a791d31", "submissionTimeMs": "1750238019928", "jobType": "TABLE_REBALANCE", "REBALANCE_PROGRESS_STATS": "{\"status\":\"DONE\",\"startTimeMs\":0,\"timeToFinishInSeconds\":1750238019,\"completionStatusMsg\":\"Instance reassigned but table is already balanced\",\"rebalanceProgressStatsOverall\":{\"totalSegmentsToBeAdded\":0,\"totalSegmentsToBeDeleted\":0,\"totalRemainingSegmentsToBeAdded\":0,\"totalRemainingSegmentsToBeDeleted\":0,\"totalRemainingSegmentsToConverge\":0,\"totalCarryOverSegmentsToBeAdded\":0,\"totalCarryOverSegmentsToBeDeleted\":0,\"totalUniqueNewUntrackedSegmentsDuringRebalance\":0,\"percentageRemainingSegmentsToBeAdded\":0.0,\"percentageRemainingSegmentsToBeDeleted\":0.0,\"estimatedTimeToCompleteAddsInSeconds\":0.0,\"estimatedTimeToCompleteDeletesInSeconds\":0.0,\"averageSegmentSizeInBytes\":0,\"totalEstimatedDataToBeMovedInBytes\":0,\"startTimeMs\":0},\"rebalanceProgressStatsCurrentStep\":{\"totalSegmentsToBeAdded\":0,\"totalSegmentsToBeDeleted\":0,\"totalRemainingSegmentsToBeAdded\":0,\"totalRemainingSegmentsToBeDeleted\":0,\"totalRe mainingSegmentsToConverge\":0,\"totalCarryOverSegmentsToBeAdded\":0,\"totalCarryOverSegmentsToBeDeleted\":0,\"totalUniqueNewUntrackedSegmentsDuringRebalance\":0,\"percentageRemainingSegmentsToBeAdded\":0.0,\"percentageRemainingSegmentsToBeDeleted\":0.0,\"estimatedTimeToCompleteAddsInSeconds\":0.0,\"estimatedTimeToCompleteDeletesInSeconds\":0.0,\"averageSegmentSizeInBytes\":0,\"totalEstimatedDataToBeMovedInBytes\":0,\"startTimeMs\":0},\"initialToTargetStateConvergence\":{\"_segmentsMissing\":0,\"_segmentsToRebalance\":0,\"_percentSegmentsToRebalance\":0.0,\"_replicasToRebalance\":0},\"currentToTargetConvergence\":{\"_segmentsMissing\":0,\"_segmentsToRebalance\":0,\"_percentSegmentsToRebalance\":0.0,\"_replicasToRebalance\":0},\"externalViewToIdealStateConvergence\":{\"_segmentsMissing\":0,\"_segmentsToRebalance\":0,\"_percentSegmentsToRebalance\":0.0,\"_replicasToRebalance\":0}}", "REBALANCE_CONTEXT": "{\"attemptId\":1,\"jobId\":\"7d45b962-c001-4eec-a54e-c0ed3a791d31\",\"config\":{\"maxAttempts\":3,\"bestEfforts\":false,\"downtime\":false,\"bootstrap\":false,\"dryRun\":false,\"preChecks\":false,\"lowDiskMode\":false,\"includeConsuming\":true,\"updateTargetTier\":false,\"batchSizePerServer\":-1,\"reassignInstances\":true,\"externalViewStabilizationTimeoutInMs\":3600000,\"minimizeDataMovement\":\"ENABLE\",\"externalViewCheckIntervalInMs\":1000,\"minAvailableReplicas\":-1,\"heartbeatIntervalInMs\":300000,\"heartbeatTimeoutInMs\":3600000,\"retryInitialDelayInMs\":300000},\"originalJobId\":\"7d45b962-c001-4eec-a54e-c0ed3a791d31\",\"allowRetries\":true}", "tableName": "upsertMeetupRsvp_REALTIME" } }, "listFields": {} } ``` - The reason is that we're calling `TableRebalanceObserver::onSuccess` without ever calling `TableRebalanceObserver::onTrigger` with the `START_TRIGGER`. - If instances are reassigned and there's no actual segment rebalance being done, there's no reason to persist stats in ZK, and the result can simply be returned to the user directly. - The other cases where we're calling some `TableRebalanceObserver` method before the start trigger are: - Segment assignment and instance assignment are both unchanged. In this case, the dry run rebalance before the actual rebalance will be a no-op and we won't run the actual rebalance itself at all (see [here](https://github.com/apache/pinot/blob/a91d6af17c651f139a4fdcc0e090de3c91eb8b8a/pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java#L709-L749)). So we won't store any stats in ZK for this case. - Downtime rebalance - we don't use ZK-based progress tracking for these rebalances (see [here](https://github.com/apache/pinot/blob/a91d6af17c651f139a4fdcc0e090de3c91eb8b8a/pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java#L705-L709)). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org