yashmayya opened a new pull request, #16140:
URL: https://github.com/apache/pinot/pull/16140

   - Currently, if a table rebalance results in instance reassignment but no 
segment rebalance, we end up writing some incorrect rebalance progress stats to 
ZK. For instance (notice `startTimeMs` and `timeToFinishInSeconds`):
   ```
   {
     "id": "/CONTROLLER_JOBS/TABLE_REBALANCE",
     "simpleFields": {},
     "mapFields": {
       "7d45b962-c001-4eec-a54e-c0ed3a791d31": {
         "jobId": "7d45b962-c001-4eec-a54e-c0ed3a791d31",
         "submissionTimeMs": "1750238019928",
         "jobType": "TABLE_REBALANCE",
         "REBALANCE_PROGRESS_STATS": 
"{\"status\":\"DONE\",\"startTimeMs\":0,\"timeToFinishInSeconds\":1750238019,\"completionStatusMsg\":\"Instance
 reassigned but table is already 
balanced\",\"rebalanceProgressStatsOverall\":{\"totalSegmentsToBeAdded\":0,\"totalSegmentsToBeDeleted\":0,\"totalRemainingSegmentsToBeAdded\":0,\"totalRemainingSegmentsToBeDeleted\":0,\"totalRemainingSegmentsToConverge\":0,\"totalCarryOverSegmentsToBeAdded\":0,\"totalCarryOverSegmentsToBeDeleted\":0,\"totalUniqueNewUntrackedSegmentsDuringRebalance\":0,\"percentageRemainingSegmentsToBeAdded\":0.0,\"percentageRemainingSegmentsToBeDeleted\":0.0,\"estimatedTimeToCompleteAddsInSeconds\":0.0,\"estimatedTimeToCompleteDeletesInSeconds\":0.0,\"averageSegmentSizeInBytes\":0,\"totalEstimatedDataToBeMovedInBytes\":0,\"startTimeMs\":0},\"rebalanceProgressStatsCurrentStep\":{\"totalSegmentsToBeAdded\":0,\"totalSegmentsToBeDeleted\":0,\"totalRemainingSegmentsToBeAdded\":0,\"totalRemainingSegmentsToBeDeleted\":0,\"totalRe
 
mainingSegmentsToConverge\":0,\"totalCarryOverSegmentsToBeAdded\":0,\"totalCarryOverSegmentsToBeDeleted\":0,\"totalUniqueNewUntrackedSegmentsDuringRebalance\":0,\"percentageRemainingSegmentsToBeAdded\":0.0,\"percentageRemainingSegmentsToBeDeleted\":0.0,\"estimatedTimeToCompleteAddsInSeconds\":0.0,\"estimatedTimeToCompleteDeletesInSeconds\":0.0,\"averageSegmentSizeInBytes\":0,\"totalEstimatedDataToBeMovedInBytes\":0,\"startTimeMs\":0},\"initialToTargetStateConvergence\":{\"_segmentsMissing\":0,\"_segmentsToRebalance\":0,\"_percentSegmentsToRebalance\":0.0,\"_replicasToRebalance\":0},\"currentToTargetConvergence\":{\"_segmentsMissing\":0,\"_segmentsToRebalance\":0,\"_percentSegmentsToRebalance\":0.0,\"_replicasToRebalance\":0},\"externalViewToIdealStateConvergence\":{\"_segmentsMissing\":0,\"_segmentsToRebalance\":0,\"_percentSegmentsToRebalance\":0.0,\"_replicasToRebalance\":0}}",
         "REBALANCE_CONTEXT": 
"{\"attemptId\":1,\"jobId\":\"7d45b962-c001-4eec-a54e-c0ed3a791d31\",\"config\":{\"maxAttempts\":3,\"bestEfforts\":false,\"downtime\":false,\"bootstrap\":false,\"dryRun\":false,\"preChecks\":false,\"lowDiskMode\":false,\"includeConsuming\":true,\"updateTargetTier\":false,\"batchSizePerServer\":-1,\"reassignInstances\":true,\"externalViewStabilizationTimeoutInMs\":3600000,\"minimizeDataMovement\":\"ENABLE\",\"externalViewCheckIntervalInMs\":1000,\"minAvailableReplicas\":-1,\"heartbeatIntervalInMs\":300000,\"heartbeatTimeoutInMs\":3600000,\"retryInitialDelayInMs\":300000},\"originalJobId\":\"7d45b962-c001-4eec-a54e-c0ed3a791d31\",\"allowRetries\":true}",
         "tableName": "upsertMeetupRsvp_REALTIME"
       }
     },
     "listFields": {}
   }
   ```
   - The reason is that we're calling `TableRebalanceObserver::onSuccess` 
without ever calling `TableRebalanceObserver::onTrigger` with the 
`START_TRIGGER`.
   - If instances are reassigned and there's no actual segment rebalance being 
done, there's no reason to persist stats in ZK, and the result can simply be 
returned to the user directly.
   - The other cases where we're calling some `TableRebalanceObserver` method 
before the start trigger are:
     - Segment assignment and instance assignment are both unchanged. In this 
case, the dry run rebalance before the actual rebalance will be a no-op and we 
won't run the actual rebalance itself at all (see 
[here](https://github.com/apache/pinot/blob/a91d6af17c651f139a4fdcc0e090de3c91eb8b8a/pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java#L709-L749)).
 So we won't store any stats in ZK for this case.
     - Downtime rebalance - we don't use ZK-based progress tracking for these 
rebalances (see 
[here](https://github.com/apache/pinot/blob/a91d6af17c651f139a4fdcc0e090de3c91eb8b8a/pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java#L705-L709)).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to