somandal opened a new pull request, #15427:
URL: https://github.com/apache/pinot/pull/15427

   ## Description
   This PR enhances the Table Rebalance progress stats.
   
   Today the progress stats can be confusing due to the various sections (there 
are 3), and it is hard to track progress because most of the time only the 
EV-IS convergence step is updated and this is only calculated for a single step 
and not for the full rebalance. To understand the progress stats, the code that 
does the stats updates must be well understood which is not a good assumption 
and experience for end users.
   
   Another concern with the existing stats is that the percentage progress 
calculations are based off the total segments in the target, rather than based 
on the actual segments that need to be added / deleted. This makes it harder to 
assess the progress of the rebalance since not all segments may even need to 
change.
   
   This PR adds the following to the existing progress stats:
   
   ```
     "rebalanceProgressStatsOverall" : {
       "totalSegmentsToBeAdded" : 30,
       "totalSegmentsToBeDeleted" : 30,
       "totalRemainingSegmentsToBeAdded" : 14,
       "totalRemainingSegmentsToBeDeleted" : 11,
       "totalCarryOverSegmentsToBeAdded" : 0,
       "totalCarryOverSegmentsToBeDeleted" : 0,
       "totalRemainingSegmentsToConverge": 0,
       "totalUniqueNewUntrackedSegmentsDuringRebalance" : 10,
       "percentageTotalSegmentsAddsRemaining" : 46.666666666666664,
       "percentageTotalSegmentDeletesRemaining" : 36.666666666666664,
       "estimatedTimeToCompleteAddsInSeconds" : 71.768375,
       "estimatedTimeToCompleteDeletesInSeconds" : 47.48584210526316,
       "averageSegmentSizeInBytes" : 35918,
       "totalEstimatedDataToBeMovedInBytes" : 1077540,
       "startTimeMs" : 1742319763143
     },
     "rebalanceProgressStatsCurrentStep" : {
       "totalSegmentsToBeAdded" : 10,
       "totalSegmentsToBeDeleted" : 10,
       "totalRemainingSegmentsToBeAdded" : 4,
       "totalRemainingSegmentsToBeDeleted" : 1,
       "totalCarryOverSegmentsToBeAdded" : 0,
       "totalCarryOverSegmentsToBeDeleted" : 0,
       "totalRemainingSegmentsToConverge": 0,
       "totalUniqueNewUntrackedSegmentsDuringRebalance" : 0,
       "percentageTotalSegmentsAddsRemaining" : 40.0,
       "percentageTotalSegmentDeletesRemaining" : 10.0,
       "estimatedTimeToCompleteAddsInSeconds" : 16.923333333333332,
       "estimatedTimeToCompleteDeletesInSeconds" : 2.8205555555555555,
       "averageSegmentSizeInBytes" : 35918,
       "totalEstimatedDataToBeMovedInBytes" : 359180,
       "startTimeMs" : 1742319819779
     },
   ```
   
   The overall stats (`rebalanceProgressStatsOverall`) above are updated at 
each step (`rebalanceProgressStatsCurrentStep`) as well to capture the progress 
during IS-EV convergence. This is to ensure that users only need to monitor 
`rebalanceProgressStatsOverall` to find out how much work has already been 
done. The `rebalanceProgressStatsCurrentStep` can be used for debugging but 
does not need to be monitored as such.
   
   `_totalUniqueNewUntrackedSegmentsDuringRebalance`: this field is added to 
track new segments that show up in the IdealState but which aren't being 
tracked as part of the rebalance. This is just informational and can be used to 
identify how many additional segments got added during the rebalance run.
   
   The time estimates are based on how much time has passed and how many 
segments were already processed and how many are remaining. This may not be a 
very accurate measure and is not based on the actual data moving since 
different set ups may have different throughput characteristics.
   
   This PR also renames some of the existing stats to make them clearer to 
understand.
   
   ## Interesting Scenario
   While experimenting with QuickStart and REALTIME tables, I ran into a 
situation where rebalance would complete successfully, but the stats I added 
would always show some segment deletions as not yet processed.
   
   Digging some more, found the following:
   
https://github.com/apache/pinot/blob/604cd165586716ae04db22856371e50147e30459/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java#L1074
   
   The above only checks for extra assigned instances for given segments if 
`lowDiskMode` is enabled. So in my scenario, there were some extra instances 
assigned when I did the stats calculations, but the above check was never 
performed since `lowDiskMode=false`. The EV-IS convergence was marked as true 
in this case and the rebalance was completed. 
   
   When I updated the code to comment out `lowDiskMode &&` from the above if, I 
saw that eventually the EV does get updated to remove the deleted segments, and 
the EV-IS do match up exactly, but that needs some additional convergence 
loops. In this case the stats were correctly updated to reflect the deletions.
   
   **As a fix for the above:** updated the code to remove `lowDiskMode &&` so 
that this convergence loop always runs to ensure deletions are converged for 
all scenarios and not just for `lowDiskMode`.
   
   ## Testing:
   All testing done with `minAvailableReplicas = -1`
   
   ### Test 1 (no delay in segment adds)
   - Ran HybridQuickStart with 6 servers
   - Tagged 3 servers with `NewDefaultTenant_OFFLINE` and removed 
`DefaultTenant_OFFLINE` from these servers
   - Ran rebalance of two types:
       - Updated replication of OFFLINE table from 1 -> 3, this got triggered 
as single step rebalance to add all the new replicas
       - Updated server tenant of OFFLINE table to `NewDefaultTenant` 
(triggered as a 3 step rebalance to move 1 replica at a time)
   
   ### Test 2 (delay in segment adds/deletes)
   - Added a random delay between 10 - 30 seconds in the 
`onBecomeOnlineFromOffline()`, `onBecomeOnlineFromConsuming()`, 
`onBecomeConsumingFromOffline()` state transition callbacks
   - Added a random delay between 0 - 30 seconds in the ` 
onBecomeDroppedFrom*()` state transition callbacks
   - Ran HybridQuickStart with 6 servers
   - Tagged 3 servers with `NewDefaultTenant_OFFLINE` and removed 
`DefaultTenant_OFFLINE` from these servers
   - Ran rebalance of two types (both were triggered as multi-step rebalances):
       - Updated replication of OFFLINE table from 1 -> 3, monitored the steps 
(due to random delay in state transition, could see some segments getting added 
sooner and others take longer)
       - Updated server tenant of OFFLINE table to `NewDefaultTenant` and 
similarly monitored the steps
   
   ### Additional testing (with delays):
   - REALTIME table with and without `includeConsuming` (similar steps as for 
OFFLINE above)
   - Scenario where some segments get added to the IdealState in the middle of 
rebalance (e.g. forceCommit for REALTIME table) to tag some servers as 
`NewDefaultTenant_REALTIME` and remove `DefaultTenant_REALTIME` tag
   - Test to force some segments into `ERROR` state, and use `bestEfforts=true` 
to validate that convergence stats are correctly captured. Also verified 
convergence stats with `forceCommit` since the existing `CONSUMING` segments 
change to `ONLINE` and the rebalance correctly waits (and stats are correctly 
updated) until the state convergence also completes.
   - Set some segments to `OFFLINE` state in ideal state. Verified that while 
they are in `OFFLINE` state in ideal state, we skip progress tracking for 
those. Eventually if they get updated to ONLINE state, we track them again and 
wait for convergence.
   - Updated the code to add fields for carry-over segments that didn't 
converge in the last step. This will mostly occur if `bestEffort=true`, and we 
are unable to complete convergence within the timeout. The other scenario was 
with `lowDiskMode` and extra deletions from EV, but we changed the code here to 
always wait for additional deletions in the EV. The percentage calculations 
include these carry-over segments.
   
   cc @raghavyadav01 @Jackie-Jiang @klsince @J-HowHuang 
   Something went wrong with the branch of my original PR: 
https://github.com/apache/pinot/pull/15266
   This PR is the exact same. I'll close the other one


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to