J-HowHuang opened a new pull request, #16886: URL: https://github.com/apache/pinot/pull/16886
## Description Tenant rebalance job could not be cancelled by API because the `TenantRebalancer`, unlike `TableRebalancer`, was not synced or checking with ZK. Currently `TenantRebalancer` relies on `ZkBasedTenantRebalanceObserver` to update the rebalance context to ZK, since https://github.com/apache/pinot/pull/16455, but never check with any updates on the content in ZK. This introduce problems: 1. When `TenantRebalanceChecker` determines a job is stuck and abort the job, it clears the queues in the context on ZK and spawns a new tenant rebalance job. If the job was in fact not stuck and still running by one of the controller, the controller wouldn't be aware of the abortion of the job it's currently doing, and keep running the next table in the queue. It would then update the rebalance context to ZK, which overwrites the aborted job context so the abortion becomes ineffective. 2. When adding the new features that modify the tenant rebalance queues, such as tenant rebalance job cancellation, the controller doesn't have a way to learn about any update made to the context on ZK, it sticks with the context locally instead. This makes any update to a tenant rebalance job impossible to be read by the controller. ## Change * Tenant rebalancer now depends on `ZkBasedTenantRebalanceObserver` to poll from queue, update status when a job is done. Tenant rebalance job metadata on ZK is the only ground truth that controller reads the context from. * Add `DELETE /tenants/rebalance/{jobId}` API to cancel a tenant rebalance job * Change tenant rebalance progress status of each table from `UNPROCESSED - > IN_QUEUE` `PROCESSING - > REBALANCING`, `PROCESSED -> DONE`, `(new) CANCELLED // cancelled by user` `ABORTED // cancelled by TenantRebalanceChecker` `NOT_SCHEDULED // tables IN_QUEUE will be marked as NOT_SCHEDULED once the rebalance job is cancelled/aborted` * Remove duplicate code that marks a table rebalance job as aborted/cancelled, into `TableRebalanceManager.cancelRebalance` ## Testing ### Basic usage verified via quickstart: Status before cancellation ``` { "timeElapsedSinceStartInSeconds": 61, "tenantRebalanceProgressStats": { "startTimeMs": 1758737223513, "totalTables": 10, "completionStatusMsg": null, "timeToFinishInSeconds": 0, "tableStatusMap": { "airlineStats_OFFLINE": "DONE", "testUnnest_OFFLINE": "DONE", "baseballStats_OFFLINE": "REBALANCING", "dimBaseballTeams_OFFLINE": "IN_QUEUE", "fineFoodReviews_OFFLINE": "IN_QUEUE", "clickstreamFunnel_OFFLINE": "IN_QUEUE", "starbucksStores_OFFLINE": "IN_QUEUE", "githubEvents_OFFLINE": "IN_QUEUE", "githubComplexTypeEvents_OFFLINE": "IN_QUEUE", "billing_OFFLINE": "IN_QUEUE" }, "remainingTables": 8, "tableRebalanceJobIdMap": { "airlineStats_OFFLINE": "99212151-701b-40ab-a58e-8a6b2ea40097", "testUnnest_OFFLINE": "53df78b6-ae78-4aec-b021-cd7ccaadd916", "baseballStats_OFFLINE": "d17e2d0b-c2a6-479f-8e9b-c824d60a97ab" } } } ``` After cancellation: ``` { "timeElapsedSinceStartInSeconds": 74, "tenantRebalanceProgressStats": { "startTimeMs": 1758737223513, "totalTables": 10, "completionStatusMsg": "Tenant rebalance job has been cancelled.", "timeToFinishInSeconds": 74, "tableStatusMap": { "airlineStats_OFFLINE": "DONE", "testUnnest_OFFLINE": "DONE", "baseballStats_OFFLINE": "CANCELLED", "dimBaseballTeams_OFFLINE": "NOT_SCHEDULED", "fineFoodReviews_OFFLINE": "NOT_SCHEDULED", "clickstreamFunnel_OFFLINE": "NOT_SCHEDULED", "starbucksStores_OFFLINE": "NOT_SCHEDULED", "githubEvents_OFFLINE": "NOT_SCHEDULED", "githubComplexTypeEvents_OFFLINE": "NOT_SCHEDULED", "billing_OFFLINE": "NOT_SCHEDULED" }, "remainingTables": 0, "tableRebalanceJobIdMap": { "airlineStats_OFFLINE": "99212151-701b-40ab-a58e-8a6b2ea40097", "testUnnest_OFFLINE": "53df78b6-ae78-4aec-b021-cd7ccaadd916", "baseballStats_OFFLINE": "d17e2d0b-c2a6-479f-8e9b-c824d60a97ab" } } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
