merlimat opened a new pull request, #25989: URL: https://github.com/apache/pulsar/pull/25989
### Motivation PIP-454 (`pip/pip-454.md`) documents that operators recover from a failed metadata store migration by re-running `start`, and that the system stays consistent because the coordinator waits for every participant before cutting over. Three edge cases in the migration framework break those guarantees: 1. **Retry after FAILED does not work.** `MigrationCoordinator.setInitialMigrationPhase()` wrote the migration flag with `expectedVersion = -1` (create-only). After a failed migration the flag node still exists with `phase=FAILED`, so a retried `start` failed with `BadVersion` in the background thread (the REST call still returned 204) and the phase silently stayed `FAILED`. Worse, the failed claim was handled by the catch block that marks the migration `FAILED`, so calling `start` while another migration was running overwrote the in-progress flag with `FAILED`, breaking the running migration. 2. **Preparation timeout did not abort the migration.** `waitForPreparation()` timed out after 60s but only logged an error and fell through to the COPYING phase. A participant that never acknowledged (e.g. a component running an older version) would have its ephemeral nodes missing from the target store after cutover. 3. **Components starting mid-migration ended up broken.** A `DualMetadataStore` created while a migration was in PREPARATION/COPYING registered a participant node that never acknowledged (stalling the coordinator at `waitForPreparation`), and never initialized the target store: when the COMPLETED notification later arrived, `handleMigrationComplete()` dereferenced the null `targetStore` (NPE), leaving the component broken until restart. ### Modifications - `MigrationCoordinator.setInitialMigrationPhase()` now reads the existing flag first: if absent, it is created with the original create-only put; a leftover `NOT_STARTED`/`FAILED` flag is replaced with a CAS on its current version (so concurrent starts still race safely); `PREPARATION`/`COPYING`/`COMPLETED` are rejected. The claim is performed before the try/catch that marks the migration `FAILED`, so a rejected start no longer overwrites the state of a migration that is already in progress. - `MetadataMigrationBase.startMigration` enforces the 409 response that was already declared in its swagger annotations, by checking the current phase synchronously before launching the migration in the background. The check reads the flag from the source store, which is its authoritative location (after a completed migration the phase-routed read would return the stale copy from the target store). - `MigrationCoordinator.waitForPreparation()` throws when participants have not acknowledged within the timeout, so the migration transitions to `FAILED` instead of proceeding to COPYING. The participant list is re-checked before failing, and the timeout is constructor-configurable for tests (default unchanged at 60s). - `DualMetadataStore` created while a migration is already in PREPARATION/COPYING runs the preparation handler at startup: it initializes the target store and acknowledges its participant node, exactly as if it had received the PREPARATION notification. `handleMigrationComplete()` initializes the target store if it has not been initialized yet (idempotent), instead of throwing NPE. - New tests: `testRetryAfterFailedMigration`, `testStartRejectedWhileMigrationInProgress` and `testPreparationTimeoutFailsMigration` in `MigrationCoordinatorTest`, and `testStoreCreatedDuringMigration` in `DualMetadataStoreTest`. Each test was verified to fail against the previous behavior with the exact symptom described above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
