merlimat opened a new pull request, #25989:
URL: https://github.com/apache/pulsar/pull/25989

   ### Motivation
   
   PIP-454 (`pip/pip-454.md`) documents that operators recover from a failed 
metadata store migration by re-running `start`, and that the system stays 
consistent because the coordinator waits for every participant before cutting 
over. Three edge cases in the migration framework break those guarantees:
   
   1. **Retry after FAILED does not work.** 
`MigrationCoordinator.setInitialMigrationPhase()` wrote the migration flag with 
`expectedVersion = -1` (create-only). After a failed migration the flag node 
still exists with `phase=FAILED`, so a retried `start` failed with `BadVersion` 
in the background thread (the REST call still returned 204) and the phase 
silently stayed `FAILED`. Worse, the failed claim was handled by the catch 
block that marks the migration `FAILED`, so calling `start` while another 
migration was running overwrote the in-progress flag with `FAILED`, breaking 
the running migration.
   
   2. **Preparation timeout did not abort the migration.** 
`waitForPreparation()` timed out after 60s but only logged an error and fell 
through to the COPYING phase. A participant that never acknowledged (e.g. a 
component running an older version) would have its ephemeral nodes missing from 
the target store after cutover.
   
   3. **Components starting mid-migration ended up broken.** A 
`DualMetadataStore` created while a migration was in PREPARATION/COPYING 
registered a participant node that never acknowledged (stalling the coordinator 
at `waitForPreparation`), and never initialized the target store: when the 
COMPLETED notification later arrived, `handleMigrationComplete()` dereferenced 
the null `targetStore` (NPE), leaving the component broken until restart.
   
   ### Modifications
   
   - `MigrationCoordinator.setInitialMigrationPhase()` now reads the existing 
flag first: if absent, it is created with the original create-only put; a 
leftover `NOT_STARTED`/`FAILED` flag is replaced with a CAS on its current 
version (so concurrent starts still race safely); 
`PREPARATION`/`COPYING`/`COMPLETED` are rejected. The claim is performed before 
the try/catch that marks the migration `FAILED`, so a rejected start no longer 
overwrites the state of a migration that is already in progress.
   - `MetadataMigrationBase.startMigration` enforces the 409 response that was 
already declared in its swagger annotations, by checking the current phase 
synchronously before launching the migration in the background. The check reads 
the flag from the source store, which is its authoritative location (after a 
completed migration the phase-routed read would return the stale copy from the 
target store).
   - `MigrationCoordinator.waitForPreparation()` throws when participants have 
not acknowledged within the timeout, so the migration transitions to `FAILED` 
instead of proceeding to COPYING. The participant list is re-checked before 
failing, and the timeout is constructor-configurable for tests (default 
unchanged at 60s).
   - `DualMetadataStore` created while a migration is already in 
PREPARATION/COPYING runs the preparation handler at startup: it initializes the 
target store and acknowledges its participant node, exactly as if it had 
received the PREPARATION notification. `handleMigrationComplete()` initializes 
the target store if it has not been initialized yet (idempotent), instead of 
throwing NPE.
   - New tests: `testRetryAfterFailedMigration`, 
`testStartRejectedWhileMigrationInProgress` and 
`testPreparationTimeoutFailsMigration` in `MigrationCoordinatorTest`, and 
`testStoreCreatedDuringMigration` in `DualMetadataStoreTest`. Each test was 
verified to fail against the previous behavior with the exact symptom described 
above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to