lhotari opened a new pull request, #25389: URL: https://github.com/apache/pulsar/pull/25389
### Motivation The `OneWayReplicatorUsingGlobalZKTest.cleanup` method is flaky, failing ~5 times per week in CI. The failure occurs because: 1. The test enables `transactionCoordinatorEnabled`, which creates a `__transaction_buffer_snapshot` system topic 2. During cleanup, reducing replication clusters triggers async topic cleanup 3. The `__transaction_buffer_snapshot` topic enters a transitional state (compaction `CancellationException`), causing its force-delete to return HTTP 422 repeatedly 4. The 30-second Awaitility timeout in `cleanupPulsarResources()` expires and throws 5. **Critically**, the exception prevents the rest of `cleanup()` from running — brokers, ZK, and BK are never shut down, leaking resources See [flaky test report](https://github.com/lhotari/pulsar-flakes/blob/master/2026-03-16-to-2026-03-23/org.apache.pulsar.broker.service.OneWayReplicatorUsingGlobalZKTest.cleanup.md) for CI failure details. ### Modifications - Wrap `cleanupPulsarResources()` in try-catch in `OneWayReplicatorTestBase.cleanup()` so that broker, ZK, and BK shutdown always proceeds even if namespace deletion fails - Add missing cleanup for `sourceClusterAlwaysSchemaCompatibleNamespace` which was created in `setup()` but never deleted in `cleanupPulsarResources()` - Use `Awaitility.await().ignoreExceptions()` retry pattern for the `admin2` namespace deletions in the non-global ZK path, matching the pattern already used for `admin1` deletions ### Verifying this change This change is already covered by existing tests: - Ran `OneWayReplicatorUsingGlobalZKTest` 4 times locally — all passed (21 tests, 0 failures each) - Ran `OneWayReplicatorTest` 1 time locally — passed (35 tests, 0 failures) ### Does this pull request potentially affect one of the following parts: - [x] Dependencies (add or upgrade a dependency): no - [ ] The public API - [ ] The schema - [ ] The default values of configurations - [ ] The threading model - [ ] The binary protocol - [ ] The REST endpoints - [ ] The admin CLI options - [ ] The metrics - [ ] Anything that affects deployment ### Documentation - [x] `doc-not-needed` ### Matching PR in forked repository PR in forked repository: https://github.com/lhotari/pulsar/pull/new/lh-fix-onewayreplicatortestbase-close -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
