lhotari opened a new pull request, #25389:
URL: https://github.com/apache/pulsar/pull/25389

   ### Motivation
   
   The `OneWayReplicatorUsingGlobalZKTest.cleanup` method is flaky, failing ~5 
times per week in CI. The failure occurs because:
   
   1. The test enables `transactionCoordinatorEnabled`, which creates a 
`__transaction_buffer_snapshot` system topic
   2. During cleanup, reducing replication clusters triggers async topic cleanup
   3. The `__transaction_buffer_snapshot` topic enters a transitional state 
(compaction `CancellationException`), causing its force-delete to return HTTP 
422 repeatedly
   4. The 30-second Awaitility timeout in `cleanupPulsarResources()` expires 
and throws
   5. **Critically**, the exception prevents the rest of `cleanup()` from 
running — brokers, ZK, and BK are never shut down, leaking resources
   
   See [flaky test 
report](https://github.com/lhotari/pulsar-flakes/blob/master/2026-03-16-to-2026-03-23/org.apache.pulsar.broker.service.OneWayReplicatorUsingGlobalZKTest.cleanup.md)
 for CI failure details.
   
   ### Modifications
   
   - Wrap `cleanupPulsarResources()` in try-catch in 
`OneWayReplicatorTestBase.cleanup()` so that broker, ZK, and BK shutdown always 
proceeds even if namespace deletion fails
   - Add missing cleanup for `sourceClusterAlwaysSchemaCompatibleNamespace` 
which was created in `setup()` but never deleted in `cleanupPulsarResources()`
   - Use `Awaitility.await().ignoreExceptions()` retry pattern for the `admin2` 
namespace deletions in the non-global ZK path, matching the pattern already 
used for `admin1` deletions
   
   ### Verifying this change
   
   This change is already covered by existing tests:
   - Ran `OneWayReplicatorUsingGlobalZKTest` 4 times locally — all passed (21 
tests, 0 failures each)
   - Ran `OneWayReplicatorTest` 1 time locally — passed (35 tests, 0 failures)
   
   ### Does this pull request potentially affect one of the following parts:
   
   - [x] Dependencies (add or upgrade a dependency): no
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The threading model
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] The metrics
   - [ ] Anything that affects deployment
   
   ### Documentation
   
   - [x] `doc-not-needed`
   
   ### Matching PR in forked repository
   
   PR in forked repository: 
https://github.com/lhotari/pulsar/pull/new/lh-fix-onewayreplicatortestbase-close


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to