void-ptr974 opened a new pull request, #25953: URL: https://github.com/apache/pulsar/pull/25953
### Motivation Message deduplication has a `Recovering` status to prevent overlapping state transitions, but the enable path started cursor replay without first moving the state to `Recovering`. A common high-frequency case is topic load with existing topic-level policies. The topic load path can trigger dedup status checks from two flows before the first recovery replay completes: | Time | Flow A: existing topic-level policy during load | Flow B: normal topic load chain | Dedup status / effect | |------|--------------------------------------------------|----------------------------------|-----------------------| | T1 | `initTopicPolicy()` loads existing topic policies | | `Initialized` or `Disabled` | | T2 | `onUpdate()` applies topic policies and calls `checkDeduplicationStatus()` | | First dedup recovery replay starts | | T3 | The first replay is asynchronous and still running | | Status is still `Initialized` or `Disabled` because enable did not set `Recovering` | | T4 | | Topic load continues and `BrokerService` explicitly calls `checkDeduplicationStatus()` | | | T5 | | The second check also enters the enable path | A second replay starts for the same dedup cursor | | T6 | First replay is rebuilding producer sequence state | Second replay is also rebuilding producer sequence state | Overlapping replay can advance shared replay/cursor state and leave recovered dedup state incomplete or inconsistent | Dedup recovery rebuilds producer sequence information from the dedup cursor. If overlapping replay leaves the recovered sequence state incomplete or inconsistent, the broker may fail to recognize already-published messages as duplicates and accept duplicate messages after recovery. There was also a retry gap after enable failure. A transient failure moved deduplication to `Failed`, but later checks did not retry enabling even when the topic policy still required deduplication. ### Modifications - Move deduplication to `Recovering` before starting cursor replay. - Allow `Failed` status to retry enabling when deduplication should be enabled. - Allow `Failed` status to proceed with disabling when deduplication should be disabled. - Keep enable failures visible by leaving the status as `Failed`. - Add tests for concurrent recovery and retry after failed enable. ### Verifications ```bash ./gradlew :pulsar-broker:test --tests org.apache.pulsar.broker.BrokerMessageDeduplicationTest ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
