void-ptr974 opened a new issue, #25954: URL: https://github.com/apache/pulsar/issues/25954
### Issue Description Message deduplication recovery can be started more than once for the same topic before the first recovery replay completes. `MessageDeduplication` defines a `Recovering` status to prevent overlapping transitions, but the enable path starts dedup cursor replay without first changing the status to `Recovering`. Because of that, another `checkStatus()` call can still observe `Initialized` or `Disabled` and start a second replay for the same dedup cursor. One practical trigger path is topic load with existing topic-level policies: | Time | Flow A: existing topic-level policy during load | Flow B: normal topic load chain | Dedup status / effect | |------|--------------------------------------------------|----------------------------------|-----------------------| | T1 | `initTopicPolicy()` loads existing topic policies | | dedup status is `Initialized` or `Disabled` | | T2 | `onUpdate()` applies topic policies and calls `checkDeduplicationStatus()` | | first dedup recovery replay starts | | T3 | first replay is still running asynchronously | | status is still `Initialized` or `Disabled` | | T4 | | topic load continues and `BrokerService` explicitly calls `checkDeduplicationStatus()` | | | T5 | | second check also enters the enable path | second replay starts for the same dedup cursor | | T6 | first replay rebuilds producer sequence state | second replay also rebuilds producer sequence state | overlapping replay can advance shared replay/cursor state | Dedup recovery rebuilds producer sequence information from the dedup cursor. If overlapping replay leaves the recovered sequence state incomplete or inconsistent, the broker may fail to identify already-published messages as duplicates. The impact is that duplicate messages can be accepted after topic load or policy refresh even though message deduplication is enabled. There is also a retry gap after recovery failure. If enabling deduplication fails transiently, the status moves to `Failed`, but later checks do not retry enabling even when the topic policy still requires deduplication. ### Error messages There may be no error message. This is a race in the dedup recovery state machine. ### Reproducing the issue A deterministic unit-level reproduction can be built by delaying the first dedup recovery replay and invoking `checkStatus()` again before the first replay completes: 1. Keep `MessageDeduplication` in `Initialized` or `Disabled`. 2. Call `checkStatus()` once and delay async cursor open or replay completion. 3. Call `checkStatus()` again before the first recovery finishes. 4. Observe that the second call enters the enable path again and starts another replay. A production-like trigger path is: 1. Enable message deduplication. 2. Use a topic with existing topic-level policies. 3. Load or reload the topic. 4. During topic load, `initTopicPolicy()` can invoke `onUpdate()`, which applies topic policies and calls `checkDeduplicationStatus()`. 5. The normal topic load chain later calls `checkDeduplicationStatus()` again. 6. If the first recovery replay is still in progress, both checks can start overlapping dedup replay. ### Expected behavior - Deduplication should enter `Recovering` before starting recovery replay. - Concurrent status checks should not start another recovery while replay is in progress. - If enable fails transiently, a later check should retry enabling when deduplication is still required by policy. ### Actual behavior - Enable starts replay while status remains `Initialized` or `Disabled`. - Another `checkStatus()` can start a second replay before the first completes. - `Failed` status does not retry enable even when deduplication should still be enabled. ### Additional information A fix has been proposed in #25953. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
