diegovallone opened a new issue, #2760: URL: https://github.com/apache/pekko/issues/2760
When using Reliable Delivery with a standard `ProducerController` backed by a durable queue (e.g., `EventSourcedProducerQueue`), restarting the producer causes an immediate crash if all previous messages were confirmed before the restart. The `ProducerController` successfully reloads its state but crashes with the following error before a `ConsumerController` can even establish demand: `java.lang.IllegalStateException: Unexpected Msg when no demand, requested true, requestedSeqNr 1, currentSeqNr X` (where `X` is the actual restored sequence number). Note: The `WorkPullingProducerController` handles state initialization differently and is unaffected by this issue. **Steps to Reproduce:** I have verified this behavior using an isolated test with Pekko's in-memory journal. 1. Start a `ProducerController` with a durable queue and a `ConsumerController`. 2. Send a message, and allow the consumer to receive and confirm it (clearing the unconfirmed buffer). 3. Wait for the `ProducerController` to receive the next `RequestNext`, ensuring the confirmed state is fully written to the durable queue. 4. Stop and restart the ProducerController to simulate a restart, crash, or deployment. 5. The framework emits a `RequestNext` with `seqNr = 1` instead of the restored sequence number. When the producer supplies the next message, it fails the internal demand check and crashes with an `IllegalStateException`. (See the attached **ProducerControllerBugTest.scala** snippet below for the fully reproducible test case. Note that the test is inverted: it succeeds if it can reproduce the bug). [ProducerControllerBugTest.txt](https://github.com/user-attachments/files/26105216/ProducerControllerBugTest.txt) **Root Cause:** In **ProducerControllerImpl.scala**, the state recovery logic ignores the loaded sequence number when initializing the demand window and requesting the next message from the local producer: In `createState`, `requestedSeqNr` is hardcoded to `1L` instead of adopting `loadedState.currentSeqNr`. In `becomeActive`, if `state.unconfirmed.isEmpty` is `true`, it hardcodes `1L` and `0L` into the `RequestNext` message and the flight recorder, rather than using `state.currentSeqNr` and `state.confirmedSeqNr`. (See the attached **bugfix.txt** which contains a `git diff` of the fix that apparently solves this issue) [bugfix.txt](https://github.com/user-attachments/files/26105226/bugfix.txt) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
