NguyenDinhThien-future-aavn opened a new issue, #4002: URL: https://github.com/apache/incubator-kie-kogito-runtimes/issues/4002
### Describe the bug I am having issue with SonataFlow not being able to handle callback events for a workflow instance if there are multiple pods/instances of SonataFlow running. For context, I am using the latest snapshot version of SonataFlow built manually from the github repositories. I am deploying my SonataFlow app on a kubernetes cluster, and I am horizontally scaling the SonataFlow app to multiple pods. <img width="565" height="287" alt="Image" src="https://github.com/user-attachments/assets/8a03d853-6df0-46cb-8333-ad49fdc135b9" /> My workflow consists of multiple "callback" states. Basically my SonataFlow app will orchestrate other services by sending messages to them via a messaging system. So, when the callback state in my workflow will send the message to other services, then the callback state will wait for a callback event. Once the other services finish its work, it will send a request to the corresponding event of the callback state so the workflow can proceed. There are times where the workflow after publishing the message, the workflow still not finished processing the original request that starts workflow, and during that in-between time, the other service sends a callback event to the workflow. With just 1 pod/instance of SonataFlow running, what I see is that when SonataFlow receive the callback event call, it will wait until the previous request to finish, and then it processes the callback event request. This leads to the workflow can proceed with the next states until it completes. But when my SonataFlow deployment horizontally scales to multiple pods/instances, and if the callback event call reaches the pod that is not the one that first initialize the first state and publish the message, the other pod that receives the callback event request cannot continue with the next states of the workflow. But the callback event request still respond with a 202 http code. This leads to the messaging system considering the message for the callback event request is successful and acknowledged. <img width="1119" height="686" alt="Image" src="https://github.com/user-attachments/assets/8a546cef-f061-4fb3-942d-772dd1448cbc" /> ### Expected behavior From my point of view, the callback event request should fail with a 500 http code response, because it was not able to process that event. ### Actual behavior The callback event request to the 2nd pod still respond with a 202 http code, meaning the event was consumed but the workflow does not transition to the next state. ### How to Reproduce? To reproduce, I also prepare an example zip file containing the workflows with a readme to reproduce. [callback-workflow.zip](https://github.com/user-attachments/files/21503434/callback-workflow.zip) ### Output of `uname -a` or `ver` _No response_ ### Output of `java -version` 17 ### GraalVM version (if different from Java) _No response_ ### Kogito version or git rev (or at least Quarkus version if you are using Kogito via Quarkus platform BOM) _No response_ ### Build tool (ie. output of `mvnw --version` or `gradlew --version`) _No response_ ### Additional information _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
