crazychengmm opened a new issue, #17829: URL: https://github.com/apache/dolphinscheduler/issues/17829
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened Search before asking I had searched in the issues and found no similar issues. Description Version: DolphinScheduler 3.2.0 Scenario: I started a workflow via the API. The tasks within the workflow completed successfully, but the workflow instance failed to recognize the completion. This led to a series of abnormal behaviors. Abnormal Behaviors: Infinite Task Creation: The workflow kept creating and submitting new task instances even though previous ones succeeded. Retry Count Stagnation: The retry_times of the task remained at 1, even though the workflow's max retry was configured to 2. It seemed like the workflow was not "retrying" the failed task, but rather "re-triggering" new tasks from scratch. Data Inconsistency: In the Web UI, the workflow instance showed an end_time from the past, but its state remained RUNNING. Bizarre Behavior after Pausing: When I manually clicked "Pause" on the workflow, the system continued to generate new task instances in a PAUSED state every few seconds. Logs: I checked both Master and Worker logs, but no explicit exceptions or error stacks were found during this period. Recovery: The issue was resolved immediately after restarting the Master cluster. The "ghost" tasks stopped being created, and the workflow state synchronized. Possible Root Cause Suspected: It seems like the WorkflowExecuteRunnable or the state machine in the Master node entered an inconsistent state/loop where it failed to update the workflow status while incorrectly believing it needed to schedule more tasks, potentially due to event loss or a race condition in the internal event queue. Steps to Reproduce Start a workflow instance via API in version 3.2.0. Observe if the task finishes but the workflow fails to transition to SUCCESS. Check if new task instances are generated repeatedly. Try to pause the workflow and observe if paused tasks are still being created. Expected Behavior The workflow should transition to SUCCESS once all tasks are finished, and no further task instances should be created. Actual Behavior The workflow remains RUNNING (despite having an end_time), keeps creating new tasks infinitely, and even creates paused tasks after the workflow is paused. Environment OS: [CentOS 7] DolphinScheduler Version: 3.2.0 Storage: [PG] Deployment: [Cluster] 3master 6worker ### What you expected to happen fix this issue ### How to reproduce I don't know how it happen ### Anything else <img width="3826" height="1506" alt="Image" src="https://github.com/user-attachments/assets/262b25b0-ed8b-4861-935f-1c9a8f2b145a" /> taskId 2330628 ### Version 3.2.x ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
