reele commented on issue #16976:
URL: 
https://github.com/apache/dolphinscheduler/issues/16976#issuecomment-2608725571

   @ruanwenjun 
   i did some test for a 5mins failed retry task, run workflow, the task failed 
and waiting retry, and stop the workflow, the workflow stop after 5mins.
   
   ```log
   ...
   
   [WI-0][TI-0] - 2025-01-22 14:55:19.537 INFO  
[MasterRpcServer-methodInvoker-5] o.a.d.s.m.e.WorkflowEventBus:[41] - Publish 
event: TaskRunningLifecycleEvent{task=<Task-with-retry>, runtimeContext=null}
   [WI-3954361][TI-0] - 2025-01-22 14:55:19.641 INFO  
[ds-workflow-eventbus-worker-11] 
o.a.d.s.m.e.t.l.h.AbstractTaskLifecycleEventHandler:[47] - Fired task 
<Task-with-retry> TaskRunningLifecycleEvent{task=<Task-with-retry>, 
runtimeContext=null} with state RUNNING_EXECUTION
   
   [WI-0][TI-0] - 2025-01-22 14:55:20.400 INFO  
[MasterRpcServer-methodInvoker-12] o.a.d.s.m.e.WorkflowEventBus:[41] - Publish 
event: TaskFailedLifecycleEvent{task=<Task-with-retry>, endTime=Wed Jan 22 
14:55:20 GMT+08:00 2025}
   [WI-3954361][TI-0] - 2025-01-22 14:55:20.445 INFO  
[ds-workflow-eventbus-worker-10] o.a.d.s.m.e.WorkflowEventBus:[41] - Publish 
event: TaskRetryLifecycleEvent{task=<Task-with-retry>, delayTime=300096/ms}
   [WI-3954361][TI-0] - 2025-01-22 14:55:20.447 INFO  
[ds-workflow-eventbus-worker-10] 
o.a.d.s.m.e.t.l.h.AbstractTaskLifecycleEventHandler:[47] - Fired task 
<Task-with-retry> TaskFailedLifecycleEvent{task=<Task-with-retry>, endTime=Wed 
Jan 22 14:55:20 GMT+08:00 2025} with state RUNNING_EXECUTION
   
   [WI-0][TI-0] - 2025-01-22 14:55:34.205 INFO  
[MasterRpcServer-methodInvoker-27] o.a.d.s.m.e.WorkflowEventBus:[41] - Publish 
event: 
WorkflowStopLifecycleEvent{workflow=<Workflow-with-retry-task>-20250122145518737}
   
   
   @@@@#### here was blocking WorkflowStopLifecycleEvent for 5mins ####@@@@
   
   
   [WI-3954361][TI-0] - 2025-01-22 15:00:20.577 INFO  
[ds-workflow-eventbus-worker-20] o.a.d.s.m.e.WorkflowEventBus:[41] - Publish 
event: TaskStartLifecycleEvent{task=<Task-with-retry>}
   [WI-3954361][TI-0] - 2025-01-22 15:00:20.578 INFO  
[ds-workflow-eventbus-worker-20] 
o.a.d.s.m.e.t.l.h.AbstractTaskLifecycleEventHandler:[47] - Fired task 
<Task-with-retry> TaskRetryLifecycleEvent{task=<Task-with-retry>, 
delayTime=300096/ms} with state FAILURE
   [WI-3954361][TI-0] - 2025-01-22 15:00:20.579 INFO  
[ds-workflow-eventbus-worker-20] 
o.a.d.s.m.e.w.l.h.AbstractWorkflowLifecycleEventHandler:[47] - Begin fire 
workflow <Workflow-with-retry-task>-20250122145518737 
LifecycleEvent[WorkflowStopLifecycleEvent{workflow=<Workflow-with-retry-task>-20250122145518737}]
 with state: RUNNING_EXECUTION
   [WI-3954361][TI-0] - 2025-01-22 15:00:20.582 INFO  
[ds-workflow-eventbus-worker-20] 
o.a.d.s.m.e.w.s.AbstractWorkflowStateAction:[150] - Success set 
WorkflowExecuteRunnable: <Workflow-with-retry-task>-20250122145518737 state 
from: RUNNING_EXECUTION to READY_STOP
   
   ...
   ```
   
   and i just found the main reason is here !!:
   
   
https://github.com/apache/dolphinscheduler/blob/352b47bd8576a47f83285ecfffec589de462fac0/dolphinscheduler-eventbus/src/main/java/org/apache/dolphinscheduler/eventbus/AbstractDelayEvent.java#L62-L64
   
   AbstractDelayEvent use createTimeInNano to compare other event, DelayQueue 
will sort the events using createTimeInNano, so the retry event was first put 
in queue, DelayQueue will take retry event first.
   
   if i change the compared value `createTimeInNano` to `createTimeInNano + 
delayTime`, that will not block the WorkflowStopLifecycleEvent any more.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to