jstastny-cz opened a new issue, #2237: URL: https://github.com/apache/incubator-kie-issues/issues/2237
There is apparent concurrency issue when scheduled timers are being deleted before they can be executed. The following BPMN process with boundary timer event with PT30S timeout on the User Task triggers a Script Task which throws exception - in such case the Job created for the boundary timer should be retried based on relevant jobs-service configuration: ``` kogito.jobs-service.maxNumberOfRetries=5 kogito.jobs-service.retryMillis=1000 ``` and perhaps also to reproduce the issue in timely manner ``` kogito.jobs-service.schedulerChunkInMinutes=5 ``` <img width="1537" height="915" alt="Image" src="https://github.com/user-attachments/assets/dd0e4ba3-4b12-44f9-a5ba-5885d0dbe084" /> The observed behavior in the setup described above is showing a blocked retry execution after the initial retry attempt. 1. initial attempt: ``` 1. 13:26:03 DEBUG [or.ki.ko.ap.jo.im.VertxJobScheduler] (vert.x-eventloop-thread-1) Executing timeout with timer Id 1 and jobId ee4ad0e6-0147-4fe1-8eab-a4f1acef1395 2. 13:26:03 TRACE [or.ki.ko.ap.jo.im.VertxJobScheduler] (Jobs-1) Timeout task 1 with jobId ee4ad0e6-0147-4fe1-8eab-a4f1acef1395 newTimeoutTask (exception thrown from task and following) 3. 13:26:03 TRACE [or.ki.ko.ap.jo.im.VertxJobScheduler] (Jobs-1) doRetryIfAny JobDetails ... retries=1 4. 13:26:03 TRACE [or.ki.ko.ap.jo.im.VertxJobScheduler] (Jobs-1) Timeout 1 with jobId ee4ad0e6-0147-4fe1-8eab-a4f1acef1395 will be updated and scheduled 5. 13:26:03 TRACE [or.ki.ko.ap.jo.im.VertxJobScheduler] (Jobs-1) removeTimerInfo TimerInfo[jobId=ee4ad0e6-0147-4fe1-8eab-a4f1acef1395, timerId=1, timeout=Wed Feb 04 13:26:03 CET 2026] 6. 13:26:03 TRACE [or.ki.ko.ap.jo.im.VertxJobScheduler] (Jobs-1) addTimerInfo JobDetails ... retries=1 7. only now or.ki.ko.ap.jo.in.ErrorHandlingJobTimeoutInterceptor kicks in to report the failure. ``` 2. first retry attempt: ``` 1. 13:26:03 DEBUG [or.ki.ko.ap.jo.im.VertxJobScheduler] (vert.x-eventloop-thread-1) Executing timeout with timer Id 2 and jobId ee4ad0e6-0147-4fe1-8eab-a4f1acef1395 2. 13:26:03 TRACE [or.ki.ko.ap.jo.im.VertxJobScheduler] (Jobs-2) Timeout task 2 with jobId ee4ad0e6-0147-4fe1-8eab-a4f1acef1395 newTimeoutTask (exception thrown from task and following) 3. 13:26:03 TRACE [or.ki.ko.ap.jo.im.VertxJobScheduler] (Jobs-2) doRetryIfAny JobDetails ... retries=1 (original) 13:26:03 TRACE [or.ki.ko.ap.jo.im.VertxJobScheduler] (Jobs-2) doRetryIfAny JobDetails ... retries=2 (rescheduled) 4. 13:26:03 TRACE [or.ki.ko.ap.jo.im.VertxJobScheduler] (Jobs-2) Timeout 2 with jobId ee4ad0e6-0147-4fe1-8eab-a4f1acef1395 will be updated and scheduled 5. 13:26:03 TRACE [or.ki.ko.ap.jo.im.VertxJobScheduler] (Jobs-2) removeTimerInfo TimerInfo[jobId=ee4ad0e6-0147-4fe1-8eab-a4f1acef1395, timerId=2, timeout=Wed Feb 04 13:26:03 CET 2026] 6. 13:26:03 TRACE [or.ki.ko.ap.jo.im.VertxJobScheduler] (Jobs-2) addTimerInfo JobDetails ... retries=2 7. 13:26:03 TRACE [or.ki.ko.ap.jo.im.VertxJobScheduler] (Jobs-2) removeTimerInfo TimerInfo[jobId=ee4ad0e6-0147-4fe1-8eab-a4f1acef1395, timerId=3, timeout=Wed Feb 04 13:26:03 CET 2026] ``` * See the last TRACE log - it immediately removes the timer it has just created. Investigation lead to conclusion that the unintended removeTimerInfo call belongs to the "previous" state of jobDetails, because the last execution attempt and should NOT cancel the scheduled timer. To remove the problem, the most straight-forward way is to extend TimerInfo record by the retry attempt ordinal to distinguish timers between the retry attempts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
