morningman opened a new issue #2964: [Bug][RoutineLoad] Routine Load encounter 
"label already used" exception
URL: https://github.com/apache/incubator-doris/issues/2964
 
 
   **Describe Bug**
   
   In some scenarios, the "Routine Load" with encounter "label already used" 
error. This error will cause the job to be paused, but the `resume` command can 
resume the job.
   
   **Why**
   The reason for this problem is that when FE schedules a certain routine load 
task, it call `beginTxn()` succeeds, but call `submitTask()` fails. After the 
submit task fails, the task will be put back into the queue. This will cause 
the task to re-call the `beginTxn()` the next time it is scheduled, thus 
reporting an error: "label already used".
   
   The `submitTask()` failed because BE returned an error: `TOO_MANY_TASKS`. In 
the routine load scenario, this error should not have occurred, because the FE 
has controlled the degree of concurrency of each BE execution task. The reason 
for this error is that we do not have good control over the actual execution 
time of each task. Each task may encounter an rpc timeout error when it is 
executed in the BE, and the timeout time is a fixed 10 minutes. This results in 
a task that originally took 10 seconds to execute, which may take 10 minutes, 
which can take up threads for a long time.
   
   **How to fix**
   To fix the above problems, we need to modify two places:
   
   1. After submit task fails, the task is no longer put back in the queue, but 
"pretends" that the task submission is successful. And this task will be 
discarded because of timeout. This can guarantee that the task will not be 
executed `beginTxn()` again, and the job will not be paused due to a failure in 
submitting a task.
   
   2. The rpc timeout of each task executed in the BE is set to the 
`query_timeout` of this task to minimize the problem of the task occupying 
resources for a long time. Although this modification may still cause the job 
to run longer than expected, it can significantly alleviate some problems.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to