github-actions[bot] commented on code in PR #64423:
URL: https://github.com/apache/doris/pull/64423#discussion_r3427739488
##########
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingMultiTblTask.java:
##########
@@ -168,13 +168,36 @@ private void sendWriteRequest() throws JobException {
log.warn("cdc_client RPC timeout api=/api/writeRecords taskId={}
jobId={} backend={}:{} timeout_sec={}",
taskId, getJobId(), backend.getHost(),
backend.getBrpcPort(),
Config.streaming_cdc_heavy_rpc_timeout_sec);
+ // the request may have been dispatched and still running remotely
+ noRetry = true;
Review Comment:
This timeout path now makes the FE task fail immediately (`noRetry`), but
the comment explicitly says the `/api/writeRecords` request may still be
running remotely. If that remote async task later reaches `commitOffset`, FE
still accepts it because `StreamingInsertJob.commitOffset()` only checks
`isCanceled` and the task id, not that the task is still RUNNING/not FAILED. A
concrete path is: brpc times out after cdc_client accepted the request, this
block calls `onFail()` and pauses the job with the task marked FAILED, then the
cdc_client finishes and commits with the same task id; FE updates
offset/statistics and `successCallback()` clears the failure and queues the
next task even though the task had already failed. Please fence late commits
from failed/noRetry tasks, for example by rejecting commits unless the current
task is still RUNNING or by marking this timeout-failed task invalid/canceled
before it can commit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]