[
https://issues.apache.org/jira/browse/HADOOP-15961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16748045#comment-16748045
]
Steve Loughran commented on HADOOP-15961:
-----------------------------------------
BTW, Looking at this patch, I think the progress call could go in the inner
loop,
{code}
...
UploadPartResult partResult = writeOperations.uploadPart(part);
offset += uploadPartSize;
parts.add(partResult.getPartETag());
progress.progess() //HERE
}
{code}
That way, it'll be invoked every 32, 64MB of part upload. If the task created
4GB of data, without the per-part uploads you could still get some timeout just
from the time to upload. a progress event per block eliminates this problem
> S3A committers: make sure there's regular progress() calls
> ----------------------------------------------------------
>
> Key: HADOOP-15961
> URL: https://issues.apache.org/jira/browse/HADOOP-15961
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Steve Loughran
> Assignee: lqjacklee
> Priority: Minor
> Attachments: HADOOP-15961-001.patch, HADOOP-15961-002.patch
>
>
> MAPREDUCE-7164 highlights how inside job/task commit more context.progress()
> callbacks are needed, just for HDFS.
> the S3A committers should be reviewed similarly.
> At a glance:
> StagingCommitter.commitTaskInternal() is at risk if a task write upload
> enough data to the localfs that the upload takes longer than the timeout.
> it should call progress it every single file commits, or better: modify
> {{uploadFileToPendingCommit}} to take a Progressable for progress callbacks
> after every part upload.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]