[
https://issues.apache.org/jira/browse/HADOOP-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165701#comment-16165701
]
ASF GitHub Bot commented on HADOOP-13600:
-----------------------------------------
Github user sahilTakiar commented on the issue:
https://github.com/apache/hadoop/pull/157
Updates:
* Moved the parallel rename logic into a dedicated class called
`ParallelDirectoryRenamer`
* A few other bug fixes, the core logic remains the same
@steveloughran your last comment on HADOOP-13786 suggested you may move the
retry logic out into a separate patch? Are you planning to do that? If not, do
you think this patch requires waiting for all the work in HADOOP-13786 to be
completed?
If there are concerns with retry behavior, we could also set the default
value of the copy thread pool to be 1, that way this feature is essentially off
by default.
Also what do you mean by "isn't going to be resilient to large copies where
you are much more likely to hit parallel IO"? What parallel IO are you
referring to?
> S3a rename() to copy files in a directory in parallel
> -----------------------------------------------------
>
> Key: HADOOP-13600
> URL: https://issues.apache.org/jira/browse/HADOOP-13600
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.7.3
> Reporter: Steve Loughran
> Assignee: Sahil Takiar
> Attachments: HADOOP-13600.001.patch
>
>
> Currently a directory rename does a one-by-one copy, making the request
> O(files * data). If the copy operations were launched in parallel, the
> duration of the copy may be reducable to the duration of the longest copy.
> For a directory with many files, this will be significant
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]