[
https://issues.apache.org/jira/browse/HADOOP-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160033#comment-16160033
]
ASF GitHub Bot commented on HADOOP-13600:
-----------------------------------------
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/hadoop/pull/157#discussion_r137931556
--- Diff:
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
---
@@ -896,45 +941,108 @@ private boolean innerRename(Path source, Path dest)
keysToDelete.add(new DeleteObjectsRequest.KeyVersion(dstKey));
}
- Path parentPath = keyToPath(srcKey);
- RemoteIterator<LocatedFileStatus> iterator =
listFilesAndEmptyDirectories(
- parentPath, true);
- while (iterator.hasNext()) {
- LocatedFileStatus status = iterator.next();
- long length = status.getLen();
- String key = pathToKey(status.getPath());
- if (status.isDirectory() && !key.endsWith("/")) {
- key += "/";
- }
- keysToDelete
- .add(new DeleteObjectsRequest.KeyVersion(key));
- String newDstKey =
- dstKey + key.substring(srcKey.length());
- copyFile(key, newDstKey, length);
-
- if (hasMetadataStore()) {
- // with a metadata store, the object entries need to be updated,
- // including, potentially, the ancestors
- Path childSrc = keyToQualifiedPath(key);
- Path childDst = keyToQualifiedPath(newDstKey);
- if (objectRepresentsDirectory(key, length)) {
- S3Guard.addMoveDir(metadataStore, srcPaths, dstMetas, childSrc,
- childDst, username);
+ // A blocking queue that tracks all objects that need to be deleted
+ BlockingQueue<Optional<DeleteObjectsRequest.KeyVersion>> deleteQueue
= new ArrayBlockingQueue<>(
+ MAX_ENTRIES_TO_DELETE * 1.5);
+
+ // Used to track if the delete thread was gracefully shutdown
+ boolean deleteFutureComplete = false;
+ FutureTask<Void> deleteFuture = null;
+
+ try {
+ // Launch a thread that will read from the deleteQueue and batch
delete any files that have already been copied
+ deleteFuture = new FutureTask<>(new Callable<Void>() {
--- End diff --
If we target java 8 only, this can be a proper lambda-expression :)
> S3a rename() to copy files in a directory in parallel
> -----------------------------------------------------
>
> Key: HADOOP-13600
> URL: https://issues.apache.org/jira/browse/HADOOP-13600
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.7.3
> Reporter: Steve Loughran
> Assignee: Sahil Takiar
>
> Currently a directory rename does a one-by-one copy, making the request
> O(files * data). If the copy operations were launched in parallel, the
> duration of the copy may be reducable to the duration of the longest copy.
> For a directory with many files, this will be significant
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]