[
https://issues.apache.org/jira/browse/HADOOP-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164647#comment-16164647
]
ASF GitHub Bot commented on HADOOP-13600:
-----------------------------------------
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/hadoop/pull/157#discussion_r138618577
--- Diff:
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
---
@@ -891,50 +902,123 @@ private boolean innerRename(Path source, Path dest)
}
List<DeleteObjectsRequest.KeyVersion> keysToDelete = new
ArrayList<>();
+ List<DeleteObjectsRequest.KeyVersion> dirKeysToDelete = new
ArrayList<>();
if (dstStatus != null && dstStatus.isEmptyDirectory() ==
Tristate.TRUE) {
// delete unnecessary fake directory.
keysToDelete.add(new DeleteObjectsRequest.KeyVersion(dstKey));
}
- Path parentPath = keyToPath(srcKey);
- RemoteIterator<LocatedFileStatus> iterator =
listFilesAndEmptyDirectories(
- parentPath, true);
- while (iterator.hasNext()) {
- LocatedFileStatus status = iterator.next();
- long length = status.getLen();
- String key = pathToKey(status.getPath());
- if (status.isDirectory() && !key.endsWith("/")) {
- key += "/";
- }
- keysToDelete
- .add(new DeleteObjectsRequest.KeyVersion(key));
- String newDstKey =
- dstKey + key.substring(srcKey.length());
- copyFile(key, newDstKey, length);
-
- if (hasMetadataStore()) {
- // with a metadata store, the object entries need to be updated,
- // including, potentially, the ancestors
- Path childSrc = keyToQualifiedPath(key);
- Path childDst = keyToQualifiedPath(newDstKey);
- if (objectRepresentsDirectory(key, length)) {
- S3Guard.addMoveDir(metadataStore, srcPaths, dstMetas, childSrc,
- childDst, username);
+ // A blocking queue that tracks all objects that need to be deleted
+ BlockingQueue<Optional<DeleteObjectsRequest.KeyVersion>> deleteQueue
= new ArrayBlockingQueue<>(
+ (int) Math.round(MAX_ENTRIES_TO_DELETE * 1.5));
+
+ // Used to track if the delete thread was gracefully shutdown
+ boolean deleteFutureComplete = false;
+ FutureTask<Void> deleteFuture = null;
+
+ try {
+ // Launch a thread that will read from the deleteQueue and batch
delete any files that have already been copied
+ deleteFuture = new FutureTask<>(() -> {
+ while (true) {
+ while (keysToDelete.size() < MAX_ENTRIES_TO_DELETE) {
+ Optional<DeleteObjectsRequest.KeyVersion> key =
deleteQueue.take();
+
+ // The thread runs until is is given an EOF message (an
Optional#empty())
+ if (key.isPresent()) {
--- End diff --
I'm making the leap to Java 8 with the committer and retry logic. Doesn't
mean we should jump to using Optional just because we can. Put differently: if
you are going to just use it as a fancy null, it's overkill. Key benefit is
that you and use map & foreach, but there we are crippled by java's checked
exceptions stopping us throwing IOEs
> S3a rename() to copy files in a directory in parallel
> -----------------------------------------------------
>
> Key: HADOOP-13600
> URL: https://issues.apache.org/jira/browse/HADOOP-13600
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.7.3
> Reporter: Steve Loughran
> Assignee: Sahil Takiar
> Attachments: HADOOP-13600.001.patch
>
>
> Currently a directory rename does a one-by-one copy, making the request
> O(files * data). If the copy operations were launched in parallel, the
> duration of the copy may be reducable to the duration of the longest copy.
> For a directory with many files, this will be significant
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]