[ 
https://issues.apache.org/jira/browse/HADOOP-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160033#comment-16160033
 ] 

ASF GitHub Bot commented on HADOOP-13600:
-----------------------------------------

Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/hadoop/pull/157#discussion_r137931556
  
    --- Diff: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
 ---
    @@ -896,45 +941,108 @@ private boolean innerRename(Path source, Path dest)
             keysToDelete.add(new DeleteObjectsRequest.KeyVersion(dstKey));
           }
     
    -      Path parentPath = keyToPath(srcKey);
    -      RemoteIterator<LocatedFileStatus> iterator = 
listFilesAndEmptyDirectories(
    -          parentPath, true);
    -      while (iterator.hasNext()) {
    -        LocatedFileStatus status = iterator.next();
    -        long length = status.getLen();
    -        String key = pathToKey(status.getPath());
    -        if (status.isDirectory() && !key.endsWith("/")) {
    -          key += "/";
    -        }
    -        keysToDelete
    -            .add(new DeleteObjectsRequest.KeyVersion(key));
    -        String newDstKey =
    -            dstKey + key.substring(srcKey.length());
    -        copyFile(key, newDstKey, length);
    -
    -        if (hasMetadataStore()) {
    -          // with a metadata store, the object entries need to be updated,
    -          // including, potentially, the ancestors
    -          Path childSrc = keyToQualifiedPath(key);
    -          Path childDst = keyToQualifiedPath(newDstKey);
    -          if (objectRepresentsDirectory(key, length)) {
    -            S3Guard.addMoveDir(metadataStore, srcPaths, dstMetas, childSrc,
    -                childDst, username);
    +      // A blocking queue that tracks all objects that need to be deleted
    +      BlockingQueue<Optional<DeleteObjectsRequest.KeyVersion>> deleteQueue 
= new ArrayBlockingQueue<>(
    +              MAX_ENTRIES_TO_DELETE * 1.5);
    +
    +      // Used to track if the delete thread was gracefully shutdown
    +      boolean deleteFutureComplete = false;
    +      FutureTask<Void> deleteFuture = null;
    +
    +      try {
    +        // Launch a thread that will read from the deleteQueue and batch 
delete any files that have already been copied
    +        deleteFuture = new FutureTask<>(new Callable<Void>() {
    --- End diff --
    
    If we target java 8 only, this can be a proper lambda-expression :)


> S3a rename() to copy files in a directory in parallel
> -----------------------------------------------------
>
>                 Key: HADOOP-13600
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13600
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Steve Loughran
>            Assignee: Sahil Takiar
>
> Currently a directory rename does a one-by-one copy, making the request 
> O(files * data). If the copy operations were launched in parallel, the 
> duration of the copy may be reducable to the duration of the longest copy. 
> For a directory with many files, this will be significant



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to