[ 
https://issues.apache.org/jira/browse/HADOOP-13208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355681#comment-15355681
 ] 

Aaron Fabbri commented on HADOOP-13208:
---------------------------------------

Looks like there is a merge conflict in S3AFileSystem.java:

{code}
+<<<<<<< 3907f2a30d773828e85cb3c7ea367be749b65920
    * Check that a Path belongs to this FileSystem.
    * Unlike the superclass, this version does not look at authority,
    * only hostnames.
    * @param path to check
    * @throws IllegalArgumentException if there is an FS mismatch
+=======
+   * Convert a key to a fully qualified path.
+   * @param key input key
+   * @return the fully qualified path including URI schema and bucket name.
+   */
+  private Path keyToQualifiedPath(String key) {
+    return keyToPath(key).makeQualified(uri, workingDir);
+  }
+
+  /**
+   * Opens an FSDataInputStream at the indicated Path.
+   * @param f the file name to open
+   * @param bufferSize the size of the buffer to be used.
+>>>>>>> HADOOP-13208: build remote iterator direct from object listings; use 
in the nonrecursive listLocatedStatus call
{code}

> S3A listFiles(recursive=true) to do a bulk listObjects instead of walking the 
> pseudo-tree of directories
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13208
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13208
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>         Attachments: HADOOP-13208-branch-2-001.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A major cost in split calculation against object stores turns out be listing 
> the directory tree itself. That's because against S3, it takes S3A two HEADs 
> and two lists to list the content of any directory path (2 HEADs + 1 list for 
> getFileStatus(); the next list to query the contents).
> Listing a directory could be improved slightly by combining the final two 
> listings. However, a listing of a directory tree will still be 
> O(directories). In contrast, a recursive {{listFiles()}} operation should be 
> implementable by a bulk listing of all descendant paths; one List operation 
> per thousand descendants. 
> As the result of this call is an iterator, the ongoing listing can be 
> implemented within the iterator itself



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to