[ 
https://issues.apache.org/jira/browse/HADOOP-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320067#comment-15320067
 ] 

Chris Nauroth commented on HADOOP-13241:
----------------------------------------

Steve, thank you for doing this.  This is very helpful information.

{code}
this can make `seek()` slow on large files. It also does not
handle "/" in secret key. The reason there has been no attempt to fix this is
that every upgrade of the Jets3t library, while
{code}

According to comments on HADOOP-3733, S3A has this problem too, so it might not 
be accurate to characterize it as an S3N-specific problem or a Jets3t problem.

{code}
* `amazon-core-java-SDK` jar.
{code}

I wasn't sure what this line meant.  I see the aws-java-sdk-core dependency is 
called out as a separate line item, so is this something else?

{code}
The files in an object store are not visible until the write has been completed.
when partitioned upload is in progress, they may be visible. Otherwise,
in-progress writes are simply saved to a local file and only copied up
{code}

Is the 'w' in "when" meant to be capitalized?

I thought with multi-part upload, the object is only visible after completion.  
Am I mistaken?

{code}
S3 renaming is a very expensive `O(data)` operation which may fail partway 
through
{code}

Perhaps specifically mention 2 specific use cases that are often flagged for 
poor rename performance: the MapReduce FileOutputCommitter and DistCp's rename 
after copy.

I think it would be worthwhile for the classpath section to mention the shell 
profile that automatically adds hadoop-aws and its dependencies to the default 
Hadoop classpath.  The easiest way to get S3A on the classpath is to symlink to 
that stock shell profile.  However, this would require different patches for 
trunk vs. branch-2.  If you prefer to keep the current patch applicable to both 
trunk and branch-2, then I'd be happy to pick up this part in a separate 
trunk-only patch.

> document s3a better
> -------------------
>
>                 Key: HADOOP-13241
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13241
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: documentation, fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>         Attachments: HADOOP-13241-branch-2-001.patch
>
>
> s3a can be documented better, things like classpath, troubleshooting, etc.
> sit down and do it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to