[jira] [Comment Edited] (HADOOP-15076) Enhance s3a troubleshooting docs, add perf section

Abraham Fine (JIRA) Thu, 11 Jan 2018 15:50:47 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323239#comment-16323239
 ]


Abraham Fine edited comment on HADOOP-15076 at 1/11/18 11:49 PM:
-----------------------------------------------------------------

I'm new to this codebase so I think I was able to point out a few parts of the 
documentation that may be confusing to new users.

h3. performance.md
* Would it be possible to change the introduction setting from two sequential 
lists to a table? That may make it easier to compare S3 and HDFS.
* {{list files a lot. This includes the setup of all queries agains data:}} 
typo in agains
* {{The MapReduce `FileOutputCommitter`. This also used by Apache Spark.}} I'm 
not sure what this sentence is trying to express
* {{Your problem may appear to be performance, but really it is that the commit 
protocol is both slow and unreliable}} Isn't the commit protocol being slow 
part of "performance"? Can this be rephrased? 
* {{This is leads to maximum read throughput}} "This will lead to..."?
* Perhaps describe the {{random}} policy before {{normal}} as one needs to 
understand {{random}} before understanding {{normal}}.
* {{may consume large amounts of resources if each query is working with a 
different set of s3 buckets}} Why wouldn't a large amount of resources be 
consumed if working with the same set of s3 buckets?
* {{When uploading data, it is uploaded in blocks set by the option}} Consider 
changing to "Data is uploaded in blocks set by the option..."
* Extra newline on 451

h3. troubleshooting_s3a.md
* {{Whatever problem you have, changing the AWS SDK version will not fix 
things, only change the stack traces you see.}} Again, I'm new here so I'm not 
sure about the history of this issue but this section seems a little heavy 
handed to me. Does amazon never release "bug fix" versions of their client that 
are API compatible? How can we make this statement with such certainty?


was (Author: abrahamfine):
I'm new to this codebase so I think I was able to point out a few parts of the 
documentation that may be confusing to new users.

h3. performance.md
* Would it be possible to change the introduction setting from two sequential 
lists to a table? That may make it easier to compare S3 and HDFS.
* {{list files a lot. This includes the setup of all queries agains data:}} 
typo in agains
* {{The MapReduce `FileOutputCommitter`. This also used by Apache Spark.}} I'm 
not sure what this sentence is trying to express
* {{Your problem may appear to be performance, but really it is that the commit 
protocol is both slow and unreliable}} Isn't the commit protocol being slow 
part of "performance"? Can this be rephrased? 
* {{This is leads to maximum read throughput}} "This will lead to..."?
* Perhaps describe the {{random}} policy before {{normal}} as one needs to 
understand {{random}} before understanding {{normal}}.
* {{may consume large amounts of resources if each query is working with a 
different set of s3 buckets}} Why wouldn't a large amount of resources be 
consumed if working with the same set of s3 buckets?
* {{When uploading data, it is uploaded in blocks set by the option}} Consider 
changing to "Data is uploaded in blocks set by the option..."
* Extra newline on 451

h3. troubleshooting_s3a.md
* {{Whatever problem you have, changing the AWS SDK version will not fix 
things, only change the stack traces you see.}} Again, I'm new here so I'm not 
sure about the history of this issue but this section seems a little heavy 
handed to me. Does amazon never release "bug fix" versions of their client that 
are API compatible? How can we make this statement with such certainty?
* 

> Enhance s3a troubleshooting docs, add perf section
> --------------------------------------------------
>
>                 Key: HADOOP-15076
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15076
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: documentation, fs/s3
>    Affects Versions: 2.8.2
>            Reporter: Steve Loughran
>            Assignee: Abraham Fine
>         Attachments: HADOOP-15076-001.patch, HADOOP-15076-002.patch, 
> HADOOP-15076-003.patch, HADOOP-15076-004.patch
>
>
> A recurrent theme in s3a-related JIRAs, support calls etc is "tried upgrading 
> the AWS SDK JAR and then I got the error ...". We know here "don't do that", 
> but its not something immediately obvious to lots of downstream users who 
> want to be able to drop in the new JAR to fix things/add new features
> We need to spell this out quite clearlyi "you cannot safely expect to do 
> this. If you want to upgrade the SDK, you will need to rebuild the whole of 
> hadoop-aws with the maven POM updated to the latest version, ideally 
> rerunning all the tests to make sure something hasn't broken. 
> Maybe near the top of the index.md file, along with "never share your AWS 
> credentials with anyone"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HADOOP-15076) Enhance s3a troubleshooting docs, add perf section

Reply via email to