[
https://issues.apache.org/jira/browse/HADOOP-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323239#comment-16323239
]
Abraham Fine edited comment on HADOOP-15076 at 1/11/18 11:49 PM:
-----------------------------------------------------------------
I'm new to this codebase so I think I was able to point out a few parts of the
documentation that may be confusing to new users.
h3. performance.md
* Would it be possible to change the introduction setting from two sequential
lists to a table? That may make it easier to compare S3 and HDFS.
* {{list files a lot. This includes the setup of all queries agains data:}}
typo in agains
* {{The MapReduce `FileOutputCommitter`. This also used by Apache Spark.}} I'm
not sure what this sentence is trying to express
* {{Your problem may appear to be performance, but really it is that the commit
protocol is both slow and unreliable}} Isn't the commit protocol being slow
part of "performance"? Can this be rephrased?
* {{This is leads to maximum read throughput}} "This will lead to..."?
* Perhaps describe the {{random}} policy before {{normal}} as one needs to
understand {{random}} before understanding {{normal}}.
* {{may consume large amounts of resources if each query is working with a
different set of s3 buckets}} Why wouldn't a large amount of resources be
consumed if working with the same set of s3 buckets?
* {{When uploading data, it is uploaded in blocks set by the option}} Consider
changing to "Data is uploaded in blocks set by the option..."
* Extra newline on 451
h3. troubleshooting_s3a.md
* {{Whatever problem you have, changing the AWS SDK version will not fix
things, only change the stack traces you see.}} Again, I'm new here so I'm not
sure about the history of this issue but this section seems a little heavy
handed to me. Does amazon never release "bug fix" versions of their client that
are API compatible? How can we make this statement with such certainty?
was (Author: abrahamfine):
I'm new to this codebase so I think I was able to point out a few parts of the
documentation that may be confusing to new users.
h3. performance.md
* Would it be possible to change the introduction setting from two sequential
lists to a table? That may make it easier to compare S3 and HDFS.
* {{list files a lot. This includes the setup of all queries agains data:}}
typo in agains
* {{The MapReduce `FileOutputCommitter`. This also used by Apache Spark.}} I'm
not sure what this sentence is trying to express
* {{Your problem may appear to be performance, but really it is that the commit
protocol is both slow and unreliable}} Isn't the commit protocol being slow
part of "performance"? Can this be rephrased?
* {{This is leads to maximum read throughput}} "This will lead to..."?
* Perhaps describe the {{random}} policy before {{normal}} as one needs to
understand {{random}} before understanding {{normal}}.
* {{may consume large amounts of resources if each query is working with a
different set of s3 buckets}} Why wouldn't a large amount of resources be
consumed if working with the same set of s3 buckets?
* {{When uploading data, it is uploaded in blocks set by the option}} Consider
changing to "Data is uploaded in blocks set by the option..."
* Extra newline on 451
h3. troubleshooting_s3a.md
* {{Whatever problem you have, changing the AWS SDK version will not fix
things, only change the stack traces you see.}} Again, I'm new here so I'm not
sure about the history of this issue but this section seems a little heavy
handed to me. Does amazon never release "bug fix" versions of their client that
are API compatible? How can we make this statement with such certainty?
*
> Enhance s3a troubleshooting docs, add perf section
> --------------------------------------------------
>
> Key: HADOOP-15076
> URL: https://issues.apache.org/jira/browse/HADOOP-15076
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: documentation, fs/s3
> Affects Versions: 2.8.2
> Reporter: Steve Loughran
> Assignee: Abraham Fine
> Attachments: HADOOP-15076-001.patch, HADOOP-15076-002.patch,
> HADOOP-15076-003.patch, HADOOP-15076-004.patch
>
>
> A recurrent theme in s3a-related JIRAs, support calls etc is "tried upgrading
> the AWS SDK JAR and then I got the error ...". We know here "don't do that",
> but its not something immediately obvious to lots of downstream users who
> want to be able to drop in the new JAR to fix things/add new features
> We need to spell this out quite clearlyi "you cannot safely expect to do
> this. If you want to upgrade the SDK, you will need to rebuild the whole of
> hadoop-aws with the maven POM updated to the latest version, ideally
> rerunning all the tests to make sure something hasn't broken.
> Maybe near the top of the index.md file, along with "never share your AWS
> credentials with anyone"
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]