[
https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388426#comment-15388426
]
Chris Nauroth commented on HADOOP-11487:
----------------------------------------
bq. Does listStatus falls outside above consistency ?
Yes, it does. {{FileSystem#listStatus}} maps to an operation listing the keys
in an S3 bucket. For that listing operation, the consistency model you quoted
does not apply. Instead, it follows an eventual consistency model. There may
be propagation delays between creating a key and that key becoming visible in
listings.
There are more details on this behavior in the AWS S3 consistency model doc:
http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel
> FileNotFound on distcp to s3n/s3a due to creation inconsistency
> ----------------------------------------------------------------
>
> Key: HADOOP-11487
> URL: https://issues.apache.org/jira/browse/HADOOP-11487
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs, fs/s3
> Affects Versions: 2.7.2
> Reporter: Paulo Motta
>
> I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm
> getting the following exception:
> {code:java}
> 2015-01-16 20:53:18,187 ERROR [main]
> org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying
> hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz
> java.io.FileNotFoundException: No such file or directory
> 's3n://s3-bucket/file.gz'
> at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
> at
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.io.FileNotFoundException: No such file or
> directory 's3n://s3-bucket/file.gz'
> at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
> at
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there.
> So probably due to Amazon's S3 eventual consistency the job failure.
> In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus
> must use fs.s3.maxRetries property in order to avoid failures like this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]