[ 
https://issues.apache.org/jira/browse/HADOOP-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332668#comment-15332668
 ] 

Chris Nauroth commented on HADOOP-13278:
----------------------------------------

bq. This is my first ticket/pull request against Hadoop, so let me know if I'm 
not following some convention properly 

[~apetresc], thank you very much for participating in Apache Hadoop!.  Please 
see our [HowToContribute|https://wiki.apache.org/hadoop/HowToContribute] wiki 
page for more details on how the contribution process works.  If you're 
interested in working on S3A, then also please pay particular attention to the 
section on [submitting patches against object 
store|https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure].
  That section discusses our requirements for integration testing of patches 
against the back-end services (S3, Azure Storage, etc.).

I understand the motivation for the proposed change, but I have to vote -1, 
because it would violate the semantics required of a Hadoop-compatible file 
system.  The patch would allow a directory to be created as a descendant of a 
file, which works against expectations of applications in the Hadoop ecosystem. 
 More concretely, the patch causes a test failure in 
{{TestS3AContractMkdir#testMkdirOverParentFile}}, which tests for exactly this 
condition.  (See below.)

However, you might be interested to know that there is a lot of other work in 
progress on hardening and optimizing S3A.  This is tracked in issues 
HADOOP-11694 and HADOOP-13204, and their sub-tasks.

{code}
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractMkdir
Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 11.502 sec <<< 
FAILURE! - in org.apache.hadoop.fs.contract.s3a.TestS3AContractMkdir
testMkdirOverParentFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractMkdir) 
 Time elapsed: 1.924 sec  <<< FAILURE!
java.lang.AssertionError: mkdirs did not fail over a file but returned true; ls 
s3a://cnauroth-test-aws-s3a/test/testMkdirOverParentFile[00] 
S3AFileStatus{path=s3a://cnauroth-test-aws-s3a/test/testMkdirOverParentFile; 
isDirectory=false; length=1024; replication=1; blocksize=33554432; 
modification_time=1466026655000; access_time=0; owner=; group=; 
permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false

        at org.junit.Assert.fail(Assert.java:88)
        at 
org.apache.hadoop.fs.contract.AbstractContractMkdirTest.testMkdirOverParentFile(AbstractContractMkdirTest.java:95)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}


> S3AFileSystem mkdirs does not need to validate parent path components
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-13278
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13278
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools
>            Reporter: Adrian Petrescu
>            Priority: Minor
>
> According to S3 semantics, there is no conflict if a bucket contains a key 
> named {{a/b}} and also a directory named {{a/b/c}}. "Directories" in S3 are, 
> after all, nothing but prefixes.
> However, the {{mkdirs}} call in {{S3AFileSystem}} does go out of its way to 
> traverse every parent path component for the directory it's trying to create, 
> making sure there's no file with that name. This is suboptimal for three main 
> reasons:
>  * Wasted API calls, since the client is getting metadata for each path 
> component 
>  * This can cause *major* problems with buckets whose permissions are being 
> managed by IAM, where access may not be granted to the root bucket, but only 
> to some prefix. When you call {{mkdirs}}, even on a prefix that you have 
> access to, the traversal up the path will cause you to eventually hit the 
> root bucket, which will fail with a 403 - even though the directory creation 
> call would have succeeded.
>  * Some people might actually have a file that matches some other file's 
> prefix... I can't see why they would want to do that, but it's not against 
> S3's rules.
> I've opened a pull request with a simple patch that just removes this portion 
> of the check. I have tested it with my team's instance of Spark + Luigi, and 
> can confirm it works, and resolves the aforementioned permissions issue for a 
> bucket on which we only had prefix access.
> This is my first ticket/pull request against Hadoop, so let me know if I'm 
> not following some convention properly :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to