[jira] [Commented] (HADOOP-13278) S3AFileSystem mkdirs does not need to validate parent path components

Chris Nauroth (JIRA) Thu, 16 Jun 2016 09:39:18 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334120#comment-15334120
 ]


Chris Nauroth commented on HADOOP-13278:
----------------------------------------

[~apetresc], I admit I hadn't considered interaction with IAM policies before, 
but I definitely see how this could be useful, and it's interesting to think 
about it.  Unfortunately, I don't see a viable way to satisfy the full range of 
possible authorization requirements that users have come to expect from a file 
system.

For the specific case that we started talking about here (walking up the 
ancestry to verify that there are no pre-existing files), it might work if that 
policy was changed slightly, so that the user was granted full access to 
/a/b/c/\*, and also granted read-only access to /\*.  I expect read access 
would be sufficient for the ancestry-checking logic.  Of course, if you also 
want to block read access to /, then this policy wouldn't satisfy the 
requirement.  It would only block write access on /.

Another consideration is handling of what we call a "fake directory", which is 
a pure metadata object used to indicate the presence of an empty directory.  
For example, consider an administrator allocating a bucket, bootstrapping the 
initial /a/b/c directory structure by running mkdir, and then applying the 
policy I described above.  At this point, S3A has persisted /a/b/c to the 
bucket as what we call a "fake directory", which is a pure metadata object that 
indicates the presence of an empty directory.  After the first file put, say 
/a/b/c/d, S3A no longer needs that pure metadata object to indicate the 
presence of the directory.  Instead, the directory exists implicitly via the 
existence of the file /a/b/c/d.  At that point, S3A would clean up the fake 
directory by deleting /a/b/c.  That implies the user would need to be granted 
delete access to /a/b/c itself, not just /a/b/c/*.  Now if we further consider 
the user deleting /a/b/c/d after that, then S3A needs to recreate the fake 
directory at /a/b/c, so the user is going to need put access on /a/b/c.

bq. Is this correct? If so, I'm not sure a separate issue is needed; the use 
case would simply be unsupported and I'll have to move my S3A filesystem to a 
bucket that grants Hadoop/Spark root access.

Definitely the typical usage is to dedicate the whole bucket to persistence of 
a single S3A file system, with the understanding of the authorization 
limitations that come with that.  Anyone who has credentials to access the 
bucket effectively has full access to that whole file system.  This is a known 
limitation, and it's common to other object store file systems like WASB too.  
I'm not aware of anyone trying to use IAM policies to restrict access to a 
sub-tree.  Certainly it's not something we actively test within the project 
right now, so in that sense, it's unsupported and you'd be treading new ground.

> S3AFileSystem mkdirs does not need to validate parent path components
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-13278
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13278
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3, tools
>            Reporter: Adrian Petrescu
>            Priority: Minor
>
> According to S3 semantics, there is no conflict if a bucket contains a key 
> named {{a/b}} and also a directory named {{a/b/c}}. "Directories" in S3 are, 
> after all, nothing but prefixes.
> However, the {{mkdirs}} call in {{S3AFileSystem}} does go out of its way to 
> traverse every parent path component for the directory it's trying to create, 
> making sure there's no file with that name. This is suboptimal for three main 
> reasons:
>  * Wasted API calls, since the client is getting metadata for each path 
> component 
>  * This can cause *major* problems with buckets whose permissions are being 
> managed by IAM, where access may not be granted to the root bucket, but only 
> to some prefix. When you call {{mkdirs}}, even on a prefix that you have 
> access to, the traversal up the path will cause you to eventually hit the 
> root bucket, which will fail with a 403 - even though the directory creation 
> call would have succeeded.
>  * Some people might actually have a file that matches some other file's 
> prefix... I can't see why they would want to do that, but it's not against 
> S3's rules.
> I've opened a pull request with a simple patch that just removes this portion 
> of the check. I have tested it with my team's instance of Spark + Luigi, and 
> can confirm it works, and resolves the aforementioned permissions issue for a 
> bucket on which we only had prefix access.
> This is my first ticket/pull request against Hadoop, so let me know if I'm 
> not following some convention properly :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13278) S3AFileSystem mkdirs does not need to validate parent path components

Reply via email to