[
https://issues.apache.org/jira/browse/HADOOP-11321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230778#comment-14230778
]
Chris Nauroth commented on HADOOP-11321:
----------------------------------------
Unfortunately, I've discovered a snag in the pure native code approach: the
process umask. The Hadoop file system APIs have established a contract that
permissions on newly created objects are governed not by the process umask, but
rather by the {{fs.permissions.umask-mode}} configuration property. On the
local file system, this is implemented by separate syscalls for
{{creat}}/{{mkdir}} followed by {{chmod}}. This guarantees that if the caller
asks for 644, then it gets 644, even if the process umask is 077.
Actually, it's more confusing than that, because we're not consistent about it.
It looks like file creations always apply {{fs.permissions.umask-mode}}. For
directories, {{FileContext}} applies it, but {{FileSystem}} doesn't. This
means, for example, that "hadoop fs -mkdir" on the local file system is in fact
governed by the process umask.
Another interesting thing I found after further experimentation is that the
problem does not repro for an SMB share mounted on Linux. A {{chmod}} call
"succeeds" without error, but simply does not change the permissions. The
error handling seems to be specific to the OS client. This may in fact turn
out to be a Windows-only bug, contrary to my prior statement.
I'm not aware of any Unix file/directory creation syscalls that let you bypass
the umask. That would mean achieving atomic create-and-set-permissions would
require a native {{umask(0)}} call. I'm very reluctant to do that, because we
can't predict how this might compromise existing applications, especially for
applications that use a mix of Hadoop and their own file creation calls. I
suppose another possibility is to fork another process to do its own
{{umask(0)}}, but then we'd have a lot of process creation overhead.
Considering all of that, I'm currently pursuing a Windows-only native code
implementation, with Linux continuing to run the existing code path. I believe
this can work, because Windows does not have a process umask or anything
equivalent that would interfere with the intention of
{{fs.permissions.umask-mode}}. Unfortunately, creations on Linux would still
be subject to the race condition between {{creat}}/{{mkdir}} and {{chmod}} that
we have in today's code, but at least the situation wouldn't get any worse.
> copyToLocal cannot save a file to an SMB share unless the user has Full
> Control permissions.
> --------------------------------------------------------------------------------------------
>
> Key: HADOOP-11321
> URL: https://issues.apache.org/jira/browse/HADOOP-11321
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.6.0
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: HADOOP-11321.1.patch, HADOOP-11321.2.patch,
> winutils.tmp.patch
>
>
> In Hadoop 2, it is impossible to use {{copyToLocal}} to copy a file from HDFS
> to a destination on an SMB share. This is because in Hadoop 2, the
> {{copyToLocal}} maps to 2 underlying {{RawLocalFileSystem}} operations:
> {{create}} and {{setPermission}}. On an SMB share, the user may be
> authorized for the {{create}} but denied for the {{setPermission}}. Windows
> denies the {{WRITE_DAC}} right required by {{setPermission}} unless the user
> has Full Control permissions. Granting Full Control isn't feasible for most
> deployments, because it's insecure. This is a regression from Hadoop 1,
> where {{copyToLocal}} only did a {{create}} and didn't do a separate
> {{setPermission}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)