[ 
https://issues.apache.org/jira/browse/HADOOP-14104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HADOOP-14104:
------------------------------------
    Attachment: HADOOP-14104-trunk-v2.patch

This patch addresses most of the previous comments.
What changed in this patch compared to previous ?
1. During job submission, added a new secret in the credential's secret map.
DFS-KMS-<namenodeUri> --> keyproviderUri 
This mapping captures the namenode's keyProviderUri at the time of job 
submission.
Every task that is scheduled to run will contact this key provider uri for 
decrypting EDEK's.
2. The key provider uri will be searched in the following order.
- from the credentials secrets map.
- Query the namenode via server defaults.
- Local conf.

Previous concerns:
1. From [~andrew.wang]
bq. I like that getServerDefaults is lock-free, but I'm still worried about the 
overhead. 
The namenode#getServerDefaults will be queried only once at the time of job 
submission.

2. From [~yzhangal]
{quote}
Currently getServerDefaults() contact NN every hour, to find if there is any 
update of keyprovider. If keyprovider changed within the hour,
client code may get into exception, wonder if we have mechanism to handle the 
exception and update the keyprovider and try again?
{quote}
This was a very good question which I didn't think while writing previous 
patch. Thanks !
We glue the namenode uri to key provider uri at the time of job submission and 
persist in ugi's credentials object.
The task will find it in credentials object and no longer need to contact 
namenode.
If there is an update(hardware update or put in maintenance mode ) for key 
provider, we plan to keep in decommission mode for 7 days.
So all the tokens which were given out while it was still active will be valid 
for 7 days and then new key provider will issue tokens for newly submitted jobs.

I tried to incorporate all the previous comments in the current patch (v2) but 
let me know if I missed any.

I need one suggestion.

{code:title=DFSClient.java|borderStyle=solid}

  public boolean isHDFSEncryptionEnabled() {
    try {
      return DFSUtilClient.isHDFSEncryptionEnabled(getKeyProviderUri());
    } catch (IOException ioe) {
      // This means the ClientProtocol#getServerDefautls threw StandbyException
      return false;
    }
  }
{code}
{{getKeyProviderUri}} is calling NamenodeRpcServer#getServerDefaults and it can 
throw an Standby Exception in which case, I am returning false.
I don't know what is the right thing to do.
 {{DFSClient.isHDFSEncryptionEnabled()}} is being called by 
{{DistributedFileSystem.getTrashRoot(Path path)}} which doesn't throw any 
IOException so I need to take some decision if an Exception is encountered.
Your help is much appreciated.
Please review.

> Client should always ask namenode for kms provider path.
> --------------------------------------------------------
>
>                 Key: HADOOP-14104
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14104
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: kms
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>         Attachments: HADOOP-14104-trunk.patch, HADOOP-14104-trunk-v1.patch, 
> HADOOP-14104-trunk-v2.patch
>
>
> According to current implementation of kms provider in client conf, there can 
> only be one kms.
> In multi-cluster environment, if a client is reading encrypted data from 
> multiple clusters it will only get kms token for local cluster.
> Not sure whether the target version is correct or not.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to