[
https://issues.apache.org/jira/browse/HADOOP-13263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348581#comment-15348581
]
Wei-Chiu Chuang commented on HADOOP-13263:
------------------------------------------
Thanks [~sodonnell] this is a good idea, and thanks [~arpiagariu] for initial
reviews.
I have a few quick comments:
[~sodonnell]
What's the purpose of {{getBackgroundRefreshSuccess()}},
{{getBackgroundRefreshException}}, {{getBackgroundRefreshQueued}},
{{getBackgroundRefreshRunning}} in Group class?
If they are used by tests only, they should not be {{public}} (most likely
package-private), and they should be annotated with {{@VisibleForTesting}}.
[~arpiagariu]
bq. We should add the settings to hdfs-default.xml at a minimum. I don't think
we have any site documentation for setting up group mapping.
The new properties should go into core-default.xml. And there's a
GroupsMapping.md under hadoop-common-project/hadoop-common/src/site/markdown.
It would be really nice if we could get this groups mapping resolution feature
described in this doc.
I also wonder if the new properties should be defined in
{{CommonConfigurationKeys}} instead, because {{CommonConfigurationKeysPublic}}
has a javadoc that says:
{code}
/**
* This class contains constants for configuration keys used
* in the common code.
*
* It includes all publicly documented configuration keys. In general
* this class should not be used directly (use CommonConfigurationKeys
* instead)
*
*/
{code}
> Reload cached groups in background after expiry
> -----------------------------------------------
>
> Key: HADOOP-13263
> URL: https://issues.apache.org/jira/browse/HADOOP-13263
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Attachments: HADOOP-13263.001.patch, HADOOP-13263.002.patch,
> HADOOP-13263.003.patch, HADOOP-13263.004.patch, HADOOP-13263.005.patch,
> HADOOP-13263.006.patch
>
>
> In HADOOP-11238 the Guava cache was introduced to allow refreshes on the
> Namenode group cache to run in the background, avoiding many slow group
> lookups. Even with this change, I have seen quite a few clusters with issues
> due to slow group lookups. The problem is most prevalent in HA clusters,
> where a slow group lookup on the hdfs user can fail to return for over 45
> seconds causing the Failover Controller to kill it.
> The way the current Guava cache implementation works is approximately:
> 1) On initial load, the first thread to request groups for a given user
> blocks until it returns. Any subsequent threads requesting that user block
> until that first thread populates the cache.
> 2) When the key expires, the first thread to hit the cache after expiry
> blocks. While it is blocked, other threads will return the old value.
> I feel it is this blocking thread that still gives the Namenode issues on
> slow group lookups. If the call from the FC is the one that blocks and
> lookups are slow, if can cause the NN to be killed.
> Guava has the ability to refresh expired keys completely in the background,
> where the first thread that hits an expired key schedules a background cache
> reload, but still returns the old value. Then the cache is eventually
> updated. This patch introduces this background reload feature. There are two
> new parameters:
> 1) hadoop.security.groups.cache.background.reload - default false to keep the
> current behaviour. Set to true to enable a small thread pool and background
> refresh for expired keys
> 2) hadoop.security.groups.cache.background.reload.threads - only relevant if
> the above is set to true. Controls how many threads are in the background
> refresh pool. Default is 1, which is likely to be enough.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]