[jira] [Commented] (HADOOP-13263) Reload cached groups in background after expiry

Arpit Agarwal (JIRA) Tue, 14 Jun 2016 15:56:38 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330817#comment-15330817
 ]


Arpit Agarwal commented on HADOOP-13263:
----------------------------------------

The patch looks good, thanks for contributing it.

A few comments:
# We should set keepAliveTime for thread pool threads so we don't always pay 
the fixed cost. e.g. [via this 
constructor|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#ThreadPoolExecutor(int,%20int,%20long,%20java.util.concurrent.TimeUnit,%20java.util.concurrent.BlockingQueue)].
 I'd suggest 
[newCachedThreadPool|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executors.html#newCachedThreadPool()]
 but it doesn't appear to support maximum number of threads.
# Can you avoid using the static block in {{GroupCacheLoader}}? The thread pool 
can be instantiated at the first async reload.
# Perhaps the default should be 3-5 threads instead of 1. With keepAliveTime 
these threads won't be using up memory all the time.
# I don't think there can be a generic way for {{reload}} to timeout the 
{{Future}} since we don't require the {{GroupMappingServiceProvider}} 
implementations to support one so we can remove this comment.
{code}
     // call? If so, do we need a way of timing out the future
     // so all the threads don't get blocked?
{code}

Couple of counters Groups can expose now or later in another. These could be 
NameNode metrics later.
# The number of pending cache reloads blocked on executors i.e. counter 
incremented just before {{executorService#submit}} and decremented when the 
call is invoked.
# Number of load operations currently in progress.

> Reload cached groups in background after expiry
> -----------------------------------------------
>
>                 Key: HADOOP-13263
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13263
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>         Attachments: HADOOP-13263.001.patch
>
>
> In HADOOP-11238 the Guava cache was introduced to allow refreshes on the 
> Namenode group cache to run in the background, avoiding many slow group 
> lookups. Even with this change, I have seen quite a few clusters with issues 
> due to slow group lookups. The problem is most prevalent in HA clusters, 
> where a slow group lookup on the hdfs user can fail to return for over 45 
> seconds causing the Failover Controller to kill it.
> The way the current Guava cache implementation works is approximately:
> 1) On initial load, the first thread to request groups for a given user 
> blocks until it returns. Any subsequent threads requesting that user block 
> until that first thread populates the cache.
> 2) When the key expires, the first thread to hit the cache after expiry 
> blocks. While it is blocked, other threads will return the old value.
> I feel it is this blocking thread that still gives the Namenode issues on 
> slow group lookups. If the call from the FC is the one that blocks and 
> lookups are slow, if can cause the NN to be killed.
> Guava has the ability to refresh expired keys completely in the background, 
> where the first thread that hits an expired key schedules a background cache 
> reload, but still returns the old value. Then the cache is eventually 
> updated. This patch introduces this background reload feature. There are two 
> new parameters:
> 1) hadoop.security.groups.cache.background.reload - default false to keep the 
> current behaviour. Set to true to enable a small thread pool and background 
> refresh for expired keys
> 2) hadoop.security.groups.cache.background.reload.threads - only relevant if 
> the above is set to true. Controls how many threads are in the background 
> refresh pool. Default is 1, which is likely to be enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13263) Reload cached groups in background after expiry

Reply via email to