[ 
https://issues.apache.org/jira/browse/HADOOP-11361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366173#comment-15366173
 ] 

Tsuyoshi Ozawa commented on HADOOP-11361:
-----------------------------------------

Thank you for explanation and reviews.I took a look more deeper. I think we 
should check overall semantics of MetricsSourceAdapter instead of doing 
workaround. 

At first, I suspect that {{getMetrics}} has a semantics bug. A following 
condition check whether infoCache should be updated. 
{code}
      if (lastRecs == null && jmxCacheTS == 0) {
              all = true; // Get all the metrics to populate the sink caches
      }
{code}

{{infoCache}} should be updated in following cases:
1. After updateAttrCache is called. It is expressed as lastRecs is null.
2. Before initialization is done - before calling {{updateJmxCache}}. It is 
expressed as {{jmxCacheTS == 0}}. 

I think these condition should be connected with {{OR}} not {{AND}}, so it can 
be fixed as follows:

{code}
      if (lastRecs == null || jmxCacheTS == 0) {
        all = true; // Get all the metrics to populate the sink caches
      }
{code}

What do you think?

Next, the NPE related problem:

{quote}
Race condition is there between two threads calling updateJmxCache() at same 
time.
{quote}

You're right. v3 patch fixed the race condition, but it introduced deadlock 
between JMXJsonServlet and ResourceManager's MetricSystem as Jason mentioned on 
HADOOP-12594:

{quote}
The timer thread has the MetricsSystemImpl lock and is trying to grab the 
MetricsSourceAdapter lock. In the meantime the JMX thread has the 
MetricsSourceAdapter lock and is trying to grab the MetricsSystemImpl lock. The 
locking order isn't consistent so we deadlocked.
{quote}

Brahma's solution is a bit tricky, so please let me confirm for a while.

> Fix a race condition in MetricsSourceAdapter.updateJmxCache
> -----------------------------------------------------------
>
>                 Key: HADOOP-11361
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11361
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.4.1, 2.5.1, 2.6.0
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>         Attachments: HADOOP-111361-003.patch, HADOOP-11361-002.patch, 
> HADOOP-11361-004.patch, HADOOP-11361.patch, HDFS-7487.patch
>
>
> {noformat}
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateAttrCache(MetricsSourceAdapter.java:247)
>       at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:177)
>       at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getAttribute(MetricsSourceAdapter.java:102)
>       at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to