[
https://issues.apache.org/jira/browse/HADOOP-11361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366173#comment-15366173
]
Tsuyoshi Ozawa commented on HADOOP-11361:
-----------------------------------------
Thank you for explanation and reviews.I took a look more deeper. I think we
should check overall semantics of MetricsSourceAdapter instead of doing
workaround.
At first, I suspect that {{getMetrics}} has a semantics bug. A following
condition check whether infoCache should be updated.
{code}
if (lastRecs == null && jmxCacheTS == 0) {
all = true; // Get all the metrics to populate the sink caches
}
{code}
{{infoCache}} should be updated in following cases:
1. After updateAttrCache is called. It is expressed as lastRecs is null.
2. Before initialization is done - before calling {{updateJmxCache}}. It is
expressed as {{jmxCacheTS == 0}}.
I think these condition should be connected with {{OR}} not {{AND}}, so it can
be fixed as follows:
{code}
if (lastRecs == null || jmxCacheTS == 0) {
all = true; // Get all the metrics to populate the sink caches
}
{code}
What do you think?
Next, the NPE related problem:
{quote}
Race condition is there between two threads calling updateJmxCache() at same
time.
{quote}
You're right. v3 patch fixed the race condition, but it introduced deadlock
between JMXJsonServlet and ResourceManager's MetricSystem as Jason mentioned on
HADOOP-12594:
{quote}
The timer thread has the MetricsSystemImpl lock and is trying to grab the
MetricsSourceAdapter lock. In the meantime the JMX thread has the
MetricsSourceAdapter lock and is trying to grab the MetricsSystemImpl lock. The
locking order isn't consistent so we deadlocked.
{quote}
Brahma's solution is a bit tricky, so please let me confirm for a while.
> Fix a race condition in MetricsSourceAdapter.updateJmxCache
> -----------------------------------------------------------
>
> Key: HADOOP-11361
> URL: https://issues.apache.org/jira/browse/HADOOP-11361
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.4.1, 2.5.1, 2.6.0
> Reporter: Brahma Reddy Battula
> Assignee: Brahma Reddy Battula
> Attachments: HADOOP-111361-003.patch, HADOOP-11361-002.patch,
> HADOOP-11361-004.patch, HADOOP-11361.patch, HDFS-7487.patch
>
>
> {noformat}
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateAttrCache(MetricsSourceAdapter.java:247)
> at
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:177)
> at
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getAttribute(MetricsSourceAdapter.java:102)
> at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]