[
https://issues.apache.org/jira/browse/HADOOP-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361880#comment-15361880
]
Yongjun Zhang commented on HADOOP-13339:
----------------------------------------
Hi [~ozawa],
Thanks for pointing me to HADOOP-11361.
I was about to update here:
Per the trace stack I'm seeing:
{code}
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateAttrCache(MetricsSourceAdapter.java:260)
at
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:184)
at
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:155)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBeanInfo(DefaultMBeanServerInterceptor.java:1378)
{code}
The relevant code is
{code}
private void updateJmxCache() {
boolean getAllMetrics = false;
synchronized(this) {
if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
// temporarilly advance the expiry while updating the cache
jmxCacheTS = Time.now() + jmxCacheTTL;
// lastRecs might have been set to an object already by another thread.
// Track the fact that lastRecs has been reset once to make sure refresh
// is correctly triggered.
if (lastRecsCleared) {
getAllMetrics = true;
lastRecsCleared = false;
}
}
else {
return;
}
}
if (getAllMetrics) {
MetricsCollectorImpl builder = new MetricsCollectorImpl();
getMetrics(builder, true); <== This is where lastRecs is created/assigned
non-NULL
}
synchronized(this) {
updateAttrCache(); <=== This is where the NPE happened
if (getAllMetrics) {
updateInfoCache();
}
jmxCacheTS = Time.now();
lastRecs = null; // in case regular interval update is not running <==
This is where lastRecs assigned NULL
lastRecsCleared = true;
}
}
{code}
If one thread enters the above method with {{Time.now() - jmxCacheTS >=
jmxCacheTTL}} being false, then {{getAllMetrics}} will continue to be false,
then when {{updateAttrCache()}} is called, it will hit NULL {{lastRecs}}
Even a single thread like this would hit the issue. So it doesn't seem a pure
synchronization issue. And the code prior to HADOOP-12482 appear to have the
same issue.
Brief look at HADOOP-12482, the logic seems fine, except the hole pointed out
here. While we need to examine the synchronization a bit further, have a
checking in MetricsSourceAdapter#updateAttrCache (so not to access it when
lastRecs is NULL) seems reasonable.
What do you think?
Thanks.
> MetricsSourceAdapter#updateAttrCache may throw NPE due to NULL lastRecs
> -----------------------------------------------------------------------
>
> Key: HADOOP-13339
> URL: https://issues.apache.org/jira/browse/HADOOP-13339
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Yongjun Zhang
> Assignee: Yongjun Zhang
>
> The for loop below may find lastRecs NULL
> {code}
> private int updateAttrCache() {
> LOG.debug("Updating attr cache...");
> int recNo = 0;
> int numMetrics = 0;
> for (MetricsRecordImpl record : lastRecs) {
> for (MetricsTag t : record.tags()) {
> setAttrCacheTag(t, recNo);
> ++numMetrics;
> }
> for (AbstractMetric m : record.metrics()) {
> setAttrCacheMetric(m, recNo);
> ++numMetrics;
> }
> ++recNo;
> }
> LOG.debug("Done. # tags & metrics="+ numMetrics);
> return numMetrics;
> }
> {code}
> and throws NPE.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]