[
https://issues.apache.org/jira/browse/HADOOP-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452379#comment-15452379
]
Jason Lowe commented on HADOOP-11223:
-------------------------------------
I recently ran across this same issue with a user complaining about
unnecessarily slow localization. It takes us at least 3 seconds to localize
anything because that's the minimum startup time of the localizer program.
Some of this slowness is from loading almost 4000 classes (!!) during
localization and needing to lookup every known FileSystem before any FileSystem
can be instantiated, but the largest problem is redundant processing in
Configuration.
The problem with copying a cached configuration object is the add default
resource behavior. That invalidates every configuration object and causes each
of them to reload the XML files _separately_. So if we have a cached object
that never gets directly used but only has copies, we'll always copy a conf
that hasn't loaded the XML and each copy will then load the XML the first time
a property is accessed in them.
Given that addDefaultResource is updating every instantiated conf object
whether people like it or not, I don't see why we can't reuse a conf object for
the straightforward cases like static code blocks, record readers, codec
factories, etc. etc.. We can wrap the singleton instance with a subclass that
throws UnsupportedOperationException on set methods. As for
addDefaultResource IMHO we can block it or not, since even if we block it on
the singleton other conf instances that have this called will update the
others. And that's the way it used to work, so arguably we're preserving the
old behavior by allowing addDefaultResource to update the shared singleton
instance.
The only downside I see once we lock down the set methods is that we
theoretically can break some use-case where the XML files were updated on disk
and the code really needed to see the fresh copy. However I don't think that
will be the case for the instances within the Hadoop core code that will be
updated to use this new singleton instance. Not sure if there are other
scenarios that could break where one plays games with the classloader to switch
which XML files will be seen. I'm assuming that if classloading games are
being played then we're going to get multiple Configuration classes loaded each
with their own singleton instance.
Bonus points if we can eliminate the requirement that each conf object has to
separately load all resources after addDefaultResource. It'd be nice if one
conf object does the load and we poke it into the others like we poke the
others when addDefaultResource was called. Also would be nice if we only
loaded the newly added resource rather than reloading all resources. However
that's really orthogonal to the shared, read-only instance idea and more
appropriate for a separate JIRA.
> Offer a read-only conf alternative to new Configuration()
> ---------------------------------------------------------
>
> Key: HADOOP-11223
> URL: https://issues.apache.org/jira/browse/HADOOP-11223
> Project: Hadoop Common
> Issue Type: Bug
> Components: conf
> Reporter: Gopal V
> Assignee: Varun Saxena
> Labels: Performance
> Attachments: HADOOP-11223.001.patch
>
>
> new Configuration() is called from several static blocks across Hadoop.
> This is incredibly inefficient, since each one of those involves primarily
> XML parsing at a point where the JIT won't be triggered & interpreter mode is
> essentially forced on the JVM.
> The alternate solution would be to offer a {{Configuration::getDefault()}}
> alternative which disallows any modifications.
> At the very least, such a method would need to be called from
> # org.apache.hadoop.io.nativeio.NativeIO::<clinit>()
> # org.apache.hadoop.security.SecurityUtil::<clinit>()
> # org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider::<clinit>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]