[ 
https://issues.apache.org/jira/browse/HADOOP-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452379#comment-15452379
 ] 

Jason Lowe commented on HADOOP-11223:
-------------------------------------

I recently ran across this same issue with a user complaining about 
unnecessarily slow localization.  It takes us at least 3 seconds to localize 
anything because that's the minimum startup time of the localizer program.  
Some of this slowness is from loading almost 4000 classes (!!) during 
localization and needing to lookup every known FileSystem before any FileSystem 
can be instantiated, but the largest problem is redundant processing in 
Configuration.

The problem with copying a cached configuration object is the add default 
resource behavior.  That invalidates every configuration object and causes each 
of them to reload the XML files _separately_.  So if we have a cached object 
that never gets directly used but only has copies, we'll always copy a conf 
that hasn't loaded the XML and each copy will then load the XML the first time 
a property is accessed in them.

Given that addDefaultResource is updating every instantiated conf object 
whether people like it or not, I don't see why we can't reuse a conf object for 
the straightforward cases like static code blocks, record readers, codec 
factories, etc. etc..  We can wrap the singleton instance with a subclass that 
throws UnsupportedOperationException on set methods.   As for 
addDefaultResource IMHO we can block it or not, since even if we block it on 
the singleton other conf instances that have this called will update the 
others.  And that's the way it used to work, so arguably we're preserving the 
old behavior by allowing addDefaultResource to update the shared singleton 
instance.

The only downside I see once we lock down the set methods is that we 
theoretically can break some use-case where the XML files were updated on disk 
and the code really needed to see the fresh copy.  However I don't think that 
will be the case for the instances within the Hadoop core code that will be 
updated to use this new singleton instance.  Not sure if there are other 
scenarios that could break where one plays games with the classloader to switch 
which XML files will be seen.  I'm assuming that if classloading games are 
being played then we're going to get multiple Configuration classes loaded each 
with their own singleton instance.

Bonus points if we can eliminate the requirement that each conf object has to 
separately load all resources after addDefaultResource.  It'd be nice if one 
conf object does the load and we poke it into the others like we poke the 
others when addDefaultResource was called.  Also would be nice if we only 
loaded the newly added resource rather than reloading all resources.  However 
that's really orthogonal to the shared, read-only instance idea and more 
appropriate for a separate JIRA.


> Offer a read-only conf alternative to new Configuration()
> ---------------------------------------------------------
>
>                 Key: HADOOP-11223
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11223
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: conf
>            Reporter: Gopal V
>            Assignee: Varun Saxena
>              Labels: Performance
>         Attachments: HADOOP-11223.001.patch
>
>
> new Configuration() is called from several static blocks across Hadoop.
> This is incredibly inefficient, since each one of those involves primarily 
> XML parsing at a point where the JIT won't be triggered & interpreter mode is 
> essentially forced on the JVM.
> The alternate solution would be to offer a {{Configuration::getDefault()}} 
> alternative which disallows any modifications.
> At the very least, such a method would need to be called from 
> # org.apache.hadoop.io.nativeio.NativeIO::<clinit>()
> # org.apache.hadoop.security.SecurityUtil::<clinit>()
> # org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider::<clinit>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to