Hi,
On 6/8/07, Enzo Michelangeli <[EMAIL PROTECTED]> wrote:
> I have here a partial explanation of what's going wrong and causes multiple
> loading of plugin classes, but its core root still mystifies me.
>
> I traced back the multiple classloading of Plugin classes to the following
> chain of events in the org.apache.nutch.plugin package:
>
> 1. The method parsePlugin() of PluginManifestParser is unduly called more
> than once for each plugin (every time from a different instance of
> PluginManifestParser), instead of being called only once per plugin.
> 2. This causes the creation of multiple instances of PluginDescriptor
> objects, each of which instantiates its own copy of PluginClassLoader
> 3. With multiple copies of classloader, there are also multiple loadings of
> a same plugin class.
>
> The method parsePlugin() is only called (indirectly) in the constructor of
> PluginManifestParser, so this means that there are multiple instances of
> PluginManifestParser. But this object is only instantiated by the
> constructor of PluginRepository , and the latter is only invoked in its
> static get() method:
>
> public static synchronized PluginRepository get(Configuration conf) {
> PluginRepository result = CACHE.get(conf);
> if (result == null) {
> result = new PluginRepository(conf);
> CACHE.put(conf, result);
> }
> return result;
> }
>
> First of all, I see one problem: CACHE is defined as WeakHashMap, which is
> not guaranteed to hold data when the garbage collector kicks into action --
> whereas we definitely want to avoid multiple instances of PluginClassLoader
> even if memory is tight. So, I think that we should change CACHE's class
> type into, say, a Hashtable (which has also synchronized methods, just in
> case). However, it turns out that this is not the immediate cause of the
> problem, because I made the change to Hashtable but the problem continued to
> occur.
>
> What actually happens is that _sometimes_ CACHE.get() returns null for no
> apparent reason! Logging the content of CACHE shows that, in these cases,
> keys and values, as well as the parameter passed to CACHE.get(), are
> perfectly valid, identical to the ones in the few previous calls. In other
> words, the (key, value) pair exists and the key is identical to the
> parameter passed to CACHE.get(), and nevertheless CACHE.get() occasionally
> returns null.
>
> After that strange event happens, the log shows that the hashcode of the
> CACHE object becomes different. This is another thing that puzzles me,
> because CACHE is a static member of PluginRepository and it should only be
> initialized at class load time; and the object referenced by it CACHE should
> not be garbage-collected. Furthermore, my logs show that the class
> PluginRepository is only loaded once, because its hashCode remains unchanged
> throughout the run.
>
> Anyway, here is how I instrumented the get() factory method of
> PluginRepository :
>
> /**
> * @return a cached instance of the plugin repository
> */
> public static synchronized PluginRepository get(Configuration conf) {
> PluginRepository result = CACHE.get(conf);
> /* --- start debug code */
> LOG.info("In thread "+Thread.currentThread()+
> " a static method of the class "+
> (new CurrentClassGetter()).getCurrentClass()+
> " called CACHE.get("+conf+
> "), where CACHE is "+CACHE+
> " and CACHE.hashCode() = "+CACHE.hashCode()+
> " - got result = "+result);
> /* end debug code --- */
> if (result == null) {
> result = new PluginRepository(conf);
> CACHE.put(conf, result);
> }
> return result;
> }
>
> /* --- start debug code */
> // detect if the current class is loaded more than
> // once by showing its hashCode
> public static class CurrentClassGetter extends SecurityManager {
> public String getCurrentClass() {
> Class cl = super.getClassContext()[1];
> return cl.toString() + "@" + cl.hashCode();
> }
> }
> /* end debug code --- */
>
>
> ...and here is what is logged by that code (my comments to each previous
> line are marked with "==>"):
>
> 2007-06-08 10:14:25,223 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {} and
> CACHE.hashCode() = 0 - got result = null
>
> ==> Getting null from CACHE.get here is right, because it was the first
> time -- I would have expected a hashcode different from zero, but perhaps
> the compiler perform lazy initialization of static members.
>
> 2007-06-08 10:14:30,100 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 11660238 - got result =
> [EMAIL PROTECTED]
>
> ==> OK so far...
>
> 2007-06-08 10:14:30,200 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 11660238 - got result =
> [EMAIL PROTECTED]
>
> ==> OK so far...
>
> 2007-06-08 10:14:31,121 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 11660238 - got result =
> [EMAIL PROTECTED]
>
> ==> OK so far...
>
> 2007-06-08 10:14:31,121 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 11660238 - got result =
> [EMAIL PROTECTED]
>
> ==> OK so far...
>
> 2007-06-08 10:14:33,084 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 11660238 - got result = null
>
> ==> AHA! Why are we this time getting null from CACHE.get(conf), if both
> content of CACHE and value of conf are exactly the same as before?? And note
> that the class PluginRepository has the same hashCode etc.
>
> 2007-06-08 10:14:33,384 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 25606144 - got result =
> [EMAIL PROTECTED]
>
> ==> And here we can see that the hashcode of CACHE has already changed
> (weird for a member allocated by the initialization of a static member of a
> class that is loaded just once... (and it is, because the hashCode of the
> class PluginRepository is the same as before). Of course, the hashcode of
> the PluginRepository _object_ has now changed because the cache miss has
> triggered a new call to its constructor, which in turn has cused the
> instantiation of new copies of PluginManifestParser, of PluginDescriptor, of
> PluginClassLoader and finally of the plugin classes -- which was what we
> didn't like in first place. This pattern occurs a few times (see the rest of
> the log below):
>
> 2007-06-08 10:14:33,384 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 25606144 - got result =
> [EMAIL PROTECTED]
>
> 2007-06-08 10:14:33,455 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 25606144 - got result =
> [EMAIL PROTECTED]
>
> 2007-06-08 10:14:33,455 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 25606144 - got result =
> [EMAIL PROTECTED]
>
> 2007-06-08 10:14:33,765 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-
> ADMIN/mapred/local/localRunner/job_23zna7.xml , mapred-default.xmlfinal:
> hadoop-site.xml), where CACHE is {Configuration: defaults:
> hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 25606144 - got result = null
>
> 2007-06-08 00:03:48,201 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] called CACHE.get(Configuration: defaults:
> hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_dvpvjj.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_dvpvjj.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_dvpvjj.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 37253945 - got result = null
>
> ==> The record above shows another occurrence of a null value returned, for
> no apparent reason, by PluginRepository.get().
>
> 2007-06-08 10:14:33,985 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 40727333 - got result =
> [EMAIL PROTECTED]
>
> 2007-06-08 10:14:33,985 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 40727333 - got result =
> [EMAIL PROTECTED]
>
> 2007-06-08 10:14:34,055 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 40727333 - got result =
> [EMAIL PROTECTED]
>
> 2007-06-08 10:14:34,055 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 40727333 - got result =
> [EMAIL PROTECTED]
>
> 2007-06-08 10:14:37,881 INFO plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> [EMAIL PROTECTED] called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED],
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> [EMAIL PROTECTED] and
> CACHE.hashCode() = 40727333 - got result = null
>
> ==> And the one above is the last occurrence.
>
> Any idea?
This will certainly help a lot. If it is not too much trouble, can you
add debug outputs for hashCodes of conf objects (both for the one in
the cache and for the parameter, because it seems Configuration object
is created more than once so their hashCode may be different, which in
turn causes the change in CACHE's hashCode(*)) and a stack trace? A
stack trace of depth 2-3 will probably suffice, I am just wondering
what is calling PluginRepository.get(conf).
(*) CACHE's hashCode is calculated like this:
int h = 0;
Iterator<Entry<K,V>> i = entrySet().iterator();
while (i.hasNext())
h += i.next().hashCode();
return h;
So if configuration's hashCode changes, CACHE's hashCode also changes.
Thanks for the detailed analysis!
>
> Enzo
>
--
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general