Why does Solr (1.4.1) keep so many Tokenizer objects?

T. Kuro Kurosaka Sat, 08 Sep 2012 00:46:31 -0700

While investigating a bug, I found that Solr keeps many Tokenizer objects.

This experimental 80-core Solr 1.4.1 system runs on Tomcat. It wascontinuously sent indexing requests in parallel, and it eventually dieddue to OutOfMemory.The heap dump that was taken by the JVM shows there were 14477 Tokenizerobjects, or about 180 Tokenizer objects per core, at the time it died.Each core's schema.xml has only 5 Fields that uses this Tokenizer, soI'd think 5 Tokenizer per indexing thread are needed at most.Tomcat at its default configuration can run up to 200 threads. So atmost 1000 Tokenizer objects should be enough.

My colleague ran a similar experiment on 10-core Solr 3.6 system, andobserved a fewer Tokenizer objects there, but still there are 48Tokenizers per core.

Why does Solr keep this many Tokenizer objects ?

Kuro

Why does Solr (1.4.1) keep so many Tokenizer objects?

Reply via email to