While investigating a bug, I found that Solr keeps many Tokenizer objects.
This experimental 80-core Solr 1.4.1 system runs on Tomcat. It was continuously sent indexing requests in parallel, and it eventually died due to OutOfMemory. The heap dump that was taken by the JVM shows there were 14477 Tokenizer objects, or about 180 Tokenizer objects per core, at the time it died. Each core's schema.xml has only 5 Fields that uses this Tokenizer, so I'd think 5 Tokenizer per indexing thread are needed at most. Tomcat at its default configuration can run up to 200 threads. So at most 1000 Tokenizer objects should be enough.
My colleague ran a similar experiment on 10-core Solr 3.6 system, and observed a fewer Tokenizer objects there, but still there are 48 Tokenizers per core.
Why does Solr keep this many Tokenizer objects ? Kuro