Hi Brian, I was testing indexing performance on a high cpu box recently and came to the same issue. I tried different indexing methods ( xml, CSVRequestHandler and Solrj + BinaryRequestWriter with multiple threads ). The last method is the fastest indeed. I believe that multiple threads approach gives you better performance if you have complex text analysis. I had very simple analysis - WhitespaceTokenizer only and performance boost with increasing threads was not very impressive ( but still ). I guess that in case of simple text analysis overall performance comes to synchronization issues.
I tried to profile application during indexing phase for CPU times and monitors and it seems that most of blocking is on the following methods: - DocumentsWriter.doBalanceRAM - DocumentsWriter.getThreadState - SolrIndexWriter.ensureOpen I don't know the guts of Solr/Lucene in such details so can't make any conclusions. Are there any configuration techniques to improve indexing performance in multiple threads scenario? Alex On Mon, Apr 26, 2010 at 6:52 PM, Wawok, Brian <brian.wa...@cmegroup.com> wrote: > Hi, > > I was wondering about how the multi-threading of the indexer works? I am > using SolrJ to stream documents to a server. As I add more threads on the > client side, I slowly see both speed and CPU usage go up on the indexer side. > Once I hit about 4 threads, my indexer is at 100% cpu usage (of 1 CPU on a > 4-way box), and will not do any more work. It is pretty fast, doing something > like 75k lines of text per second.. but I would really like to use all 4 CPUs > on the indexer. Is the just a limitation of Solr, or is this a limitation of > using SolrJ and document streaming? > > > Thanks, > > > Brian >