Hi Ken, This is all very dependent on your documents, your indexing setup and your hardware. Just as an extreme data point, I'll describe our experience.
We run 5 clients on each of 6 machines to send documents to Solr using the standard http xml process. Our documents contain about 10 fields, but one field contains OCR for the full text of a book. The documents are about 700KB in size. Each client sends solr documents to one of 10 solr shards on a round-robin basis. We are running 5 shards on each of two dedicated indexing machines each with 144GB of memory and 2 x Quad Core Intel Xeon E5540 2.53GHz processors (Nehalem). What we generally see is that once the index gets large enough for significant merging, our producers can send documents to solr faster than it can index them. We suspect that our bottleneck is simply disk I/O for index merging on the Solr build machines. We are currently experimenting with changing the maxRAMBufferSize settings and various merge policies/merge factors to see if we can speed up the Solr end of the indexing process. Since we optimize our index down to two segments, we are also planning to experiment with using the "nomerge" merge policy. I hope to have some results to report on our blog sometime in the next month or so. Tom Burton-West www.hathitrust.org/blogs -----Original Message----- From: kenf_nc [mailto:ken.fos...@realestate.com] Sent: Sunday, July 18, 2010 8:18 AM To: solr-user@lucene.apache.org Subject: Re: indexing best practices No one has done performance analysis? Or has a link to anywhere where it's been done? basically fastest way to get documents into Solr. So many options available, what's the fastest: 1) file import (xml, csv) vs DIH vs POSTing 2) number of concurrent clients 1 vs 10 vs 100 ...is there a diminishing returns number? I have 16 million small (8 to 10 fields, no large text fields) docs that get updated monthly and 2.5 million largish (20 to 30 fields, a couple html text fields) that get updated monthly. It currently takes about 20 hours to do a full import. I would like to cut that down as much as possible. Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-best-practices-tp973274p976313.html Sent from the Solr - User mailing list archive at Nabble.com.