"Nomerge" has struck me as somewhat uncontrollable. There is also a "balanced" merge policy in the trunk, courtesy of LinkedIn.
On Mon, Jul 19, 2010 at 12:43 PM, Burton-West, Tom <tburt...@umich.edu> wrote: > Hi Ken, > > This is all very dependent on your documents, your indexing setup and your > hardware. Just as an extreme data point, I'll describe our experience. > > We run 5 clients on each of 6 machines to send documents to Solr using the > standard http xml process. Our documents contain about 10 fields, but one > field contains OCR for the full text of a book. The documents are about > 700KB in size. > > Each client sends solr documents to one of 10 solr shards on a round-robin > basis. We are running 5 shards on each of two dedicated indexing machines > each with 144GB of memory and 2 x Quad Core Intel Xeon E5540 2.53GHz > processors (Nehalem). What we generally see is that once the index gets > large enough for significant merging, our producers can send documents to > solr faster than it can index them. > > We suspect that our bottleneck is simply disk I/O for index merging on the > Solr build machines. We are currently experimenting with changing the > maxRAMBufferSize settings and various merge policies/merge factors to see if > we can speed up the Solr end of the indexing process. Since we optimize our > index down to two segments, we are also planning to experiment with using the > "nomerge" merge policy. I hope to have some results to report on our blog > sometime in the next month or so. > > Tom Burton-West > www.hathitrust.org/blogs > > -----Original Message----- > From: kenf_nc [mailto:ken.fos...@realestate.com] > Sent: Sunday, July 18, 2010 8:18 AM > To: solr-user@lucene.apache.org > Subject: Re: indexing best practices > > > No one has done performance analysis? Or has a link to anywhere where it's > been done? > > basically fastest way to get documents into Solr. So many options available, > what's the fastest: > 1) file import (xml, csv) vs DIH vs POSTing > 2) number of concurrent clients 1 vs 10 vs 100 ...is there a diminishing > returns number? > > I have 16 million small (8 to 10 fields, no large text fields) docs that get > updated monthly and 2.5 million largish (20 to 30 fields, a couple html text > fields) that get updated monthly. It currently takes about 20 hours to do a > full import. I would like to cut that down as much as possible. > Thanks, > Ken > -- > View this message in context: > http://lucene.472066.n3.nabble.com/indexing-best-practices-tp973274p976313.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com