"Nomerge" has struck me as somewhat uncontrollable. There is also a
"balanced" merge policy in the trunk, courtesy of LinkedIn.

On Mon, Jul 19, 2010 at 12:43 PM, Burton-West, Tom <tburt...@umich.edu> wrote:
> Hi Ken,
>
> This is all very dependent on your documents, your indexing setup and your 
> hardware. Just as an extreme data point, I'll describe our experience.
>
> We run 5 clients on each of 6 machines to send documents to Solr using the 
> standard http xml process.  Our documents contain about 10 fields, but one 
> field contains OCR for the full text of a book.  The documents are about 
> 700KB in size.
>
> Each client sends solr documents to one of 10 solr shards on a round-robin 
> basis.  We are running 5 shards on each of two dedicated indexing machines 
> each with 144GB of memory and 2 x Quad Core Intel Xeon E5540 2.53GHz 
> processors (Nehalem).  What we generally see is that once the index gets 
> large enough for significant merging, our producers can send documents to 
> solr faster than it can index them.
>
> We suspect that our bottleneck is simply disk I/O for index merging on the 
> Solr build machines.  We are currently experimenting with changing the 
> maxRAMBufferSize settings and various merge policies/merge factors to see if 
> we can speed up the Solr end of the indexing process.   Since we optimize our 
> index down to two segments, we are also planning to experiment with using the 
> "nomerge" merge policy. I hope to have some results to report on our blog 
> sometime in the next  month or so.
>
> Tom Burton-West
> www.hathitrust.org/blogs
>
> -----Original Message-----
> From: kenf_nc [mailto:ken.fos...@realestate.com]
> Sent: Sunday, July 18, 2010 8:18 AM
> To: solr-user@lucene.apache.org
> Subject: Re: indexing best practices
>
>
> No one has done performance analysis? Or has a link to anywhere where it's
> been done?
>
> basically fastest way to get documents into Solr. So many options available,
> what's the fastest:
> 1) file import (xml, csv)  vs  DIH  vs POSTing
> 2) number of concurrent clients   1   vs 10 vs 100 ...is there a diminishing
> returns number?
>
> I have 16 million small (8 to 10 fields, no large text fields) docs that get
> updated monthly and 2.5 million largish (20 to 30 fields, a couple html text
> fields) that get updated monthly. It currently takes about 20 hours to do a
> full import. I would like to cut that down as much as possible.
> Thanks,
> Ken
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/indexing-best-practices-tp973274p976313.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to