Thank you once again Betsy!

Vincent Vu Nguyen
Division of Science Quality and Translation
Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-6154
Century Bldg 2400
Atlanta, GA 30329 


-----Original Message-----
From: Burton-West, Tom [mailto:tburt...@umich.edu] 
Sent: Tuesday, October 05, 2010 2:41 PM
To: solr-user@lucene.apache.org
Subject: Experience with large merge factors

Hi all,

At some point we will need to re-build an index that totals about 3
terabytes in size (split over 12 shards).  At our current indexing speed
we estimate that this will take about 4 weeks.  We would like to reduce
that time.  It appears that our main bottleneck is disk I/O during index
merging.

Each index is somewhere between 250 and 350GB.  We are currently using a
mergeFactor of 10 and a ramBufferSizeMB of 32MB.  What this means is
that for every approximately 320 MB, 3.2GB,  and 32GB we get merges.  We
are doing this offline and will run an optimize at the end.  What we
would like to do is reduce the number of intermediate merges.   We
thought about just using a nomerge merge policy and then optimizing at
the end, but suspect we would run out of filehandles and that merging
10,000 segments during an optimize might not be efficient.

We would like to find some optimum mergeFactor somewhere between 0
(noMerge merge policy) and 1,000.  (We are also planning to raise the
ramBufferSizeMB significantly).

What experience do others have using a large mergeFactor?

Tom



Reply via email to