Hi all,

At some point we will need to re-build an index that totals about 3 terabytes 
in size (split over 12 shards).  At our current indexing speed we estimate that 
this will take about 4 weeks.  We would like to reduce that time.  It appears 
that our main bottleneck is disk I/O during index merging.

Each index is somewhere between 250 and 350GB.  We are currently using a 
mergeFactor of 10 and a ramBufferSizeMB of 32MB.  What this means is that for 
every approximately 320 MB, 3.2GB,  and 32GB we get merges.  We are doing this 
offline and will run an optimize at the end.  What we would like to do is 
reduce the number of intermediate merges.   We thought about just using a 
nomerge merge policy and then optimizing at the end, but suspect we would run 
out of filehandles and that merging 10,000 segments during an optimize might 
not be efficient.

We would like to find some optimum mergeFactor somewhere between 0 (noMerge 
merge policy) and 1,000.  (We are also planning to raise the ramBufferSizeMB 
significantly).

What experience do others have using a large mergeFactor?

Tom



Reply via email to