On 6/20/2011 12:31 PM, Michael McCandless wrote:
Actually, TieredMP has two different params (different from the
previous default LogMP):
* segmentsPerTier controls how many segments you can tolerate in the
index (bigger number means more segments)
* maxMergeAtOnce says how many segments can be merged at a time for
"normal" (not optimize) merging
For back-compat, mergeFactor maps to both of these, but it's better to
set them directly eg:
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
<int name="maxMergeAtOnce">10</int>
<int name="segmentsPerTier">20</int>
</mergePolicy>
(and then remove your mergeFactor setting under indexDefaults)
You should always have maxMergeAtOnce<= segmentsPerTier else too much
merging will happen.
If you set segmentsPerTier to 35 than this can easily exceed 70
segments, so your optimize will again need more than one merge. Note
that if you make the maxMergeAtOnce/Explicit too large then 1) you
risk running out of file handles (if you don't use compound file), and
2) merge performance likely gets worse as the OS is forced to splinter
its IO cache across more files (I suspect) and so more seeking will
happen.
Thanks much for the information!
I've set my server up so that the user running the index has a soft
limit of 4096 files and a hard limit of 6144 files, and
/proc/sys/fs/file-max is 48409, so I should be OK on file handles. The
index is almost twice as big as available memory, so I'm not really
worried about the I/O cache. I've sized my mergFactor and
ramBufferSizeMB so that the individual merges during indexing happen
entirely from the I/O cache, which is the point where I really care
about it. There's nothing I can do about the optimize without spending
a LOT of money.
I will remove mergeFactor, set maxMergeAtOnce and segmentsPerTier to 35,
and maxMergeAtOnceExplicit to 70. If I ever run into a situation where
it gets beyond 70 segments at any one time, I've probably got bigger
problems than the number of passes my optimize takes, so I'll think
about it then. :) Does that sound reasonable?
Shawn