On Sun, Jun 19, 2011 at 12:35 PM, Shawn Heisey <s...@elyograg.org> wrote:
> On 6/19/2011 7:32 AM, Michael McCandless wrote:
>>
>> With LogXMergePolicy (the default before 3.2), optimize respects
>> mergeFactor, so it's doing 2 steps because you have 37 segments but 35
>> mergeFactor.
>>
>> With TieredMergePolicy (default on 3.2 and after), there is now a
>> separate merge factor used for optimize (maxMergeAtOnceExplicit)... so
>> you could eg set this factor higher and more often get a single merge
>> for the optimize.
>
> This makes sense.  the default for maxMergeAtOnceExplicit is 30 according to
> LUCENE-854, so it merges the first 30 segments, then it goes back and merges
> the new one plus the other 7 that remain.  To counteract this behavior, I've
> put this in my solrconfig.xml, to test next week.
>
> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
> <int name="maxMergeAtOnceExplicit">70</int>
> </mergePolicy>
>
> I figure that twice the megeFactor (35) will likely cover every possible
> outcome.  Is that a correct thought?

Actually, TieredMP has two different params (different from the
previous default LogMP):

  * segmentsPerTier controls how many segments you can tolerate in the
index (bigger number means more segments)

  * maxMergeAtOnce says how many segments can be merged at a time for
"normal" (not optimize) merging

For back-compat, mergeFactor maps to both of these, but it's better to
set them directly eg:

    <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
      <int name="maxMergeAtOnce">10</int>
      <int name="segmentsPerTier">20</int>
    </mergePolicy>

(and then remove your mergeFactor setting under indexDefaults)

You should always have maxMergeAtOnce <= segmentsPerTier else too much
merging will happen.

If you set segmentsPerTier to 35 than this can easily exceed 70
segments, so your optimize will again need more than one merge.  Note
that if you make the maxMergeAtOnce/Explicit too large then 1) you
risk running out of file handles (if you don't use compound file), and
2) merge performance likely gets worse as the OS is forced to splinter
its IO cache across more files (I suspect) and so more seeking will
happen.

Mike McCandless

http://blog.mikemccandless.com

Reply via email to