Re: solr 4.7.2 mergeFactor/ Merge policy issue

Dmitry Kan Mon, 16 Mar 2015 01:17:51 -0700

Hi,

I can confirm similar behaviour, but for solr 4.3.1. We use default values
for merge related settings. Even though mergeFactor=10
by default, there are 13 segments in one core and 30 segments in another. I
am not sure it proves there is a bug in the merging, because it depends on
the TieredMergePolicy. Relevant discussion from the past:
http://lucene.472066.n3.nabble.com/TieredMergePolicy-reclaimDeletesWeight-td4071487.html
Apart from other policy parameters you could play with ReclaimDeletesWeight,
in case you'd like to affect on merging the segments with deletes in them.
See
http://stackoverflow.com/questions/18361300/informations-about-tieredmergepolicy



Regarding your attachment: I believe it got cut by the mailing list system,
could you share it via a file sharing system?

On Sat, Mar 14, 2015 at 7:36 AM, Summer Shire <shiresum...@gmail.com> wrote:

> Hi All,
>
> Did anyone get a chance to look at my config and the InfoStream File ?
>
> I am very curious to see what you think
>
> thanks,
> Summer
>
> > On Mar 6, 2015, at 5:20 PM, Summer Shire <shiresum...@gmail.com> wrote:
> >
> > Hi All,
> >
> > Here’s more update on where I am at with this.
> > I enabled infoStream logging and quickly figured that I need to get rid
> of maxBufferedDocs. So Erick you
> > were absolutely right on that.
> > I increased my ramBufferSize to 100MB
> > and reduced maxMergeAtOnce to 3 and segmentsPerTier to 3 as well.
> > My config looks like this
> >
> > <indexConfig>
> >    <useCompoundFile>false</useCompoundFile>
> >    <ramBufferSizeMB>100</ramBufferSizeMB>
> >
> >
> <!--<maxMergeSizeForForcedMerge>9223372036854775807</maxMergeSizeForForcedMerge>-->
> >    <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
> >      <int name="maxMergeAtOnce">3</int>
> >      <int name="segmentsPerTier">3</int>
> >    </mergePolicy>
> >    <mergeScheduler
> class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
> >    <infoStream file=“/tmp/INFOSTREAM.txt”>true</infoStream>
> >  </indexConfig>
> >
> > I am attaching a sample infostream log file.
> > In the infoStream logs though you an see how the segments keep on adding
> > and it shows (just an example )
> > allowedSegmentCount=10 vs count=9 (eligible count=9) tooBigCount=0
> >
> > I looked at TieredMergePolicy.java to see how allowedSegmentCount is
> getting calculated
> > // Compute max allowed segs in the index
> >    long levelSize = minSegmentBytes;
> >    long bytesLeft = totIndexBytes;
> >    double allowedSegCount = 0;
> >    while(true) {
> >      final double segCountLevel = bytesLeft / (double) levelSize;
> >      if (segCountLevel < segsPerTier) {
> >        allowedSegCount += Math.ceil(segCountLevel);
> >        break;
> >      }
> >      allowedSegCount += segsPerTier;
> >      bytesLeft -= segsPerTier * levelSize;
> >      levelSize *= maxMergeAtOnce;
> >    }
> >    int allowedSegCountInt = (int) allowedSegCount;
> > and the minSegmentBytes is calculated as follows
> > // Compute total index bytes & print details about the index
> >    long totIndexBytes = 0;
> >    long minSegmentBytes = Long.MAX_VALUE;
> >    for(SegmentInfoPerCommit info : infosSorted) {
> >      final long segBytes = size(info);
> >      if (verbose()) {
> >        String extra = merging.contains(info) ? " [merging]" : "";
> >        if (segBytes >= maxMergedSegmentBytes/2.0) {
> >          extra += " [skip: too large]";
> >        } else if (segBytes < floorSegmentBytes) {
> >          extra += " [floored]";
> >        }
> >        message("  seg=" + writer.get().segString(info) + " size=" +
> String.format(Locale.ROOT, "%.3f", segBytes/1024/1024.) + " MB" + extra);
> >      }
> >
> >      minSegmentBytes = Math.min(segBytes, minSegmentBytes);
> >      // Accum total byte size
> >      totIndexBytes += segBytes;
> >    }
> >
> >
> > any input is welcome.
> >
> > <myinfoLog.rtf>
> >
> >
> > thanks,
> > Summer
> >
> >
> >> On Mar 5, 2015, at 8:11 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> >>
> >> I would, BTW, either just get rid of the <maxBufferedDocs> all together
> or
> >> make it much higher, i.e. 100000. I don't think this is really your
> >> problem, but you're creating a lot of segments here.
> >>
> >> But I'm kind of at a loss as to what would be different about your
> setup.
> >> Is there _any_ chance that you have some secondary process looking at
> >> your index that's maintaining open searchers? Any custom code that's
> >> perhaps failing to close searchers? Is this a Unix or Windows system?
> >>
> >> And just to be really clear, you _only_ seeing more segments being
> >> added, right? If you're only counting files in the index directory, it's
> >> _possible_ that merging is happening, you're just seeing new files take
> >> the place of old ones.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey <apa...@elyograg.org>
> wrote:
> >>> On 3/4/2015 4:12 PM, Erick Erickson wrote:
> >>>> I _think_, but don't know for sure, that the merging stuff doesn't get
> >>>> triggered until you commit, it doesn't "just happen".
> >>>>
> >>>> Shot in the dark...
> >>>
> >>> I believe that new segments are created when the indexing buffer
> >>> (ramBufferSizeMB) fills up, even without commits.  I'm pretty sure that
> >>> anytime a new segment is created, the merge policy is checked to see
> >>> whether a merge is needed.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >
>
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: solr 4.7.2 mergeFactor/ Merge policy issue

Reply via email to