Make sure you fully digest Mike McCandless' blog post on segment merge
before trying to outguess his code:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Generally, I don't think you would want to merge just two segments.
Generally, you should do a bunch at a time, typically 10. IOW, take all the
segments on a tier and merge them into one segment at the next tier.

There is no documented practical upper limit for how big to make a single
segment, but very large segments are not likely to be optimized well in
Lucene, hence the default max merge size of 5GB. If you want to get a lot
above that, you're in uncharted territory. Besides, if you start pushing
your index well above the amount of available system memory your query
performance will suffer. I'd watch for the latter before pushing on the
former.


-- Jack Krupansky

On Sun, Jan 31, 2016 at 10:43 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
wrote:

> Thanks for your reply Shawn and Jack.
>
> I wanted to increase the segment size to 15GB, so that there will be lesser
> segments to search for during the query, which should potentially improve
> the query speed.
>
> What if I set the segment size to 20GB? Will all the existing 10GB segments
> be merge to 20GB, as now merging two 10GB segments will results in a 20GB
> segment?
>
> Regards,
> Edwin
>
>
> On 31 January 2016 at 12:16, Jack Krupansky <jack.krupan...@gmail.com>
> wrote:
>
> > From the Lucene MergePolicy Javadoc:
> >
> > "Whenever the segments in an index have been altered by IndexWriter
> > <
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/IndexWriter.html
> > >,
> > either the addition of a newly flushed segment, addition of many segments
> > from addIndexes* calls, or a previous merge that may now need to cascade,
> > IndexWriter
> > <
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/IndexWriter.html
> > >
> >  invokes findMerges(org.apache.lucene.index.MergeTrigger,
> > org.apache.lucene.index.SegmentInfos,
> org.apache.lucene.index.IndexWriter)
> > <
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.html#findMerges(org.apache.lucene.index.MergeTrigger
> > ,
> > org.apache.lucene.index.SegmentInfos,
> > org.apache.lucene.index.IndexWriter)> to
> > give the MergePolicy a chance to pick merges that are now required. This
> > method returns a MergePolicy.MergeSpecification
> > <
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.MergeSpecification.html
> > >
> > instance
> > describing the set of merges that should be done, or null if no merges
> are
> > necessary. When IndexWriter.forceMerge is called, it calls
> > findForcedMerges(SegmentInfos,int,Map,
> > IndexWriter)
> > <
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.html#findForcedMerges(org.apache.lucene.index.SegmentInfos
> > ,
> > int, java.util.Map, org.apache.lucene.index.IndexWriter)> and the
> > MergePolicy should then return the necessary merges."
> >
> > See:
> >
> >
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/MergePolicy.html
> >
> > IOW, when the next commit occurs that closes and flushes the currently
> open
> > segment.
> >
> > Nothing will happen to any existing 10GB segments, now or ever in the
> > future since merging two 10GB segments would not be possible with a limit
> > of only 15GB.
> >
> > Maybe you could clue us in as to what effect you are trying to achieve. I
> > mean, why should any app care whether segments are 10GB or 15GB?
> >
> >
> > -- Jack Krupansky
> >
> > On Sat, Jan 30, 2016 at 6:28 PM, Shawn Heisey <apa...@elyograg.org>
> wrote:
> >
> > > On 1/30/2016 7:31 AM, Zheng Lin Edwin Yeo wrote:
> > > > I would like to find out, when I increase the maxMergedSegmentMB from
> > > 10240
> > > > (10GB) to 15360 (15GB), will all the 10GB segments that were created
> > > > previously be automatically merge to 15GB?
> > >
> > > Not necessarily.  It will make those 10GB+ segments eligible for
> further
> > > merging, whereas they would have been ineligible before the change.
> > >
> > > This might mean that one or more of those large segments will be merged
> > > soon after the change and restart/reload, but I do not know when it
> > > might happen.  It would probably wait until at least one new segment
> was
> > > created, at which time the merge policy would be consulted.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>

Reply via email to