I'm not 100% sure that playing with maxSegments will work.

what will work is to re-index everything. You can re-index into the
existing collection, no need to start with a new collection. Eventually
you'll replace enough docs in the over-sized segments that they'll fall
under the 2.5G live documents limit and be merged away. Not elegant, but
it'd work.

Best,
Erick

On Fri, Jan 5, 2018 at 6:46 AM, Markus Jelsma <markus.jel...@openindex.io>
wrote:

> It could be that when this index was first reconstructed, it was optimized
> to one segment before packed and shipped.
>
> How about optimizing it again, with maxSegments set to ten, it should
> recover right?
>
> -----Original message-----
> > From:Shawn Heisey <apa...@elyograg.org>
> > Sent: Friday 5th January 2018 14:34
> > To: solr-user@lucene.apache.org
> > Subject: Re: Very high number of deleted docs, part 2
> >
> > On 1/5/2018 5:33 AM, Markus Jelsma wrote:
> > > Another collection, now on 7.1, also shows this problem and has
> default TMP settings. This time size is different, each shard of this
> collection is over 40 GB, and each shard has about 50 % deleted documents.
> Each shard's largest segment is just under 20 GB with about 75 % deleted
> documents. After that are a few five/six GB segments with just under 50 %
> deleted documents.
> > >
> > > What do i need to change to make Lucene believe that at least that
> twenty GB and three month old segment should be merged away. And how what
> would the predicted indexing performance penalty be?
> >
> > Quick answer: Erick's statements in the previous thread can be
> > summarized as this:  On large indexes that do a lot of deletes or
> > updates, once you do an optimize, you have to continue to do optimizes
> > regularly, or you're going to have this problem.
> >
> > TL;DR:
> >
> > I think Erick covered most of this (possibly all of it) in the previous
> > thread.
> >
> > If you've got a 20GB segment and TMP's settings are default, then that
> > means at some point in the past, you've done an optimize.  The default
> > TMP settings have a maximum segment size of 5GB, so if you never
> > optimize, then there will never be a segment larger than 5GB, and the
> > deleted document percentage would be less likely to get out of control.
> > The optimize operation ignores the maximum segment size and reduces the
> > index to a single large segment with zero deleted docs.
> >
> > TMP's behavior with really big segments is apparently completely as the
> > author intended, but this specific problem wasn't ever addressed.
> >
> > If you do an optimize once and then don't ever do it again, any very
> > large segments are going to be vulnerable to this problem, and the only
> > way (currently) to fix it is to do another optimize.
> >
> > See this issue for a more in-depth discussion and an attempt to figure
> > out how to avoid it:
> >
> > https://issues.apache.org/jira/browse/LUCENE-7976
> >
> > Thanks,
> > Shawn
> >
> >
>

Reply via email to