Well, maxSegments with optimize or commit with expungeDeletes did not do the 
job in testing. But tell me more about 2.5G live documents limit, no idea what 
it is.

Thanks,
Markus 
 
-----Original message-----
> From:Erick Erickson <erickerick...@gmail.com>
> Sent: Friday 5th January 2018 17:56
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Very high number of deleted docs, part 2
> 
> I'm not 100% sure that playing with maxSegments will work.
> 
> what will work is to re-index everything. You can re-index into the
> existing collection, no need to start with a new collection. Eventually
> you'll replace enough docs in the over-sized segments that they'll fall
> under the 2.5G live documents limit and be merged away. Not elegant, but
> it'd work.
> 
> Best,
> Erick
> 
> On Fri, Jan 5, 2018 at 6:46 AM, Markus Jelsma <markus.jel...@openindex.io>
> wrote:
> 
> > It could be that when this index was first reconstructed, it was optimized
> > to one segment before packed and shipped.
> >
> > How about optimizing it again, with maxSegments set to ten, it should
> > recover right?
> >
> > -----Original message-----
> > > From:Shawn Heisey <apa...@elyograg.org>
> > > Sent: Friday 5th January 2018 14:34
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Very high number of deleted docs, part 2
> > >
> > > On 1/5/2018 5:33 AM, Markus Jelsma wrote:
> > > > Another collection, now on 7.1, also shows this problem and has
> > default TMP settings. This time size is different, each shard of this
> > collection is over 40 GB, and each shard has about 50 % deleted documents.
> > Each shard's largest segment is just under 20 GB with about 75 % deleted
> > documents. After that are a few five/six GB segments with just under 50 %
> > deleted documents.
> > > >
> > > > What do i need to change to make Lucene believe that at least that
> > twenty GB and three month old segment should be merged away. And how what
> > would the predicted indexing performance penalty be?
> > >
> > > Quick answer: Erick's statements in the previous thread can be
> > > summarized as this:  On large indexes that do a lot of deletes or
> > > updates, once you do an optimize, you have to continue to do optimizes
> > > regularly, or you're going to have this problem.
> > >
> > > TL;DR:
> > >
> > > I think Erick covered most of this (possibly all of it) in the previous
> > > thread.
> > >
> > > If you've got a 20GB segment and TMP's settings are default, then that
> > > means at some point in the past, you've done an optimize.  The default
> > > TMP settings have a maximum segment size of 5GB, so if you never
> > > optimize, then there will never be a segment larger than 5GB, and the
> > > deleted document percentage would be less likely to get out of control.
> > > The optimize operation ignores the maximum segment size and reduces the
> > > index to a single large segment with zero deleted docs.
> > >
> > > TMP's behavior with really big segments is apparently completely as the
> > > author intended, but this specific problem wasn't ever addressed.
> > >
> > > If you do an optimize once and then don't ever do it again, any very
> > > large segments are going to be vulnerable to this problem, and the only
> > > way (currently) to fix it is to do another optimize.
> > >
> > > See this issue for a more in-depth discussion and an attempt to figure
> > > out how to avoid it:
> > >
> > > https://issues.apache.org/jira/browse/LUCENE-7976
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
> 

Reply via email to