No, that collection never receives a forceMerge nor expungeDeletes. Almost all 
(99.999%) documents are overwritten every 90 minutes.

A single shard has 16k docs (97k total) but is only 300 MB large. Maybe that's 
a problem there.

I can simply turn a switch to forgeMerge after the periodic update cycle, but i 
preferred Lucene to do it for me.

Thanks,
Markus
 
-----Original message-----
> From:Erick Erickson <erickerick...@gmail.com>
> Sent: Wednesday 4th October 2017 14:56
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Very high number of deleted docs
> 
> Did you _ever_ do a forceMerge/optimize or expungeDeletes?
> 
> Here's the problem TieredMergePolicy (TMP) has a maximum segment size
> it will allow, 5G by default. No segment is even considered for
> merging unless it has < 2.5G (or half whatever the default is)
> non-deleted docs, the logic being that to merge similar size segments,
> each has to be less than half the max size.
> 
> However, optimize/forceMerge and expungeDeletes do not have a limit on
> the segment size. So say you optimize at some point and have a 100G
> segment. It won't get merged until you have 97.5G worth of deleted
> docs.
> 
> More here:
> https://issues.apache.org/jira/browse/LUCENE-7976
> 
> Erick
> 
> On Wed, Oct 4, 2017 at 5:47 AM, Markus Jelsma
> <markus.jel...@openindex.io> wrote:
> > Do you mean a periodic forceMerge? That is usually considered a bad habit 
> > on this list (i agree). It is just that i am actually very surprised this 
> > can happen at all with default settings. This factory, unfortunately does 
> > not seem to support settings configured in solrconfig.
> >
> > Thanks,
> > Markus
> >
> > -----Original message-----
> >> From:Amrit Sarkar <sarkaramr...@gmail.com>
> >> Sent: Wednesday 4th October 2017 14:42
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Very high number of deleted docs
> >>
> >> Hi Markus,
> >>
> >> Emir already mentioned tuning *reclaimDeletesWeight which *affects segments
> >> about to merge priority. Optimising index time by time, preferably
> >> scheduling weekly / fortnight / ..., at low traffic period to never be in
> >> such odd position of 80% deleted docs in total index.
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>
> >> On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović <
> >> emir.arnauto...@sematext.com> wrote:
> >>
> >> > Hi Markus,
> >> > You can set reclaimDeletesWeight in merge settings to some higher value
> >> > than default (I think it is 2) to favor segments with deleted docs when
> >> > merging.
> >> >
> >> > HTH,
> >> > Emir
> >> > --
> >> > Monitoring - Log Management - Alerting - Anomaly Detection
> >> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >> >
> >> >
> >> >
> >> > > On 4 Oct 2017, at 13:31, Markus Jelsma <markus.jel...@openindex.io>
> >> > wrote:
> >> > >
> >> > > Hello,
> >> > >
> >> > > Using a 6.6.0, i just spotted one of our collections having a core of
> >> > which over 80 % of the total number of documents were deleted documents.
> >> > >
> >> > > It has <mergePolicyFactory 
> >> > > class="org.apache.solr.index.TieredMergePolicyFactory"/>
> >> > configured with no non-default settings.
> >> > >
> >> > > Is this supposed to happen? How can i prevent these kind of numbers?
> >> > >
> >> > > Thanks,
> >> > > Markus
> >> >
> >> >
> >>
> 

Reply via email to