rapid updates aren't the cause of a large percentage of deleted documents. See the JIRA I referenced for the probable cause: https://issues.apache.org/jira/browse/LUCENE-7976
If my suspicion is correct you'll see one or more of your segments occupy way more than 5G. Assuming my suspicion is correct, you have to either periodically optimize/forceMerge or expungeDeletes regularly. At that point, though, you might as well optimize/forceMerge. expungeDeletes would only save you re-writing segments with < 20% deleted docs (at least I think that's the cutoff). Or reindex from scratch and never, never, never forceMerge/optimize or expungeDeletes. Best, Erick On Wed, Oct 4, 2017 at 6:03 AM, Emir Arnautović <emir.arnauto...@sematext.com> wrote: > Hi Markus, > It is passed but not explicitly - it uses reflection to pass arguments - take > a look at parent factory class. > > When it comes to force merging - you have extreme case - 80% is deleted (my > guess frequent updates) and extreme cases require some extreme measures - it > can be either periodic force merge or full reindexing + aliases. > > HTH, > Emir > > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > >> On 4 Oct 2017, at 14:47, Markus Jelsma <markus.jel...@openindex.io> wrote: >> >> Do you mean a periodic forceMerge? That is usually considered a bad habit on >> this list (i agree). It is just that i am actually very surprised this can >> happen at all with default settings. This factory, unfortunately does not >> seem to support settings configured in solrconfig. >> >> Thanks, >> Markus >> >> -----Original message----- >>> From:Amrit Sarkar <sarkaramr...@gmail.com> >>> Sent: Wednesday 4th October 2017 14:42 >>> To: solr-user@lucene.apache.org >>> Subject: Re: Very high number of deleted docs >>> >>> Hi Markus, >>> >>> Emir already mentioned tuning *reclaimDeletesWeight which *affects segments >>> about to merge priority. Optimising index time by time, preferably >>> scheduling weekly / fortnight / ..., at low traffic period to never be in >>> such odd position of 80% deleted docs in total index. >>> >>> Amrit Sarkar >>> Search Engineer >>> Lucidworks, Inc. >>> 415-589-9269 >>> www.lucidworks.com >>> Twitter http://twitter.com/lucidworks >>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 >>> >>> On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović < >>> emir.arnauto...@sematext.com> wrote: >>> >>>> Hi Markus, >>>> You can set reclaimDeletesWeight in merge settings to some higher value >>>> than default (I think it is 2) to favor segments with deleted docs when >>>> merging. >>>> >>>> HTH, >>>> Emir >>>> -- >>>> Monitoring - Log Management - Alerting - Anomaly Detection >>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >>>> >>>> >>>> >>>>> On 4 Oct 2017, at 13:31, Markus Jelsma <markus.jel...@openindex.io> >>>> wrote: >>>>> >>>>> Hello, >>>>> >>>>> Using a 6.6.0, i just spotted one of our collections having a core of >>>> which over 80 % of the total number of documents were deleted documents. >>>>> >>>>> It has <mergePolicyFactory >>>>> class="org.apache.solr.index.TieredMergePolicyFactory"/> >>>> configured with no non-default settings. >>>>> >>>>> Is this supposed to happen? How can i prevent these kind of numbers? >>>>> >>>>> Thanks, >>>>> Markus >>>> >>>> >>> >