rapid updates aren't the cause of a large percentage of deleted
documents. See the JIRA I referenced for the probable cause:
https://issues.apache.org/jira/browse/LUCENE-7976

If my suspicion is correct you'll see one or more of your segments
occupy way more than 5G. Assuming my suspicion is correct, you have to
either periodically optimize/forceMerge or expungeDeletes regularly.
At that point, though, you might as well optimize/forceMerge.
expungeDeletes would only save you re-writing segments with < 20%
deleted docs (at least I think that's the cutoff).

Or reindex from scratch and never, never, never forceMerge/optimize or
expungeDeletes.

Best,
Erick

On Wed, Oct 4, 2017 at 6:03 AM, Emir Arnautović
<emir.arnauto...@sematext.com> wrote:
> Hi Markus,
> It is passed but not explicitly - it uses reflection to pass arguments - take 
> a look at parent factory class.
>
> When it comes to force merging - you have extreme case - 80% is deleted (my 
> guess frequent updates) and extreme cases require some extreme measures - it 
> can be either periodic force merge or full reindexing + aliases.
>
> HTH,
> Emir
>
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 4 Oct 2017, at 14:47, Markus Jelsma <markus.jel...@openindex.io> wrote:
>>
>> Do you mean a periodic forceMerge? That is usually considered a bad habit on 
>> this list (i agree). It is just that i am actually very surprised this can 
>> happen at all with default settings. This factory, unfortunately does not 
>> seem to support settings configured in solrconfig.
>>
>> Thanks,
>> Markus
>>
>> -----Original message-----
>>> From:Amrit Sarkar <sarkaramr...@gmail.com>
>>> Sent: Wednesday 4th October 2017 14:42
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Very high number of deleted docs
>>>
>>> Hi Markus,
>>>
>>> Emir already mentioned tuning *reclaimDeletesWeight which *affects segments
>>> about to merge priority. Optimising index time by time, preferably
>>> scheduling weekly / fortnight / ..., at low traffic period to never be in
>>> such odd position of 80% deleted docs in total index.
>>>
>>> Amrit Sarkar
>>> Search Engineer
>>> Lucidworks, Inc.
>>> 415-589-9269
>>> www.lucidworks.com
>>> Twitter http://twitter.com/lucidworks
>>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>>
>>> On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović <
>>> emir.arnauto...@sematext.com> wrote:
>>>
>>>> Hi Markus,
>>>> You can set reclaimDeletesWeight in merge settings to some higher value
>>>> than default (I think it is 2) to favor segments with deleted docs when
>>>> merging.
>>>>
>>>> HTH,
>>>> Emir
>>>> --
>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>>
>>>>
>>>>
>>>>> On 4 Oct 2017, at 13:31, Markus Jelsma <markus.jel...@openindex.io>
>>>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> Using a 6.6.0, i just spotted one of our collections having a core of
>>>> which over 80 % of the total number of documents were deleted documents.
>>>>>
>>>>> It has <mergePolicyFactory 
>>>>> class="org.apache.solr.index.TieredMergePolicyFactory"/>
>>>> configured with no non-default settings.
>>>>>
>>>>> Is this supposed to happen? How can i prevent these kind of numbers?
>>>>>
>>>>> Thanks,
>>>>> Markus
>>>>
>>>>
>>>
>

Reply via email to