Did you _ever_ do a forceMerge/optimize or expungeDeletes?

Here's the problem TieredMergePolicy (TMP) has a maximum segment size
it will allow, 5G by default. No segment is even considered for
merging unless it has < 2.5G (or half whatever the default is)
non-deleted docs, the logic being that to merge similar size segments,
each has to be less than half the max size.

However, optimize/forceMerge and expungeDeletes do not have a limit on
the segment size. So say you optimize at some point and have a 100G
segment. It won't get merged until you have 97.5G worth of deleted
docs.

More here:
https://issues.apache.org/jira/browse/LUCENE-7976

Erick

On Wed, Oct 4, 2017 at 5:47 AM, Markus Jelsma
<markus.jel...@openindex.io> wrote:
> Do you mean a periodic forceMerge? That is usually considered a bad habit on 
> this list (i agree). It is just that i am actually very surprised this can 
> happen at all with default settings. This factory, unfortunately does not 
> seem to support settings configured in solrconfig.
>
> Thanks,
> Markus
>
> -----Original message-----
>> From:Amrit Sarkar <sarkaramr...@gmail.com>
>> Sent: Wednesday 4th October 2017 14:42
>> To: solr-user@lucene.apache.org
>> Subject: Re: Very high number of deleted docs
>>
>> Hi Markus,
>>
>> Emir already mentioned tuning *reclaimDeletesWeight which *affects segments
>> about to merge priority. Optimising index time by time, preferably
>> scheduling weekly / fortnight / ..., at low traffic period to never be in
>> such odd position of 80% deleted docs in total index.
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović <
>> emir.arnauto...@sematext.com> wrote:
>>
>> > Hi Markus,
>> > You can set reclaimDeletesWeight in merge settings to some higher value
>> > than default (I think it is 2) to favor segments with deleted docs when
>> > merging.
>> >
>> > HTH,
>> > Emir
>> > --
>> > Monitoring - Log Management - Alerting - Anomaly Detection
>> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> >
>> >
>> >
>> > > On 4 Oct 2017, at 13:31, Markus Jelsma <markus.jel...@openindex.io>
>> > wrote:
>> > >
>> > > Hello,
>> > >
>> > > Using a 6.6.0, i just spotted one of our collections having a core of
>> > which over 80 % of the total number of documents were deleted documents.
>> > >
>> > > It has <mergePolicyFactory 
>> > > class="org.apache.solr.index.TieredMergePolicyFactory"/>
>> > configured with no non-default settings.
>> > >
>> > > Is this supposed to happen? How can i prevent these kind of numbers?
>> > >
>> > > Thanks,
>> > > Markus
>> >
>> >
>>

Reply via email to