Hmmm, OK,  I stand corrected.

This is odd, though. I suspect a quirk in the merging algorithm when
you have a small index..

Ahh, wait. What happens if you modify the segments per tier parameter
of TMP? The default is 10, and perhaps because this is such a small
index you don't have very many like-sized segments to merge after your
periodic run. Setting segs per tier to a much lower number (like 2)
might kick in the background merging. It'll make more I/O during
indexing happen of course.

Best,
Erick

On Wed, Oct 4, 2017 at 7:09 AM, Markus Jelsma
<markus.jel...@openindex.io> wrote:
> No, that collection never receives a forceMerge nor expungeDeletes. Almost 
> all (99.999%) documents are overwritten every 90 minutes.
>
> A single shard has 16k docs (97k total) but is only 300 MB large. Maybe 
> that's a problem there.
>
> I can simply turn a switch to forgeMerge after the periodic update cycle, but 
> i preferred Lucene to do it for me.
>
> Thanks,
> Markus
>
> -----Original message-----
>> From:Erick Erickson <erickerick...@gmail.com>
>> Sent: Wednesday 4th October 2017 14:56
>> To: solr-user <solr-user@lucene.apache.org>
>> Subject: Re: Very high number of deleted docs
>>
>> Did you _ever_ do a forceMerge/optimize or expungeDeletes?
>>
>> Here's the problem TieredMergePolicy (TMP) has a maximum segment size
>> it will allow, 5G by default. No segment is even considered for
>> merging unless it has < 2.5G (or half whatever the default is)
>> non-deleted docs, the logic being that to merge similar size segments,
>> each has to be less than half the max size.
>>
>> However, optimize/forceMerge and expungeDeletes do not have a limit on
>> the segment size. So say you optimize at some point and have a 100G
>> segment. It won't get merged until you have 97.5G worth of deleted
>> docs.
>>
>> More here:
>> https://issues.apache.org/jira/browse/LUCENE-7976
>>
>> Erick
>>
>> On Wed, Oct 4, 2017 at 5:47 AM, Markus Jelsma
>> <markus.jel...@openindex.io> wrote:
>> > Do you mean a periodic forceMerge? That is usually considered a bad habit 
>> > on this list (i agree). It is just that i am actually very surprised this 
>> > can happen at all with default settings. This factory, unfortunately does 
>> > not seem to support settings configured in solrconfig.
>> >
>> > Thanks,
>> > Markus
>> >
>> > -----Original message-----
>> >> From:Amrit Sarkar <sarkaramr...@gmail.com>
>> >> Sent: Wednesday 4th October 2017 14:42
>> >> To: solr-user@lucene.apache.org
>> >> Subject: Re: Very high number of deleted docs
>> >>
>> >> Hi Markus,
>> >>
>> >> Emir already mentioned tuning *reclaimDeletesWeight which *affects 
>> >> segments
>> >> about to merge priority. Optimising index time by time, preferably
>> >> scheduling weekly / fortnight / ..., at low traffic period to never be in
>> >> such odd position of 80% deleted docs in total index.
>> >>
>> >> Amrit Sarkar
>> >> Search Engineer
>> >> Lucidworks, Inc.
>> >> 415-589-9269
>> >> www.lucidworks.com
>> >> Twitter http://twitter.com/lucidworks
>> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >>
>> >> On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović <
>> >> emir.arnauto...@sematext.com> wrote:
>> >>
>> >> > Hi Markus,
>> >> > You can set reclaimDeletesWeight in merge settings to some higher value
>> >> > than default (I think it is 2) to favor segments with deleted docs when
>> >> > merging.
>> >> >
>> >> > HTH,
>> >> > Emir
>> >> > --
>> >> > Monitoring - Log Management - Alerting - Anomaly Detection
>> >> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> >> >
>> >> >
>> >> >
>> >> > > On 4 Oct 2017, at 13:31, Markus Jelsma <markus.jel...@openindex.io>
>> >> > wrote:
>> >> > >
>> >> > > Hello,
>> >> > >
>> >> > > Using a 6.6.0, i just spotted one of our collections having a core of
>> >> > which over 80 % of the total number of documents were deleted documents.
>> >> > >
>> >> > > It has <mergePolicyFactory 
>> >> > > class="org.apache.solr.index.TieredMergePolicyFactory"/>
>> >> > configured with no non-default settings.
>> >> > >
>> >> > > Is this supposed to happen? How can i prevent these kind of numbers?
>> >> > >
>> >> > > Thanks,
>> >> > > Markus
>> >> >
>> >> >
>> >>
>>

Reply via email to