Hmmm, OK, I stand corrected. This is odd, though. I suspect a quirk in the merging algorithm when you have a small index..
Ahh, wait. What happens if you modify the segments per tier parameter of TMP? The default is 10, and perhaps because this is such a small index you don't have very many like-sized segments to merge after your periodic run. Setting segs per tier to a much lower number (like 2) might kick in the background merging. It'll make more I/O during indexing happen of course. Best, Erick On Wed, Oct 4, 2017 at 7:09 AM, Markus Jelsma <markus.jel...@openindex.io> wrote: > No, that collection never receives a forceMerge nor expungeDeletes. Almost > all (99.999%) documents are overwritten every 90 minutes. > > A single shard has 16k docs (97k total) but is only 300 MB large. Maybe > that's a problem there. > > I can simply turn a switch to forgeMerge after the periodic update cycle, but > i preferred Lucene to do it for me. > > Thanks, > Markus > > -----Original message----- >> From:Erick Erickson <erickerick...@gmail.com> >> Sent: Wednesday 4th October 2017 14:56 >> To: solr-user <solr-user@lucene.apache.org> >> Subject: Re: Very high number of deleted docs >> >> Did you _ever_ do a forceMerge/optimize or expungeDeletes? >> >> Here's the problem TieredMergePolicy (TMP) has a maximum segment size >> it will allow, 5G by default. No segment is even considered for >> merging unless it has < 2.5G (or half whatever the default is) >> non-deleted docs, the logic being that to merge similar size segments, >> each has to be less than half the max size. >> >> However, optimize/forceMerge and expungeDeletes do not have a limit on >> the segment size. So say you optimize at some point and have a 100G >> segment. It won't get merged until you have 97.5G worth of deleted >> docs. >> >> More here: >> https://issues.apache.org/jira/browse/LUCENE-7976 >> >> Erick >> >> On Wed, Oct 4, 2017 at 5:47 AM, Markus Jelsma >> <markus.jel...@openindex.io> wrote: >> > Do you mean a periodic forceMerge? That is usually considered a bad habit >> > on this list (i agree). It is just that i am actually very surprised this >> > can happen at all with default settings. This factory, unfortunately does >> > not seem to support settings configured in solrconfig. >> > >> > Thanks, >> > Markus >> > >> > -----Original message----- >> >> From:Amrit Sarkar <sarkaramr...@gmail.com> >> >> Sent: Wednesday 4th October 2017 14:42 >> >> To: solr-user@lucene.apache.org >> >> Subject: Re: Very high number of deleted docs >> >> >> >> Hi Markus, >> >> >> >> Emir already mentioned tuning *reclaimDeletesWeight which *affects >> >> segments >> >> about to merge priority. Optimising index time by time, preferably >> >> scheduling weekly / fortnight / ..., at low traffic period to never be in >> >> such odd position of 80% deleted docs in total index. >> >> >> >> Amrit Sarkar >> >> Search Engineer >> >> Lucidworks, Inc. >> >> 415-589-9269 >> >> www.lucidworks.com >> >> Twitter http://twitter.com/lucidworks >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 >> >> >> >> On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović < >> >> emir.arnauto...@sematext.com> wrote: >> >> >> >> > Hi Markus, >> >> > You can set reclaimDeletesWeight in merge settings to some higher value >> >> > than default (I think it is 2) to favor segments with deleted docs when >> >> > merging. >> >> > >> >> > HTH, >> >> > Emir >> >> > -- >> >> > Monitoring - Log Management - Alerting - Anomaly Detection >> >> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> > >> >> > >> >> > >> >> > > On 4 Oct 2017, at 13:31, Markus Jelsma <markus.jel...@openindex.io> >> >> > wrote: >> >> > > >> >> > > Hello, >> >> > > >> >> > > Using a 6.6.0, i just spotted one of our collections having a core of >> >> > which over 80 % of the total number of documents were deleted documents. >> >> > > >> >> > > It has <mergePolicyFactory >> >> > > class="org.apache.solr.index.TieredMergePolicyFactory"/> >> >> > configured with no non-default settings. >> >> > > >> >> > > Is this supposed to happen? How can i prevent these kind of numbers? >> >> > > >> >> > > Thanks, >> >> > > Markus >> >> > >> >> > >> >> >>