Hello - with TieredMergePolicy and default reclaimDeletesWeight of 2.0, and frequent updates, it is not uncommon to see a ratio of 25%. If you want deletes to be reclaimed more often, e.g. weight of 4.0, you will see very frequent merging of large segments, killing performance if you are on spinning disks.
Markus -----Original message----- > From:Erick Erickson <erickerick...@gmail.com> > Sent: Wednesday 30th March 2016 2:50 > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: Deleted documents and expungeDeletes > > bq: where I see that the number of deleted documents just > keeps on growing and growing, but they never seem to be deleted > > This shouldn't be happening. The default TieredMergePolicy weights > segments to be merged (which happens automatically) heavily as per > the percentage of deleted docs. Here's a great visualization: > http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html > > It may be that when you say "growing and growing", that the number of > deleted docs hasn't reached the threshold where they get merged away. > > Please specify "growing and growing", Until it gets to 15% or more of the > total > then I'd start to worry. And then only if it kept growing after that. > > To your questions: > 1> This is automatic. It'll "just happen", but you will probably always carry > some deleted docs around in your index. > > 2> You always need at least as much free space as your index occupies on disk. > In the worst case of normal merging, _all_ the segments will be merged > and they're > copied first. Once that's successful, then the original is deleted. > > 3> Not really. Normally there should be no need. > > 4> True, but usually the effect is so minuscule that nobody notices. > People spend > endless time obsessing about this and unless and until you can show that your > _users_ notice, I'd ignore it. > > Best, > Erick > > On Tue, Mar 29, 2016 at 8:16 AM, Jostein Elvaker Haande > <jehaa...@gmail.com> wrote: > > Hello everyone, > > > > I apologise beforehand if this is a question that has been visited > > numerous times on this list, but after hours spent on Google and > > talking to SOLR savvy people on #solr @ Freenode I'm still a bit at a > > loss about SOLR and deleted documents. > > > > I have quite a few indexes in both production and development > > environments, where I see that the number of deleted documents just > > keeps on growing and growing, but they never seem to be deleted. From > > my understanding, this can be controller in the merge policy set for > > the current core, but I've not been able to find any specifics on the > > topic. > > > > The general consensus on most search hits I've found is to perform an > > optimize of the core, however this is both an expensive operation, > > both in terms of CPU cycles as well as disk I/O, and also requires you > > to have anywhere from 2 times to 3 times the size of the index > > available on disk to be guaranteed to complete fully. Given these > > criteria, it's often not something that is a viable option in certain > > environments, both to it being a resource hog and often that you just > > don't have the needed available disk space to perform the optimize. > > > > After having spoken with a couple of people on IRC (thanks tokee and > > elyograg), I was made aware of an optional parameter for <commit> > > called 'expungeDeletes' that can explicitly make sure that deleted > > documents are deleted from the index, i.e: > > > > curl http://localhost:8983/solr/coreName/update -H "Content-Type: > > text/xml" --data-binary '<commit expungeDeletes="true"/>' > > > > Now my questions are as follows: > > > > 1) How can I make sure that this is dealt with in my merge policy, if > > at all possible? > > 2) I've tried to find some disk space guidelines for 'expungeDeletes', > > however I've not been able to find any. What are the general > > guidelines here? Does it require as much space as an optimize, or is > > it less "aggressive" compared to an optimize? > > 3) Is 'expungeDeletes' the recommended method to make sure your > > deleted documents are actually removed from the index, or should you > > deal with this in your merge policy? > > 4) I have also heard from talks on #SOLR that deleted documents has an > > impact on the relevancy of performed searches. Is this correct, or > > just misinformation? > > > > If you require any additional information, like snippets from my > > configuration (solrconfig.xml), I'm more than happy to provide this. > > > > Again, if this is an issue that's being revisited for the Nth time, I > > apologize, I'm just trying to get my head around this with my somewhat > > limited SOLR knowledge. > > > > -- > > Yours sincerely Jostein Elvaker Haande > > "A free society is a society where it is safe to be unpopular" > > - Adlai Stevenson > > > > http://tolecnal.net -- tolecnal at tolecnal dot net >