bq: where I see that the number of deleted documents just keeps on growing and growing, but they never seem to be deleted
This shouldn't be happening. The default TieredMergePolicy weights segments to be merged (which happens automatically) heavily as per the percentage of deleted docs. Here's a great visualization: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html It may be that when you say "growing and growing", that the number of deleted docs hasn't reached the threshold where they get merged away. Please specify "growing and growing", Until it gets to 15% or more of the total then I'd start to worry. And then only if it kept growing after that. To your questions: 1> This is automatic. It'll "just happen", but you will probably always carry some deleted docs around in your index. 2> You always need at least as much free space as your index occupies on disk. In the worst case of normal merging, _all_ the segments will be merged and they're copied first. Once that's successful, then the original is deleted. 3> Not really. Normally there should be no need. 4> True, but usually the effect is so minuscule that nobody notices. People spend endless time obsessing about this and unless and until you can show that your _users_ notice, I'd ignore it. Best, Erick On Tue, Mar 29, 2016 at 8:16 AM, Jostein Elvaker Haande <jehaa...@gmail.com> wrote: > Hello everyone, > > I apologise beforehand if this is a question that has been visited > numerous times on this list, but after hours spent on Google and > talking to SOLR savvy people on #solr @ Freenode I'm still a bit at a > loss about SOLR and deleted documents. > > I have quite a few indexes in both production and development > environments, where I see that the number of deleted documents just > keeps on growing and growing, but they never seem to be deleted. From > my understanding, this can be controller in the merge policy set for > the current core, but I've not been able to find any specifics on the > topic. > > The general consensus on most search hits I've found is to perform an > optimize of the core, however this is both an expensive operation, > both in terms of CPU cycles as well as disk I/O, and also requires you > to have anywhere from 2 times to 3 times the size of the index > available on disk to be guaranteed to complete fully. Given these > criteria, it's often not something that is a viable option in certain > environments, both to it being a resource hog and often that you just > don't have the needed available disk space to perform the optimize. > > After having spoken with a couple of people on IRC (thanks tokee and > elyograg), I was made aware of an optional parameter for <commit> > called 'expungeDeletes' that can explicitly make sure that deleted > documents are deleted from the index, i.e: > > curl http://localhost:8983/solr/coreName/update -H "Content-Type: > text/xml" --data-binary '<commit expungeDeletes="true"/>' > > Now my questions are as follows: > > 1) How can I make sure that this is dealt with in my merge policy, if > at all possible? > 2) I've tried to find some disk space guidelines for 'expungeDeletes', > however I've not been able to find any. What are the general > guidelines here? Does it require as much space as an optimize, or is > it less "aggressive" compared to an optimize? > 3) Is 'expungeDeletes' the recommended method to make sure your > deleted documents are actually removed from the index, or should you > deal with this in your merge policy? > 4) I have also heard from talks on #SOLR that deleted documents has an > impact on the relevancy of performed searches. Is this correct, or > just misinformation? > > If you require any additional information, like snippets from my > configuration (solrconfig.xml), I'm more than happy to provide this. > > Again, if this is an issue that's being revisited for the Nth time, I > apologize, I'm just trying to get my head around this with my somewhat > limited SOLR knowledge. > > -- > Yours sincerely Jostein Elvaker Haande > "A free society is a society where it is safe to be unpopular" > - Adlai Stevenson > > http://tolecnal.net -- tolecnal at tolecnal dot net