Hello - with TieredMergePolicy and default reclaimDeletesWeight of 2.0, and 
frequent updates, it is not uncommon to see a ratio of 25%. If you want deletes 
to be reclaimed more often, e.g. weight of 4.0, you will see very frequent 
merging of large segments, killing performance if you are on spinning disks.

Markus

 
 
-----Original message-----
> From:Erick Erickson <erickerick...@gmail.com>
> Sent: Wednesday 30th March 2016 2:50
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Deleted documents and expungeDeletes
> 
> bq: where I see that the number of deleted documents just
> keeps on growing and growing, but they never seem to be deleted
> 
> This shouldn't be happening.  The default TieredMergePolicy weights
> segments to be merged (which happens automatically) heavily as per
> the percentage of deleted docs. Here's a great visualization:
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
> 
> It may be that when you say "growing and growing", that the number of
> deleted docs hasn't reached the threshold where they get merged away.
> 
> Please specify "growing and growing", Until it gets to 15% or more of the 
> total
> then I'd start to worry. And then only if it kept growing after that.
> 
> To your questions:
> 1> This is automatic. It'll "just happen", but you will probably always carry
> some deleted docs around in your index.
> 
> 2> You always need at least as much free space as your index occupies on disk.
> In the worst case of normal merging, _all_ the segments will be merged
> and they're
> copied first. Once that's successful, then the original is deleted.
> 
> 3> Not really. Normally there should be no need.
> 
> 4> True, but usually the effect is so minuscule that nobody notices.
> People spend
> endless time obsessing about this and unless and until you can show that your
> _users_ notice, I'd ignore it.
> 
> Best,
> Erick
> 
> On Tue, Mar 29, 2016 at 8:16 AM, Jostein Elvaker Haande
> <jehaa...@gmail.com> wrote:
> > Hello everyone,
> >
> > I apologise beforehand if this is a question that has been visited
> > numerous times on this list, but after hours spent on Google and
> > talking to SOLR savvy people on #solr @ Freenode I'm still a bit at a
> > loss about SOLR and deleted documents.
> >
> > I have quite a few indexes in both production and development
> > environments, where I see that the number of deleted documents just
> > keeps on growing and growing, but they never seem to be deleted. From
> > my understanding, this can be controller in the merge policy set for
> > the current core, but I've not been able to find any specifics on the
> > topic.
> >
> > The general consensus on most search hits I've found is to perform an
> > optimize of the core, however this is both an expensive operation,
> > both in terms of CPU cycles as well as disk I/O, and also requires you
> > to have anywhere from 2 times to 3 times the size of the index
> > available on disk to be guaranteed to complete fully. Given these
> > criteria, it's often not something that is a viable option in certain
> > environments, both to it being a resource hog and often that you just
> > don't have the needed available disk space to perform the optimize.
> >
> > After having spoken with a couple of people on IRC (thanks tokee and
> > elyograg), I was made aware of an optional parameter for <commit>
> > called 'expungeDeletes' that can explicitly make sure that deleted
> > documents are deleted from the index, i.e:
> >
> > curl http://localhost:8983/solr/coreName/update -H "Content-Type:
> > text/xml" --data-binary '<commit expungeDeletes="true"/>'
> >
> > Now my questions are as follows:
> >
> > 1) How can I make sure that this is dealt with in my merge policy, if
> > at all possible?
> > 2) I've tried to find some disk space guidelines for 'expungeDeletes',
> > however I've not been able to find any. What are the general
> > guidelines here? Does it require as much space as an optimize, or is
> > it less "aggressive" compared to an optimize?
> > 3) Is 'expungeDeletes' the recommended method to make sure your
> > deleted documents are actually removed from the index, or should you
> > deal with this in your merge policy?
> > 4) I have also heard from talks on #SOLR that deleted documents has an
> > impact on the relevancy of performed searches. Is this correct, or
> > just misinformation?
> >
> > If you require any additional information, like snippets from my
> > configuration (solrconfig.xml), I'm more than happy to provide this.
> >
> > Again, if this is an issue that's being revisited for the Nth time, I
> > apologize, I'm just trying to get my head around this with my somewhat
> > limited SOLR knowledge.
> >
> > --
> > Yours sincerely Jostein Elvaker Haande
> > "A free society is a society where it is safe to be unpopular"
> > - Adlai Stevenson
> >
> > http://tolecnal.net -- tolecnal at tolecnal dot net
> 

Reply via email to