Hello everyone,

I apologise beforehand if this is a question that has been visited
numerous times on this list, but after hours spent on Google and
talking to SOLR savvy people on #solr @ Freenode I'm still a bit at a
loss about SOLR and deleted documents.

I have quite a few indexes in both production and development
environments, where I see that the number of deleted documents just
keeps on growing and growing, but they never seem to be deleted. From
my understanding, this can be controller in the merge policy set for
the current core, but I've not been able to find any specifics on the
topic.

The general consensus on most search hits I've found is to perform an
optimize of the core, however this is both an expensive operation,
both in terms of CPU cycles as well as disk I/O, and also requires you
to have anywhere from 2 times to 3 times the size of the index
available on disk to be guaranteed to complete fully. Given these
criteria, it's often not something that is a viable option in certain
environments, both to it being a resource hog and often that you just
don't have the needed available disk space to perform the optimize.

After having spoken with a couple of people on IRC (thanks tokee and
elyograg), I was made aware of an optional parameter for <commit>
called 'expungeDeletes' that can explicitly make sure that deleted
documents are deleted from the index, i.e:

curl http://localhost:8983/solr/coreName/update -H "Content-Type:
text/xml" --data-binary '<commit expungeDeletes="true"/>'

Now my questions are as follows:

1) How can I make sure that this is dealt with in my merge policy, if
at all possible?
2) I've tried to find some disk space guidelines for 'expungeDeletes',
however I've not been able to find any. What are the general
guidelines here? Does it require as much space as an optimize, or is
it less "aggressive" compared to an optimize?
3) Is 'expungeDeletes' the recommended method to make sure your
deleted documents are actually removed from the index, or should you
deal with this in your merge policy?
4) I have also heard from talks on #SOLR that deleted documents has an
impact on the relevancy of performed searches. Is this correct, or
just misinformation?

If you require any additional information, like snippets from my
configuration (solrconfig.xml), I'm more than happy to provide this.

Again, if this is an issue that's being revisited for the Nth time, I
apologize, I'm just trying to get my head around this with my somewhat
limited SOLR knowledge.

-- 
Yours sincerely Jostein Elvaker Haande
"A free society is a society where it is safe to be unpopular"
- Adlai Stevenson

http://tolecnal.net -- tolecnal at tolecnal dot net

Reply via email to