On 3/19/2015 12:24 AM, vicky desai wrote:
> I fail to understand why this deleted docs are not removed from index on
> merging. Is there a good documentation which explains how exactly is merging
> done?
>
> What can I do to solve this problem other than optimization?

Deleted docs *are* removed by automatic merging -- but only from the
specific segments that are merged, and only docs deleted before the
merge starts.  Deleted docs residing in other index segments are unaffected.

If you are replacing/updating/deleting documents in your index on a
regular basis, then there will always be deleted documents in the index,
unless you optimize.  As long as you don't do it frequently, there is
nothing wrong with optimizing your index, you just need to be aware of
the cost -- optimizing causes a large amount of I/O, which can affect
Solr performance while the optimize is happening and for a short time
afterwards.

What actual problem are you trying to solve by getting rid of your
deleted documents?  With 2-3 million total docs and about half a million
deleted docs, as long as you have enough memory in the system for
effective disk caching, I don't think performance will be a major
factor.  If you are finding that it does cause much lower performance,
you probably need more RAM in the server.

http://wiki.apache.org/solr/SolrPerformanceProblems

The only other thing that deleted documents might do to your search
results is affect the order of documents returned when you do not
explicitly sort them and rely on relevancy ranking, because the terms in
the deleted documents will affect the similarity calculation.

The most accessible information we have on how merging happens is the
visualization blog post that Erick already shared with you.  The third
video shows how the default merge policy works in recent Solr versions,
with a mergeFactor of 10 ... if you count the number of segments, you
will see that there are quite a lot more than 10 segments in the index
at all times.

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Each of the bars in the graph shows deleted documents with a dark gray
color, and you'll notice that it continually changes while the video
plays ... and the index never reaches a state with minimal deleted
documents.

Thanks,
Shawn

Reply via email to