On 3/19/2015 12:24 AM, vicky desai wrote: > I fail to understand why this deleted docs are not removed from index on > merging. Is there a good documentation which explains how exactly is merging > done? > > What can I do to solve this problem other than optimization?
Deleted docs *are* removed by automatic merging -- but only from the specific segments that are merged, and only docs deleted before the merge starts. Deleted docs residing in other index segments are unaffected. If you are replacing/updating/deleting documents in your index on a regular basis, then there will always be deleted documents in the index, unless you optimize. As long as you don't do it frequently, there is nothing wrong with optimizing your index, you just need to be aware of the cost -- optimizing causes a large amount of I/O, which can affect Solr performance while the optimize is happening and for a short time afterwards. What actual problem are you trying to solve by getting rid of your deleted documents? With 2-3 million total docs and about half a million deleted docs, as long as you have enough memory in the system for effective disk caching, I don't think performance will be a major factor. If you are finding that it does cause much lower performance, you probably need more RAM in the server. http://wiki.apache.org/solr/SolrPerformanceProblems The only other thing that deleted documents might do to your search results is affect the order of documents returned when you do not explicitly sort them and rely on relevancy ranking, because the terms in the deleted documents will affect the similarity calculation. The most accessible information we have on how merging happens is the visualization blog post that Erick already shared with you. The third video shows how the default merge policy works in recent Solr versions, with a mergeFactor of 10 ... if you count the number of segments, you will see that there are quite a lot more than 10 segments in the index at all times. http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Each of the bars in the graph shows deleted documents with a dark gray color, and you'll notice that it continually changes while the video plays ... and the index never reaches a state with minimal deleted documents. Thanks, Shawn