On 6/10/2013 12:31 PM, Cosimo Streppone wrote:
On 10/6/2013 19:15, Shawn Heisey wrote:
I really liked the LHC page. :)  Michael is correct here.  If you look
through that JIRA, you'll see that there are still very valid reasons
for doing an optimize, but the age-old reason of "improving performance"
is not one of them.

That is interesting, because I am not running any manual optimize,
but I can clearly see that Solr master is doing something to the index
that periodically brings it down about 80% in size.

After that, the query response time is much lower, and more
importantly, has a much lower variance too.

Solr does merge segments according to your configuration, with a default mergeFactor of 10, so when 10 candidate segments exist, those specific segments will be merged into one larger segment. When that has happened ten times, the ten larger segments will be merged into an even larger segment, and so on. When segments are merged, deleted documents are not copied to the new segment.

An optimize is just a special explicit merge that in most cases merges all segments down to one.

If you are seeing an 80% index size reduction on a regular basis just from merging, then it sounds like you do a lot of document deletes, reindexes, and/or atomic updates. When you reindex or do an atomic update on a document, the old one is deleted and the new one is inserted.

Document deletes don't actually remove anything from the index, they just mark specific document IDs as deleted. The index doesn't get any smaller. Searches will still look at the deleted docs, but they get removed from the results after the search is done. Merging (or an optimize) is the only way that deleted documents actually get removed.

Thanks,
Shawn

Reply via email to