On 6/10/2013 12:31 PM, Cosimo Streppone wrote:
On 10/6/2013 19:15, Shawn Heisey wrote:
I really liked the LHC page. :) Michael is correct here. If you look
through that JIRA, you'll see that there are still very valid reasons
for doing an optimize, but the age-old reason of "improving performance"
is not one of them.
That is interesting, because I am not running any manual optimize,
but I can clearly see that Solr master is doing something to the index
that periodically brings it down about 80% in size.
After that, the query response time is much lower, and more
importantly, has a much lower variance too.
Solr does merge segments according to your configuration, with a default
mergeFactor of 10, so when 10 candidate segments exist, those specific
segments will be merged into one larger segment. When that has happened
ten times, the ten larger segments will be merged into an even larger
segment, and so on. When segments are merged, deleted documents are not
copied to the new segment.
An optimize is just a special explicit merge that in most cases merges
all segments down to one.
If you are seeing an 80% index size reduction on a regular basis just
from merging, then it sounds like you do a lot of document deletes,
reindexes, and/or atomic updates. When you reindex or do an atomic
update on a document, the old one is deleted and the new one is inserted.
Document deletes don't actually remove anything from the index, they
just mark specific document IDs as deleted. The index doesn't get any
smaller. Searches will still look at the deleted docs, but they get
removed from the results after the search is done. Merging (or an
optimize) is the only way that deleted documents actually get removed.
Thanks,
Shawn