Here is one way to tell if the index is optimized. Look at this graph for example:
https://apps.sematext.com/spm/s/Dxn6SHjSLB See the purple line labeled "delta"? If it's not 0 it means your index has deletions. This index has over 100K deleted docs that have not been expunged. That's because we never optimize it. See the difference beween max docs and num docs? That's that delta. See that green jagged line? That's the number of segments. It goes up and down as Lucene merges segments. There are close to 30 segments in this index. If the index were optimized, it would have just 1 segment and that green line would be down close to the X axis. So that's one way to see if your index is optimized or not. But as you can see here, we don't optimize this index at all. That's the index that's behind http://search-hadoop.com/ btw. As you can see, it's constantly growing.... slowly, but growing.... so we don't optimize. Otis -- Solr & ElasticSearch Support http://sematext.com/ On Mon, Jun 10, 2013 at 2:49 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 6/10/2013 12:31 PM, Cosimo Streppone wrote: >> >> On 10/6/2013 19:15, Shawn Heisey wrote: >>> >>> I really liked the LHC page. :) Michael is correct here. If you look >>> through that JIRA, you'll see that there are still very valid reasons >>> for doing an optimize, but the age-old reason of "improving performance" >>> is not one of them. >> >> >> That is interesting, because I am not running any manual optimize, >> but I can clearly see that Solr master is doing something to the index >> that periodically brings it down about 80% in size. >> >> After that, the query response time is much lower, and more >> importantly, has a much lower variance too. > > > Solr does merge segments according to your configuration, with a default > mergeFactor of 10, so when 10 candidate segments exist, those specific > segments will be merged into one larger segment. When that has happened ten > times, the ten larger segments will be merged into an even larger segment, > and so on. When segments are merged, deleted documents are not copied to > the new segment. > > An optimize is just a special explicit merge that in most cases merges all > segments down to one. > > If you are seeing an 80% index size reduction on a regular basis just from > merging, then it sounds like you do a lot of document deletes, reindexes, > and/or atomic updates. When you reindex or do an atomic update on a > document, the old one is deleted and the new one is inserted. > > Document deletes don't actually remove anything from the index, they just > mark specific document IDs as deleted. The index doesn't get any smaller. > Searches will still look at the deleted docs, but they get removed from the > results after the search is done. Merging (or an optimize) is the only way > that deleted documents actually get removed. > > Thanks, > Shawn >