Re: Period on-line index optimization

Walter Underwood Tue, 27 Nov 2018 10:32:57 -0800

There is one case where optimize makes sense. You do a full reload of content 
rarely, maybe once per day or once per week. You use a master/slave cluster. 
Your index isn’t huge (say under 1 million docs). We have exactly that setup 
for our textbook search. We do not run optimize. Our median response time is 3 
ms, 95th percentile is 200 ms.

When Ultraseek was the search engine for irs.gov, we did have them do a force 
merge once per day. On April 15th, those 15 slave servers needed all the 
throughput they could get.

When you force a full merge, you make a single huge segment. This interferes 
with the normal merge algorithms. That segment will probably hang around for a 
long time until there are enough deletes to trigger a merge for it. Eventually, 
there will be a big merge and it will happen at a time that you cannot control. 
It is almost like a chemical dependency—with manual merges, the auto merge is 
less effective. Solr isn’t quite “addicted” to manual merges, but it does 
hamper the automatic ones.

The very latest version of Solr (7.5) has some additional rules to try and 
avoid that problem. But you can easily avoid it by not doing an optimize. 
Details here:

https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/

The real cost of “optimize”? That this is my 120th post to the solr-user list 
about it. It is probably the #1 most misunderstood thing in Solr.

https://markmail.org/search/?q=solr-user%20optimize%20wunder#query:solr-user%20optimize%20wunder%20list%3Aorg.apache.lucene.solr-user+page:1+state:facets

The biggest problem with your cluster is doing a commit after every document. 
That will absolutely kill performance. You need to configure auto hard and soft 
commits with reasonable settings. That is about 1000X more important than 
optimize.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 27, 2018, at 9:43 AM, Christopher Schultz 
> <ch...@christopherschultz.net> wrote:
> 
> I understand that "optimize" makes it sounds like, without performing
> that operation, that the index is "not optimized" which sounds bad.
> I'm not hung-up on the terminology.
> 
> In my live index, I can see total 20 segments. 7 of them are "all
> gray" and the other 13 are at various levels of "dark grayness". I
> haven't been able to find a reference for what those colors mean, but
> they don't seem to be correlated with any data I can see on each segment
> .
> 
> When I have run an "optimize" operation on a test index, I can see a
> single segment which is shown all in "light gray", whatever that means.
> 
> Other than wasting my time, are there any negative consequences for
> periodically "optimizing" (or merging) the index?
> 
> Thanks,
> - -chris

Re: Period on-line index optimization

Reply via email to