Question, Toke: in your "immutable" cases, don't the benefits of optimizing 
come mostly from eliminating deleted records?   Is there any material 
difference in heap, CPU, etc. between 1, 5 or 10 segments?   I.e. at how many 
segments/shard do you see a noticeable performance hit?

Also, I curious if you have experimented much with the maxMergedSegmentMB and 
reclaimDeletesWeight  properties of the TieredMergePolicy?

For frequently updated indexes, would setting maxMergedSegmentMB lower (say 512 
or 1024 MB, depending on total index size) and reclaimDeletesWeight higher (say 
2.5?) be a good best practice?

-----Original Message-----
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: Monday, June 29, 2015 3:56 PM
To: solr-user@lucene.apache.org
Subject: Re: optimize status

Reitzel, Charles <charles.reit...@tiaa-cref.org> wrote:
> Is there really a good reason to consolidate down to a single segment?

In the  scenario spawning this thread it does not seem to be the best choice. 
Speaking more broadly there are Solr setups out there that deals with immutable 
data, often tied to a point in time, e.g. log data. We have such a setup 
(harvested web resources) and are able to lower heap requirements significantly 
and increase speed by building fully optimized and immutable shards.

> Any incremental query performance benefit is tiny compared to the loss of 
> managability.

True in many cases and I agree that the "Optimize"-wording is a bit of a trap. 
While technically correct, it implies that one should do it occasionally to 
keep any index fit. A different wording and maybe a tooltip saying something 
like "Only recommended for non-changing indexes" might be better.

Turning it around: To minimize the risk of occasional performance-degrading 
large merges, one might want an index where all the shards are below a certain 
size. Splitting larger shards into smaller ones would in that case also be an 
optimization, just towards a different goal.

- Toke Eskildsen

*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*************************************************************************

Reply via email to