Rahul:

bq:  we dont want the index sizes to grow too large and auto optimzie to kick in

Not what quite what's going on. There is no "auto optimize". What
there is is background merging that will take _some_ segments and
merge them together. Very occasionally this will be the same as a full
optimize if it just happens that "some" means all the segments.

bq: recovery takes a bit more time when it is not optimized

I'd be interested in formal measurements here. A recovery that copied
the _entire_ index down from the leader shouldn't really have that
much be different between an optimized and non-optimized index, but
all things are possible. If the recovery is a "peer sync" it shouldn't
matter at all.

If you're continually adding documents that _replace_ older documents,
optimizing will recover any "holes" left by the old updated docs. An
update is really a mark-as-deleted for the old version and a re-index
of the new. Since segments are write-once, the old data is left there
until the segment is merged. Now, one of the bits of information that
goes into deciding whether to merge a segment or not is the size.
Another is the percentage of deleted docs. When you optimize, you get
one huge segment. Now you have to update a lot of docs for that
segment to have a large percentage of deleted documents and be merged,
thus wasting space and memory.

So it's a tradeoff. But if you're getting satisfactory performance
from what you have now, there's no reason to change.

Here's a wonderful video about the process. you want the third one
down (TieredMergePolicy) as that's the default.

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Best,
Erick

On Sun, Dec 20, 2015 at 8:26 PM, Rahul Ramesh <rr.ii...@gmail.com> wrote:
> Hi Erick,
> We index around several million documents/ day and we optimize everyday
> when the relative load is low. The reason we optimize is, we dont want the
> index sizes to grow too large and auto optimzie to kick in. When auto
> optimize kicks in, it results in unpredictable performance as it is CPU and
> IO intensive.
>
> In older solr (4.2), when the segment size grows too large, insertion used
> to fail .  Have we seen this problem in solr cloud?
>
> Also, we have observed, recovery takes a bit more time when it is not
> optimized. We dont have any quantitative measurement for the same. Its just
> an observation. Is this correct observation?
>
> If we optimize it every day, the indexes will not be skewed right?
>
> Please let me know if my understanding is correct.
>
> Regards,
> Rahul
>
> On Mon, Dec 21, 2015 at 9:54 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> You'll probably have to shard before you get to the TB range. At that
>> point, all the optimization is done individually on each shard so it
>> really doesn't matter how many shards you have.
>>
>> Just issuing
>> http://solr:port/solr/collection/update?optimize=true
>>
>> is sufficient, that'll forward the optimize command to all the shards
>> in the collection.
>>
>> Best,
>> Erick
>>
>> On Sun, Dec 20, 2015 at 8:19 PM, Zheng Lin Edwin Yeo
>> <edwinye...@gmail.com> wrote:
>> > Thanks for your information Erick.
>> >
>> > We have yet to decide how often we will update the index to include new
>> > documents that came in. Let's say we update the index once a day, then
>> when
>> > the indexed is updated, we do the optimization (this will be done at
>> night
>> > when there are not many users using the system).
>> > But my index size will probably grow quite big (potentially can go up to
>> > more than 1TB in the future), so does that have to be taken into
>> > consideration too?
>> >
>> > Regards,
>> > Edwin
>> >
>> >
>> > On 21 December 2015 at 12:12, Erick Erickson <erickerick...@gmail.com>
>> > wrote:
>> >
>> >> Much depends on how often the index is updated. If your index only
>> >> changes, say, once a day then it's probably a good idea. If you're
>> >> constantly updating your index, then I'd recommend that you do _not_
>> >> optimize.
>> >>
>> >> Optimizing will create one large segment. That segment will be
>> >> unlikely to be merged since it is so large relative to other segments
>> >> for quite a while, resulting in significant wasted space. So if you're
>> >> regularly indexing documents that _replace_ existing documents, this
>> >> will skew your index.
>> >>
>> >> Bottom line:
>> >> If you have a relatively static index the you can build and then use
>> >> for an extended time (as in 12 hours plus) it can be worth the time to
>> >> optimize. Otherwise I wouldn't bother.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Sun, Dec 20, 2015 at 7:57 PM, Zheng Lin Edwin Yeo
>> >> <edwinye...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I would like to find out, will it be good to do write a script to do
>> an
>> >> > auto-opitmization of the indexes at a certain time every day? Is there
>> >> any
>> >> > advantage to do so?
>> >> >
>> >> > I found that optimization can reduce the index size by quite a
>> >> > signification amount, and allow the searching of the index to run
>> faster.
>> >> > But will there be advantage if we do the optimization every day?
>> >> >
>> >> > I'm using Solr 5.3.0
>> >> >
>> >> > Regards,
>> >> > Edwin
>> >>
>>

Reply via email to