Thanks Erick!

Yes, most likely I'll have to do sharing soon before the index size grows
too big.

Regards,
Edwin

On 21 December 2015 at 13:18, Rahul Ramesh <rr.ii...@gmail.com> wrote:

> Thanks Erick!
>
> Rahul
>
> On Mon, Dec 21, 2015 at 10:07 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > Rahul:
> >
> > bq:  we dont want the index sizes to grow too large and auto optimzie to
> > kick in
> >
> > Not what quite what's going on. There is no "auto optimize". What
> > there is is background merging that will take _some_ segments and
> > merge them together. Very occasionally this will be the same as a full
> > optimize if it just happens that "some" means all the segments.
> >
> > bq: recovery takes a bit more time when it is not optimized
> >
> > I'd be interested in formal measurements here. A recovery that copied
> > the _entire_ index down from the leader shouldn't really have that
> > much be different between an optimized and non-optimized index, but
> > all things are possible. If the recovery is a "peer sync" it shouldn't
> > matter at all.
> >
> > If you're continually adding documents that _replace_ older documents,
> > optimizing will recover any "holes" left by the old updated docs. An
> > update is really a mark-as-deleted for the old version and a re-index
> > of the new. Since segments are write-once, the old data is left there
> > until the segment is merged. Now, one of the bits of information that
> > goes into deciding whether to merge a segment or not is the size.
> > Another is the percentage of deleted docs. When you optimize, you get
> > one huge segment. Now you have to update a lot of docs for that
> > segment to have a large percentage of deleted documents and be merged,
> > thus wasting space and memory.
> >
> > So it's a tradeoff. But if you're getting satisfactory performance
> > from what you have now, there's no reason to change.
> >
> > Here's a wonderful video about the process. you want the third one
> > down (TieredMergePolicy) as that's the default.
> >
> >
> >
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
> >
> > Best,
> > Erick
> >
> > On Sun, Dec 20, 2015 at 8:26 PM, Rahul Ramesh <rr.ii...@gmail.com>
> wrote:
> > > Hi Erick,
> > > We index around several million documents/ day and we optimize everyday
> > > when the relative load is low. The reason we optimize is, we dont want
> > the
> > > index sizes to grow too large and auto optimzie to kick in. When auto
> > > optimize kicks in, it results in unpredictable performance as it is CPU
> > and
> > > IO intensive.
> > >
> > > In older solr (4.2), when the segment size grows too large, insertion
> > used
> > > to fail .  Have we seen this problem in solr cloud?
> > >
> > > Also, we have observed, recovery takes a bit more time when it is not
> > > optimized. We dont have any quantitative measurement for the same. Its
> > just
> > > an observation. Is this correct observation?
> > >
> > > If we optimize it every day, the indexes will not be skewed right?
> > >
> > > Please let me know if my understanding is correct.
> > >
> > > Regards,
> > > Rahul
> > >
> > > On Mon, Dec 21, 2015 at 9:54 AM, Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > >
> > >> You'll probably have to shard before you get to the TB range. At that
> > >> point, all the optimization is done individually on each shard so it
> > >> really doesn't matter how many shards you have.
> > >>
> > >> Just issuing
> > >> http://solr:port/solr/collection/update?optimize=true
> > >>
> > >> is sufficient, that'll forward the optimize command to all the shards
> > >> in the collection.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Sun, Dec 20, 2015 at 8:19 PM, Zheng Lin Edwin Yeo
> > >> <edwinye...@gmail.com> wrote:
> > >> > Thanks for your information Erick.
> > >> >
> > >> > We have yet to decide how often we will update the index to include
> > new
> > >> > documents that came in. Let's say we update the index once a day,
> then
> > >> when
> > >> > the indexed is updated, we do the optimization (this will be done at
> > >> night
> > >> > when there are not many users using the system).
> > >> > But my index size will probably grow quite big (potentially can go
> up
> > to
> > >> > more than 1TB in the future), so does that have to be taken into
> > >> > consideration too?
> > >> >
> > >> > Regards,
> > >> > Edwin
> > >> >
> > >> >
> > >> > On 21 December 2015 at 12:12, Erick Erickson <
> erickerick...@gmail.com
> > >
> > >> > wrote:
> > >> >
> > >> >> Much depends on how often the index is updated. If your index only
> > >> >> changes, say, once a day then it's probably a good idea. If you're
> > >> >> constantly updating your index, then I'd recommend that you do
> _not_
> > >> >> optimize.
> > >> >>
> > >> >> Optimizing will create one large segment. That segment will be
> > >> >> unlikely to be merged since it is so large relative to other
> segments
> > >> >> for quite a while, resulting in significant wasted space. So if
> > you're
> > >> >> regularly indexing documents that _replace_ existing documents,
> this
> > >> >> will skew your index.
> > >> >>
> > >> >> Bottom line:
> > >> >> If you have a relatively static index the you can build and then
> use
> > >> >> for an extended time (as in 12 hours plus) it can be worth the time
> > to
> > >> >> optimize. Otherwise I wouldn't bother.
> > >> >>
> > >> >> Best,
> > >> >> Erick
> > >> >>
> > >> >> On Sun, Dec 20, 2015 at 7:57 PM, Zheng Lin Edwin Yeo
> > >> >> <edwinye...@gmail.com> wrote:
> > >> >> > Hi,
> > >> >> >
> > >> >> > I would like to find out, will it be good to do write a script to
> > do
> > >> an
> > >> >> > auto-opitmization of the indexes at a certain time every day? Is
> > there
> > >> >> any
> > >> >> > advantage to do so?
> > >> >> >
> > >> >> > I found that optimization can reduce the index size by quite a
> > >> >> > signification amount, and allow the searching of the index to run
> > >> faster.
> > >> >> > But will there be advantage if we do the optimization every day?
> > >> >> >
> > >> >> > I'm using Solr 5.3.0
> > >> >> >
> > >> >> > Regards,
> > >> >> > Edwin
> > >> >>
> > >>
> >
>

Reply via email to