Thanks Erick! Yes, most likely I'll have to do sharing soon before the index size grows too big.
Regards, Edwin On 21 December 2015 at 13:18, Rahul Ramesh <rr.ii...@gmail.com> wrote: > Thanks Erick! > > Rahul > > On Mon, Dec 21, 2015 at 10:07 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > Rahul: > > > > bq: we dont want the index sizes to grow too large and auto optimzie to > > kick in > > > > Not what quite what's going on. There is no "auto optimize". What > > there is is background merging that will take _some_ segments and > > merge them together. Very occasionally this will be the same as a full > > optimize if it just happens that "some" means all the segments. > > > > bq: recovery takes a bit more time when it is not optimized > > > > I'd be interested in formal measurements here. A recovery that copied > > the _entire_ index down from the leader shouldn't really have that > > much be different between an optimized and non-optimized index, but > > all things are possible. If the recovery is a "peer sync" it shouldn't > > matter at all. > > > > If you're continually adding documents that _replace_ older documents, > > optimizing will recover any "holes" left by the old updated docs. An > > update is really a mark-as-deleted for the old version and a re-index > > of the new. Since segments are write-once, the old data is left there > > until the segment is merged. Now, one of the bits of information that > > goes into deciding whether to merge a segment or not is the size. > > Another is the percentage of deleted docs. When you optimize, you get > > one huge segment. Now you have to update a lot of docs for that > > segment to have a large percentage of deleted documents and be merged, > > thus wasting space and memory. > > > > So it's a tradeoff. But if you're getting satisfactory performance > > from what you have now, there's no reason to change. > > > > Here's a wonderful video about the process. you want the third one > > down (TieredMergePolicy) as that's the default. > > > > > > > http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html > > > > Best, > > Erick > > > > On Sun, Dec 20, 2015 at 8:26 PM, Rahul Ramesh <rr.ii...@gmail.com> > wrote: > > > Hi Erick, > > > We index around several million documents/ day and we optimize everyday > > > when the relative load is low. The reason we optimize is, we dont want > > the > > > index sizes to grow too large and auto optimzie to kick in. When auto > > > optimize kicks in, it results in unpredictable performance as it is CPU > > and > > > IO intensive. > > > > > > In older solr (4.2), when the segment size grows too large, insertion > > used > > > to fail . Have we seen this problem in solr cloud? > > > > > > Also, we have observed, recovery takes a bit more time when it is not > > > optimized. We dont have any quantitative measurement for the same. Its > > just > > > an observation. Is this correct observation? > > > > > > If we optimize it every day, the indexes will not be skewed right? > > > > > > Please let me know if my understanding is correct. > > > > > > Regards, > > > Rahul > > > > > > On Mon, Dec 21, 2015 at 9:54 AM, Erick Erickson < > erickerick...@gmail.com > > > > > > wrote: > > > > > >> You'll probably have to shard before you get to the TB range. At that > > >> point, all the optimization is done individually on each shard so it > > >> really doesn't matter how many shards you have. > > >> > > >> Just issuing > > >> http://solr:port/solr/collection/update?optimize=true > > >> > > >> is sufficient, that'll forward the optimize command to all the shards > > >> in the collection. > > >> > > >> Best, > > >> Erick > > >> > > >> On Sun, Dec 20, 2015 at 8:19 PM, Zheng Lin Edwin Yeo > > >> <edwinye...@gmail.com> wrote: > > >> > Thanks for your information Erick. > > >> > > > >> > We have yet to decide how often we will update the index to include > > new > > >> > documents that came in. Let's say we update the index once a day, > then > > >> when > > >> > the indexed is updated, we do the optimization (this will be done at > > >> night > > >> > when there are not many users using the system). > > >> > But my index size will probably grow quite big (potentially can go > up > > to > > >> > more than 1TB in the future), so does that have to be taken into > > >> > consideration too? > > >> > > > >> > Regards, > > >> > Edwin > > >> > > > >> > > > >> > On 21 December 2015 at 12:12, Erick Erickson < > erickerick...@gmail.com > > > > > >> > wrote: > > >> > > > >> >> Much depends on how often the index is updated. If your index only > > >> >> changes, say, once a day then it's probably a good idea. If you're > > >> >> constantly updating your index, then I'd recommend that you do > _not_ > > >> >> optimize. > > >> >> > > >> >> Optimizing will create one large segment. That segment will be > > >> >> unlikely to be merged since it is so large relative to other > segments > > >> >> for quite a while, resulting in significant wasted space. So if > > you're > > >> >> regularly indexing documents that _replace_ existing documents, > this > > >> >> will skew your index. > > >> >> > > >> >> Bottom line: > > >> >> If you have a relatively static index the you can build and then > use > > >> >> for an extended time (as in 12 hours plus) it can be worth the time > > to > > >> >> optimize. Otherwise I wouldn't bother. > > >> >> > > >> >> Best, > > >> >> Erick > > >> >> > > >> >> On Sun, Dec 20, 2015 at 7:57 PM, Zheng Lin Edwin Yeo > > >> >> <edwinye...@gmail.com> wrote: > > >> >> > Hi, > > >> >> > > > >> >> > I would like to find out, will it be good to do write a script to > > do > > >> an > > >> >> > auto-opitmization of the indexes at a certain time every day? Is > > there > > >> >> any > > >> >> > advantage to do so? > > >> >> > > > >> >> > I found that optimization can reduce the index size by quite a > > >> >> > signification amount, and allow the searching of the index to run > > >> faster. > > >> >> > But will there be advantage if we do the optimization every day? > > >> >> > > > >> >> > I'm using Solr 5.3.0 > > >> >> > > > >> >> > Regards, > > >> >> > Edwin > > >> >> > > >> > > >