Hi Tomás, No, I am not seeing reloads. I am trying to understand the interactions between hard commit, soft commit, transaction log update with a TLOG cluster for both leader and follower replicas. For example, after getting new segments from the leader the follower replica will still apply the hard/soft commit?
PS: congratulations on the Berlin Buzzwords' talk. :) Thanks! On Mon, Dec 10, 2018 at 9:24 PM Tomás Fernández Löbbe <tomasflo...@gmail.com> wrote: > I think this is a good point. The tricky part is that if TLOG replicas > don't replicate often, their transaction logs will get too big too, so you > want the replication interval of TLOG replicas to be tied to the > auto(hard)Commit interval (by default at least). If you are using them for > search, you may also not want to open a searcher for each fetch... for PULL > replicas, maybe the best way is to use the autoSoftCommit interval to > define the polling interval. That said, I'm not sure using different > configurations is a good idea, some people may be mixing TLOG and PULL and > querying them both alike. > > In the meantime, if you have different hosts for TLOG and PULL replicas, > one workaround you can have is to define the autoCommit time with a system > property, and use different properties for TLOGs vs PULL nodes. > > > There is no commit on TLOG/PULL follower replicas, only on the leader. > > Followers fetch the segments and **reload the core** every 150 seconds > > Edward, "reload" shouldn't really happen in regular TLOG/PULL fetches. Are > you seeing reloads? > > On Mon, Dec 10, 2018 at 4:41 PM Erick Erickson <erickerick...@gmail.com> > wrote: > > > bq. but not every poll attempt they fetch new segment from the leader > > > > Ah, right. Ignore my comment. Commit will only occur on the followers > > when there are new segments to pull down, so your'e right, roughly > > every second poll would commit find things to bring down and open a > > new searcher......... > > On Sun, Dec 9, 2018 at 4:14 PM Edward Ribeiro <edward.ribe...@gmail.com> > > wrote: > > > > > > Hi Vadim, > > > > > > There is no commit on TLOG/PULL follower replicas, only on the leader. > > > Followers fetch the segments and **reload the core** every 150 seconds > > (if > > > there were new segments, I suppose). Yeah, followers don't pay the CPU > > > price of indexing, but there are still cache invalidation, autowarming, > > > etc, in addition to network and IO demand. Is that ritht, Erick? > > > > > > Besides that, Erick is pointing out that under a heavy indexing > workload > > > you could either have: > > > > > > 1. Very large transaction logs; > > > > > > 2. Very large numbers of segments. If that is the case, you could have > > the > > > following scenario numerous times: > > > 2.1. follower replica downloads segment A and B from leader; > > > 2.2 leader merges segments A + B into C; > > > 2.3. follower replicas discard A and B and download C on next poll; > > > > > > Under the second condition followers needlessly downloaded segments > that > > > would eventually be merged. > > > > > > IMO, you should carefully evaluate if the use of TLOG/PULL is really > > > recommended for your cluster setup, plus indexing and querying > workload. > > > You can very much stay with a NRT setup if it suits you better. The > > videos > > > below provide a nice set of hints for when to choose between NRT or > some > > > combination of TLOG and PULL. > > > > > > https://youtu.be/XIb8X3MwVKc > > > > > > https://youtu.be/dkWy2ykzAv0 > > > > > > https://youtu.be/XqfTjd9KDWU > > > > > > Regards, > > > Edward > > > > > > Em dom, 9 de dez de 2018 16:56, <vadim.iva...@spb.ntk-intourist.ru > > escreveu: > > > > > > > > > > > If hard commit max time is 300 sec then commit happens every 300 sec > > on > > > > tlog leader. And new segments pop up on the leader every 300 sec, > > during > > > > indexing. Polling interval on other replicas 150 sec, but not every > > poll > > > > attempt they fetch new segment from the leader, afaiu. Erick, do you > > mean > > > > that on all other tlog replicas(not leaders) commit occurs every > poll? > > > > воскресенье, 09 декабря 2018г., 19:21 +03:00 от Erick Erickson > > > > erickerick...@gmail.com : > > > > > > > > >Not quite, 600000. The polling interval is half the commit > > interval.... > > > > > > > > > >This has always bothered me a little bit, I wonder at the utility > of a > > > > >config param. We already have old-style replication with a > > > > >configurable polling interval. Under very heavy indexing loads, it > > > > >seems to me that either the tlogs will grow quite large or we'll be > > > > >pulling a lot of unnecessary segments across the wire, segments > > > > >that'll soon be merged away and the merged segment re-pulled. > > > > > > > > > >Apparently, though, nobody's seen this "in the wild", so it's > > > > >theoretical at this point. > > > > >On Sun, Dec 9, 2018 at 1:48 AM Vadim Ivanov > > > > < vadim.iva...@spb.ntk-intourist.ru> wrote: > > > > > > > > > > Thanks, Edward, for clues. > > > > > What bothers me is newSearcher start, warming, cache clear... all > > that > > > > CPU consuming stuff in my heavy-indexing scenario. > > > > > With NRT I had autoSoftCommit: 300000 . > > > > > So I had new Searcher no more than every 5 min on every replica. > > > > > To have more or less the same effect with TLOG - PULL collection, > > > > > I suppose, I have to have : 300000 > > > > > (yes, I understand that newSearchers start asynchronously on leader > > and > > > > replicas) > > > > > Am I right? > > > > > -- > > > > > Vadim > > > > > > > > > > > > > > >> -----Original Message----- > > > > >> From: Edward Ribeiro [mailto:edward.ribe...@gmail.com] > > > > >> Sent: Sunday, December 09, 2018 12:42 AM > > > > >> To: solr-user@lucene.apache.org > > > > >> Subject: Re: Soft commit and new replica types > > > > >> > > > > >> Some insights in the new replica types below: > > > > >> > > > > >> On Sat, December 8, 2018 08:42, Vadim Ivanov < > > > > >> vadim.iva...@spb.ntk-intourist.ru wrote: > > > > >> > > > > >>> > > > > >>> From Ref guide we have: > > > > >>> " NRT is the only type of replica that supports soft-commits..." > > > > >>> "If TLOG replica does become a leader, it will behave the same as > > if it > > > > >>> was a NRT type of replica." > > > > >>> Does it mean, that if we do not have NRT replicas in the cluster > > then > > > > >>> autoSoftCommit section in solconfig.xml Ignored completely (even > on > > > > TLOG > > > > >>> leader)? > > > > >>> > > > > >> > > > > >> No, not completely. Both TLOG and PULL nodes will periodically > poll > > the > > > > >> leader for changes in index segments' files and download those > > segments > > > > >> from the leader. If hard commit max time is defined in > > solrconfig.xml > > > > the > > > > >> polling interval of each replica will be half that value. Or else > > if the > > > > >> soft commit max time is defined then the replicas will use half > the > > soft > > > > >> commit max time as the interval. If neither are defined then the > > poll > > > > >> interval will be 3 seconds (hard coded). See here: > > > > >> https://github.com/apache/lucene- > > > > >> > > solr/blob/75b183196798232aa6f2dcaaaab117f309119053/solr/core/src/java/o > > > > >> rg/apache/solr/cloud/ReplicateFromLeader.java#L68-L77 > > > > >> > > > > >> If the TLOG is the leader it will index locally and append the doc > > to > > > > >> transaction log as a NRT node would do as well as it will > > synchronously > > > > >> replicate the data to other TLOG replicas' transaction logs (PULL > > nodes > > > > >> don't have transaction logs). But TLOG/PULL replicas doesn't > support > > > > soft > > > > >> commits nor real time gets, afaik. > > > > >> > > > > >>> > > > > >> > > > > >>> > > > > >>> 60000 > > > > >>> > > > > >>> > > > > >>> Should we say that in autoCommit section openSearcher is always > > true in > > > > >>> that case? > > > > >> > > > > >> > > > > >> > > > > >> 10000 > > > > >> 30000 > > > > >> 512m > > > > >> false > > > > >> > > > > >> > > > > >> Does it mean that new Searcher always starts on all replicas when > > hard > > > > >> commit happens on leader? > > > > >> > > > > >> > > > > >> Nope. Or at least, the searcher is not synchronously created. Each > > non > > > > >> leader replica will periodically fetch the index changes from the > > leader > > > > >> and open a new searcher to reflect those changes as seen here: > > > > >> https://github.com/apache/lucene- > > > > >> > > solr/blob/75b183196798232aa6f2dcaaaab117f309119053/solr/core/src/java/o > > > > >> rg/apache/solr/handler/IndexFetcher.java#L653 > > > > >> But it's important to note that the potential delay between the > > leader's > > > > >> hard commit and the other replicas fetching those changes from the > > > > leader > > > > >> and opening a new searcher to reflect latest changes. > > > > >> > > > > >> PS: I am still digging these new replica types so I can have > > > > misunderstood > > > > >> or missed some aspect of it. > > > > >> > > > > >> Regards, > > > > >> Edward > > > > > > > > > > > >