The updates are fairly frequent (a few per minute) and have a tight freshness requirement. We really don’t want to show tutors who are not available. Luckily, it is a smallish collection, a few hundred thousand.
The traffic isn’t a problem and the cluster is working very well. This is about understanding our metrics. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 21, 2018, at 8:46 AM, Erick Erickson <erickerick...@gmail.com> wrote: > > A couple of notes: > > TLOG replicas will have the same issue. When I said that leaders > forwarded to followers, what that's really about is that the follower > guarantees that the docs have been written to the TLOG. So if you > change your model to use TLOG replicas, don't expect a change. > > PULL replicas, OTOH, only pull down changed segments, but you're > replace the raw document forwarding with segment copying so it's not > clear to me how that would change the number of messages flying > around. > > bq. All our updates are single documents. We need to track the > availability of online tutors, so we don’t batch them. > > I'm inferring that this means the updates aren't all that frequent and > if you waited for, say, 100 changes you might wait a long time. FYI, > here is the result of some experimentation I did for the difference > between various batch sizes: > https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/, may > not apply if your tutors come and go slowly of course. > > Best, > Erick > > On Tue, Aug 21, 2018 at 8:02 AM, Walter Underwood <wun...@wunderwood.org> > wrote: >> Thanks, that is exactly what I was curious about. >> >> All our updates are single documents. We need to track the availability of >> online >> tutors, so we don’t batch them. >> >> Right now, we have a replication factor of 36 (way too many), so that means >> each >> update means 3 x 35 internal communications. Basically, a 100X update >> amplification >> for our cluster. >> >> We’ll be reducing the cluster to four hosts as soon as we get out of the >> current >> blackout on prod changes. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Aug 20, 2018, at 10:05 PM, Erick Erickson <erickerick...@gmail.com> >>> wrote: >>> >>> Walter: >>> >>> Each update is roughly >>> >>> request goes to leader (may be forwarded) >>> >>> leader sends the update to _each_ replica. depending on how many docs >>> you're sending per update request this may be more than one request. >>> IIRC there was some JIRA a while ago where the forwarding wasn't all >>> that efficient, but that's going from (shaky) memoryh. >>> >>> each follower acks back to the leader >>> >>> leader acks back to the client. >>> >>> So perhaps you're seeing the individual forwards to followers? Your >>> logs should show update requests with FROMLEADER for these >>> sub-requests (updates and queries). Does that help? >>> >>> Erick >>> >>> >>> >>> On Mon, Aug 20, 2018 at 8:03 PM, Walter Underwood <wun...@wunderwood.org> >>> wrote: >>>> I’m comparing request counts from New Relic, which is reporting 16 krpm >>>> aggregate >>>> requests across the cluster, and the AWS load balancer is reporting 1 >>>> krpm. Or it might >>>> be 1k requests per 5 minutes because CloudWatch is like that. >>>> >>>> This is a 36 node cluster, not sharded. We are going to shrink it, but I’d >>>> like to understand it. >>>> >>>> wunder >>>> Walter Underwood >>>> wun...@wunderwood.org >>>> http://observer.wunderwood.org/ (my blog) >>>> >>>>> On Aug 20, 2018, at 7:02 PM, Shalin Shekhar Mangar >>>>> <shalinman...@gmail.com> wrote: >>>>> >>>>> There are a single persistent HTTP connection open from the leader to each >>>>> replica in the shard. All updates coming to the leader are expanded (for >>>>> atomic updates) and streamed over that single connection. When using >>>>> in-place docvalues updates, there is a possibility of the replica making a >>>>> request to the leader if updates has been re-ordered and the replica does >>>>> not have enough context to process the update. >>>>> >>>>> Can you quantify the "tons of internal traffic"? Are you seeing higher >>>>> number of open connections as well? >>>>> >>>>> On Fri, Aug 17, 2018 at 11:17 PM Walter Underwood <wun...@wunderwood.org> >>>>> wrote: >>>>> >>>>>> How many messages are sent back and forth between a leader and replica >>>>>> with NRT? >>>>>> >>>>>> We have a collection that gets frequent updates and we are seeing a ton >>>>>> of >>>>>> internal >>>>>> cluster traffic. >>>>>> >>>>>> wunder >>>>>> Walter Underwood >>>>>> wun...@wunderwood.org >>>>>> http://observer.wunderwood.org/ (my blog) >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Regards, >>>>> Shalin Shekhar Mangar. >>>> >>