Thanks, that is exactly what I was curious about. All our updates are single documents. We need to track the availability of online tutors, so we don’t batch them.
Right now, we have a replication factor of 36 (way too many), so that means each update means 3 x 35 internal communications. Basically, a 100X update amplification for our cluster. We’ll be reducing the cluster to four hosts as soon as we get out of the current blackout on prod changes. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 20, 2018, at 10:05 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > Walter: > > Each update is roughly > > request goes to leader (may be forwarded) > > leader sends the update to _each_ replica. depending on how many docs > you're sending per update request this may be more than one request. > IIRC there was some JIRA a while ago where the forwarding wasn't all > that efficient, but that's going from (shaky) memoryh. > > each follower acks back to the leader > > leader acks back to the client. > > So perhaps you're seeing the individual forwards to followers? Your > logs should show update requests with FROMLEADER for these > sub-requests (updates and queries). Does that help? > > Erick > > > > On Mon, Aug 20, 2018 at 8:03 PM, Walter Underwood <wun...@wunderwood.org> > wrote: >> I’m comparing request counts from New Relic, which is reporting 16 krpm >> aggregate >> requests across the cluster, and the AWS load balancer is reporting 1 krpm. >> Or it might >> be 1k requests per 5 minutes because CloudWatch is like that. >> >> This is a 36 node cluster, not sharded. We are going to shrink it, but I’d >> like to understand it. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Aug 20, 2018, at 7:02 PM, Shalin Shekhar Mangar <shalinman...@gmail.com> >>> wrote: >>> >>> There are a single persistent HTTP connection open from the leader to each >>> replica in the shard. All updates coming to the leader are expanded (for >>> atomic updates) and streamed over that single connection. When using >>> in-place docvalues updates, there is a possibility of the replica making a >>> request to the leader if updates has been re-ordered and the replica does >>> not have enough context to process the update. >>> >>> Can you quantify the "tons of internal traffic"? Are you seeing higher >>> number of open connections as well? >>> >>> On Fri, Aug 17, 2018 at 11:17 PM Walter Underwood <wun...@wunderwood.org> >>> wrote: >>> >>>> How many messages are sent back and forth between a leader and replica >>>> with NRT? >>>> >>>> We have a collection that gets frequent updates and we are seeing a ton of >>>> internal >>>> cluster traffic. >>>> >>>> wunder >>>> Walter Underwood >>>> wun...@wunderwood.org >>>> http://observer.wunderwood.org/ (my blog) >>>> >>>> >>> >>> -- >>> Regards, >>> Shalin Shekhar Mangar. >>