The updates are fairly frequent (a few per minute) and have a tight freshness 
requirement.
We really don’t want to show tutors who are not available. Luckily, it is a 
smallish
collection, a few hundred thousand.

The traffic isn’t a problem and the cluster is working very well. This is about
understanding our metrics.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 21, 2018, at 8:46 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> A couple of notes:
> 
> TLOG replicas will have the same issue. When I said that leaders
> forwarded to followers, what that's really about is that the follower
> guarantees that the docs have been written to the TLOG. So if you
> change your model to use TLOG replicas, don't expect a change.
> 
> PULL replicas, OTOH, only pull down changed segments, but you're
> replace the raw document forwarding with segment copying so it's not
> clear to me how that would change the number of messages flying
> around.
> 
> bq. All our updates are single documents. We need to track the
> availability of online tutors, so we don’t batch them.
> 
> I'm inferring that this means the updates aren't all that frequent and
> if you waited for, say, 100 changes you might wait a long time. FYI,
> here is the result of some experimentation I did for the difference
> between various batch sizes:
> https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/, may
> not apply if your tutors come and go slowly of course.
> 
> Best,
> Erick
> 
> On Tue, Aug 21, 2018 at 8:02 AM, Walter Underwood <wun...@wunderwood.org> 
> wrote:
>> Thanks, that is exactly what I was curious about.
>> 
>> All our updates are single documents. We need to track the availability of 
>> online
>> tutors, so we don’t batch them.
>> 
>> Right now, we have a replication factor of 36 (way too many), so that means 
>> each
>> update means 3 x 35 internal communications. Basically, a 100X update 
>> amplification
>> for our cluster.
>> 
>> We’ll be reducing the cluster to four hosts as soon as we get out of the 
>> current
>> blackout on prod changes.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Aug 20, 2018, at 10:05 PM, Erick Erickson <erickerick...@gmail.com> 
>>> wrote:
>>> 
>>> Walter:
>>> 
>>> Each update is roughly
>>> 
>>> request goes to leader (may be forwarded)
>>> 
>>> leader sends the update to _each_ replica. depending on how many docs
>>> you're sending per update request this may be more than one request.
>>> IIRC there was some JIRA a while ago where the forwarding wasn't all
>>> that efficient, but that's going from (shaky) memoryh.
>>> 
>>> each follower acks back to the leader
>>> 
>>> leader acks back to the client.
>>> 
>>> So perhaps you're seeing the individual forwards to followers? Your
>>> logs should show update requests with FROMLEADER for these
>>> sub-requests (updates and queries). Does that help?
>>> 
>>> Erick
>>> 
>>> 
>>> 
>>> On Mon, Aug 20, 2018 at 8:03 PM, Walter Underwood <wun...@wunderwood.org> 
>>> wrote:
>>>> I’m comparing request counts from New Relic, which is reporting 16 krpm 
>>>> aggregate
>>>> requests across the cluster, and the AWS load balancer is reporting 1 
>>>> krpm. Or it might
>>>> be 1k requests per 5 minutes because CloudWatch is like that.
>>>> 
>>>> This is a 36 node cluster, not sharded. We are going to shrink it, but I’d 
>>>> like to understand it.
>>>> 
>>>> wunder
>>>> Walter Underwood
>>>> wun...@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>> 
>>>>> On Aug 20, 2018, at 7:02 PM, Shalin Shekhar Mangar 
>>>>> <shalinman...@gmail.com> wrote:
>>>>> 
>>>>> There are a single persistent HTTP connection open from the leader to each
>>>>> replica in the shard. All updates coming to the leader are expanded (for
>>>>> atomic updates) and streamed over that single connection. When using
>>>>> in-place docvalues updates, there is a possibility of the replica making a
>>>>> request to the leader if updates has been re-ordered and the replica does
>>>>> not have enough context to process the update.
>>>>> 
>>>>> Can you quantify the "tons of internal traffic"? Are you seeing higher
>>>>> number of open connections as well?
>>>>> 
>>>>> On Fri, Aug 17, 2018 at 11:17 PM Walter Underwood <wun...@wunderwood.org>
>>>>> wrote:
>>>>> 
>>>>>> How many messages are sent back and forth between a leader and replica
>>>>>> with NRT?
>>>>>> 
>>>>>> We have a collection that gets frequent updates and we are seeing a ton 
>>>>>> of
>>>>>> internal
>>>>>> cluster traffic.
>>>>>> 
>>>>>> wunder
>>>>>> Walter Underwood
>>>>>> wun...@wunderwood.org
>>>>>> http://observer.wunderwood.org/  (my blog)
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Regards,
>>>>> Shalin Shekhar Mangar.
>>>> 
>> 

Reply via email to