Why would you want to write a load balancer when there are so many that are 
free and very fast?

For update traffic, there is very little benefit in sending updates directly to 
the shard leader. Forwarding an update to the leader is fast. Indexing is slow. 
So the bottleneck is always at the leader.

Before you build anything, measure. Collect a large update and send that 
directly to the leader. Then do the same to a non-leader shard. Compare the 
speed. If you are batching and indexing with multiple threads, I doubt you’ll 
see a meaningful difference. I commonly see 10% difference in identical load 
benchmarks, so the speedup has to be much larger than that to be real.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 11, 2019, at 8:38 AM, Boban Acimovic <b...@it-agenten.com> wrote:
> 
> I would actually like to write a load balancer itself, but I want it to be 
> able to send the data as efficiently as possible. I know how to read ZK data, 
> but I don’t know how can I figure out which shard is responsible upon data 
> that I have in a document that I want to index.
> 
> 
> 
> 
>> On 11. Feb 2019, at 17:23, Walter Underwood <wun...@wunderwood.org> wrote:
>> 
>> We send all updates to the load balancer, so they’ll end up on the wrong 
>> shard, not on the leader, etc. Indexing speed is still limited by the CPU 
>> available on each leader. I don’t think that sending the update to the right 
>> leader makes any improvement in throughput.
>> 
>> On the other hand, the CloudSolrClient ignores errors from Solr, which makes 
>> it unacceptable for production use.
>> 
>> I would stay with your current indexing client and worry about something 
>> else.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)

Reply via email to