Your summary pretty much nails it.

For (b) note that CloudSolrClient uses an internal software load
balancer to distribute queries, FWIW.



On Mon, Apr 18, 2016 at 7:52 AM, John Bickerstaff
<j...@johnbickerstaff.com> wrote:
> Thanks all - very helpful.
>
> @Shawn - your reply implies that even if I'm hitting the URL for a single
> endpoint via HTTP - the "balancing" will still occur across the Solr Cloud
> (I understand the caveat about that single endpoint being a potential point
> of failure).  I just want to verify that I'm interpreting your response
> correctly...
>
> (I have been asked to provide IT with a comprehensive list of options prior
> to a design discussion - which is why I'm trying to get clear about the
> various options)
>
> In a nutshell, I think I understand the following:
>
> a. Even if hitting a single URL, the Solr Cloud will "balance" across all
> available nodes for searching
>           Caveat: That single URL represents a potential single point of
> failure and this should be taken into account
>
> b. SolrJ's CloudSolrClient API provides the ability to distribute load --
> based on Zookeeper's "knowledge" of all available Solr instances.
>           Note: This is more robust than "a" due to the fact that it
> eliminates the "single point of failure"
>
> c.  Use of a load balancer hitting all known Solr instances will be fine -
> although the search requests may not run on the Solr instance the load
> balancer targeted - due to "a" above.
>
> Corrections or refinements welcomed...
>
> On Mon, Apr 18, 2016 at 7:21 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>
>> On 4/17/2016 10:35 PM, John Bickerstaff wrote:
>> > My prior use of SOLR in production was pre SOLR cloud.  We put a
>> > round-robin  load balancer in front of replicas for searching.
>> >
>> > Do I understand correctly that a load balancer is unnecessary with SOLR
>> > Cloud?  I. E. -- SOLR and Zookeeper will balance the load, regardless of
>> > which replica's URL is getting hit?
>>
>> Your understanding is correct -- queries sent to a single SolrCloud node
>> will be balanced across the cloud, although the node you are sending the
>> queries to might represent a single point of failure.
>>
>> If your program is written in Java, you can use CloudSolrClient in SolrJ
>> -- this client talks to the zookeeper ensemble and dynamically adjusts
>> to the addition and removal of Solr nodes in the cloud.  All
>> notifications from the cloud to the client about servers going up or
>> down are nearly instantaneous -- the client does not need to poll for
>> status.
>>
>> For other programming languages, if your client code is not capable of
>> failing over to a second node when the primary goes down, then you would
>> still need a load balancer.
>>
>> Thanks,
>> Shawn
>>
>>

Reply via email to