Your summary pretty much nails it. For (b) note that CloudSolrClient uses an internal software load balancer to distribute queries, FWIW.
On Mon, Apr 18, 2016 at 7:52 AM, John Bickerstaff <j...@johnbickerstaff.com> wrote: > Thanks all - very helpful. > > @Shawn - your reply implies that even if I'm hitting the URL for a single > endpoint via HTTP - the "balancing" will still occur across the Solr Cloud > (I understand the caveat about that single endpoint being a potential point > of failure). I just want to verify that I'm interpreting your response > correctly... > > (I have been asked to provide IT with a comprehensive list of options prior > to a design discussion - which is why I'm trying to get clear about the > various options) > > In a nutshell, I think I understand the following: > > a. Even if hitting a single URL, the Solr Cloud will "balance" across all > available nodes for searching > Caveat: That single URL represents a potential single point of > failure and this should be taken into account > > b. SolrJ's CloudSolrClient API provides the ability to distribute load -- > based on Zookeeper's "knowledge" of all available Solr instances. > Note: This is more robust than "a" due to the fact that it > eliminates the "single point of failure" > > c. Use of a load balancer hitting all known Solr instances will be fine - > although the search requests may not run on the Solr instance the load > balancer targeted - due to "a" above. > > Corrections or refinements welcomed... > > On Mon, Apr 18, 2016 at 7:21 AM, Shawn Heisey <apa...@elyograg.org> wrote: > >> On 4/17/2016 10:35 PM, John Bickerstaff wrote: >> > My prior use of SOLR in production was pre SOLR cloud. We put a >> > round-robin load balancer in front of replicas for searching. >> > >> > Do I understand correctly that a load balancer is unnecessary with SOLR >> > Cloud? I. E. -- SOLR and Zookeeper will balance the load, regardless of >> > which replica's URL is getting hit? >> >> Your understanding is correct -- queries sent to a single SolrCloud node >> will be balanced across the cloud, although the node you are sending the >> queries to might represent a single point of failure. >> >> If your program is written in Java, you can use CloudSolrClient in SolrJ >> -- this client talks to the zookeeper ensemble and dynamically adjusts >> to the addition and removal of Solr nodes in the cloud. All >> notifications from the cloud to the client about servers going up or >> down are nearly instantaneous -- the client does not need to poll for >> status. >> >> For other programming languages, if your client code is not capable of >> failing over to a second node when the primary goes down, then you would >> still need a load balancer. >> >> Thanks, >> Shawn >> >>