We use an AWS ALB for all of our Solr clusters. One is 40 instances.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 29, 2018, at 8:33 PM, Sushant Vengurlekar <svengurle...@curvolabs.com> 
> wrote:
> 
> What are some of the suggested loadbalancers for solrcloud? Can AWS ELB be
> used for load balancing?
> 
> On Fri, Jun 29, 2018 at 8:04 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> 
>> In your setup, the load balancer prevents single points of failure.
>> 
>> Since you're pinging a URL, what happens if that node dies or is turned
>> off?
>> Your PHP program has no way of knowing what to do, but the load
>> balancer does.
>> 
>> Your understanding of Zookeeper's role shows a common misconception.
>> 
>> Zookeeper keeps track of the topology of the collections, what nodes are
>> up,
>> what ones down etc. It does _not_ have anything to do with distributing
>> queries
>> or updates. Imagine a 1,000 node collection. If each and every request had
>> to go through Zookeeper, that would be a bottleneck.
>> 
>> Instead, when each node's state changes, it informs Zookeeper which in turn
>> informs all the other Solr nodes who care. It looks like this.
>> - node starts up.
>> - as each replica comes up, it informs Zookeeper that it is now "active".
>> - for each collection with any replica on that node, a "watch" is set on
>> the
>>   collection's state.json node in Zookeeper
>> - every time that state.json node changes, Zookeeper notifies
>>   the node.
>> - eventually everything starts all the state changes are broadcast
>>  and Zookeeper just sits there.
>> - periodically Zookeeper pings each Solr node and if it has gone away
>>  it informs all the Solr nodes that this node is dead
>>  and the Solr node updates it's snapshot of the cluster's
>>  topologyl
>> 
>> A query comes in to a Solr node and this is what happens:
>> - the Solr node looks in it's Zookeeper information to see
>>  where all the replicas for the collection are.
>> - Solr picks one replica from each shard and sends the
>>   subquery to them
>> - Solr assembles the response from the subrequests
>> - Solr sends the response to the client.
>> 
>> note that Zookeeper isn't involved at all. In fact, Zookeeper
>> can go away completely and each Solr node will work on it's
>> last snapshot of the topology of the network and answer
>> _queries_. Updates will fail completely if Zookeeper falls
>> below quorum, but Zookeeper isn't handling the _update_.
>> It's still Solr knowing that Zookeeper is below quorum
>> and refusing to process an update.
>> 
>> There's more going on of course, but that's the general outline.
>> 
>> Since you're using PHP, it doesn't know about Zookeeper, all it
>> has is a URL so as I mentioned above, if that node goes away
>> it's your php program that's not Zookeeper-aware.
>> 
>> If you were using "CloudSolrClient" in SolrJ, it _is_ Zookeeper
>> aware and you would not need a load balancer. But again
>> that's because it knows the cluster topology (it registers its own
>> watchers) and can "do the right thing" if something goes away.
>> Zookeeper is still not directly involved in processing queries
>> or updates.
>> 
>> Best,
>> Erick
>> 
>> On Fri, Jun 29, 2018 at 7:31 PM, Sushant Vengurlekar
>> <svengurle...@curvolabs.com> wrote:
>>> Thanks for your reply. I have a follow up question. Why is a load
>> balancer
>>> needed? Isn't that the job of zookeeper to loadbalance queries across
>> solr
>>> nodes?
>>> 
>>> I was under the impression that you send query to zookeeper and it
>> handles
>>> the rest and sends the response back. Can you please enlighten .me on
>> that
>>> one.
>>> 
>>> Thank you
>>> 
>>> On Fri, Jun 29, 2018 at 7:19 PM, Shalin Shekhar Mangar <
>>> shalinman...@gmail.com> wrote:
>>> 
>>>> You send your queries and updates directly to Solr's collection e.g.
>>>> http://host:port/solr/<your_collection_name>. You can use any Solr node
>>>> for
>>>> this request. If the node does not have the collection being queried
>> then
>>>> the request will be forwarded internally to a Solr instance which has
>> that
>>>> collection.
>>>> 
>>>> ZooKeeper is used by Solr's Java client to look up the list of Solr
>> nodes
>>>> having the collection being queried. But if you are using PHP then you
>> can
>>>> probably keep a list of Solr nodes in configuration and randomly choose
>>>> one. A better implementation would be to setup a load balancer and put
>> all
>>>> Solr nodes behind it and query the load balancer URL in your
>> application.
>>>> 
>>>> On Sat, Jun 30, 2018 at 7:31 AM Sushant Vengurlekar <
>>>> svengurle...@curvolabs.com> wrote:
>>>> 
>>>>> I have a question regarding querying in solrcloud.
>>>>> 
>>>>> I am working on php code to query solrcloud for search results. Do I
>> send
>>>>> the query to zookeeper or send it to a particular solr node? How does
>> the
>>>>> querying process work in general.
>>>>> 
>>>>> Thank you
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> Shalin Shekhar Mangar.
>>>> 
>> 

Reply via email to