We use an AWS ALB for all of our Solr clusters. One is 40 instances. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
> On Jun 29, 2018, at 8:33 PM, Sushant Vengurlekar <svengurle...@curvolabs.com> > wrote: > > What are some of the suggested loadbalancers for solrcloud? Can AWS ELB be > used for load balancing? > > On Fri, Jun 29, 2018 at 8:04 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> In your setup, the load balancer prevents single points of failure. >> >> Since you're pinging a URL, what happens if that node dies or is turned >> off? >> Your PHP program has no way of knowing what to do, but the load >> balancer does. >> >> Your understanding of Zookeeper's role shows a common misconception. >> >> Zookeeper keeps track of the topology of the collections, what nodes are >> up, >> what ones down etc. It does _not_ have anything to do with distributing >> queries >> or updates. Imagine a 1,000 node collection. If each and every request had >> to go through Zookeeper, that would be a bottleneck. >> >> Instead, when each node's state changes, it informs Zookeeper which in turn >> informs all the other Solr nodes who care. It looks like this. >> - node starts up. >> - as each replica comes up, it informs Zookeeper that it is now "active". >> - for each collection with any replica on that node, a "watch" is set on >> the >> collection's state.json node in Zookeeper >> - every time that state.json node changes, Zookeeper notifies >> the node. >> - eventually everything starts all the state changes are broadcast >> and Zookeeper just sits there. >> - periodically Zookeeper pings each Solr node and if it has gone away >> it informs all the Solr nodes that this node is dead >> and the Solr node updates it's snapshot of the cluster's >> topologyl >> >> A query comes in to a Solr node and this is what happens: >> - the Solr node looks in it's Zookeeper information to see >> where all the replicas for the collection are. >> - Solr picks one replica from each shard and sends the >> subquery to them >> - Solr assembles the response from the subrequests >> - Solr sends the response to the client. >> >> note that Zookeeper isn't involved at all. In fact, Zookeeper >> can go away completely and each Solr node will work on it's >> last snapshot of the topology of the network and answer >> _queries_. Updates will fail completely if Zookeeper falls >> below quorum, but Zookeeper isn't handling the _update_. >> It's still Solr knowing that Zookeeper is below quorum >> and refusing to process an update. >> >> There's more going on of course, but that's the general outline. >> >> Since you're using PHP, it doesn't know about Zookeeper, all it >> has is a URL so as I mentioned above, if that node goes away >> it's your php program that's not Zookeeper-aware. >> >> If you were using "CloudSolrClient" in SolrJ, it _is_ Zookeeper >> aware and you would not need a load balancer. But again >> that's because it knows the cluster topology (it registers its own >> watchers) and can "do the right thing" if something goes away. >> Zookeeper is still not directly involved in processing queries >> or updates. >> >> Best, >> Erick >> >> On Fri, Jun 29, 2018 at 7:31 PM, Sushant Vengurlekar >> <svengurle...@curvolabs.com> wrote: >>> Thanks for your reply. I have a follow up question. Why is a load >> balancer >>> needed? Isn't that the job of zookeeper to loadbalance queries across >> solr >>> nodes? >>> >>> I was under the impression that you send query to zookeeper and it >> handles >>> the rest and sends the response back. Can you please enlighten .me on >> that >>> one. >>> >>> Thank you >>> >>> On Fri, Jun 29, 2018 at 7:19 PM, Shalin Shekhar Mangar < >>> shalinman...@gmail.com> wrote: >>> >>>> You send your queries and updates directly to Solr's collection e.g. >>>> http://host:port/solr/<your_collection_name>. You can use any Solr node >>>> for >>>> this request. If the node does not have the collection being queried >> then >>>> the request will be forwarded internally to a Solr instance which has >> that >>>> collection. >>>> >>>> ZooKeeper is used by Solr's Java client to look up the list of Solr >> nodes >>>> having the collection being queried. But if you are using PHP then you >> can >>>> probably keep a list of Solr nodes in configuration and randomly choose >>>> one. A better implementation would be to setup a load balancer and put >> all >>>> Solr nodes behind it and query the load balancer URL in your >> application. >>>> >>>> On Sat, Jun 30, 2018 at 7:31 AM Sushant Vengurlekar < >>>> svengurle...@curvolabs.com> wrote: >>>> >>>>> I have a question regarding querying in solrcloud. >>>>> >>>>> I am working on php code to query solrcloud for search results. Do I >> send >>>>> the query to zookeeper or send it to a particular solr node? How does >> the >>>>> querying process work in general. >>>>> >>>>> Thank you >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Shalin Shekhar Mangar. >>>> >>