What are some of the suggested loadbalancers for solrcloud? Can AWS ELB be used for load balancing?
On Fri, Jun 29, 2018 at 8:04 PM, Erick Erickson <erickerick...@gmail.com> wrote: > In your setup, the load balancer prevents single points of failure. > > Since you're pinging a URL, what happens if that node dies or is turned > off? > Your PHP program has no way of knowing what to do, but the load > balancer does. > > Your understanding of Zookeeper's role shows a common misconception. > > Zookeeper keeps track of the topology of the collections, what nodes are > up, > what ones down etc. It does _not_ have anything to do with distributing > queries > or updates. Imagine a 1,000 node collection. If each and every request had > to go through Zookeeper, that would be a bottleneck. > > Instead, when each node's state changes, it informs Zookeeper which in turn > informs all the other Solr nodes who care. It looks like this. > - node starts up. > - as each replica comes up, it informs Zookeeper that it is now "active". > - for each collection with any replica on that node, a "watch" is set on > the > collection's state.json node in Zookeeper > - every time that state.json node changes, Zookeeper notifies > the node. > - eventually everything starts all the state changes are broadcast > and Zookeeper just sits there. > - periodically Zookeeper pings each Solr node and if it has gone away > it informs all the Solr nodes that this node is dead > and the Solr node updates it's snapshot of the cluster's > topologyl > > A query comes in to a Solr node and this is what happens: > - the Solr node looks in it's Zookeeper information to see > where all the replicas for the collection are. > - Solr picks one replica from each shard and sends the > subquery to them > - Solr assembles the response from the subrequests > - Solr sends the response to the client. > > note that Zookeeper isn't involved at all. In fact, Zookeeper > can go away completely and each Solr node will work on it's > last snapshot of the topology of the network and answer > _queries_. Updates will fail completely if Zookeeper falls > below quorum, but Zookeeper isn't handling the _update_. > It's still Solr knowing that Zookeeper is below quorum > and refusing to process an update. > > There's more going on of course, but that's the general outline. > > Since you're using PHP, it doesn't know about Zookeeper, all it > has is a URL so as I mentioned above, if that node goes away > it's your php program that's not Zookeeper-aware. > > If you were using "CloudSolrClient" in SolrJ, it _is_ Zookeeper > aware and you would not need a load balancer. But again > that's because it knows the cluster topology (it registers its own > watchers) and can "do the right thing" if something goes away. > Zookeeper is still not directly involved in processing queries > or updates. > > Best, > Erick > > On Fri, Jun 29, 2018 at 7:31 PM, Sushant Vengurlekar > <svengurle...@curvolabs.com> wrote: > > Thanks for your reply. I have a follow up question. Why is a load > balancer > > needed? Isn't that the job of zookeeper to loadbalance queries across > solr > > nodes? > > > > I was under the impression that you send query to zookeeper and it > handles > > the rest and sends the response back. Can you please enlighten .me on > that > > one. > > > > Thank you > > > > On Fri, Jun 29, 2018 at 7:19 PM, Shalin Shekhar Mangar < > > shalinman...@gmail.com> wrote: > > > >> You send your queries and updates directly to Solr's collection e.g. > >> http://host:port/solr/<your_collection_name>. You can use any Solr node > >> for > >> this request. If the node does not have the collection being queried > then > >> the request will be forwarded internally to a Solr instance which has > that > >> collection. > >> > >> ZooKeeper is used by Solr's Java client to look up the list of Solr > nodes > >> having the collection being queried. But if you are using PHP then you > can > >> probably keep a list of Solr nodes in configuration and randomly choose > >> one. A better implementation would be to setup a load balancer and put > all > >> Solr nodes behind it and query the load balancer URL in your > application. > >> > >> On Sat, Jun 30, 2018 at 7:31 AM Sushant Vengurlekar < > >> svengurle...@curvolabs.com> wrote: > >> > >> > I have a question regarding querying in solrcloud. > >> > > >> > I am working on php code to query solrcloud for search results. Do I > send > >> > the query to zookeeper or send it to a particular solr node? How does > the > >> > querying process work in general. > >> > > >> > Thank you > >> > > >> > >> > >> -- > >> Regards, > >> Shalin Shekhar Mangar. > >> >