bq: I could just set up a load balancer on the two Solr instances and let client query requests use the load balancer to find a working instance.
That's all you need to do. The client shouldn't have to really even be aware that Zookeeper exists, there's really no need to query ZK and route your requests yourself. The _Solr_ instances query ZK and "know" about each other's state and are notivied of any problems, i.e. nodes going up/down etc. Once a request hits any running Solr node, it'll be routed around any problems. In the setup you describe, i.e. not using SolrJ, your client really shouldn't even need to be aware ZK exists. Your load balancer should know what nodes are up and route your requests around any hosed machines. If you _do_ decide to use SolrJ sometime, CloudSolrServer (renamed CloudSolrClient in 5x) _does_ take the ZK ensemble and do some smart routing on the client side, including simple load balancing, and responds to any solr nodes going up/down for you. Putting a load balancer in front or some other type of connection, though, will accomplish much the same thing if Java isn't an option. The SolrJ stuff is more sophisticated though. Best, Erick On Sun, Mar 1, 2015 at 3:51 AM, Julian Perry <ju...@limitless.co.uk> wrote: > Hi > > I'm really after best practice guidelines for making queries to > an index on a Solr cluster. I'm not calling from Java. > > I have Solr 4.10.2 up and running, seems stable. > > I have about 6 indexes/collections - am running SolrCloud with > two Solr instances (both currently running on the same dev. box - > just one shard each) and standalone Zookeeper with 3 instances. > All seems fine. I can do queries against either instance, and > perform index updates and replication works fine. > > I'm not using Java to talk to Solr - the web pages are built with > PHP (or something similar - happy to call zk/Solr from C). So I > need to call Solr from the web page code. Clearly I need > resilience and so don't want to specifically call one of the Solr > instances directly. > > I could just set up a load balancer on the two Solr instances and > let client query requests use the load balancer to find a working > instance. > > From what I have read though - I am supposed to make a call to > zookeeper to ask which Solr instances are running up to date and > working replicas of the collection that I need. Is that right? > I should do that every time I need to make a query? > > There seems to be a zookeeper client library in the zk dist - in > zookeeper-3.4.6/src/c/ - can I use that? It looks like I can > pass in a list of potential zk host:port pairs and it will find > a working zk for me - is that right? > > Then I need to ask the working zk which solr instance I should > connect to for the given index/collection - how do I do that - > is that held in clusterstate.json? > > So the steps to make a Solr query against my cluster would be: > > a) call zk client library with list of zk host/ports > > b) ask zk for clusterstate.json > > c) pick an active server (at random) for the relevant collection > (is there some load balancing option in there) > > d) call the Solr server returned by (c) > > Is that best practice - or am I missing something? > > -- > Cheers > Jules. >