mmm i think there is a misconception here : On 10 January 2016 at 19:00, Robert Brown <r...@intelcompute.com> wrote:
> I'm thinking more about how the external load-balancer will know if a node > is down, as to take it out the pool of active servers to even attempt > sending a query to. > This is SolrCloud responsibility and in particular Zookeeper knows the topology of the cluster. A query will not reach a dead node. You should use a SolrCloud aware client ( like the SolrJ one) . If you want to use a different load-balancer because you don't like the SolrCloud one, it will not be that easy, because the distribution of the queries happens automatically. Cheers > > I could ping tho that just means the IP is alive. I could configure the > load-balancer to actually try a query, but this may be (even a tiny) > performance hit. > > Is there another recommended way of configuring external load-balancers to > know when a node is not accepting queries? > > > > > On 10/01/16 18:25, Erick Erickson wrote: > >> For health checks, you can go ahead and get the real IP addresses and >> ping them directly if you care to.... Or just let Zookeeper do that >> for you. One of the tasks of Zookeeper is pinging all the machines >> with all the replicas and, if any of them are unreachable, telling the >> rest of the cluster that that machine is down. >> >> Best, >> Erick >> >> On Sun, Jan 10, 2016 at 5:19 AM, Robert Brown <r...@intelcompute.com> >> wrote: >> >>> Thanks Erick, >>> >>> For the health-checks on the load-balancer side, would you recommend a >>> simple query, or is there a reliable ping or similar for this scenario? >>> >>> Cheers, >>> Rob >>> >>> >>> On 09/01/16 23:44, Erick Erickson wrote: >>> >>>> bq: is it best/good to get the CLUSTERSTATUS via the collection API >>>> and explicitly send queries to a replica to ensure I don't send >>>> queries to the leaders of my collection >>>> >>>> In a word _no_. SolrCloud is vastly different than the old >>>> master/slave. In SolrCloud, each and every node (leader and replicas) >>>> index all the docs and serve queries. The additional burden the leader >>>> has is actually very small. There's absolutely no reason to _not_ use >>>> the leader to serve queries. >>>> >>>> As far as sending updates, there would be a _little_ benefit to >>>> sending the updates directly to the leader, but _far_ more benefit in >>>> using SolrJ. If you use SolrJ (and CloudSolrClient), then the >>>> documents are split up on the _client_ and only the docs for a >>>> particular shard are automatically sent to the leader for that shard. >>>> Using SolrJ you can essentially scale indexing linearly with the >>>> number of shards you have. Just using HTTP does not scale linearly. >>>> Your particular app may not care, but in high-throughput situations >>>> this can be significant. >>>> >>>> So rather than spend time and effort sending updates directly to a >>>> leader and have the leader then forward the docs to the correct shard, >>>> I recommend investing the time in using SolrJ for updates rather than >>>> sending updates to the leader over HTTP. Or just ignore the problem >>>> and devote your efforts to something that are more valuable. >>>> >>>> So in short: >>>> 1> just stick a load balancer in front of _all_ your Solr nodes for >>>> queries. And note that there's an internal load balancer already in >>>> Solr that routes things around anyway, although putting a load >>>> balancer in front of your entire cluster makes it so there's not a >>>> single point of failure. >>>> 2> Depending on your throughput needs, either >>>> 2a> use SolrJ to index >>>> 2b> don't worry about it and send updates through the load balancer as >>>> well. There'll be an extra hop if you send updates to a replica, but >>>> if that's significant you should be using SolrJ >>>> >>>> As for 5.5, it's not at all clear that there _will_ be a 5.5. 5.4 was >>>> just released in early December. There's usually a several month lag >>>> between point releases and there's some agitation to start the 6.0 >>>> release process, so it's up in the air. >>>> >>>> >>>> On Sat, Jan 9, 2016 at 12:04 PM, Robert Brown <r...@intelcompute.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> (btw, when is 5.5 due? I see the docs reference it, but not the >>>>> download >>>>> page) >>>>> >>>>> Anyway, I index and query Solr over HTTP (no SolrJ, etc.) - is it >>>>> best/good >>>>> to get the CLUSTERSTATUS via the collection API and explicitly send >>>>> queries >>>>> to a replica to ensure I don't send queries to the leaders of my >>>>> collection, >>>>> to improve performance? Like-wise with sending updates directly to a >>>>> Leader? >>>>> >>>>> My leaders will receive full updates of the entire collection once a >>>>> day, >>>>> so >>>>> I would assume if the leader is handling queries too, performance would >>>>> be >>>>> hit? >>>>> >>>>> Is the CLUSTERSTATUS API the only way to do this btw without SolrJ, >>>>> etc.? >>>>> I >>>>> wasn't sure if ZooKeeper would be able to tell me also. >>>>> >>>>> Do I also need to do anything to ensure the leaders are never sent >>>>> queries >>>>> from the replica's? >>>>> >>>>> Does this all sound sane? >>>>> >>>>> One of my collections is 3 shards, with 2 replica's each (9 total >>>>> nodes), >>>>> 70m docs in total. >>>>> >>>>> Thanks, >>>>> Rob >>>>> >>>>> > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England