For health checks, you can go ahead and get the real IP addresses and
ping them directly if you care to.... Or just let Zookeeper do that
for you. One of the tasks of Zookeeper is pinging all the machines
with all the replicas and, if any of them are unreachable, telling the
rest of the cluster that that machine is down.

Best,
Erick

On Sun, Jan 10, 2016 at 5:19 AM, Robert Brown <r...@intelcompute.com> wrote:
> Thanks Erick,
>
> For the health-checks on the load-balancer side, would you recommend a
> simple query, or is there a reliable ping or similar for this scenario?
>
> Cheers,
> Rob
>
>
> On 09/01/16 23:44, Erick Erickson wrote:
>>
>> bq: is it best/good to get the CLUSTERSTATUS via the collection API
>> and explicitly send queries to a replica to ensure I don't send
>> queries to the leaders of my collection
>>
>> In a word _no_. SolrCloud is vastly different than the old
>> master/slave. In SolrCloud, each and every node (leader and replicas)
>> index all the docs and serve queries. The additional burden the leader
>> has is actually very small. There's absolutely no reason to _not_ use
>> the leader to serve queries.
>>
>> As far as sending updates, there would be a _little_ benefit to
>> sending the updates directly to the leader, but _far_ more benefit in
>> using SolrJ. If you use SolrJ (and CloudSolrClient), then the
>> documents are split up on the _client_ and only the docs for a
>> particular shard are automatically sent to the leader for that shard.
>> Using SolrJ you can essentially scale indexing linearly with the
>> number of shards you have. Just using HTTP does not scale linearly.
>> Your particular app may not care, but in high-throughput situations
>> this can be significant.
>>
>> So rather than spend time and effort sending updates directly to a
>> leader and have the leader then forward the docs to the correct shard,
>> I recommend investing the time in using SolrJ for updates rather than
>> sending updates to the leader over HTTP. Or just ignore the problem
>> and devote your efforts to something that are more valuable.
>>
>> So in short:
>> 1> just stick a load balancer in front of _all_ your Solr nodes for
>> queries. And note that there's an internal load balancer already in
>> Solr that routes things around anyway, although putting a load
>> balancer in front of your entire cluster makes it so there's not a
>> single point of failure.
>> 2> Depending on your throughput needs, either
>> 2a> use SolrJ to index
>> 2b> don't worry about it and send updates through the load balancer as
>> well. There'll be an extra hop if you send updates to a replica, but
>> if that's significant you should be using SolrJ
>>
>> As for 5.5, it's not at all clear that there _will_ be a 5.5. 5.4 was
>> just released in early December. There's usually a several month lag
>> between point releases and there's some agitation to start the 6.0
>> release process, so it's up in the air.
>>
>>
>> On Sat, Jan 9, 2016 at 12:04 PM, Robert Brown <r...@intelcompute.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> (btw, when is 5.5 due?  I see the docs reference it, but not the download
>>> page)
>>>
>>> Anyway, I index and query Solr over HTTP (no SolrJ, etc.) - is it
>>> best/good
>>> to get the CLUSTERSTATUS via the collection API and explicitly send
>>> queries
>>> to a replica to ensure I don't send queries to the leaders of my
>>> collection,
>>> to improve performance?  Like-wise with sending updates directly to a
>>> Leader?
>>>
>>> My leaders will receive full updates of the entire collection once a day,
>>> so
>>> I would assume if the leader is handling queries too, performance would
>>> be
>>> hit?
>>>
>>> Is the CLUSTERSTATUS API the only way to do this btw without SolrJ, etc.?
>>> I
>>> wasn't sure if ZooKeeper would be able to tell me also.
>>>
>>> Do I also need to do anything to ensure the leaders are never sent
>>> queries
>>> from the replica's?
>>>
>>> Does this all sound sane?
>>>
>>> One of my collections is 3 shards, with 2 replica's each (9 total nodes),
>>> 70m docs in total.
>>>
>>> Thanks,
>>> Rob
>>>
>

Reply via email to