Re: Querying only replica's

Alessandro Benedetti Mon, 11 Jan 2016 09:16:35 -0800

mmm i think there is a misconception here :

On 10 January 2016 at 19:00, Robert Brown <r...@intelcompute.com> wrote:


> I'm thinking more about how the external load-balancer will know if a node
> is down, as to take it out the pool of active servers to even attempt
> sending a query to.
>
This is SolrCloud responsibility and in particular Zookeeper knows the
topology of the cluster.
A query will not reach a dead node.
You should use a SolrCloud aware client ( like the SolrJ one) .

If you want to use a different load-balancer because you don't like the
SolrCloud one, it will not be that easy, because the distribution of the
queries happens automatically.

Cheers

>
> I could ping tho that just means the IP is alive.  I could configure the
> load-balancer to actually try a query, but this may be (even a tiny)
> performance hit.
>
> Is there another recommended way of configuring external load-balancers to
> know when a node is not accepting queries?
>
>
>
>
> On 10/01/16 18:25, Erick Erickson wrote:
>
>> For health checks, you can go ahead and get the real IP addresses and
>> ping them directly if you care to.... Or just let Zookeeper do that
>> for you. One of the tasks of Zookeeper is pinging all the machines
>> with all the replicas and, if any of them are unreachable, telling the
>> rest of the cluster that that machine is down.
>>
>> Best,
>> Erick
>>
>> On Sun, Jan 10, 2016 at 5:19 AM, Robert Brown <r...@intelcompute.com>
>> wrote:
>>
>>> Thanks Erick,
>>>
>>> For the health-checks on the load-balancer side, would you recommend a
>>> simple query, or is there a reliable ping or similar for this scenario?
>>>
>>> Cheers,
>>> Rob
>>>
>>>
>>> On 09/01/16 23:44, Erick Erickson wrote:
>>>
>>>> bq: is it best/good to get the CLUSTERSTATUS via the collection API
>>>> and explicitly send queries to a replica to ensure I don't send
>>>> queries to the leaders of my collection
>>>>
>>>> In a word _no_. SolrCloud is vastly different than the old
>>>> master/slave. In SolrCloud, each and every node (leader and replicas)
>>>> index all the docs and serve queries. The additional burden the leader
>>>> has is actually very small. There's absolutely no reason to _not_ use
>>>> the leader to serve queries.
>>>>
>>>> As far as sending updates, there would be a _little_ benefit to
>>>> sending the updates directly to the leader, but _far_ more benefit in
>>>> using SolrJ. If you use SolrJ (and CloudSolrClient), then the
>>>> documents are split up on the _client_ and only the docs for a
>>>> particular shard are automatically sent to the leader for that shard.
>>>> Using SolrJ you can essentially scale indexing linearly with the
>>>> number of shards you have. Just using HTTP does not scale linearly.
>>>> Your particular app may not care, but in high-throughput situations
>>>> this can be significant.
>>>>
>>>> So rather than spend time and effort sending updates directly to a
>>>> leader and have the leader then forward the docs to the correct shard,
>>>> I recommend investing the time in using SolrJ for updates rather than
>>>> sending updates to the leader over HTTP. Or just ignore the problem
>>>> and devote your efforts to something that are more valuable.
>>>>
>>>> So in short:
>>>> 1> just stick a load balancer in front of _all_ your Solr nodes for
>>>> queries. And note that there's an internal load balancer already in
>>>> Solr that routes things around anyway, although putting a load
>>>> balancer in front of your entire cluster makes it so there's not a
>>>> single point of failure.
>>>> 2> Depending on your throughput needs, either
>>>> 2a> use SolrJ to index
>>>> 2b> don't worry about it and send updates through the load balancer as
>>>> well. There'll be an extra hop if you send updates to a replica, but
>>>> if that's significant you should be using SolrJ
>>>>
>>>> As for 5.5, it's not at all clear that there _will_ be a 5.5. 5.4 was
>>>> just released in early December. There's usually a several month lag
>>>> between point releases and there's some agitation to start the 6.0
>>>> release process, so it's up in the air.
>>>>
>>>>
>>>> On Sat, Jan 9, 2016 at 12:04 PM, Robert Brown <r...@intelcompute.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> (btw, when is 5.5 due?  I see the docs reference it, but not the
>>>>> download
>>>>> page)
>>>>>
>>>>> Anyway, I index and query Solr over HTTP (no SolrJ, etc.) - is it
>>>>> best/good
>>>>> to get the CLUSTERSTATUS via the collection API and explicitly send
>>>>> queries
>>>>> to a replica to ensure I don't send queries to the leaders of my
>>>>> collection,
>>>>> to improve performance?  Like-wise with sending updates directly to a
>>>>> Leader?
>>>>>
>>>>> My leaders will receive full updates of the entire collection once a
>>>>> day,
>>>>> so
>>>>> I would assume if the leader is handling queries too, performance would
>>>>> be
>>>>> hit?
>>>>>
>>>>> Is the CLUSTERSTATUS API the only way to do this btw without SolrJ,
>>>>> etc.?
>>>>> I
>>>>> wasn't sure if ZooKeeper would be able to tell me also.
>>>>>
>>>>> Do I also need to do anything to ensure the leaders are never sent
>>>>> queries
>>>>> from the replica's?
>>>>>
>>>>> Does this all sound sane?
>>>>>
>>>>> One of my collections is 3 shards, with 2 replica's each (9 total
>>>>> nodes),
>>>>> 70m docs in total.
>>>>>
>>>>> Thanks,
>>>>> Rob
>>>>>
>>>>>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Querying only replica's

Reply via email to