Close. Zookeeper is not involved in routing requests. Each Solr node
queries Zookeeper to get the topology of the cluster, and thereafter
Zookeeper will notify each node when the topology changes, i.e.
a node goes up or down, a replica goes into recovery etc. Zookeeper
does _not_ get involved in each request since each Solr node
has all the information it needs to satisfy the request cached.
This is a common misunderstanding.

So, nodeA gets the topology of the cluster, including the IP addresses
of each and every node in the cluster. Now you send a query directly to
nodeA. There is an internal load balancer that routes that request to one
of the nodes in the cluster, perhaps itself, perhaps nodeB, etc. That way,
nodeA doesn’t do all the aggregating. 

Aggregating? Well, a top-level request comes in to nodeB. Let’s say
rows=10. NodeB must send a sub-request to one replica of every shard
and get the top 10 from each one. It then sorts the lists by whatever
the sort criteria are and sends another request to each of the replicas
queried in the first step to get the actual top 10 docs. Why the 2nd round
trip? Well, imagine there are 100 shards (and I’ve seen more). If the 
sub-requests each returned the top 10 documents, there would be
1,000 documents fetched, 990 of which would be thrown away.

Your setup has a single point of failure the way you have it set up now.
Ideally, you have nodeA with one replica of each shard and nodeB also
has one replica for each shard. So either one can go down and your system
can still serve requests. However, since your app is sending the 
requests all to the same node, you don’t have that robustness; if that
node goes down so does your entire application.

You should be doing one of two things:
1> use a load balancer between your app and your Solr nodes
or
2> have your app use SolrJ and CloudSolrClient. That class is “just
another Solr node” as far as Zookeeper is concerned. It goes through
the exact same process as a Solr node. When it starts, it gets a snapshot
of the topology of the cluster and “does the right thing” with requests, 
including dealing with any changes to the topology, i.e. nodes
stopping/starting, replicas going into recovery, new collections being
added, etc.

HTH,
Erick

> On Jun 4, 2020, at 7:11 PM, Odysci <ody...@gmail.com> wrote:
> 
> Erick,
> thanks for the reply.
> Your last line puzzled me a bit. You wrote
> *"The theory is that all the top-level requests shouldn’t be handled by the
> same Solr instance if a client is directly using the http address of a
> single node in the cluster for all requests."*
> 
> We are using 2 machines (2 different IPs), 2 shards with 2 replicas each.
> We have an application which sends all solr requests to the same http
> address of our machine A. I assumed that Zookeeper would distribute the
> requests among the nodes.
> Is this not the right thing to do? Should I have the application alternate
> the solr machine to send requests to?
> Thanks
> 
> Reinaldo
> 
> 
> On Wed, May 27, 2020 at 12:37 PM Erick Erickson <erickerick...@gmail.com>
> wrote:
> 
>> The base algorithm for searches picks out one replica from each
>> shard in a round-robin fashion, without regard to whether it’s on
>> the same machine or not.
>> 
>> You can alter this behavior, see:
>> https://lucene.apache.org/solr/guide/8_1/distributed-requests.html
>> 
>> When you say “the exact same search”, it isn’t quite in the sense that
>> it’s going to a different shard as evidenced by &DISTRIB=false being
>> on the URL (I’d guess you already know that, but…). The top-level
>> request _may_ be forwarded as is, there’s an internal load balancer
>> that does this. The theory is that all the top-level requests shouldn’t
>> be handled by the same Solr instance if a client is directly using
>> the http address of a single node in the cluster for all requests.
>> 
>> Best,
>> Erick
>> 
>> 
>> 
>>> On May 27, 2020, at 11:12 AM, Odysci <ody...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I have a question regarding solrcloud searches on both replicas of an
>> index.
>>> I have a solrcloud setup with 2 physical machines (let's call them A and
>>> B), and my index is divided into 2 shards, and 2 replicas, such that each
>>> machine has a full copy of the index. My Zookeeper setup uses 3
>> instances.
>>> The nodes and replicas are as follows:
>>> Machine A:
>>>     core_node3 / shard1_replica_n1
>>>     core_node7 / shard2_replica_n4
>>> Machine B:
>>>     core_node5 / shard1_replica_n2
>>>     core_node8 / shard2_replica_n6
>>> 
>>> I'm using solrJ and I create the solr client using
>> Http2SolrClient.Builder
>>> and the IP of machineA.
>>> 
>>> Here is my question:
>>> when I do a search (using solrJ) and I look at the search logs on both
>>> machines, I see that the same search is being executed on both machines.
>>> But if the full index is present on both machines, wouldn't it be enough
>>> just to search on one of machines?
>>> In fact, if I turn off machine B, the search returns the correct results
>>> anyway.
>>> 
>>> Thanks a lot.
>>> 
>>> Reinaldo
>> 
>> 

Reply via email to