OK, now we’re cooking with oil. First, nodes in recovery shouldn’t make any difference to a query. They should not serve any part of a query so I think/hope that’s a red herring. At worst a node in recovery should pass the query on to another replica that is _not_ recovering.
When you’re looking at this, be aware that as long as _Solr_ is up and running on a node, it’ll accept queries. For simplicity let's say Solr1 hosts _only_ collection1_shard1_replica1 (cs1r1). Now you fire a query at Solr1. It has the topology from ZooKeeper as well as its own internal knowledge of hosted replicas. For a top-level query it should send sub-queries out only to healthy replicas, bypassing its own recovering replica. Let’s claim you fire the query at Solr2. First if there’s been time to propagate the down state of cs1r1 to ZooKeeper and Solr2 has the state, it shouldn’t even send a subrequest to cs1r1. Now let’s say Solr2 hasn’t gotten the message yet and does send a query to cs1r1. cs1r1 should know its state is recovering and either return an error the Solr2 (which will pick a new replica to send that subrequest to) or forward it on to another healthy replica, I’m not quite sure which. In any case it should _not_ service the request from cs1r1. If you do prove that a node serving requests that is really in recovery, that’s a fairly serious bug and we need to know lots of details. Second, even if you did have the URL Solr sends the query to it wouldn’t help. Once a Solr node receives a query, it does its _own_ round robin for a subrequest to one replica of each shard, get’s the replies back then goes back out to the same replica for the final documents. So you still wouldn’t know what replica served the queries. The fact that you say things come back into sync after commit points to autocommit times. I’m assuming you have an autocommit setting that opens a new searcher (<openSearcher>true in the “autocommit” section or any positive time in the autoSoftCommit section of solrconfig.xml). These commit points will fire at different wall-clock time, resulting in replicas temporarily having different searchable documents. BTW, the same thing applies if you send “commitWithin” in a SolrJ cloudSolrClient.add command… Anyway, if you just fire a query at a specific replica and add &distrib=false, the replica will bring back only documents from that replica. We’re talking the replica, so part of the URL will be the complete replica name like "…./solr/collection1_shard1_replica_n1/query?q=*:*&distrib=false” A very quick test would be, when you have a replica in recovery, stop indexing and wait for your autocommit interval to expire (one that opens a new searcher) or issue a commit to the collection. My bet/hope is that your counts will be just fine. You can use the &distrib=false parameter to query each replica of the relevant shard directly… Best, Erick > On May 22, 2019, at 8:09 AM, Russell Taylor <russell.tay...@theice.com> wrote: > > Hi Erick, > Every time any of the replication nodes goes into recovery mode we start > seeing queries which don't match the correct count. I'm being told zookeeper > will give me the correct node (Not one in recovery), but I want to prove it > as the query issue only comes up when any of the nodes are in recovery mode. > The application loading the data shows the correct counts and after > committing we check the results and they look correct. > > If I can get the URL I can prove that the problem is due to doing the query > against a node in recovery mode. > > I hope that explains the problem, thanks for your time. > > Regards > > Russell Taylor > > > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: 22 May 2019 15:50 > To: solr-user@lucene.apache.org > Subject: Re: CloudSolrClient (any version). Find the node your query has > connected to. > > WARNING - External email from lucene.apache.org > > Why do you want to know? You’ve asked how do to X without telling us what > problem Y you’re trying to solve (the XY problem) and frequently that leads > to a lot of wasted time….. > > Under the covers CloudSolrClient uses a pretty simple round-robin load > balancer to pick a Solr node to send the query to so “it depends”….. > >> On May 22, 2019, at 5:51 AM, Jörn Franke <jornfra...@gmail.com> wrote: >> >> You have to provide the addresses of the zookeeper ensemble - it will figure >> it out on its own based on information in Zookeeper. >> >>> Am 22.05.2019 um 14:38 schrieb Russell Taylor <russell.tay...@theice.com>: >>> >>> Hi, >>> Using CloudSolrClient, how do I find the node (I have 3 nodes for this >>> collection on our 6 node cluster) the query has connected to. >>> I'm hoping to get the full URL if possible. >>> >>> >>> Regards >>> >>> Russell Taylor >>> >>> >>> >>> ________________________________ >>> >>> This message may contain confidential information and is intended for >>> specific recipients unless explicitly noted otherwise. If you have reason >>> to believe you are not an intended recipient of this message, please delete >>> it and notify the sender. This message may not represent the opinion of >>> Intercontinental Exchange, Inc. (ICE), its subsidiaries or affiliates, and >>> does not constitute a contract or guarantee. Unencrypted electronic mail is >>> not secure and the recipient of this message is expected to provide >>> safeguards from viruses and pursue alternate means of communication where >>> privacy or a binding message is desired. > > > ________________________________ > > This message may contain confidential information and is intended for > specific recipients unless explicitly noted otherwise. If you have reason to > believe you are not an intended recipient of this message, please delete it > and notify the sender. This message may not represent the opinion of > Intercontinental Exchange, Inc. (ICE), its subsidiaries or affiliates, and > does not constitute a contract or guarantee. Unencrypted electronic mail is > not secure and the recipient of this message is expected to provide > safeguards from viruses and pursue alternate means of communication where > privacy or a binding message is desired.