I have noticed strange behavior using cursorMark for deep paging in an
application. We use solrcloud for searching. We have several clouds for
development. For our development systems we have two different clouds. One
cloud has 2 shards with 1 replica per shard. All or our other clouds are
set up with 2 shards and 2 replicas per shard.

The application sorts the data by score descending, and the schema's unique
id ascending. According to the documentation, cursor mark requires that the
tie breaker be the schema's unique id.

When I run against the first cloud, I always get consistent results for the
same query. That is not the case with the second cloud. Some queries return
different numbers of results each time it's called. In the code I return
the number found from solr, and I count the number of results for all
iterations against the cursor mark. Sometimes it returns more rows than the
numFound and sometimes less.

I figured that the problem was in my code or in the data to make it easier
to find the problem I changed the sort to just be the unique id from the
schema. The problem went away.

1. The Number Found from solr was always the same
2. It worked when there was only 1 replica per shard
3. From debug statements it appears to return different total counts from
different replicas. When there were 2 replicas per shard I saw 4 different
values being returned.
4. Not sorting on score, and only on the unique id provides consistent
results.

So it appears that we should not include score in the sort when using
cursor mark and solrcloud.

We use solrj and CloudSolrClient. We are currently using the Solr 6.2 solrj
client with Solr 7.2 in our dev environment. We are in the process of
moving completely to 7.2.

Is this a known issue with cursormark and solrcloud?
For debugging purposes can I determine which solr node that cloudSolrClient
is using for a particular query?

I have not yet created a standalone test case for the issue, I'm still not
100% convinced that it is solrcloud, but it certainly looks like it is.

Thanks,
Webster

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Reply via email to