On 1/15/2018 11:56 AM, Webster Homer wrote:
I have noticed strange behavior using cursorMark for deep paging in an
application. We use solrcloud for searching. We have several clouds for
development. For our development systems we have two different clouds. One
cloud has 2 shards with 1 replica per shard. All or our other clouds are
set up with 2 shards and 2 replicas per shard.

A cloud doesn't get set up with shards and replicas.  A collection does.  One SolrCloud cluster can contain many collections.

When you say "cloud" are you referring to a collection, or are you referring to a set of servers running ZooKeeper and Solr? The latter is what I would expect cloud to mean.

When I run against the first cloud, I always get consistent results for the
same query. That is not the case with the second cloud. Some queries return
different numbers of results each time it's called. In the code I return
the number found from solr, and I count the number of results for all
iterations against the cursor mark. Sometimes it returns more rows than the
numFound and sometimes less.

I figured that the problem was in my code or in the data to make it easier
to find the problem I changed the sort to just be the unique id from the
schema. The problem went away.

1. The Number Found from solr was always the same
2. It worked when there was only 1 replica per shard
3. From debug statements it appears to return different total counts from
different replicas. When there were 2 replicas per shard I saw 4 different
values being returned.
4. Not sorting on score, and only on the unique id provides consistent
results.

When you have multiple replicas, each replica may have different numbers of deleted documents.  Deleted documents will almost always affect scoring.  Because SolrCloud load balances across replicas, one page of your cursorMark query can be served by a different replica than the next one, so the order of results can differ.

When sorting by unique ID, deleted documents will not affect sort order.  When there is only one replica, then sorting by score will always produce the same order, unless the index gets modified.

Thanks,
Shawn

Reply via email to