On 5/9/2017 10:59 AM, moscovig wrote: > We are running on solrj 6.2.0, server 6.2.1 > and trying to fetch 100 records at a time, with nextCursorMark, > while* sorting on: score desc,* key asc > > The collection is made of 2 shards, with 3 replicas each. > > We get inconsistent results when not specifying specific replica for each > shard. > > Sometimes the 3rd, and sometime the 10th fetch will contain results that we > expected to see in the 15th batch. > Something went wrong with the score sorting. > > When we specify a replica for each shard to query from with > shards=solr1:8983/solr/tweets_shard1_replica2/,solr26:8983/solr/tweets_shard2_replica3 > > It is working as expected. > > It seems as if the cursor doesn't keep the sort between different replicas > of each shard.
The way that SolrCloud accomplishes its data replication can result in replicas that contain different numbers of deleted documents, even when each replica contains the exact same documents that *aren't* deleted. Deleted documents are still part of the index, so they can affect TF and IDF, which are the primary components in the score. This means that the score can be slightly different depending on which replica answers the query. If you want to be absolutely certain that everything is identical across all replicas, you could optimize the collection, but this could take a very long time if the collection is large. You would also need to be sure that you do not make any changes to the index until your cusorMark pagination is complete. Any changes to the index will likely affect scores from one query to the next, which can affect the order of documents in your cursormark. You could miss documents, or find that you've retrieved the same document more than once. Thanks, Shawn