On 5/9/2017 10:59 AM, moscovig wrote:
> We are running on solrj 6.2.0, server 6.2.1 
> and trying to fetch 100 records at a time, with nextCursorMark, 
> while* sorting on: score desc,* key asc
>
> The collection is made of 2 shards, with 3 replicas each.
>
> We get inconsistent results when not specifying specific replica for each
> shard.
>
> Sometimes the 3rd, and sometime the 10th fetch will contain results that we
> expected to see in the 15th batch.
> Something went wrong with the score sorting. 
>
> When we specify a replica for each shard to query from with
> shards=solr1:8983/solr/tweets_shard1_replica2/,solr26:8983/solr/tweets_shard2_replica3
>
> It is working as expected.
>
> It seems as if the cursor doesn't keep the sort between different replicas
> of each shard.

The way that SolrCloud accomplishes its data replication can result in
replicas that contain different numbers of deleted documents, even when
each replica contains the exact same documents that *aren't* deleted. 
Deleted documents are still part of the index, so they can affect TF and
IDF, which are the primary components in the score.  This means that the
score can be slightly different depending on which replica answers the
query.

If you want to be absolutely certain that everything is identical across
all replicas, you could optimize the collection, but this could take a
very long time if the collection is large.  You would also need to be
sure that you do not make any changes to the index until your cusorMark
pagination is complete.  Any changes to the index will likely affect
scores from one query to the next, which can affect the order of
documents in your cursormark.  You could miss documents, or find that
you've retrieved the same document more than once.

Thanks,
Shawn

Reply via email to