Hey Solr community,

I'm using Solr's cursor mark feature and noticing duplicates when paging 
through results.   The duplicate records happen intermittently and appear at 
the end of one page, and the beginning of the next (but not on all pages 
through the results). So if rows=20 the duplicate records would be document 20 
on page1, and document 21 on page 2.   The document's id come from a database 
and that field is a unique primary key so I'm confident that there are no 
duplicate document id's in my corpus.   Additionally no index updates are 
occurring in the index (it's completely static).  My result sort order is id (a 
string representation of a timestamp (YYYY-MM-DD HH:MM.SSSSSS)), score. In this 
Solr community post 
(https://lucene.472066.n3.nabble.com/Solr-document-duplicated-during-pagination-td4269176.html)
 Shawn Heisey suggests:


"There are two ways this can happen.  One is that the index has changed
between different queries, pushing or pulling results between the end of
one page and the beginning of the next page.  The other is having the
same uniqueKey value in more than one shard."

In the Solr query below for one of the example duplicates in question I can see 
a search by the id returns only a single document. The replication factor for 
the collection is 2 so the id will also appear in this shards replica.  Taking 
into consideration Shawn's advice above, my question is will having a shard 
replica still count as the document having a duplicate id in another shard and 
potentially introduce duplicates into my paged results?  If not could anyone 
suggest another possible scenario where duplicates could potentially be 
introduced?

As always any advice would be greatly appreciated,

Thanks,

Dwane

Environment
Solr cloud (7.7.2)
8 shard collection, replication factor 2

{

  "responseHeader":{

    "zkConnected":true,

    "status":0,

    "QTime":2072,

    "params":{

      "q":"id:myUUID(YYYY-MM-DD HH:MM.SSSSSS)",

      "fl":"id,[shard]"}},

  "response":{"numFound":1,"start":0,"maxScore":17.601822,"docs":[

      {

        "id":"myUUID(YYYY-MM-DD HH:MM.SSSSSS)",

        
"[shard]":"https://solr1:9014/solr/MyCollection_shard4_replica_n12/|https://solr2:9011/solr/MyCollection_shard4_replica_n35/"}]

  }}


Reply via email to