Hi Anshum

 

I am using SolrCloud with NRT replicas. I am trying to cursorMark for deep 
pagination, but I am getting inconsistent results. What I mean by inconsistent 
results is score of same documents keep on changing in single iteration.

 

I know why that is happening. NRT replicas in a SolrCloud index can have 
different numbers of deleted documents. Even though deleted documents do not 
appear in search results, they ARE still part of the index, and can affect 
scoring. Since SolrCloud load balances requests across replicas, page 1 may use 
different replicas than page 2, and end up with different scoring, which can 
affect the order of results and change which page number they end up on.

 

I was looking for way to get cursorMark work with NRT replication on SolrCloud. 
I am planning to use replica.base property of the shards.preference parameter 
<https://solr.apache.org/guide/8_11/distributed-requests.html#shards-preference-parameter>.
 I have tested it for small set of data. Seems like it is working. I will need 
spend more time and see whether it will affect query performance and how it 
behaves if I want to iterate thru whole index.

 

Best Regards,

Monika


On 2022/03/01 17:57:22 Anshum Gupta wrote:
> Hi Monika,
> 
> Can you clarify what it is that you mean by 'inconsistent results'? You
> should expect the value of cursorMark and nextCursorMark to be the same for
> the last batch and if your numFound is '6', your first batch would be your
> last batch if you set the rows parameter to a value <= 6.
> 
> In case you haven't already looked at the reference guide for this, here's
> the link for the same:
> https://solr.apache.org/guide/8_11/pagination-of-results.html
> 
> 
> On Tue, Mar 1, 2022 at 8:19 AM Naagar, Monika (ELS-AMS) <
> [email protected]> wrote:
> 
> > Hi Team,
> >
> > We are running on Solr 8.11.1. We are trying to fetch records using
> > CursorMark
> > while sorting on: score+desc,publication_date_dt+desc,id+asc.
> >
> > The collection is made of 5 shards, with 3 replicas each (NRT replica).
> >
> > We get inconsistent results when not specifying specific replica for each
> > shard.
> >
> > For example: I am trying to use cursor for this query
> > q="COVID-19+and+Women’s+Health".
> >
> > numFound is 6
> >
> > I keep on getting different value for nextCursorMark. nextCursorMark is
> > not same as CursorMark. I am not able to finish pagination. Sometime
> > nextCursorMark and CursorMark are same, but behavior is inconsistent.
> >
> >
> > Thank you for the help!
> >
> > Best Regards,
> > Monika
> >
> >
> > ________________________________
> >
> > Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The
> > Netherlands, Registration No. 33158992, Registered in The Netherlands.
> >
> 
> 
> -- 
> Anshum Gupta
> 

Reply via email to