On 2/26/2018 10:26 AM, Webster Homer wrote:
> We need the results by relevancy so the application sorts the results by
> score desc, and the unique id ascending as the tie breaker

This is the reason for the discrepancy, and why the different replica
types don't have the same issue.

Each NRT replica can have different deleted documents than the others,
just due to the way that NRT replicas work.  Deleted documents affect
relevancy scoring.  When one replica has say 5000 deleted documents and
another has 200, or has 5000 but they're different docs, a relevancy
sort can end up different.  So when Solr goes to one replica for page 1
and another for page 2 (which is expected due to SolrCloud's internal
load balancing), you may end up with duplicate documents or documents
missing.  Because deleted documents are not counted or returned,
numFound will be consistent, as long as the index doesn't change between
the queries for pages.

If you were using a deterministic sort rather than relevancy, this
wouldn't be happening, because deleted documents have no influence on
that kind of sort.

With TLOG or PULL, the replicas are absolutely identical, so there is no
difference, unless the index is changing as you page through the results.

I think changing replica types is the only solution here.  NRT replicas
are working as they were designed -- there's no bug, even though
problems like this do sometimes turn up.

Thanks,
Shawn

Reply via email to