Inconsistency in results between replicas using CloudSolrClient

Chris Troullis Tue, 01 Aug 2017 13:50:42 -0700

Hi,

I think I know the answer to this question, but just wanted to verify/see
what other people do to address this concern.


I have a Solr Cloud setup (6.6.0) with 2 nodes, 1 collection with 1 shard
and 2 replicas (1 replica per node). The nature of my use case requires
frequent updates to Solr, and documents are being added constantly
throughout the day. I am using CloudSolrClient via SolrJ to query my
collection and load balance across my 2 replicas.

Here's my question:

As I understand it, because of the nature of Solr Cloud (eventual
consistency), and the fact that the soft commit timings on the 2 replicas
will not necessarily be in sync, would it not be possible to run into a
scenario where, say a document gets indexed on replica 1 right before a
soft commit, but indexed on replica 2 right after a soft commit? In this
scenario, using the load balanced CloudSolrClient, wouldn't it be possible
for a user to do a search, see the newly added document because they got
sent to replica 1, and then search again, and the newly added document
would disappear from their results since they got sent to replica 2 and the
soft commit hasn't happened yet?

If so, how do people typically handle this scenario in NRT search cases? It
seems like a poor user experience if things keep disappearing and
reappearing from their search results randomly. Currently the only thought
I have to prevent this is to write (or extend) my own solr client to stick
a user's session to a specific replica (unless it goes down), but still
load balance users between the replicas. But of course then I have to
manage all of the things CloudSolrClient manages manually re: cluster
state, etc.

Can anyone confirm/deny my understanding of how this works/offer any
suggestions to eliminate the scenario in question from occurring?

Thanks,

Chris

Inconsistency in results between replicas using CloudSolrClient

Reply via email to