Hi Gili,

Great question!
A write in Solr, by default, is only guaranteed to exist in 1 place i.e.
the leader and the safety valves that we have to preserve these writes are:

1. The leaderVoteWait time for which leader election is suspended until
enough live replicas are available
2. The two-way peer-sync between leader candidate and other replicas

The other safety valve is on the client side with the "min_rf" parameter
introduced by SOLR-5468 in Solr 4.9. If you set this param to 2 while
making the request then Solr will return the number of replicas to which it
could successfully send the update. Then depending on the response you can
make a decision to retry the update at a later time assuming it is
idempotent. This kinda puts the onus ensuring consistency on the client
side which is not ideal but better than nothing. See SOLR-5468 for more
discussion on this topic.

In your particular example, none of these safeties are invoked because you
start node2 while node1 was down and node2 goes ahead with leader election
after the wait period. Also since node1 was down during leader election,
peer sync doesn't happen and then node2 becomes the leader.

When node1 comes back online and joins as a replica, it recovers from the
leader using peer-sync (which returns the newest 100 updates) and finds
that there's nothing newer on the leader. However, there are no checks to
make sure that the replica doesn't have a newer update itself which is why
you end up with the inconsistent replica. If there were a lot of updates on
node2 (more than 100) while node1 was down, in which case peer-sync isn't
applicable, then it'd would have done a replication recovery and this
inconsistency would have been resolved.

So yeah we have a valid consistency bug such that we have inconsistent
replicas in a steady state. I wonder if the right way is to bump min_rf to
a higher value or peer-sync both ways during replica recovery. I'll need to
think more on this.


On Thu, Dec 11, 2014 at 4:21 PM, Gili Nachum <gilinac...@gmail.com> wrote:

> I know Solr CAP properties are CP, but I don't see it happening over a very
> basic test - doing something wrong?
>
> With two Solr nodes, I index doc1 to both, stop node2, update doc1, stop
> node1, start node2, start node1, and I get two different versions of the
> doc depending on which replica I query.
> I would expect node2 to update to itself.
> Attaching Solr logs from both nodes.
>
> *Config*
> Solr 4.7.2 / Jetty.
> SoldCloud on two nodes, and  3 ZK, all running in localhost.
> single collection: single shard with two replicas.
>
> *Reproducing:*
> start node1 9.148.58.114:8983
> start node2 9.148.58.114:8984
> Cluster state: node1 leader. node2 active.
>
> index value 'A' (id="change me").
> query and expect 'A' -> success
>
> Stop node2
> Cluster state: node1 leader. node2 gone.
> query and expect 'A' -> success
>
> Update document value from 'A'->'B'
> query and expect 'B' -> success
>
> Stop node1
> then
> Start node2.
> Cluster state: node1 gone. node2 down.
>
> *    104510 [coreZkRegister-1-thread-1] INFO
> org.apache.solr.cloud.ShardLeaderElectionContext Waiting until we see more
> replicas up for shard shard1: total=2 found=1 timeoutin=5.27665925E14ms*
>
> wait 3m.
>
> *    184679 [coreZkRegister-1-thread-1] INFO
> org.apache.solr.cloud.ShardLeaderElectionContext  I am the new leader:
> http://9.148.58.114:8984/solr/quick-results-collection_shard1_replica2/
> <http://9.148.58.114:8984/solr/quick-results-collection_shard1_replica2/>
> shard1    *
> Cluster state: node1 gone. node2 leader.
>
> query and expect 'A' (old value) -> success
>
> start node1
> Cluster state: node1 actove. node2 leader.
>
> *Inconsistency: *
> *    Querying node1 always returns 'B'. *
>
> http://localhost:8983/solr/quick-results-collection_shard1_replica1/select?q=*%3A*&wt=json&indent=true
> *    Querying node1 always returns 'A'. *
>
> http://localhost:8984/solr/quick-results-collection_shard1_replica2/select?q=*%3A*&wt=json&indent=true
>



-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to