I've been thinking about how SolrCloud deals with write-availability using
in-sync replica sets, in which writes will continue to be accepted so long
as there is at least one healthy node per shard.

For a little background (and to verify my understanding of the process is
correct), SolrCloud only considers active/healthy replicas when
acknowledging a write. Specifically, when a shard leader accepts an update
request, it forwards the request to all active/healthy replicas and only
considers the write successful if all active/healthy replicas ack the
write. Any down / gone replicas are not considered and will sync up with
the leader when they come back online using peer sync or snapshot
replication. For instance, if a shard has 3 nodes, A, B, C with A being the
current leader, then writes to the shard will continue to succeed even if B
& C are down.

The issue is that if a shard leader continues to accept updates even if it
loses all of its replicas, then we have acknowledged updates on only 1
node. If that node, call it A, then fails and one of the previous replicas,
call it B, comes back online before A does, then any writes that A accepted
while the other replicas were offline are at risk to being lost.

SolrCloud does provide a safe-guard mechanism for this problem with the
leaderVoteWait setting, which puts any replicas that come back online
before node A into a temporary wait state. If A comes back online within
the wait period, then all is well as it will become the leader again and no
writes will be lost. As a side note, sys admins definitely need to be made
more aware of this situation as when I first encountered it in my cluster,
I had no idea what it meant.

My question is whether we want to consider an approach where SolrCloud will
not accept writes unless there is a majority of replicas available to
accept the write? For my example, under this approach, we wouldn't accept
writes if both B&C failed, but would if only C did, leaving A & B online.
Admittedly, this lowers the write-availability of the system, so may be
something that should be tunable? Just wanted to put this out there as
something I've been thinking about lately ...

Cheers,
Tim

Reply via email to