I've been thinking about how SolrCloud deals with write-availability using in-sync replica sets, in which writes will continue to be accepted so long as there is at least one healthy node per shard.
For a little background (and to verify my understanding of the process is correct), SolrCloud only considers active/healthy replicas when acknowledging a write. Specifically, when a shard leader accepts an update request, it forwards the request to all active/healthy replicas and only considers the write successful if all active/healthy replicas ack the write. Any down / gone replicas are not considered and will sync up with the leader when they come back online using peer sync or snapshot replication. For instance, if a shard has 3 nodes, A, B, C with A being the current leader, then writes to the shard will continue to succeed even if B & C are down. The issue is that if a shard leader continues to accept updates even if it loses all of its replicas, then we have acknowledged updates on only 1 node. If that node, call it A, then fails and one of the previous replicas, call it B, comes back online before A does, then any writes that A accepted while the other replicas were offline are at risk to being lost. SolrCloud does provide a safe-guard mechanism for this problem with the leaderVoteWait setting, which puts any replicas that come back online before node A into a temporary wait state. If A comes back online within the wait period, then all is well as it will become the leader again and no writes will be lost. As a side note, sys admins definitely need to be made more aware of this situation as when I first encountered it in my cluster, I had no idea what it meant. My question is whether we want to consider an approach where SolrCloud will not accept writes unless there is a majority of replicas available to accept the write? For my example, under this approach, we wouldn't accept writes if both B&C failed, but would if only C did, leaving A & B online. Admittedly, this lowers the write-availability of the system, so may be something that should be tunable? Just wanted to put this out there as something I've been thinking about lately ... Cheers, Tim