I think I've figured out how to express it: A zk node can offer its services
if it is able to communicate with more than half of the specified ensemble
size, which assures that there is no split brain, where two or more
competing groups of inter-communicating nodes could offer services that
conflict since only communicating nodes can agree on services.
1 zk = works but no HA - single-point of failure
2 zk = works, but no HA since that would allow split-brain
3 zk = allows 1 unreachable/down node since 2 is more than half of 3 and
assures no split brain, but a single node could be split brain
4 zk = allows 1 unreachable/down node, but not 2 or 1 since that could mean
split brain
5 zk = allows 1 or 2 unreachable/down since 3 is more than half of 5 and
assures no split brain, but 2 or 1 nodes reachable could be split brain
6 zk = allows 1 or 2 unreachable/down since 4 is more than half of 5 and
assures no split brain, but 3 or fewer nodes communicating could be split
brain
And, finally, it is not the number of nodes that are "down" per se, but how
many nodes a given node can communicate and whether that is a simple
majority of the specified ensemble size.
-- Jack Krupansky
-----Original Message-----
From: Yonik Seeley
Sent: Thursday, December 06, 2012 11:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Minimum HA Setup with SolrCloud
On Thu, Dec 6, 2012 at 8:42 PM, Jack Krupansky <j...@basetechnology.com>
wrote:
And this is precisely why the mystery remains - because you're only
describing half the picture! Describe the rest of the picture - including
what exactly those two zks can and can't do, including resolution of ties
and the concept of "constituting a majority" and a quorum.
The high level description is simple: 2 nodes that can still talk to
each other are more than 50% of the original 3, hence they know that
they can make decisions, and that it's impossible for another
partitioned group to make contrary decisions (since any partitioned
groups must be less than 50% of the original cluster).
The (very) low level is out of scope for here... I'd suggest starting
here though: http://en.wikipedia.org/wiki/Paxos_(computer_science)
and following up on the zookeeper lists.
-Yonik
http://lucidworks.com