Yeah, sorry, my maths was clearly flawed today, thanks for correcting me Shawn.
What I meant was in a 3 ZK setup, if you lose one machine, you are okay, but you are also "at risk", since losing anything else would lose quorum. So in our NRT-style scenario, we would have to get that dead machine back ASAP. As Shawn says, we have a larger ensemble to allow for another machine crashing during a planned maintenance window (so we are down 2 ZKs for some period of time, and that is still ok). It all depends how DR you need to be. On 13 April 2016 at 16:48, Shawn Heisey <apa...@elyograg.org> wrote: > On 4/13/2016 9:34 AM, Daniel Collins wrote: > > Just to chip in, more ZKs are probably only necessary if you are doing > NRT > > indexing. > > > > Loss of a single ZK (in a 3 machine setup) will block indexing for the > time > > it takes to get that machine/instance back up > > That would NOT block indexing. If you have three zookeepers and you > lose one, SolrCloud functionality will not change. If you lose TWO, > then you would no longer be able to index. > > If you've seen a situation where losing one zookeeper out of three > causes indexing to stop, then either something is not configured > correctly, or you've encountered a bug. I would bet more on a > misconfiguration than a bug. > > A 5-node ensemble would allow you to lose a server and still be able to > take down another server for maintenance, without affecting SolrCloud > operation. > > Thanks, > Shawn > >