But there's still the latency issue. Draw a diagram of all the
communications that have to go on to do an update and it's a _lot_ of
arrows going across DCs.
My suspicion is that it'll be much easier to just treat the separate DCs as
separate clusters that don't know about each other. that is, you
The way Zookeeper is set up, requiring 'quorum' is aimed at avoiding
'split brain' where two halves of your cluster start to operate
independently. This means that you *have* to favour one half of your
cluster over the other, in the case that they cannot communicate with
each other.
For example. i
This is exactly the problem we are encountering as well, how to deal with
the ZK Quorum when we have multiple DCs. Our index is spread so that each
DC has a complete copy and *should* be able to survive on its own, but how
to arrange ZK to deal with that. The problem with Quorum is we need an odd
For the Zk quorum issue, we'll put nodes in 3 different AZ's so we can lose
1 AZ and still establish quorum with the other 2.
On Tue, Jan 22, 2013 at 10:44 PM, Timothy Potter wrote:
> Hi Markus,
>
> Thanks for the insight. There's a pretty high cost to using the approach
> you suggest in that I'd
Hi Markus,
Thanks for the insight. There's a pretty high cost to using the approach
you suggest in that I'd have to double my node count which won't make my
acct'ing dept. very happy.
As for cross AZ latency, I'm already running my cluster with nodes in 3
different AZ's and our distributed query
Aside from the latency, how would you deal with the Zookeeper quorum?
Say DC1 had ZK1 and ZK2, and DC2 had ZK3.
Now anytime any server in DC2 can't talk to DC1, there is no Zookeeper
quorum. So if DC1 goes down, having nodes in DC2 doesn't do you any
good since theres no ZK quorum. I guess things
Hi,
Regarding availability; since SolrCloud is not DC-aware at this moment we
'solve' the problem by simply operating multiple identical clusters in
different DCs and send updates to them all. This works quite well but it
requires some manual intervention if a DC is down due to a prolonged DOS