But there's still the latency issue. Draw a diagram of all the
communications that have to go on to do an update and it's a _lot_ of
arrows going across DCs.

My suspicion is that it'll be much easier to just treat the separate DCs as
separate clusters that don't know about each other. that is, your indexing
process just has to send the update request to each cluster. Synchronizing
them afterwards is a problem assuredly.....

TNSTAAFL


On Wed, Jan 23, 2013 at 4:46 AM, Upayavira <u...@odoko.co.uk> wrote:

> The way Zookeeper is set up, requiring 'quorum' is aimed at avoiding
> 'split brain' where two halves of your cluster start to operate
> independently. This means that you *have* to favour one half of your
> cluster over the other, in the case that they cannot communicate with
> each other.
>
> For example. if you have three zookeepers, you'll put two in one DC and
> one in the other. The DC with two Zookeepers will stay active should the
> link between them go down.
>
> I'm not entirely sure what happens to the network with one zookeeper,
> I'd like to think it can still serve queries, it *could* work out which
> nodes are accessible to it - but it will certainly not be doing updates
> (they should be buffered until the other DC returns).
>
> If you want true geographical redundancy, I think Markus' suggestion is
> a sensible one.
>
> Upayavira
>
> On Tue, Jan 22, 2013, at 10:11 PM, Markus Jelsma wrote:
> > Hi,
> >
> > Regarding availability; since SolrCloud is not DC-aware at this moment we
> > 'solve' the problem by simply operating multiple identical clusters in
> > different DCs and send updates to them all. This works quite well but it
> > requires some manual intervention if a DC is down due to a prolonged DOS
> > attack or netwerk of power failure.
> >
> > I don't think it's a very good idea to change clusterstate.json because
> > Solr will modify it when for example a node goes down. Your preconfigured
> > state doesn't exist anymore. It's also a bad idea because distributed
> > queries are going to be sent to remote locations, adding a lot of
> > latency. Again, because it's not DC aware.
> >
> > Any good solution to this problem should be in Solr itself.
> >
> > Cheers,
> >
> >
> > -----Original message-----
> > > From:Timothy Potter <thelabd...@gmail.com>
> > > Sent: Tue 22-Jan-2013 22:46
> > > To: solr-user@lucene.apache.org
> > > Subject: Manually assigning shard leader and replicas during initial
> setup on EC2
> > >
> > > Hi,
> > >
> > > I'm wanting to split my existing Solr 4 cluster into 2 different
> > > availability zones in EC2, as in have my initial leaders in one zone
> and
> > > their replicas in another AZ. My thinking here is if one zone goes
> down, my
> > > cluster stays online. This is the recommendation of Amazon EC2 docs.
> > >
> > > My thinking here is to just cook up a clusterstate.json file to
> manually
> > > set my desired shard / replica assignments to specific nodes. After
> which I
> > > can update the clusterstate.json file in Zk and then bring the nodes
> > > online.
> > >
> > > The other thing to mention is that I have existing indexes that need
> to be
> > > preserved as I don't want to re-index. For this I'm planning to just
> move
> > > data directories where they need to be based on my changes to
> > > clusterstate.json
> > >
> > > Does this sound reasonable? Any pitfalls I should look out for?
> > >
> > > Thanks.
> > > Tim
> > >
>

Reply via email to