Re: Data Centre recovery/replication, does this seem plausible?

Michael Della Bitta Thu, 29 Aug 2013 09:37:05 -0700

Someone really needs to test this with EC2 availability zones. I haven't
had the time, but I know other clustered NoSQL solutions like HBase and
Cassandra can deal with it.


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>


On Thu, Aug 29, 2013 at 12:20 PM, Walter Underwood <wun...@wunderwood.org>wrote:

> Here is a really different approach.
>
> Make the two data centers one Solr Cloud cluster and use a third data
> center (or EC2 region) for one additional Zookeeper node. When you lose a
> DC, Zookeeper still functions.
>
> There would be more traffic between datacenters.
>
> wunder
>
> On Aug 29, 2013, at 4:11 AM, Erick Erickson wrote:
>
> > Yeah, reality gets in the way of simple solutions a lot.....
> >
> > And making it even more fun you'd really want to only
> > bring up one node for each shard in the broken DC and
> > let that one be fully synched. Then bring up the replicas
> > in a controlled fashion so you didn't saturate the local
> > network with replications. And then you'd.....
> >
> > But as Shawn says, this is certainly functionality that
> > would be waaay cool, there's just been no time to
> > make it all work, the main folks who've been working
> > in this area all have a mountain of higher-priority
> > stuff to get done first....
> >
> > There's been talk of making SolrCloud "rack aware" which
> > could extend into some kind of work in this area, but
> > that's also on the "future" plate. As you're well aware
> > it's not a trivial problem!
> >
> > Hmmm, what you really want here is the ability to say
> > to a recovering cluster "do your initial synch using nodes
> > that the ZK ensemble located at XXX know about, then
> > switch to your very own ensemble". Something like a
> > "remote recovery" option..... Which is _still_ kind of
> > tricky, I sure hope you have identical sharding schemes.....
> >
> > FWIW,
> > Erick
> >
> >
> > On Wed, Aug 28, 2013 at 1:12 PM, Shawn Heisey <s...@elyograg.org> wrote:
> >
> >> On 8/28/2013 10:48 AM, Daniel Collins wrote:
> >>
> >>> What ideally I would like to do
> >>> is at the point that I kick off recovery, divert the indexing feed for
> the
> >>> "broken" into a transaction log on those machines, run the replication
> and
> >>> swap the index in, then replay the transaction log to bring it all up
> to
> >>> date.  That process (conceptually)  is the same as the
> >>> org.apache.solr.cloud.**RecoveryStrategy code.
> >>>
> >>
> >> I don't think any such mechanism exists currently.  It would be
> extremely
> >> awesome if it did.  If there's not an existing Jira issue, I recommend
> that
> >> you file one.  Being able to set up a multi-datacenter cloud with
> automatic
> >> recovery would be awesome.  Even if it took a long time, having it be
> fully
> >> automated would be exceptionally useful.
> >>
> >>
> >> Yes, if I could divert that feed a that application level, then I can do
> >>> what you suggest, but it feels like more work to do that (and build an
> >>> external transaction log) whereas the code seems to already be in Solr
> >>> itself, I just need to hook it all up (famous last words!) Our indexing
> >>> pipeline does a lot of pre-processing work (its not just pulling data
> from
> >>> a database), and since we are only talking about the time taken to do
> the
> >>> replication (should be an hour or less), it feels like we ought to be
> able
> >>> to store that in a Solr transaction log (i.e. the last point in the
> >>> indexing pipeline).
> >>>
> >>
> >> I think it would have to be a separate transaction log.  One problem
> with
> >> really big regular tlogs is that when Solr gets restarted, the entire
> >> transaction log that's currently on the disk gets replayed.  If it were
> big
> >> enough to recover the last several hours to a duplicate cloud, it would
> >> take forever to replay on Solr restart.  If the regular tlog were kept
> >> small but a second log with the last 24 hours were available, it could
> >> replay updates when the second cloud came back up.
> >>
> >> I do import from a database, so the application-level tracking works
> >> really well for me.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>

Re: Data Centre recovery/replication, does this seem plausible?

Reply via email to