Re: Data Centre recovery/replication, does this seem plausible?

Erick Erickson Thu, 29 Aug 2013 04:12:42 -0700

Yeah, reality gets in the way of simple solutions a lot.....

And making it even more fun you'd really want to only
bring up one node for each shard in the broken DC and
let that one be fully synched. Then bring up the replicas
in a controlled fashion so you didn't saturate the local
network with replications. And then you'd.....


But as Shawn says, this is certainly functionality that
would be waaay cool, there's just been no time to
make it all work, the main folks who've been working
in this area all have a mountain of higher-priority
stuff to get done first....

There's been talk of making SolrCloud "rack aware" which
could extend into some kind of work in this area, but
that's also on the "future" plate. As you're well aware
it's not a trivial problem!

Hmmm, what you really want here is the ability to say
to a recovering cluster "do your initial synch using nodes
that the ZK ensemble located at XXX know about, then
switch to your very own ensemble". Something like a
"remote recovery" option..... Which is _still_ kind of
tricky, I sure hope you have identical sharding schemes.....

FWIW,
Erick


On Wed, Aug 28, 2013 at 1:12 PM, Shawn Heisey <s...@elyograg.org> wrote:

> On 8/28/2013 10:48 AM, Daniel Collins wrote:
>
>> What ideally I would like to do
>> is at the point that I kick off recovery, divert the indexing feed for the
>> "broken" into a transaction log on those machines, run the replication and
>> swap the index in, then replay the transaction log to bring it all up to
>> date.  That process (conceptually)  is the same as the
>> org.apache.solr.cloud.**RecoveryStrategy code.
>>
>
> I don't think any such mechanism exists currently.  It would be extremely
> awesome if it did.  If there's not an existing Jira issue, I recommend that
> you file one.  Being able to set up a multi-datacenter cloud with automatic
> recovery would be awesome.  Even if it took a long time, having it be fully
> automated would be exceptionally useful.
>
>
>  Yes, if I could divert that feed a that application level, then I can do
>> what you suggest, but it feels like more work to do that (and build an
>> external transaction log) whereas the code seems to already be in Solr
>> itself, I just need to hook it all up (famous last words!) Our indexing
>> pipeline does a lot of pre-processing work (its not just pulling data from
>> a database), and since we are only talking about the time taken to do the
>> replication (should be an hour or less), it feels like we ought to be able
>> to store that in a Solr transaction log (i.e. the last point in the
>> indexing pipeline).
>>
>
> I think it would have to be a separate transaction log.  One problem with
> really big regular tlogs is that when Solr gets restarted, the entire
> transaction log that's currently on the disk gets replayed.  If it were big
> enough to recover the last several hours to a duplicate cloud, it would
> take forever to replay on Solr restart.  If the regular tlog were kept
> small but a second log with the last 24 hours were available, it could
> replay updates when the second cloud came back up.
>
> I do import from a database, so the application-level tracking works
> really well for me.
>
> Thanks,
> Shawn
>
>

Re: Data Centre recovery/replication, does this seem plausible?

Reply via email to