Re: Multiple database backup strategy

Benjamin Bastian Sat, 19 Mar 2016 14:48:30 -0700

When a shard is updated, it'll trigger a "database updated" event. CouchDB
will hold those updates in memory for a configurable amount of time in
order to dedupe updates. It'll then cast lists of updated databases to
nodes which host the relevant _db_updates shards for further deduplication.
It's only at that point that the updates are persisted. Only a single
update needs to reach the _db_updates DB. IIRC, _db_updates triggers up to
n^3 (assuming the _db_updates DB and the updated DB have the same N), so it
may be a bit tricky for all of them to fail. You'd need coordinated node
failure. Perhaps something like datacenter power loss. Another possible
issue is if all the nodes which host a shard range of the _db_updates DB
are unreachable by the nodes which host a shard range of any other DB. Even
if it was momentary, it'd cause messages to be dropped from the _db_updates
feed.


For n=3 DBs, it seems like it'd be difficult for all of those things to go
wrong (except perhaps in the case of power loss or catastrophic network
failure). For n=1 DBs, you'd simply need to reboot a node soon after an
update.

On Sat, Mar 19, 2016 at 1:31 PM, Adam Kocoloski <[email protected]> wrote:

> Hi Bob, comments inline:
>
> > On Mar 19, 2016, at 2:36 PM, Robert Samuel Newson <[email protected]>
> wrote:
> >
> > Hi,
> >
> > The problem is that _db_updates is not guaranteed to see every update,
> so I think it falls at the first hurdle.
>
> Do you mean to say that a listener of _db_updates is not guaranteed to see
> every updated *database*? I think it would be helpful for the discussion to
> describe the scenario in which an updated database permanently fails to
> show up in the feed. My recollection is that it’s quite byzantine.
>
> > What couch_replicator_manager does in couchdb 2.0 (though not in the
> version that Cloudant originally contributed) is to us ecouch_event, notice
> which are to _replicator shards, and trigger management work from that.
>
> Did you mean to say “couch_event”? I assume so. You’re describing how the
> replicator manager discovers new replication jobs, not how the jobs
> discover new updates to source databases specified by replication jobs.
> Seems orthogonal to me unless I missed something.
>
> > Some work I'm embarking on, with a few other devs here at Cloudant, is
> to enhance the replicator manager to not run all jobs at once and it is
> indeed the plan to have each of those jobs run for a while, kill them (they
> checkpoint then close all resources) and reschedule them later. It's TBD
> whether we'd always strip feed=continuous from those. We _could_ let each
> job run to completion (i.e, caught up to the source db as of the start of
> the replication job) but I think we have to be a bit smarter and allow
> replication jobs that constantly have work to do (i.e, the source db is
> always busy), to run as they run today, with feed=continuous, unless
> forcibly ousted by a scheduler due to some configuration concurrency
> setting.
>
> So I think this is really the crux of the issue. My contention is that
> permanently occupying a socket for each continuous replication with the
> same source and mediator is needlessly expensive, and that _db_updates
> could be an elegant replacement.
>
> > I note  for completeness that the work we're planning explicitly
> includes "multi database" strategies, you'll hopefully be able to make a
> single _replicator doc that represents your entire intention (e.g,
> "replicate _all_ dbs from server1 to server2”).
>
> Nice! It’ll be good to hear more about that design as it evolves,
> particularly in aspects like discovery of newly created source databases
> and reporting of 403s and other fatal errors.
>
> Adam

Re: Multiple database backup strategy

Reply via email to