Re: Multiple database backup strategy

Robert Samuel Newson Sat, 19 Mar 2016 11:37:37 -0700

Hi,

The problem is that _db_updates is not guaranteed to see every update, so I 
think it falls at the first hurdle.

What couch_replicator_manager does in couchdb 2.0 (though not in the version 
that Cloudant originally contributed) is to us ecouch_event, notice which are 
to _replicator shards, and trigger management work from that.

Some work I'm embarking on, with a few other devs here at Cloudant, is to 
enhance the replicator manager to not run all jobs at once and it is indeed the 
plan to have each of those jobs run for a while, kill them (they checkpoint 
then close all resources) and reschedule them later. It's TBD whether we'd 
always strip feed=continuous from those. We _could_ let each job run to 
completion (i.e, caught up to the source db as of the start of the replication 
job) but I think we have to be a bit smarter and allow replication jobs that 
constantly have work to do (i.e, the source db is always busy), to run as they 
run today, with feed=continuous, unless forcibly ousted by a scheduler due to 
some configuration concurrency setting.

I note  for completeness that the work we're planning explicitly includes 
"multi database" strategies, you'll hopefully be able to make a single 
_replicator doc that represents your entire intention (e.g, "replicate _all_ 
dbs from server1 to server2").

B.

> On 14 Mar 2016, at 02:40, Adam Kocoloski <[email protected]> wrote:
> 
> 
>> On Mar 10, 2016, at 3:18 AM, Jan Lehnardt <[email protected]> wrote:
>> 
>>> 
>>> On 09 Mar 2016, at 21:29, Nick Wood <[email protected]> wrote:
>>> 
>>> Hello,
>>> 
>>> I'm looking to back up a CouchDB server with multiple databases. Currently
>>> 1,400, but it fluctuates up and down throughout the day as new databases
>>> are added and old ones deleted. ~10% of the databases are written to within
>>> any 5 minute period of time.
>>> 
>>> Goals
>>> - Maintain a continual off-site snapshot of all databases, preferably no
>>> older than a few seconds (or minutes)
>>> - Be efficient with bandwidth (i.e. not copy the whole database file for
>>> every backup run)
>>> 
>>> My current solution watches the global _changes feed and fires up a
>>> continuous replication to an off-site server whenever it sees a change. If
>>> it doesn't see a change from a database for 10 minutes, it kills that
>>> replication. This means I only have ~150 active replications running on
>>> average at any given time.
>> 
>> How about instead of using continuous replications and killing them,
>> use non-continuous replications based on _db_updates? They end
>> automatically and should use fewer resources then.
>> 
>> Best
>> Jan
>> --
> 
> In my opinion this is actually a design we should adopt for CouchDB’s own 
> replication manager. Keeping all those _changes listeners running is 
> needlessly expensive now that we have _db_updates.
> 
> Adam

Re: Multiple database backup strategy

Reply via email to