I missed a point in Adam's earlier post.

The current scheme uses couch_event for runtime changes to _replicator docs but 
has to read all updates of all _replicator databases at startup. In the steady 
state it is just receiving couch_event notifications. The /_db_updates option 
would change that only slightly (we'd read /_db_updates from 0 to find all 
_replicator databases, rather than reading the changes feed for the node-local 
'dbs' database).

CouchDB itself has a single /_replicator database, of course, but the code will 
consider any database to be a /_replicator database if the name ends that way. 
i.e, today, if you made a database called foo/_replicator it would be 
considered a /_replicator database by the system (and we'd inject the ddoc, 
etc).

B.

> On 20 Mar 2016, at 14:31, Robert Samuel Newson <[email protected]> wrote:
> 
> Since I'm typing anyway, and haven't yet been dinged for top-posting, I 
> wanted to mention one other optimization we had in mind.
> 
> Currently each replicator job has its own connection pool. When we introduce 
> the notion that we can stop and restart jobs, those become approximately 
> useless. So we will obvious hoist that 'up' to a higher level and manage 
> connection pools at the manager level.
> 
> One optimization that seems obvious from the Cloudant perspective is to allow 
> reuse of connections to the same destinations even though they are ostensibly 
> for different domains. That is, a connection to rnewson.cloudant.com is 
> ultimately a connection to lbX.jenever.cloudant.com. This connection could 
> just as easily be used for any other user in the jenever cluster. Thus, if 
> it's idle, we could borrow that connection rather than create a new one.
> 
>> host rnewson.cloudant.com
> rnewson.cloudant.com is an alias for jenever.cloudant.com.
> jenever.cloudant.com is an alias for lb2.jenever.cloudant.com.
> lb2.jenever.cloudant.com has address 5.153.0.207
> 
> Rather than add rnewson.cloudant.com > 5.153.0.207 to the pool, we would add 
> lb2.jenever.cloudant.com -> 5.153.0.207 and resolve rnewson.cloudant.com to 
> its ultimate CNAME before consulting the pool.
> 
> Does this optimization help elsewhere than Cloudant?
> 
>> On 20 Mar 2016, at 14:22, Robert Samuel Newson <[email protected]> wrote:
>> 
>> My point is that we can (and currently do) trigger the replication manager 
>> on receipt of the database updated event, so it avoids all of the other 
>> parts of the sequence you describe which could fail.
>> 
>> The obvious difference, and I suspect this is what motivates Adam's 
>> position, is that _db_updates can be called remotely. A solution using 
>> /_db_updates as its feed can run somewhere else, it wouldn't even need to be 
>> a couchdb cluster. With the current 2.0 scheme, the _replicator db has to 
>> live on the nodes performing replication management (and therefore it 
>> depends on couch_{btree,file} etc). That's a huge incentive to go the 
>> /_db_updates route and it would serve as a model for others like pouchdb 
>> that cannot choose to co-locate.
>> 
>> One side-benefit we get from using database updated events from the 
>> _replicator shards, though, is that it helps us determine which node will 
>> run any particular job. We allocate a job to the lowest live erlang node 
>> that hosts the document. If we go with /_db_updates, we'll need some other 
>> scheme. That's not a bad thing (indeed, it could be a very good thing), but 
>> it would need more thought. While in Seattle we did discuss both directions 
>> at some length and believe we'd need some form of leader election system, 
>> the leader would then assign (and rebalance) replication jobs across the 
>> erlang cluster. I pointed at a proof-of-concept implementation of an 
>> algorithm I trust that I wrote a while back at 
>> https://github.com/cloudant/sobhuza as a possible starting point.
>> 
>> B.
>> 
>> P.S. I'm using Mail.app and simply replying where it sticks the cursor (at 
>> the top), but in other forums I've been berated for top-posting. Should I 
>> modify my reply style here?
>> 
>> On 19 Mar 2016, at 21:42, Benjamin Bastian <[email protected]> wrote:
>>> 
>>> When a shard is updated, it'll trigger a "database updated" event. CouchDB
>>> will hold those updates in memory for a configurable amount of time in
>>> order to dedupe updates. It'll then cast lists of updated databases to
>>> nodes which host the relevant _db_updates shards for further deduplication.
>>> It's only at that point that the updates are persisted. Only a single
>>> update needs to reach the _db_updates DB. IIRC, _db_updates triggers up to
>>> n^3 (assuming the _db_updates DB and the updated DB have the same N), so it
>>> may be a bit tricky for all of them to fail. You'd need coordinated node
>>> failure. Perhaps something like datacenter power loss. Another possible
>>> issue is if all the nodes which host a shard range of the _db_updates DB
>>> are unreachable by the nodes which host a shard range of any other DB. Even
>>> if it was momentary, it'd cause messages to be dropped from the _db_updates
>>> feed.
>>> 
>>> For n=3 DBs, it seems like it'd be difficult for all of those things to go
>>> wrong (except perhaps in the case of power loss or catastrophic network
>>> failure). For n=1 DBs, you'd simply need to reboot a node soon after an
>>> update.
>>> 
>>> On Sat, Mar 19, 2016 at 1:31 PM, Adam Kocoloski <[email protected]> wrote:
>>> 
>>>> Hi Bob, comments inline:
>>>> 
>>>>> On Mar 19, 2016, at 2:36 PM, Robert Samuel Newson <[email protected]>
>>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> The problem is that _db_updates is not guaranteed to see every update,
>>>> so I think it falls at the first hurdle.
>>>> 
>>>> Do you mean to say that a listener of _db_updates is not guaranteed to see
>>>> every updated *database*? I think it would be helpful for the discussion to
>>>> describe the scenario in which an updated database permanently fails to
>>>> show up in the feed. My recollection is that it’s quite byzantine.
>>>> 
>>>>> What couch_replicator_manager does in couchdb 2.0 (though not in the
>>>> version that Cloudant originally contributed) is to us ecouch_event, notice
>>>> which are to _replicator shards, and trigger management work from that.
>>>> 
>>>> Did you mean to say “couch_event”? I assume so. You’re describing how the
>>>> replicator manager discovers new replication jobs, not how the jobs
>>>> discover new updates to source databases specified by replication jobs.
>>>> Seems orthogonal to me unless I missed something.
>>>> 
>>>>> Some work I'm embarking on, with a few other devs here at Cloudant, is
>>>> to enhance the replicator manager to not run all jobs at once and it is
>>>> indeed the plan to have each of those jobs run for a while, kill them (they
>>>> checkpoint then close all resources) and reschedule them later. It's TBD
>>>> whether we'd always strip feed=continuous from those. We _could_ let each
>>>> job run to completion (i.e, caught up to the source db as of the start of
>>>> the replication job) but I think we have to be a bit smarter and allow
>>>> replication jobs that constantly have work to do (i.e, the source db is
>>>> always busy), to run as they run today, with feed=continuous, unless
>>>> forcibly ousted by a scheduler due to some configuration concurrency
>>>> setting.
>>>> 
>>>> So I think this is really the crux of the issue. My contention is that
>>>> permanently occupying a socket for each continuous replication with the
>>>> same source and mediator is needlessly expensive, and that _db_updates
>>>> could be an elegant replacement.
>>>> 
>>>>> I note  for completeness that the work we're planning explicitly
>>>> includes "multi database" strategies, you'll hopefully be able to make a
>>>> single _replicator doc that represents your entire intention (e.g,
>>>> "replicate _all_ dbs from server1 to server2”).
>>>> 
>>>> Nice! It’ll be good to hear more about that design as it evolves,
>>>> particularly in aspects like discovery of newly created source databases
>>>> and reporting of 403s and other fatal errors.
>>>> 
>>>> Adam
>> 
> 

Reply via email to