Final note, we've conflated two uses of /_db_updates that I want to be very clear on;
1) using /_db_updates to detect active source databases of a replication job. 2) using /_db_updates to hear about new/updated/deleted _replicator documents. It was the 2nd case where the unreliability was a concern, since the update frequency is very low, one expects, for _replicator databases. > On 20 Mar 2016, at 14:44, Robert Samuel Newson <[email protected]> wrote: > > (I swear I'll stop soon...) > > Using /_db_updates as a cheap mechanism to detect activity at the source for > any database we're interested in is an important optimization. We didn't > discuss it this past week as we felt that /_db_updates wasn't sufficiently > reliable. We can save a lot of churn in the scheduler by simply not resuming > any job unless we have seen an update to the source database. > > B. > >> On 20 Mar 2016, at 14:36, Robert Samuel Newson <[email protected]> wrote: >> >> I missed a point in Adam's earlier post. >> >> The current scheme uses couch_event for runtime changes to _replicator docs >> but has to read all updates of all _replicator databases at startup. In the >> steady state it is just receiving couch_event notifications. The >> /_db_updates option would change that only slightly (we'd read /_db_updates >> from 0 to find all _replicator databases, rather than reading the changes >> feed for the node-local 'dbs' database). >> >> CouchDB itself has a single /_replicator database, of course, but the code >> will consider any database to be a /_replicator database if the name ends >> that way. i.e, today, if you made a database called foo/_replicator it would >> be considered a /_replicator database by the system (and we'd inject the >> ddoc, etc). >> >> B. >> >>> On 20 Mar 2016, at 14:31, Robert Samuel Newson <[email protected]> wrote: >>> >>> Since I'm typing anyway, and haven't yet been dinged for top-posting, I >>> wanted to mention one other optimization we had in mind. >>> >>> Currently each replicator job has its own connection pool. When we >>> introduce the notion that we can stop and restart jobs, those become >>> approximately useless. So we will obvious hoist that 'up' to a higher level >>> and manage connection pools at the manager level. >>> >>> One optimization that seems obvious from the Cloudant perspective is to >>> allow reuse of connections to the same destinations even though they are >>> ostensibly for different domains. That is, a connection to >>> rnewson.cloudant.com is ultimately a connection to >>> lbX.jenever.cloudant.com. This connection could just as easily be used for >>> any other user in the jenever cluster. Thus, if it's idle, we could borrow >>> that connection rather than create a new one. >>> >>>> host rnewson.cloudant.com >>> rnewson.cloudant.com is an alias for jenever.cloudant.com. >>> jenever.cloudant.com is an alias for lb2.jenever.cloudant.com. >>> lb2.jenever.cloudant.com has address 5.153.0.207 >>> >>> Rather than add rnewson.cloudant.com > 5.153.0.207 to the pool, we would >>> add lb2.jenever.cloudant.com -> 5.153.0.207 and resolve >>> rnewson.cloudant.com to its ultimate CNAME before consulting the pool. >>> >>> Does this optimization help elsewhere than Cloudant? >>> >>>> On 20 Mar 2016, at 14:22, Robert Samuel Newson <[email protected]> wrote: >>>> >>>> My point is that we can (and currently do) trigger the replication manager >>>> on receipt of the database updated event, so it avoids all of the other >>>> parts of the sequence you describe which could fail. >>>> >>>> The obvious difference, and I suspect this is what motivates Adam's >>>> position, is that _db_updates can be called remotely. A solution using >>>> /_db_updates as its feed can run somewhere else, it wouldn't even need to >>>> be a couchdb cluster. With the current 2.0 scheme, the _replicator db has >>>> to live on the nodes performing replication management (and therefore it >>>> depends on couch_{btree,file} etc). That's a huge incentive to go the >>>> /_db_updates route and it would serve as a model for others like pouchdb >>>> that cannot choose to co-locate. >>>> >>>> One side-benefit we get from using database updated events from the >>>> _replicator shards, though, is that it helps us determine which node will >>>> run any particular job. We allocate a job to the lowest live erlang node >>>> that hosts the document. If we go with /_db_updates, we'll need some other >>>> scheme. That's not a bad thing (indeed, it could be a very good thing), >>>> but it would need more thought. While in Seattle we did discuss both >>>> directions at some length and believe we'd need some form of leader >>>> election system, the leader would then assign (and rebalance) replication >>>> jobs across the erlang cluster. I pointed at a proof-of-concept >>>> implementation of an algorithm I trust that I wrote a while back at >>>> https://github.com/cloudant/sobhuza as a possible starting point. >>>> >>>> B. >>>> >>>> P.S. I'm using Mail.app and simply replying where it sticks the cursor (at >>>> the top), but in other forums I've been berated for top-posting. Should I >>>> modify my reply style here? >>>> >>>> On 19 Mar 2016, at 21:42, Benjamin Bastian <[email protected]> wrote: >>>>> >>>>> When a shard is updated, it'll trigger a "database updated" event. CouchDB >>>>> will hold those updates in memory for a configurable amount of time in >>>>> order to dedupe updates. It'll then cast lists of updated databases to >>>>> nodes which host the relevant _db_updates shards for further >>>>> deduplication. >>>>> It's only at that point that the updates are persisted. Only a single >>>>> update needs to reach the _db_updates DB. IIRC, _db_updates triggers up to >>>>> n^3 (assuming the _db_updates DB and the updated DB have the same N), so >>>>> it >>>>> may be a bit tricky for all of them to fail. You'd need coordinated node >>>>> failure. Perhaps something like datacenter power loss. Another possible >>>>> issue is if all the nodes which host a shard range of the _db_updates DB >>>>> are unreachable by the nodes which host a shard range of any other DB. >>>>> Even >>>>> if it was momentary, it'd cause messages to be dropped from the >>>>> _db_updates >>>>> feed. >>>>> >>>>> For n=3 DBs, it seems like it'd be difficult for all of those things to go >>>>> wrong (except perhaps in the case of power loss or catastrophic network >>>>> failure). For n=1 DBs, you'd simply need to reboot a node soon after an >>>>> update. >>>>> >>>>> On Sat, Mar 19, 2016 at 1:31 PM, Adam Kocoloski <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Bob, comments inline: >>>>>> >>>>>>> On Mar 19, 2016, at 2:36 PM, Robert Samuel Newson <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> The problem is that _db_updates is not guaranteed to see every update, >>>>>> so I think it falls at the first hurdle. >>>>>> >>>>>> Do you mean to say that a listener of _db_updates is not guaranteed to >>>>>> see >>>>>> every updated *database*? I think it would be helpful for the discussion >>>>>> to >>>>>> describe the scenario in which an updated database permanently fails to >>>>>> show up in the feed. My recollection is that it’s quite byzantine. >>>>>> >>>>>>> What couch_replicator_manager does in couchdb 2.0 (though not in the >>>>>> version that Cloudant originally contributed) is to us ecouch_event, >>>>>> notice >>>>>> which are to _replicator shards, and trigger management work from that. >>>>>> >>>>>> Did you mean to say “couch_event”? I assume so. You’re describing how the >>>>>> replicator manager discovers new replication jobs, not how the jobs >>>>>> discover new updates to source databases specified by replication jobs. >>>>>> Seems orthogonal to me unless I missed something. >>>>>> >>>>>>> Some work I'm embarking on, with a few other devs here at Cloudant, is >>>>>> to enhance the replicator manager to not run all jobs at once and it is >>>>>> indeed the plan to have each of those jobs run for a while, kill them >>>>>> (they >>>>>> checkpoint then close all resources) and reschedule them later. It's TBD >>>>>> whether we'd always strip feed=continuous from those. We _could_ let each >>>>>> job run to completion (i.e, caught up to the source db as of the start of >>>>>> the replication job) but I think we have to be a bit smarter and allow >>>>>> replication jobs that constantly have work to do (i.e, the source db is >>>>>> always busy), to run as they run today, with feed=continuous, unless >>>>>> forcibly ousted by a scheduler due to some configuration concurrency >>>>>> setting. >>>>>> >>>>>> So I think this is really the crux of the issue. My contention is that >>>>>> permanently occupying a socket for each continuous replication with the >>>>>> same source and mediator is needlessly expensive, and that _db_updates >>>>>> could be an elegant replacement. >>>>>> >>>>>>> I note for completeness that the work we're planning explicitly >>>>>> includes "multi database" strategies, you'll hopefully be able to make a >>>>>> single _replicator doc that represents your entire intention (e.g, >>>>>> "replicate _all_ dbs from server1 to server2”). >>>>>> >>>>>> Nice! It’ll be good to hear more about that design as it evolves, >>>>>> particularly in aspects like discovery of newly created source databases >>>>>> and reporting of 403s and other fatal errors. >>>>>> >>>>>> Adam >>>> >>> >> >
