I’ll never berate anyone for top-posting (or bottom-posting for that matter). I 
just follow suit with whatever the current thread is doing — in this, very very 
clearly top-posting ;)

Thank you for making this distinction clear. Personally I was only ever 
interested in the first case. Scoping the replicator manager to only learn 
about _replicator docs on the local cluster through internal APIs is a smart 
move — I wasn’t suggesting anything different there. I do think we should have 
an efficient way for a client to learn about the existence of new updates to an 
arbitrary number of databases on a remote cluster using a single socket.

Adam

> On Mar 20, 2016, at 10:45 AM, Robert Samuel Newson <[email protected]> wrote:
> 
> Final note, we've conflated two uses of /_db_updates that I want to be very 
> clear on;
> 
> 1) using /_db_updates to detect active source databases of a replication job.
> 2) using /_db_updates to hear about new/updated/deleted _replicator documents.
> 
> It was the 2nd case where the unreliability was a concern, since the update 
> frequency is very low, one expects, for _replicator databases.
> 
>> On 20 Mar 2016, at 14:44, Robert Samuel Newson <[email protected]> wrote:
>> 
>> (I swear I'll stop soon...)
>> 
>> Using /_db_updates as a cheap mechanism to detect activity at the source for 
>> any database we're interested in is an important optimization. We didn't 
>> discuss it this past week as we felt that /_db_updates wasn't sufficiently 
>> reliable. We can save a lot of churn in the scheduler by simply not resuming 
>> any job unless we have seen an update to the source database.
>> 
>> B.
>> 
>>> On 20 Mar 2016, at 14:36, Robert Samuel Newson <[email protected]> wrote:
>>> 
>>> I missed a point in Adam's earlier post.
>>> 
>>> The current scheme uses couch_event for runtime changes to _replicator docs 
>>> but has to read all updates of all _replicator databases at startup. In the 
>>> steady state it is just receiving couch_event notifications. The 
>>> /_db_updates option would change that only slightly (we'd read /_db_updates 
>>> from 0 to find all _replicator databases, rather than reading the changes 
>>> feed for the node-local 'dbs' database).
>>> 
>>> CouchDB itself has a single /_replicator database, of course, but the code 
>>> will consider any database to be a /_replicator database if the name ends 
>>> that way. i.e, today, if you made a database called foo/_replicator it 
>>> would be considered a /_replicator database by the system (and we'd inject 
>>> the ddoc, etc).
>>> 
>>> B.
>>> 
>>>> On 20 Mar 2016, at 14:31, Robert Samuel Newson <[email protected]> wrote:
>>>> 
>>>> Since I'm typing anyway, and haven't yet been dinged for top-posting, I 
>>>> wanted to mention one other optimization we had in mind.
>>>> 
>>>> Currently each replicator job has its own connection pool. When we 
>>>> introduce the notion that we can stop and restart jobs, those become 
>>>> approximately useless. So we will obvious hoist that 'up' to a higher 
>>>> level and manage connection pools at the manager level.
>>>> 
>>>> One optimization that seems obvious from the Cloudant perspective is to 
>>>> allow reuse of connections to the same destinations even though they are 
>>>> ostensibly for different domains. That is, a connection to 
>>>> rnewson.cloudant.com is ultimately a connection to 
>>>> lbX.jenever.cloudant.com. This connection could just as easily be used for 
>>>> any other user in the jenever cluster. Thus, if it's idle, we could borrow 
>>>> that connection rather than create a new one.
>>>> 
>>>>> host rnewson.cloudant.com
>>>> rnewson.cloudant.com is an alias for jenever.cloudant.com.
>>>> jenever.cloudant.com is an alias for lb2.jenever.cloudant.com.
>>>> lb2.jenever.cloudant.com has address 5.153.0.207
>>>> 
>>>> Rather than add rnewson.cloudant.com > 5.153.0.207 to the pool, we would 
>>>> add lb2.jenever.cloudant.com -> 5.153.0.207 and resolve 
>>>> rnewson.cloudant.com to its ultimate CNAME before consulting the pool.
>>>> 
>>>> Does this optimization help elsewhere than Cloudant?
>>>> 
>>>>> On 20 Mar 2016, at 14:22, Robert Samuel Newson <[email protected]> wrote:
>>>>> 
>>>>> My point is that we can (and currently do) trigger the replication 
>>>>> manager on receipt of the database updated event, so it avoids all of the 
>>>>> other parts of the sequence you describe which could fail.
>>>>> 
>>>>> The obvious difference, and I suspect this is what motivates Adam's 
>>>>> position, is that _db_updates can be called remotely. A solution using 
>>>>> /_db_updates as its feed can run somewhere else, it wouldn't even need to 
>>>>> be a couchdb cluster. With the current 2.0 scheme, the _replicator db has 
>>>>> to live on the nodes performing replication management (and therefore it 
>>>>> depends on couch_{btree,file} etc). That's a huge incentive to go the 
>>>>> /_db_updates route and it would serve as a model for others like pouchdb 
>>>>> that cannot choose to co-locate.
>>>>> 
>>>>> One side-benefit we get from using database updated events from the 
>>>>> _replicator shards, though, is that it helps us determine which node will 
>>>>> run any particular job. We allocate a job to the lowest live erlang node 
>>>>> that hosts the document. If we go with /_db_updates, we'll need some 
>>>>> other scheme. That's not a bad thing (indeed, it could be a very good 
>>>>> thing), but it would need more thought. While in Seattle we did discuss 
>>>>> both directions at some length and believe we'd need some form of leader 
>>>>> election system, the leader would then assign (and rebalance) replication 
>>>>> jobs across the erlang cluster. I pointed at a proof-of-concept 
>>>>> implementation of an algorithm I trust that I wrote a while back at 
>>>>> https://github.com/cloudant/sobhuza as a possible starting point.
>>>>> 
>>>>> B.
>>>>> 
>>>>> P.S. I'm using Mail.app and simply replying where it sticks the cursor 
>>>>> (at the top), but in other forums I've been berated for top-posting. 
>>>>> Should I modify my reply style here?
>>>>> 
>>>>> On 19 Mar 2016, at 21:42, Benjamin Bastian <[email protected]> wrote:
>>>>>> 
>>>>>> When a shard is updated, it'll trigger a "database updated" event. 
>>>>>> CouchDB
>>>>>> will hold those updates in memory for a configurable amount of time in
>>>>>> order to dedupe updates. It'll then cast lists of updated databases to
>>>>>> nodes which host the relevant _db_updates shards for further 
>>>>>> deduplication.
>>>>>> It's only at that point that the updates are persisted. Only a single
>>>>>> update needs to reach the _db_updates DB. IIRC, _db_updates triggers up 
>>>>>> to
>>>>>> n^3 (assuming the _db_updates DB and the updated DB have the same N), so 
>>>>>> it
>>>>>> may be a bit tricky for all of them to fail. You'd need coordinated node
>>>>>> failure. Perhaps something like datacenter power loss. Another possible
>>>>>> issue is if all the nodes which host a shard range of the _db_updates DB
>>>>>> are unreachable by the nodes which host a shard range of any other DB. 
>>>>>> Even
>>>>>> if it was momentary, it'd cause messages to be dropped from the 
>>>>>> _db_updates
>>>>>> feed.
>>>>>> 
>>>>>> For n=3 DBs, it seems like it'd be difficult for all of those things to 
>>>>>> go
>>>>>> wrong (except perhaps in the case of power loss or catastrophic network
>>>>>> failure). For n=1 DBs, you'd simply need to reboot a node soon after an
>>>>>> update.
>>>>>> 
>>>>>> On Sat, Mar 19, 2016 at 1:31 PM, Adam Kocoloski <[email protected]> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Bob, comments inline:
>>>>>>> 
>>>>>>>> On Mar 19, 2016, at 2:36 PM, Robert Samuel Newson <[email protected]>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> The problem is that _db_updates is not guaranteed to see every update,
>>>>>>> so I think it falls at the first hurdle.
>>>>>>> 
>>>>>>> Do you mean to say that a listener of _db_updates is not guaranteed to 
>>>>>>> see
>>>>>>> every updated *database*? I think it would be helpful for the 
>>>>>>> discussion to
>>>>>>> describe the scenario in which an updated database permanently fails to
>>>>>>> show up in the feed. My recollection is that it’s quite byzantine.
>>>>>>> 
>>>>>>>> What couch_replicator_manager does in couchdb 2.0 (though not in the
>>>>>>> version that Cloudant originally contributed) is to us ecouch_event, 
>>>>>>> notice
>>>>>>> which are to _replicator shards, and trigger management work from that.
>>>>>>> 
>>>>>>> Did you mean to say “couch_event”? I assume so. You’re describing how 
>>>>>>> the
>>>>>>> replicator manager discovers new replication jobs, not how the jobs
>>>>>>> discover new updates to source databases specified by replication jobs.
>>>>>>> Seems orthogonal to me unless I missed something.
>>>>>>> 
>>>>>>>> Some work I'm embarking on, with a few other devs here at Cloudant, is
>>>>>>> to enhance the replicator manager to not run all jobs at once and it is
>>>>>>> indeed the plan to have each of those jobs run for a while, kill them 
>>>>>>> (they
>>>>>>> checkpoint then close all resources) and reschedule them later. It's TBD
>>>>>>> whether we'd always strip feed=continuous from those. We _could_ let 
>>>>>>> each
>>>>>>> job run to completion (i.e, caught up to the source db as of the start 
>>>>>>> of
>>>>>>> the replication job) but I think we have to be a bit smarter and allow
>>>>>>> replication jobs that constantly have work to do (i.e, the source db is
>>>>>>> always busy), to run as they run today, with feed=continuous, unless
>>>>>>> forcibly ousted by a scheduler due to some configuration concurrency
>>>>>>> setting.
>>>>>>> 
>>>>>>> So I think this is really the crux of the issue. My contention is that
>>>>>>> permanently occupying a socket for each continuous replication with the
>>>>>>> same source and mediator is needlessly expensive, and that _db_updates
>>>>>>> could be an elegant replacement.
>>>>>>> 
>>>>>>>> I note  for completeness that the work we're planning explicitly
>>>>>>> includes "multi database" strategies, you'll hopefully be able to make a
>>>>>>> single _replicator doc that represents your entire intention (e.g,
>>>>>>> "replicate _all_ dbs from server1 to server2”).
>>>>>>> 
>>>>>>> Nice! It’ll be good to hear more about that design as it evolves,
>>>>>>> particularly in aspects like discovery of newly created source databases
>>>>>>> and reporting of 403s and other fatal errors.
>>>>>>> 
>>>>>>> Adam
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to