To those following along at home, we managed to get PouchDB’s test suite to pass against CouchDB 2 RC4 in Node, but not in Firefox.
https://github.com/pouchdb/pouchdb/pull/5628#issuecomment-244584069 Since this is a timeout, it may just be a test artifact, but 100 seconds seems like a pretty long timeout to not be a true failure. –Nolan > On Sep 3, 2016, at 8:24 PM, Russell Branca <[email protected]> wrote: > > N=1 might reduce some of the issues, but it won't eliminate the problem > entirely. The fundamental issue is that the "_dbs" db, which contains a > document corresponding to every clustered database in the system, does not > provide immediate consistency guarantees, and cycling databases can result > in conflicts arising in these docs. The docs contain the shard/node > mappings and conflicts can cause different nodes to have different views of > the world. > > It's important to remember that the "_dbs" db powers the db -> shards > mapping and is a fundamental component of the quorum system, so > unfortunately the standard clustered quorum semantics are not available in > the "_dbs" db as it operates at a lower level. You can see the initial > synchronization during bootup in [1] which circles its way back to [2] by > way of mem3_sync_nodes.erl. You can further see where the "_dbs" db is a > local db in the way in which shards are loaded in [3] and the fallback for > creating the "_dbs" db in [4]. > > So in summary, the "_dbs" db operates at a lower level than the quorum > system as the db is a core component that powers the shard mappings, and > therefore uses a different approach for synchronization where each node has > a full copy of the "_dbs" db and syncs directly with the other nodes. This > is a known weak point as can be seen by the impact of cycling databases too > quickly, and so recommended best practice is to not cycle databases > quickly. Obviously this is not ideal, and this is one of the areas where a > CP config store of some sort would be a significant boon, but bolting on a > CP system to an AP system is fraught with a new set of complexities. > > (A clarification on N=1: with N=1 you only have one replica of the > database, and the database exists on only one node. The rest of the nodes > still need to get the updated "_dbs" db doc so they know where the database > exists, because any node in the cluster can handle any request and it will > need to know where the database exists. In general, you have one > coordinating node and N replica nodes containing the N replicas (of each > shard) for the given database. In a three node cluster with N=3, whatever > coordinating node the request is handled by will also have a local shard > replica, but this is a special case. In a cluster with more than 3 nodes, > say 15 nodes, the coordinating node will only have a 3/15 chance to contain > a local shard (assuming round robin load balancing across nodes). So > basically every node must know where every database exists because every > node can coordinate every request.) > > > -Russell > > > [1] > https://github.com/apache/couchdb-mem3/blob/15615b295ec970ca9b12b7b54107a80b95149511/src/mem3_sync.erl#L234-L236 > [2] > https://github.com/apache/couchdb-mem3/blob/15615b295ec970ca9b12b7b54107a80b95149511/src/mem3_sync.erl#L230-L232 > [3] > https://github.com/apache/couchdb-mem3/blob/699308f510d335d05bfd0416ad5e893b68a7ec1d/src/mem3_shards.erl#L266-L283 > [4] > https://github.com/apache/couchdb-mem3/blob/699308f510d335d05bfd0416ad5e893b68a7ec1d/src/mem3_util.erl#L214-L222 > > On Fri, Sep 2, 2016 at 10:43 AM, Nolan Lawson <[email protected]> wrote: > >> Thanks, Dale. That was my recollection as well. >> >> Basically PouchDB does PUT -> DELETE -> PUT between every test, so since >> there are 1000s of tests, this race condition comes up pretty easily. We >> can add a timeout or do a random DB name, but without doing that we don't >> know if Couch 2.x is truly "passing" the test suite or not. >> >> I have some time this weekend, so I'll look into adding a patch to do the >> workaround for Couch 2. I tend to side with Jan that in a clustered system >> it can't reliably tell us when a database was truly deleted without >> sacrificing the A in CAP. PouchDB users are already familiar with the weird >> ways that databases start to behave when you actually DELETE them (e.g. >> replication gets unreliable), hence workarounds like >> https://www.npmjs.com/package/pouchdb-erase . In practice I expect PouchDB >> users to never delete databases, so this is just an artifact of our test >> suite IMO. >> >> –Nolan >> >> >> On Fri, Sep 2, 2016 at 3:14 AM, Dale Harvey <[email protected]> wrote: >> >>> In PouchDB we can look into a workaround that uses random names only when >>> the tests are run against Couch 2.0, however I would really like to make >>> sure that a database not being fully deleted when we get a successful >>> confirmation of deletion is considered a bug, it has impacts beyond the >>> test suite, its really hard to create a reliable system when there is no >>> way for you to be certain when a database is deleted. >>> >>> Will found it easiest to reproduce this using concurrent scripts but >> would >>> like to clarify that Pouch doesnt run the test suite in parallel, this >> bug >>> can be hit by doing CREATE -> DELETE -> CREATE, its extremely hard to >> nail >>> down and reproduce (the similiar bug in PouchDB took many attempts + >>> months). I will take a look at seeing if I can make an easier and clearer >>> steps to reproduce. >>> >>> On 2 September 2016 at 11:01, Jan Lehnardt <[email protected]> wrote: >>> >>>> >>>>> On 02 Sep 2016, at 11:58, Will Holley <[email protected]> wrote: >>>>> >>>>> Jan - I can understand that being the case in a clustered setup with >>>>> distributed shard maps but shouldn't n=1 mitigate that? >>>> >>>> n=1 still does q=8 (8 shards per node) and the software makes >>>> noconsistency guarantees whatsoever. >>>> >>>> n=1 && q=1 might work as a side-effect, but not sure how that is useful >>>> for reliable tests :) >>>> >>>> Best >>>> Jan >>>> -- >>>> >>>> >>>>> >>>>> On 2 September 2016 at 10:53, Jan Lehnardt <[email protected]> wrote: >>>>>> >>>>>>> On 02 Sep 2016, at 11:45, Dale Harvey <[email protected]> wrote: >>>>>>> >>>>>>> In PouchDB we used to generate unique database names for tests, >>>> however we >>>>>>> removed it for serveral reasons, one large reason being it >> indicates >>> a >>>> race >>>>>>> condition in critical code if we cannot reliably create -> delete >> -> >>>> create >>>>>>> the same database (we have uncovered and fixed a lot of bugs in >>>> PouchDB due >>>>>>> to this). While its not my call how to prioritise those bugs, I >>> really >>>> do >>>>>>> not think we should be closing what are fairly serious bugs because >>> it >>>>>>> wasnt inconvenient to workaround them in the couch test suite. >>>>>> >>>>>> It’s just that a CouchDB 2.0 cluster is an AP system, and recreating >>>> databases >>>>>> in quick succession reliably basically requires a CA system and >> that’s >>>> not what can do easily. >>>>>> >>>>>> (I hope I got the CAP letters right, but I think it is clear what I >>>> mean) >>>>>> >>>>>> That is, maybe we skip those tests when run against a CouchDB 2.0 >>>> endpoint and keep them for PouchDB? >>>>>> >>>>>> Best >>>>>> Jan >>>>>> -- >>>>>> >>>>>> >>>>>>> >>>>>>> On 2 September 2016 at 10:31, Joan Touzet <[email protected]> >> wrote: >>>>>>> >>>>>>>> Hi Nolan, Will: >>>>>>>> >>>>>>>> A further update from looking deeper with @janl. It appears that >> we >>>>>>>> have a pending fix for COUCHDB-3017 and we'll work on getting that >>>>>>>> merged before 2.0. >>>>>>>> >>>>>>>> COUCHDB-3034 is a WONTFIX. FYI in CouchDB itself we changed all of >>>>>>>> our tests to use unique database names. I'll update the bug myself >>>>>>>> shortly. >>>>>>>> >>>>>>>> -Joan >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>>> From: "Joan Touzet" <[email protected]> >>>>>>>>> To: [email protected] >>>>>>>>> Sent: Friday, September 2, 2016 5:15:00 AM >>>>>>>>> Subject: Re: Getting libraries to test RCs >>>>>>>>> >>>>>>>>> Hi Will, >>>>>>>>> >>>>>>>>> Neither of these are currently tagged as blocking issues for >>> CouchDB >>>>>>>>> 2.0, only major priority. If you want to flag them as such, this >> is >>>>>>>>> your last chance, and even still, there's no guarantee fixes for >>> them >>>>>>>>> will hit 2.0. >>>>>>>>> >>>>>>>>> Erlangers, is there any chance of at least triaging these today? >>>>>>>>> >>>>>>>>> -Joan >>>>>>>>> >>>>>>>>> ----- Original Message ----- >>>>>>>>>> From: "Will Holley" <[email protected]> >>>>>>>>>> To: [email protected], "Joan Touzet" <[email protected]> >>>>>>>>>> Sent: Friday, September 2, 2016 4:43:48 AM >>>>>>>>>> Subject: Re: Getting libraries to test RCs >>>>>>>>>> >>>>>>>>>> Assuming nothing's changed in the last few weeks, there are 2 >>>>>>>>>> issues >>>>>>>>>> which cause the PouchDB tests to fail against master: >> COUCHDB-3017 >>>>>>>>>> and >>>>>>>>>> COUCHDB-3034. >>>>>>>>>> >>>>>>>>>> Both could be addressed in the test suite by using different >>>>>>>>>> database >>>>>>>>>> names for each test, but that's quite a disruptive change. >>>>>>>>>> >>>>>>>>>> On 2 September 2016 at 03:15, Joan Touzet <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>>> Hi Nolan, you state that it's 'failing for known reasons.' Is >>>>>>>>>>> that >>>>>>>>>>> reasons in PouchDB or anything you need to push back on us? >> We'd >>>>>>>>>>> like >>>>>>>>>>> to know ASAP as we're very, very close to releasing 2.0 now. >>>>>>>>>>> >>>>>>>>>>> I have zero PouchDB knowledge so I'm hoping you can give us a >>>>>>>>>>> short >>>>>>>>>>> summary of what you think is wrong. >>>>>>>>>>> >>>>>>>>>>> All the best, >>>>>>>>>>> Joan >>>>>>>>>>> >>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>> From: "Nolan Lawson" <[email protected]> >>>>>>>>>>>> To: [email protected] >>>>>>>>>>>> Sent: Thursday, September 1, 2016 7:56:42 PM >>>>>>>>>>>> Subject: Re: Getting libraries to test RCs >>>>>>>>>>>> >>>>>>>>>>>> We have been testing CouchDB master in PouchDB for months now, >>>>>>>>>>>> but >>>>>>>>>>>> as >>>>>>>>>>>> an allowed failure because I believe it’s failing for known >>>>>>>>>>>> reasons. >>>>>>>>>>>> We test both using Node.js and the browser. >>>>>>>>>>>> >>>>>>>>>>>> Node: https://travis-ci.org/pouchdb/pouchdb/jobs/156198210 >>>>>>>>>>>> Browser: https://travis-ci.org/pouchdb/pouchdb/jobs/156198211 >>>>>>>>>>>> >>>>>>>>>>>> For anyone who wants to run the Pouch test suite against >>>>>>>>>>>> CouchDB, >>>>>>>>>>>> it’s just: >>>>>>>>>>>> >>>>>>>>>>>> git clone https://github.com/pouchdb/pouchdb.git >>>>>>>>>>>> cd pouchdb >>>>>>>>>>>> npm I >>>>>>>>>>>> COUCH_HOST=http://localhost:5984 BAIL=0 npm t >>>>>>>>>>>> >>>>>>>>>>>> BAIL=0 will tell it to run the full test suite and not stop on >>>>>>>>>>>> any >>>>>>>>>>>> failures. That way you can inspect the failures and see if >>>>>>>>>>>> they’re >>>>>>>>>>>> serious or not. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Nolan >>>>>>>>>>>> >>>>>>>>>>>>> On Aug 29, 2016, at 12:15 PM, Jan Lehnardt <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Anyone on this list who could help with this? The work items >>>>>>>>>>>>> are >>>>>>>>>>>>> fairly self-explanatory and not very big individually <3 >>>>>>>>>>>>> >>>>>>>>>>>>> Best >>>>>>>>>>>>> Jan >>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>>>> On 10 Aug 2016, at 09:37, Jan Lehnardt <[email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hey everyone, >>>>>>>>>>>>>> >>>>>>>>>>>>>> from Joan’s excellent blog post about testing Release >>>>>>>>>>>>>> Candidates: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> To our valued CouchDB application and library developers: >>>>>>>>>>>>>>> please, >>>>>>>>>>>>>>> please run your software against each of the options below. >>>>>>>>>>>>>> >>>>>>>>>>>>>> — https://blog.couchdb.org/2016/08/08/release-candidates/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think we can be a little more proactive about this for >>>>>>>>>>>>>> CouchDB >>>>>>>>>>>>>> client libraries: let’s open issues on all the >>>>>>>>>>>>>> CouchDB-compatible >>>>>>>>>>>>>> client software we care about to test an RC. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Since there are a lot of projects, and we don’t necessarily >>>>>>>>>>>>>> know >>>>>>>>>>>>>> which one we “care” about, we should try to be clever about >>>>>>>>>>>>>> it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Maybe something like this can work: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. We prepare an issue text explaining the thing: Heya, >>>>>>>>>>>>>> CouchDB >>>>>>>>>>>>>> team here, major new version coming up, you should test it >>>>>>>>>>>>>> like >>>>>>>>>>>>>> so: <include instructions to test against a 3-node cluster. >>>>>>>>>>>>>> Maybe >>>>>>>>>>>>>> even provide a cluster to do this, or Cloudant can sponsor >>>>>>>>>>>>>> something? >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2. Post this message with a call to action on [email protected], >> the >>>>>>>>>>>>>> weekly news, and our other (social) media channels. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 3. Ask people who submitted an issue to report back with a >>>>>>>>>>>>>> link. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 4. Collect the link in an issue or JIRA (this could be done >>>>>>>>>>>>>> in >>>>>>>>>>>>>> 3., >>>>>>>>>>>>>> but then everybody needs to be added to the wiki write >> group, >>>>>>>>>>>>>> and >>>>>>>>>>>>>> that’s just extra overhead we don’t need). Maybe we borrow a >>>>>>>>>>>>>> gist >>>>>>>>>>>>>> for this, or a Google doc. >>>>>>>>>>>>>> >>>>>>>>>>>>>> That way we encourage client software to check out RCs and >> we >>>>>>>>>>>>>> can >>>>>>>>>>>>>> keep track, while the community helps to select which >>>>>>>>>>>>>> software >>>>>>>>>>>>>> to >>>>>>>>>>>>>> encourage to test 2.0 compat, and helps spread the word and >>>>>>>>>>>>>> the >>>>>>>>>>>>>> burden is not left with just a few folks. >>>>>>>>>>>>>> >>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best >>>>>>>>>>>>>> Jan >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Professional Support for Apache CouchDB: >>>>>>>>>>>>> https://neighbourhood.ie/couchdb-support/ >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>>> -- >>>>>> Professional Support for Apache CouchDB: >>>>>> https://neighbourhood.ie/couchdb-support/ >>>>>> >>>> >>>> -- >>>> Professional Support for Apache CouchDB: >>>> https://neighbourhood.ie/couchdb-support/ >>>> >>>> >>> >> >> >> >> -- >> Nolan Lawson >> nolanlawson.com >> github.com/nolanlawson >>
