Don't think this is very relevant, but thought some people might be interested. This is how we generate a revision ID in Drupal to use with CouchDB. https://github.com/dickolsson/drupal-multiversion/blob/8.x-1.x/src/MultiversionManager.php#L436-L457
On 23 March 2016 at 16:41, Jan Lehnardt <[email protected]> wrote: > Great sleuthing Michael! > > In addition to the recommendation to upgrade to {minor_version: 1}, which > could > be a good first step, how about going the extra mile to make _rev > generation > easier across platforms? This would benefit PouchDB and others. > > Best > Jan > -- > > > On 23 Mar 2016, at 01:30, Michael Fair <[email protected]> wrote: > > > > Greetings CouchDBers! > > > > I've been modifying a BERT library to recreate the md5 calc of a > RevisionID > > in Java. > > > > I haven't tackled attachments yet, however with the awesome help of > rnewson > > on the IRC channel, I've succeeded in recreating the md5 for all the > > documents I've tried so far which includes docs with values of strings, > big > > and small integers, lists of big integers, lists of small integers, true, > > false, null, and objects; however the glaring exception is floats. > > > > The {minor_version, 0} format used for floats (A 31 byte string based > > representation in %.20e format) is dependent on the host environment > doing > > the encoding and can't be reliably duplicated in other machines and > > languages. > > > > For instance, here are examples of encoding 3.14159 as %.20e string on > this > > laptop: > > erlang: 3.1415899999999999000e+00 (This is what term_to_binary is using) > > python: 3.14158999999999988262e+00 > > java: 3.14159000000000000000e+00 > > > > These minor numerical differences unfortunately make the md5 computation > > untenable. And further, it seems that even different OTP versions and > > different hardware will encode the {minor_version, 0} format slightly > > differently on different Couch instances (A couple people on IRC shared > > with me what their OTP produced). > > > > > > To make a long story short and spare folks reading the mind-numbing > > details, without changing something, replicating the md5 for the revision > > id of documents with floats just can't be done sanely. > > > > As things are now, like I mentioned, even different installations of > > CouchDB can disagree on the MD5 revision id for the document > {"pi":3.14159}. > > > > > > So where does this create an issue? > > > > It shows up by creating a conflict document during replication when the > two > > servers calculated different revision ids for the same document update > > (which only happens if it was a multi-master update (an update where both > > sides were updated before replicating -- like separate laptops on > separate > > planes each doing the same thing)). > > > > If only one side or the other was updated, it doesn't cause a problem. > > > > My goal is enabling people to upload documents from multiple server > > applications using JSON and Couch to handle the replication bits. > > > > To give this heterogeneous environment the same multi-master intelligence > > that Couch has, they need to be able to compute the same revision id that > > Couch would compute; otherwise documents modified directly in couch could > > create these kinds of multi-master type conflicts. > > > > > > ---- > > > > What to do (aside from simply do nothing)? > > > > At the least I recommend changing the term_to_binary computation to use > the > > {minor_version, 1} option in the rev_id calculation. > > > > This changes how floats are encoded to the 64-bit IEEE format. It became > > the standard way of encoding floats in OTP 17.0+ and is available as an > > option all the way back to OTP 11. As long as it's explicitly provided > as > > a requested option in the term_to_binary call, all currently deployed OTP > > installations for Couch can do it. > > > > Doing this normalizes the md5 calculation for floats regardless of the > OTP > > platform, and should make it feasible for third party applications to > > replicate the encoding. > > > > > > > > I have some other ideas beyond that, but they would require changes to > the > > replication protocol to support. > > > > > > ---- > > > > For anyone interested I'd be happy to share the code I have. It's still > a > > bit rough in the document construction part, but once constructed, > getting > > the binary encoding and revision id are each just a single call. > > > > > > Thanks, > > Mike > > -- > Professional Support for Apache CouchDB: > https://neighbourhood.ie/couchdb-support/ > >
