Allowing it to be specified directly, falling back to gen 3, 2, 1 calculation seems ok, but getting baroque. Davisp had a suggestion for a layered replicator where each optimisation was a separate feature subject to negotiation. The ability to checkpoint and methods to find recorded checkpoints are candidate features.
Stabilising replication id with a per-server uuid seems a modest step. It helps two cases, the dynamic port one and the cluster one (where any node can be substituted for another). A tidier solution in the latter case than the one in bigcouch today. So this tidies something just in time for the merge. Sent from the ocean floor On 11 Oct 2012, at 14:59, Bob Dionne <[email protected]> wrote: > Incorporating a unique id from the source and target seems like a good way to > go but I'm wondering if an id from an ini file will > work in the clustered BigCouch case. Would an API level request work better? > Something the replicator would interrogate > for both the source and the target. > > > On Oct 11, 2012, at 5:42 AM, Robert Newson <[email protected]> wrote: > >> I'll note here that the attached patch is wrong. It uses a single uuid >> from the node running replication, which might not be the source or >> target. Instead, the uuid of source and target must be retrieved and >> used instead of the host:port. Jason's suggestion to add the uuid >> (stored in the ini file) to the welcome message sounds really good to >> me. >> >> Can't attach this to the ticket today as I don't have my Jira creds. >> >> Sent from the ocean floor >> >> On 10 Oct 2012, at 21:40, Jan Lehnardt <[email protected]> wrote: >> >>> flagged. >>> >>> On Oct 10, 2012, at 22:34 , Robert Newson <[email protected]> wrote: >>> >>>> Jan, >>>> >>>> Flag that as fix-for 1.3? I don't have my creds on my phone to do it. >>>> >>>> I like the ini uuid idea best, modelled after the cookie with secret. >>>> If we have the uuid, we'd omit host name as well as port, right? >>>> >>>> Sent from the ocean floor >>>> >>>> On 10 Oct 2012, at 21:12, Jan Lehnardt <[email protected]> wrote: >>>> >>>>> Filipe tells me this is https://issues.apache.org/jira/browse/COUCHDB-1259 >>>>> >>>>> Cheers >>>>> Jan >>>>> -- >>>>> >>>>> On Oct 4, 2012, at 02:28 , Dustin Sallings <[email protected]> wrote: >>>>> >>>>>> >>>>>> I'm bringing this back up as requested. I'm currently simultaneously in >>>>>> the "not replicating interesting things" and "has duplicate replicates >>>>>> state". I think the stuff below shows the "not replicating" stuff. >>>>>> >>>>>> Active tasks shows the other (these are based on replicator DB documents >>>>>> (example below): >>>>>> >>>>>> [ >>>>>> { >>>>>> "checkpointed_source_seq": 2022317, >>>>>> "continuous": true, >>>>>> "doc_id": "cbstats-from-dogbowl", >>>>>> "doc_write_failures": 0, >>>>>> "docs_read": 300, >>>>>> "docs_written": 300, >>>>>> "missing_revisions_found": 300, >>>>>> "pid": "<0.10466.12>", >>>>>> "progress": 100, >>>>>> "replication_id": "50daecd0a29f4b7e5d102990831f3d64+continuous", >>>>>> "revisions_checked": 304, >>>>>> "source": "http://dustin:*****@single.couchbase.net/cbstats/", >>>>>> "source_seq": 2022317, >>>>>> "started_on": 1349309457, >>>>>> "target": "cbstats", >>>>>> "type": "replication", >>>>>> "updated_on": 1349310442 >>>>>> }, >>>>>> { >>>>>> "checkpointed_source_seq": 2022317, >>>>>> "continuous": true, >>>>>> "doc_id": "cbstats-from-dogbowl", >>>>>> "doc_write_failures": 0, >>>>>> "docs_read": 62, >>>>>> "docs_written": 62, >>>>>> "missing_revisions_found": 62, >>>>>> "pid": "<0.11019.12>", >>>>>> "progress": 100, >>>>>> "replication_id": "411e341d5aa9a3fe636cf4ea8ba71720+continuous", >>>>>> "revisions_checked": 304, >>>>>> "source": "http://dustin:*****@single.couchbase.net/cbstats/", >>>>>> "source_seq": 2022317, >>>>>> "started_on": 1349309471, >>>>>> "target": "cbstats", >>>>>> "type": "replication", >>>>>> "updated_on": 1349310443 >>>>>> }, >>>>>> { >>>>>> "checkpointed_source_seq": 107068, >>>>>> "continuous": true, >>>>>> "doc_id": "gerrit-from-prod", >>>>>> "doc_write_failures": 0, >>>>>> "docs_read": 22, >>>>>> "docs_written": 22, >>>>>> "missing_revisions_found": 22, >>>>>> "pid": "<0.11086.12>", >>>>>> "progress": 100, >>>>>> "replication_id": "4a21031dac0d81637a23c32bad620be9+continuous", >>>>>> "revisions_checked": 26, >>>>>> "source": "http://dustinphoto.iriscouch.com/gerrit/", >>>>>> "source_seq": 107068, >>>>>> "started_on": 1349309487, >>>>>> "target": "gerrit", >>>>>> "type": "replication", >>>>>> "updated_on": 1349310445 >>>>>> }, >>>>>> { >>>>>> "checkpointed_source_seq": 107068, >>>>>> "continuous": true, >>>>>> "doc_id": "gerrit-from-prod", >>>>>> "doc_write_failures": 0, >>>>>> "docs_read": 17, >>>>>> "docs_written": 17, >>>>>> "missing_revisions_found": 17, >>>>>> "pid": "<0.11107.12>", >>>>>> "progress": 100, >>>>>> "replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9+continuous", >>>>>> "revisions_checked": 26, >>>>>> "source": "http://dustinphoto.iriscouch.com/gerrit/", >>>>>> "source_seq": 107068, >>>>>> "started_on": 1349309488, >>>>>> "target": "gerrit", >>>>>> "type": "replication", >>>>>> "updated_on": 1349310445 >>>>>> } >>>>>> ] >>>>>> >>>>>> >>>>>> The replicator document for the latter, for example is this: >>>>>> >>>>>> { >>>>>> "_id": "gerrit-from-prod", >>>>>> "_rev": "2235-36de10fb757581a1782dacbb26ee4809", >>>>>> "source": "http://dustinphoto.iriscouch.com/gerrit", >>>>>> "target": "gerrit", >>>>>> "continuous": true, >>>>>> "user_ctx": { >>>>>> "roles": [ >>>>>> "_admin" >>>>>> ] >>>>>> }, >>>>>> "_replication_state_time": "2012-10-03T17:11:27-07:00", >>>>>> "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9", >>>>>> "_replication_state": "triggered" >>>>>> } >>>>>> >>>>>> >>>>>> Begin forwarded message: >>>>>> >>>>>>> From: Dustin Sallings <[email protected]> >>>>>>> Subject: Re: replication problems >>>>>>> Date: June 15, 2012 0:10:04 PDT >>>>>>> To: [email protected] >>>>>>> Reply-To: [email protected] >>>>>>> >>>>>>> >>>>>>> On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote: >>>>>>> >>>>>>>> Ar you using _replicate or _replicator ? Anything interresting in logs? >>>>>>> >>>>>>> >>>>>>> I'm using _replicator (wonderful feature, I just kill the DB and >>>>>>> everything goes back the way I want it). >>>>>>> >>>>>>> Hmm... I do think I found some stuff digging through the logs. This >>>>>>> is the local DB I noticed not doing its thing, although there were tons >>>>>>> of errors all around this. Looks like the server got into some kind of >>>>>>> bad state and sort of half-crashed. >>>>>>> >>>>>>> >>>>>>> [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication >>>>>>> `ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> >>>>>>> `rpics-processed`) failed: >>>>>>> {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4 >>>>>>> 42.0>}]}}, >>>>>>> {gen_server,call, >>>>>>> [couch_server, >>>>>>> {open,<<"rpics">>, >>>>>>> [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]}, >>>>>>> infinity]}} >>>>>>> [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic server >>>>>>> <0.383.0> terminating >>>>>>> ** Last message in was {'EXIT',<0.384.0>, >>>>>>> {{timeout, >>>>>>> {gen_server,call, >>>>>>> [<0.213.0>,{open_ref_count,<0.442.0>}]}}, >>>>>>> {gen_server,call, >>>>>>> [couch_server, >>>>>>> {open,<<"cbstats">>, >>>>>>> [{user_ctx, >>>>>>> {user_ctx,null,[<<"_admin">>],undefined}}, >>>>>>> {user_ctx, >>>>>>> {user_ctx,null,[<<"_admin">>],undefined}}]}, >>>>>>> infinity]}}} >>>>>>> >>>>>>> ** When Server state == {state,<0.272.0>,<0.384.0>,20, >>>>>>> {httpdb, >>>>>>> >>>>>>> "http://dustin:[email protected]/cbstats/", >>>>>>> nil, >>>>>>> [{"Accept","application/json"}, >>>>>>> {"User-Agent","CouchDB/1.2.0"}], >>>>>>> 30000, >>>>>>> [{socket_options, >>>>>>> [{keepalive,true},{nodelay,false}]}], >>>>>>> 10,250,<0.273.0>,20}, >>>>>>> {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>, >>>>>>> <0.290.0>,<0.286.0>,<0.367.0>, >>>>>>> {db_header,6,984356,0, >>>>>>> {860345646,{737369,975,640891414},59433736}, >>>>>>> {860348005,738344,42056446}, >>>>>>> {860352635,[],5737}, >>>>>>> 0,nil,nil,1000}, >>>>>>> 984356, >>>>>>> {btree,<0.286.0>, >>>>>>> {860345646,{737369,975,640891414},59433736}, >>>>>>> #Fun<couch_db_updater.10.57960608>, >>>>>>> #Fun<couch_db_updater.11.57960608>, >>>>>>> #Fun<couch_btree.5.133731799>, >>>>>>> #Fun<couch_db_updater.12.57960608>,snappy}, >>>>>>> {btree,<0.286.0>, >>>>>>> {860348005,738344,42056446}, >>>>>>> #Fun<couch_db_updater.13.57960608>, >>>>>>> #Fun<couch_db_updater.14.57960608>, >>>>>>> #Fun<couch_btree.5.133731799>, >>>>>>> #Fun<couch_db_updater.15.57960608>,snappy}, >>>>>>> {btree,<0.286.0>, >>>>>>> {860352635,[],5737}, >>>>>>> #Fun<couch_btree.3.133731799>, >>>>>>> #Fun<couch_btree.4.133731799>, >>>>>>> #Fun<couch_btree.5.133731799>,nil,snappy}, >>>>>>> 984356,<<"cbstats">>, >>>>>>> "/Volumes/terror/db/couchdb/cbstats.couch",[],[], >>>>>>> nil, >>>>>>> {user_ctx,null,[<<"_admin">>],undefined}, >>>>>>> nil,1000, >>>>>>> [before_header,after_header,on_file_open], >>>>>>> [{user_ctx, >>>>>>> {user_ctx,null,[<<"_admin">>],undefined}}], >>>>>>> snappy,nil,nil}, >>>>>>> [],nil,nil,nil, >>>>>>> {rep_stats,0,0,0,0,0}, >>>>>>> nil,<0.385.0>, >>>>>>> {batch,[],0}} >>>>>>> ** Reason for termination == >>>>>>> ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}} >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Scrolling to the beginning of the errors, I find this: >>>>>>> >>>>>>> >>>>>>> [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication >>>>>>> `543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> >>>>>>> `http://dustin:*****@dustinphoto.couchone.com/photo/`) failed: >>>>>>> source_db_down >>>>>>> [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 - - GET >>>>>>> /_all_dbs 200 >>>>>>> [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic server >>>>>>> <0.289.0> terminating >>>>>>> ** Last message in was {update_docs,<0.272.0>,[], >>>>>>> [{{doc, >>>>>>> <<"_local/c4cc070f896d7267e52ba012856fed4b">>, >>>>>>> {0,[<<"346185">>]}, >>>>>>> {[{<<"session_id">>, >>>>>>> <<"9fb3475683d44bb1e151031dd42cc59f">>}, >>>>>>> {<<"source_last_seq">>,1419004}, >>>>>>> {<<"replication_id_version">>,2}, >>>>>>> {<<"history">>, >>>>>>> [{[{<<"session_id">>, >>>>>>> >>>>>>> <<"9fb3475683d44bb1e151031dd42cc59f">>}, >>>>>>> {<<"start_time">>, >>>>>>> <<"Thu, 14 Jun 2012 01:35:02 GMT">>}, >>>>>>> {<<"end_time">>, >>>>>>> <<"Thu, 14 Jun 2012 23:15:29 GMT">>}, >>>>>>> {<<"start_last_seq">>,1410146}, >>>>>>> {<<"end_last_seq">>,1419004}, >>>>>>> {<<"recorded_seq">>,1419004}, >>>>>>> {<<"missing_checked">>,8100}, >>>>>>> {<<"missing_found">>,8100}, >>>>>>> {<<"docs_read">>,8100}, >>>>>>> {<<"docs_written">>,8100}, >>>>>>> {<<"doc_write_failures">>,0}]}, >>>>>>> {[{<<"session_id">>, >>>>>>> >>>>>>> <<"3edd7c50327eab7ec0768451e34efa8b">>}, >>>>>>> {<<"start_time">>, >>>>>>> <<"Tue, 12 Jun 2012 05:51:17 GMT">>}, >>>>>>> {<<"end_time">>, >>>>>>> <<"Tue, 12 Jun 2012 13:02:37 GMT">>}, >>>>>>> {<<"start_last_seq">>,1407186}, >>>>>>> {<<"end_last_seq">>,1410146}, >>>>>>> {<<"recorded_seq">>,1410146}, >>>>>>> {<<"missing_checked">>,2583}, >>>>>>> {<<"missing_found">>,2577}, >>>>>>> {<<"docs_read">>,2577}, >>>>>>> {<<"docs_written">>,2577}, >>>>>>> {<<"doc_write_failures">>,0}]}, >>>>>>> {[{<<"session_id">>, >>>>>>> >>>>>>> <<"172de62044281a01b1584a9d099f42af">>}, >>>>>>> {<<"start_time">>, >>>>>>> <<"Mon, 11 Jun 2012 03:40:11 GMT">>}, >>>>>>> {<<"end_time">>, >>>>>>> <<"Mon, 11 Jun 2012 15:16:24 GMT">>}, >>>>>>> {<<"start_last_seq">>,1405428}, >>>>>>> {<<"end_last_seq">>,1407186}, >>>>>>> {<<"recorded_seq">>,1407186}, >>>>>>> {<<"missing_checked">>,1721}, >>>>>>> {<<"missing_found">>,1721}, >>>>>>> {<<"docs_read">>,1721}, >>>>>>> {<<"docs_written">>,1721}, >>>>>>> {<<"doc_write_failures">>,0}]}, >>>>>>> {[{<<"session_id">>, >>>>>>> >>>>>>> <<"e60a126a2036c5fab00a1249101820c8">>}, >>>>>>> {<<"start_time">>, >>>>>>> <<"Sat, 09 Jun 2012 07:47:22 GMT">>}, >>>>>>> {<<"end_time">>, >>>>>>> <<"Sun, 10 Jun 2012 21:16:20 GMT">>}, >>>>>>> {<<"start_last_seq">>,1386289}, >>>>>>> {<<"end_last_seq">>,1405428}, >>>>>>> {<<"recorded_seq">>,1405428}, >>>>>>> {<<"missing_checked">>,16977}, >>>>>>> {<<"missing_found">>,16977}, >>>>>>> {<<"docs_read">>,16977}, >>>>>>> {<<"docs_written">>,16977}, >>>>>>> {<<"doc_write_failures">>,0}]}, >>>>>>> {[{<<"session_id">>, >>>>>>> >>>>>>> <<"ef3e4333d340dcf73ddfa3fe8c720042">>}, >>>>>>> {<<"start_time">>, >>>>>>> <<"Mon, 04 Jun 2012 02:39:44 GMT">>}, >>>>>>> {<<"end_time">>, >>>>>>> <<"Mon, 04 Jun 2012 12:35:50 GMT">>}, >>>>>>> {<<"start_last_seq">>,1384738}, >>>>>>> {<<"end_last_seq">>,1386289}, >>>>>>> {<<"recorded_seq">>,1386289}, >>>>>>> {<<"missing_checked">>,1551}, >>>>>>> {<<"missing_found">>,1550}, >>>>>>> {<<"docs_read">>,1550}, >>>>>>> {<<"docs_written">>,1550}, >>>>>>> {<<"doc_write_failures">>,0}]}, >>>>>>> {[{<<"session_id">>, >>>>>>> >>>>>>> <<"d5123a3caf462794aaf5a47be1bb3b6e">>}, >>>>>>> {<<"start_time">>, >>>>>>> <<"Wed, 30 May 2012 20:41:43 GMT">>}, >>>>>>> {<<"end_time">>, >>>>>>> <<"Mon, 04 Jun 2012 02:37:33 GMT">>}, >>>>>>> {<<"start_last_seq">>,1372404}, >>>>>>> {<<"end_last_seq">>,1384738}, >>>>>>> {<<"recorded_seq">>,1384738}, >>>>>>> {<<"missing_checked">>,12334}, >>>>>>> {<<"missing_found">>,12333}, >>>>>>> {<<"docs_read">>,12333}, >>>>>>> {<<"docs_written">>,12333}, >>>>>>> {<<"doc_write_failures">>,0}]}, >>>>>>> {[{<<"session_id">>, >>>>>>> >>>>>>> <<"52a16e8832f70dc094f6fff5e9b7d75b">>}, >>>>>>> {<<"start_time">>, >>>>>>> <<"Sun, 27 May 2012 23:36:41 GMT">>}, >>>>>>> {<<"end_time">>, >>>>>>> <<"Wed, 30 May 2012 20:40:14 GMT">>}, >>>>>>> {<<"start_last_seq">>,1361049}, >>>>>>> {<<"end_last_seq">>,1372404}, >>>>>>> {<<"recorded_seq">>,1372404}, >>>>>>> {<<"missing_checked">>,11355}, >>>>>>> {<<"missing_found">>,11355}, >>>>>>> {<<"docs_read">>,11355}, >>>>>>> {<<"docs_written">>,11355}, >>>>>>> {<<"doc_write_failures">>,0}]}, >>>>>>> [...lots of these...] >>>>>>> >>>>>>> [],false,[]}, >>>>>>> #Ref<0.0.15.159973>}], >>>>>>> false,false} >>>>>>> ** When Server state == >>>>>>> {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>, >>>>>>> <0.290.0>,<0.286.0>,<0.367.0>, >>>>>>> {db_header,6,992456,0, >>>>>>> {943280145,{744250,975,647546641},60017672}, >>>>>>> {943282327,745225,42485979}, >>>>>>> {943267963,[],5753}, >>>>>>> 0,nil,nil,1000}, >>>>>>> 992456, >>>>>>> {btree,<0.286.0>, >>>>>>> {943280145,{744250,975,647546641},60017672}, >>>>>>> #Fun<couch_db_updater.10.57960608>, >>>>>>> #Fun<couch_db_updater.11.57960608>, >>>>>>> #Fun<couch_btree.5.133731799>, >>>>>>> #Fun<couch_db_updater.12.57960608>,snappy}, >>>>>>> {btree,<0.286.0>, >>>>>>> {943282327,745225,42485979}, >>>>>>> #Fun<couch_db_updater.13.57960608>, >>>>>>> #Fun<couch_db_updater.14.57960608>, >>>>>>> #Fun<couch_btree.5.133731799>, >>>>>>> #Fun<couch_db_updater.15.57960608>,snappy}, >>>>>>> {btree,<0.286.0>, >>>>>>> {943267963,[],5753}, >>>>>>> #Fun<couch_btree.3.133731799>, >>>>>>> #Fun<couch_btree.4.133731799>, >>>>>>> #Fun<couch_btree.5.133731799>,nil,snappy}, >>>>>>> 992456,<<"cbstats">>, >>>>>>> "/Volumes/terror/db/couchdb/cbstats.couch",[],[], >>>>>>> nil, >>>>>>> {user_ctx,null,[],undefined}, >>>>>>> nil,1000, >>>>>>> [before_header,after_header,on_file_open], >>>>>>> [{user_ctx, >>>>>>> {user_ctx,null,[<<"_admin">>],undefined}}], >>>>>>> snappy,nil,nil} >>>>>>> ** Reason for termination == >>>>>>> ** {timeout, >>>>>>> {gen_server,call, >>>>>>> [<0.288.0>, >>>>>>> {db_updated, >>>>>>> {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>, >>>>>>> <0.286.0>,<0.367.0>, >>>>>>> {db_header,6,992456,0, >>>>>>> {943280145,{744250,975,647546641},60017672}, >>>>>>> {943282327,745225,42485979}, >>>>>>> {943267963,[],5753}, >>>>>>> 0,nil,nil,1000}, >>>>>>> 992456, >>>>>>> {btree,<0.286.0>, >>>>>>> {943280145,{744250,975,647546641},60017672}, >>>>>>> #Fun<couch_db_updater.10.57960608>, >>>>>>> #Fun<couch_db_updater.11.57960608>, >>>>>>> #Fun<couch_btree.5.133731799>, >>>>>>> #Fun<couch_db_updater.12.57960608>,snappy}, >>>>>>> {btree,<0.286.0>, >>>>>>> {943282327,745225,42485979}, >>>>>>> #Fun<couch_db_updater.13.57960608>, >>>>>>> #Fun<couch_db_updater.14.57960608>, >>>>>>> #Fun<couch_btree.5.133731799>, >>>>>>> #Fun<couch_db_updater.15.57960608>,snappy}, >>>>>>> {btree,<0.286.0>, >>>>>>> {943284347,[],5756}, >>>>>>> #Fun<couch_btree.3.133731799>, >>>>>>> #Fun<couch_btree.4.133731799>, >>>>>>> #Fun<couch_btree.5.133731799>,nil,snappy}, >>>>>>> 992456,<<"cbstats">>, >>>>>>> "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil, >>>>>>> {user_ctx,null,[],undefined}, >>>>>>> #Ref<0.0.15.160107>,1000, >>>>>>> [before_header,after_header,on_file_open], >>>>>>> [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}], >>>>>>> snappy,nil,nil}}]}} >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> dustin sallings >>>>>> >>>>>> -- >>>>>> dustin sallings >
