Package: slapd Version: 2.4.44+dfsg-5+deb9u1 Severity: important Dear Maintainer,
I noticed an issue related to delta-syncrepl in a 3-way multi-mastersetup. The issue seems to be caused by a bug in upstream. A patch isavailable upstream for version 2.4.45.The version2.4.45+dfsg-1~bpo9+1 is also affected. Upstream ITS: ITS#8444: Out-of-sync issue with memberOf overlay, Delta-syncreplMMR and >2 nodes https://www.openldap.org/its/index.cgi/Software%20Bugs?id=8444;selectid=8444 Upstream patch: ftp://ftp.openldap.org/incoming/Ondrej-Kuznik-20170825-ITS8444.patch I tested a build of openldap-2.4.45+dfsg and openldap-2.4.44+dfsg with the patchapplied. The issue is resolved in both versions. Please include the upstream patch in stretch and stretch-backports. Description: The issueis related to the cn=config attribute: olcSpCheckpoint <ops> <minutes> On the first ldap_modify operation after a checkpoint has occured, the contextCSN values do not get updated in the accesslog database (provider), while the contextCSN values in the corresponding primary database (consumer) are updated successfully. This leaves slapd in an inconsistent state, where consumer and provider are out of sync. No data is lost, but it is applied multiple times, depending on the number of masters, due to a replication loop caused by the outdated contextCSN values in the accesslog database. If slapd is restartet in this out-of-sync state, the replication is broken. Log output after the ldap_modify operation: sid=000: conn=656113 op=1 MOD dn="cn=phd-test-aa1,ou=hb,ou=syncrepl,dc=phys,dc=ethz,dc=ch" conn=656113 op=1 MOD attr=heartbeatTimestamp slap_queue_csn: queueing 0x7f2fcd418840 20170929065340.101419Z#000000#000#000000 slap_graduate_commit_csn: removing 0x7f2fcd418840 20170929065340.101419Z#000000#000#000000 conn=656113 op=1 RESULT tag=103 err=0 text= syncprov_sendresp: to=002, cookie=rid=004 syncprov_sendresp: to=001, cookie=rid=004 conn=656113 op=2 UNBIND conn=656113 fd=28 closed do_syncrep2: rid=006 cookie=rid=006,sid=002 slap_queue_csn: queueing 0x7f2fc5417c60 20170929065340.101419Z#000000#000#000000 slap_graduate_commit_csn: removing 0x7f2fc5417c60 20170929065340.101419Z#000000#000#000000 syncrepl_message_to_op: rid=006 be_modify cn=phd-test-aa1,ou=hb,ou=syncrepl,dc=phys,dc=ethz,dc=ch (0) do_syncrep2: rid=005 cookie=rid=005,sid=001 slap_queue_csn: queueing 0x7f2fe0012210 20170929065340.101419Z#000000#000#000000 slap_graduate_commit_csn: removing 0x7f2fe0012210 20170929065340.101419Z#000000#000#000000 syncrepl_message_to_op: rid=005 be_modify cn=phd-test-aa1,ou=hb,ou=syncrepl,dc=phys,dc=ethz,dc=ch (0) sid=001: do_syncrep2: rid=004 cookie=rid=004 slap_queue_csn: queueing 0x7f1f6415de10 20170929065340.101419Z#000000#000#000000 slap_graduate_commit_csn: removing 0x7f1f6415de10 20170929065340.101419Z#000000#000#000000 do_syncrep2: rid=006 cookie=rid=006,sid=002 slap_queue_csn: queueing 0x7f1f58157630 20170929065340.101419Z#000000#000#000000 slap_graduate_commit_csn: removing 0x7f1f58157630 20170929065340.101419Z#000000#000#000000 syncrepl_message_to_op: rid=006 be_modify cn=phd-test-aa1,ou=hb,ou=syncrepl,dc=phys,dc=ethz,dc=ch (0) syncprov_sendresp: to=002, cookie=rid=005,sid=001 syncrepl_message_to_op: rid=004 be_modify cn=phd-test-aa1,ou=hb,ou=syncrepl,dc=phys,dc=ethz,dc=ch (0) syncprov_sendresp: cookie=rid=005,sid=001 sid=002: do_syncrep2: rid=004 cookie=rid=004 slap_queue_csn: queueing 0x7f87e8138190 20170929065340.101419Z#000000#000#000000 slap_graduate_commit_csn: removing 0x7f87e8138190 20170929065340.101419Z#000000#000#000000 syncrepl_message_to_op: rid=004 be_modify cn=phd-test-aa1,ou=hb,ou=syncrepl,dc=phys,dc=ethz,dc=ch (0) syncprov_sendresp: cookie=rid=006,sid=002 syncprov_sendresp: to=001, cookie=rid=006,sid=002 do_syncrep2: rid=005 cookie=rid=005,sid=001 slap_queue_csn: queueing 0x7f8800124620 20170929065340.101419Z#000000#000#000000 slap_graduate_commit_csn: removing 0x7f8800124620 20170929065340.101419Z#000000#000#000000 syncrepl_message_to_op: rid=005 be_modify cn=phd-test-aa1,ou=hb,ou=syncrepl,dc=phys,dc=ethz,dc=ch (0) Log output after slapd restart: sid=000: conn=1028 op=2 BIND dn="uid=ldap/phd-test-aa2.ethz.ch,cn=gssapi,cn=auth" mech=GSSAPI sasl_ssf=56 ssf=56 conn=1028 op=2 RESULT tag=97 err=0 text= conn=1028 op=3 SRCH base="cn=deltalog" scope=2 deref=0 filter="(&(objectClass=auditWriteObject)(reqResult=0))" conn=1028 op=3 SRCH attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN conn=1028 op=3 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider! ... conn=1026 op=2 BIND dn="uid=ldap/phd-test-aa3.ethz.ch,cn=gssapi,cn=auth" mech=GSSAPI sasl_ssf=56 ssf=56 conn=1026 op=2 RESULT tag=97 err=0 text= conn=1026 op=3 SRCH base="cn=deltalog" scope=2 deref=0 filter="(&(objectClass=auditWriteObject)(reqResult=0))" conn=1026 op=3 SRCH attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN conn=1026 op=3 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider! sid=001: do_syncrep2: rid=004 LDAP_RES_SEARCH_RESULT do_syncrep2: rid=004 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform do_syncrep2: rid=004 (53) Server is unwilling to perform sid=002: do_syncrep2: rid=004 LDAP_RES_SEARCH_RESULT do_syncrep2: rid=004 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform do_syncrep2: rid=004 (53) Server is unwilling to perform Upstream patch: >From 1d960a78959fd460f894380b44bd5e866763a3af Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ond=C5=99ej=20Kuzn=C3=ADk?= <on...@mistotebe.net> Date: Fri, 25 Aug 2017 16:25:23 +0100 Subject: [PATCH] ITS#8444 Do not clear the pending operation when checkpointing When a checkpoint happens, if we remove the CSN from the pending list, accesslog won't pass it onto the accesslog DB. But in a delta-mmr scenario, an accesslog entry without a CSN faces a race where it might be applied twice - that usually fails and causes a full refresh, other times it can cause a silent desync - both are undesirable. --- servers/slapd/overlays/syncprov.c | 1 + 1 file changed, 1 insertion(+) diff --git a/servers/slapd/overlays/syncprov.c b/servers/slapd/overlays/syncprov.c index e2f79f03a..2f0f5e8fe 100644 --- a/servers/slapd/overlays/syncprov.c +++ b/servers/slapd/overlays/syncprov.c @@ -1492,6 +1492,7 @@ syncprov_checkpoint( Operation *op, slap_overinst *on ) opm.o_bd->bd_info = on->on_info->oi_orig; opm.o_managedsait = SLAP_CONTROL_NONCRITICAL; opm.o_no_schema_check = 1; + opm.o_opid = -1; opm.o_bd->be_modify( &opm, &rsm ); if ( rsm.sr_err == LDAP_NO_SUCH_OBJECT && -- 2.14.1 -- Sven Mäder <mae...@phys.ethz.ch>