Hi Howard, I have tried the slapd -c option with a rid value, and it also tries to resync the entire directory when doing that while comparing CSNs. There is also a cid value which can be passed to the -c option, but I was unable to find an example of what to pass in there. Is it just a contextCSN value? Thanks.
cheers, Ven -----Original Message----- From: Howard Chu [mailto:[email protected]] Sent: August-02-11 2:35 PM To: Mahadevan, Venkatasubramanian Cc: Chris Jacobs; '[email protected]' Subject: Re: syncrepl: consumer state is newer than provider Mahadevan, Venkatasubramanian wrote: > Hi David, > > Thanks much for your response. > That's what I did but when I do that it seems to take forever to > recover using syncrepl as it goes through all the entries in the > databases comparing CSNs. So what I did was stop slapd and rebuild the > database using slapadd with the -w option to preserve syncrepl > information. After that, replication started working again, but it's a > less than ideal way to recover from a replication failure. Perhaps the > inherent nature of 2 master servers being updated leads to replication > conflicts whereby the 2 servers get stuck in an infinite loop because their > contextCSN values are out of sync? Next time try the slapd -c option. > cheers, > > Ven > > ________________________________________ > From: Chris Jacobs [[email protected]] > Sent: Monday, August 01, 2011 8:33 AM > To: Mahadevan, Venkatasubramanian; '[email protected]' > Subject: Re: syncrepl: consumer state is newer than provider > > Apologies for top posting - blackberry. > > Short term fix: > Pick a server, take it offline (stop slapd). > Clear it's database - be careful to not delete any db config files. > Start it back up. > > If this happens again, then you'll want to up logging, etc. There's plenty of > info on how to trouble shoot openldap. > > Note: I'm a sysadmin, not a systems engineer. It's possible the actual reason > this broke is clear in your current logs, but not to me. > > - chris > > Chris Jacobs, Systems Administrator, Technology Services Group Apollo > Group | Apollo Marketing and Product Development?? |?? Aptimus, Inc. > 2001 6th Ave?? |?? Suite 3200?? |?? Seattle, WA 98121 direct > 206.839.8245?? |?? cell 206.601.3256?? |?? fax 206.839.8106 email > [email protected] > > ________________________________ > From: > [email protected]<openldap-technical-bounces@Ope > nLDAP.org> > To: [email protected]<[email protected]> > Sent: Fri Jul 29 14:03:06 2011 > Subject: syncrepl: consumer state is newer than provider > > Hello, > > I have 2 OpenLDAP servers with the following configuration: > > -- OpenLDAP 2.4.26-Release running on Red Hat Enterprise 5.5 > -- The two servers are setup in a mirrored multi-master configuration. > Below is the relevant portion of the slapd.conf: > > > server1 > ---------- > syncrepl rid=002 > provider=ldaps://server2 > type=refreshAndPersist > retry="5 5 300 +" > searchbase="o=ourdomain.ca" > attrs="*,+" > bindmethod=simple > binddn="cn=Replication Manager,o=ubc.ca" > credentials=something > > mirrormode TRUE > overlay syncprov > syncprov-checkpoint 100 10 > > server2 > ---------- > syncrepl rid=001 > provider=ldaps://server1 > type=refreshAndPersist > retry="5 5 300 +" > searchbase="o=ourdomain.ca" > attrs="*,+" > bindmethod=simple > binddn="cn=Replication Manager,o=ubc.ca" > credentials=something > > mirrormode TRUE > overlay syncprov > syncprov-checkpoint 100 10 > > The servers have their clocks synchronized using ntp. Below is the output of > ntpq: > > server1 > ---------- > ntpq> peer > remote refid st t when poll reach delay offset > jitter > ====================================================================== > ======== > +hub.ubc.ca 93.113.2.250 3 u 594 1024 377 1.252 1.110 1.520 > *dns3.ubc.ca 192.53.103.108 2 u 92 1024 377 1.648 2.670 0.157 > > server2 > ---------- > ntpq> peer > remote refid st t when poll reach delay offset > jitter > ====================================================================== > ======== > +hub.ubc.ca 93.113.2.250 3 u 332 1024 377 0.706 3.487 0.900 > *dns3.ubc.ca 192.53.103.108 2 u 325 1024 377 1.631 3.668 0.022 > > > As far as I can tell the clocks appear to be in sync with each other, > so hopefully this is not a cause of the replication issues I am having. > > The problem is that the servers are now refusing to synchronize with > each other (replication was working > before) but not it does not. The log files on the servers are filled with > entries like: > > server1 > ---------- > Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 > LDAP_RES_SEARCH_RESULT Jul 29 13:48:54 ldapdev1 slapd[11989]: > do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT (53) Server is unwilling > to perform Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 (53) > Server is unwilling to perform Jul 29 13:48:57 ldapdev1 slapd[11989]: > conn=1081 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)" > Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH attr=* + > Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SEARCH RESULT tag=101 > err=53 nentries=0 text=consumer state is newer than provider! > > server2 > ---------- > Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 > LDAP_RES_SEARCH_RESULT Jul 29 13:50:52 ldapdev2 slapd[7996]: > do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT (53) Server is unwilling > to perform Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 (53) > Server is unwilling to perform Jul 29 13:50:55 ldapdev2 slapd[7996]: > conn=1102 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)" > Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH attr=* + Jul > 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SEARCH RESULT tag=101 err=53 > nentries=0 text=consumer state is newer than provider! > > > So it is looking like the ContextCSN cookies on both servers are out of sync. > Digging further into this, I did a search for the ContextCSN values on both > servers and got the following values: > > server1 > ---------- > 20110729165747.697237Z#000000#001#000000;20110726161604.535176Z#000000 > #002#000000 > > server2 > ---------- > 20110728220449.050499Z#000000#001#000000;20110728223211.933995Z#000000 > #002#000000 > > > So my question is: how does one get the server synchronization cookies back > into sync and ensure that replication is restarted succesfully again? > As of now, all I see is the log files filling up with messages as shown above > and the sync cookies not being updated. Any help or pointers are appreciated. > Thanks! > > cheers, > > Ven > > ________________________________ > This message is private and confidential. If you have received it in error, > please notify the sender and remove it from your system. > > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
