On 3/3/21 8:58 PM, Quanah Gibson-Mount wrote: > --On Wednesday, March 3, 2021 6:24 PM +0100 Emmanuel Seyman > <[email protected]> wrote: > >> The problem is that I don't see any messages in the log that stand >> out as being errors (granted, I'm not sure what I'm looking for). >> In fact, the alert flaps every once in a while as the two nodes >> come back in sync and drift away from each other again. >> >> I find these values surprising considering I've never seen a syncrepl >> error in the 2 years before the upgrade. Is there a known issue with >> replication in 2.4.57 that would explain these sync differences? > > The replication code in 2.4.44 was completely unreliable and could > report being in sync regardless of whether or not that was true. It's > also unknown to me if the nagios plugin is accurate for the current > codebase. > > Generally what you want to look at are the contextCSN values in the root > of the DIT of each server to see if they match.
My slapdcheck package [1] also implements exactly this check and sometimes it shows a difference although the changes have been corrected replicated (normal syncrepl). You can look at the code to verify what it's doing: https://gitlab.com/ae-dir/slapdcheck/-/blob/master/slapdcheck/__init__.py#L1070 (It reads the actual syncrepl providers from cn=config before comparing the contextCSN values for each serverID.) I discussed this several times with Howard and Ondrej but no idea came up why that happens. Ciao, Michael. [1] https://www.stroeder.com/slapdcheck.html
