On 3/3/21 8:58 PM, Quanah Gibson-Mount wrote:
> --On Wednesday, March 3, 2021 6:24 PM +0100 Emmanuel Seyman
> <[email protected]> wrote:
> 
>> The problem is that I don't see any messages in the log that stand
>> out as being errors (granted, I'm not sure what I'm looking for).
>> In fact, the alert flaps every once in a while as the two nodes
>> come back in sync and drift away from each other again.
>>
>> I find these values surprising considering I've never seen a syncrepl
>> error in the 2 years before the upgrade. Is there a known issue with
>> replication in 2.4.57 that would explain these sync differences?
> 
> The replication code in 2.4.44 was completely unreliable and could
> report being in sync regardless of whether or not that was true.  It's
> also unknown to me if the nagios plugin is accurate for the current
> codebase.
> 
> Generally what you want to look at are the contextCSN values in the root
> of the DIT of each server to see if they match.

My slapdcheck package [1] also implements exactly this check and
sometimes it shows a difference although the changes have been corrected
replicated (normal syncrepl).

You can look at the code to verify what it's doing:

https://gitlab.com/ae-dir/slapdcheck/-/blob/master/slapdcheck/__init__.py#L1070

(It reads the actual syncrepl providers from cn=config before comparing
the contextCSN values for each serverID.)

I discussed this several times with Howard and Ondrej but no idea came
up why that happens.

Ciao, Michael.

[1] https://www.stroeder.com/slapdcheck.html

Reply via email to