On 3/4/21 12:20 PM, Ondřej Kuzník wrote:
> On Wed, Mar 03, 2021 at 09:52:26PM +0100, Michael Ströder wrote:
>> My slapdcheck package [1] also implements exactly this check and
>> sometimes it shows a difference although the changes have been corrected
>> replicated (normal syncrepl).
>>
>> You can look at the code to verify what it's doing:
>>
>> https://gitlab.com/ae-dir/slapdcheck/-/blob/master/slapdcheck/__init__.py#L1070
>>
>> (It reads the actual syncrepl providers from cn=config before comparing
>> the contextCSN values for each serverID.)
>>
>> I discussed this several times with Howard and Ondrej but no idea came
>> up why that happens.
> 
> I don't remember the discussion anymore but there's a corner case people
> writing syncrepl checking scripts often forget to address:
> 
> If it takes 1 second to replicate a change and a previous change
> happened x seconds before this one there's going to be a window of 1
> second where you see an x second CSN difference between the provider and
> consumer. In no way does it mean the consumer is x seconds behind.

I'm talking about the contextCSN difference being visible for several
*hours* while the changes have been already successfully replicated.
Replication delay is very short, syncrepl type is refreshAndPersist.

> If there's an acceptable delay of n seconds, you better wait for that
> amount of time before raising an alarm,

And what's an appropriate value for n? 86400? ;-]

> See the logic in syncmonitor[0]

Ideally I'd like to query cn=monitor whether slapd thinks replication is
in a healthy state.

Ciao, Michael.

Reply via email to