On 3/4/21 12:20 PM, Ondřej Kuzník wrote: > On Wed, Mar 03, 2021 at 09:52:26PM +0100, Michael Ströder wrote: >> My slapdcheck package [1] also implements exactly this check and >> sometimes it shows a difference although the changes have been corrected >> replicated (normal syncrepl). >> >> You can look at the code to verify what it's doing: >> >> https://gitlab.com/ae-dir/slapdcheck/-/blob/master/slapdcheck/__init__.py#L1070 >> >> (It reads the actual syncrepl providers from cn=config before comparing >> the contextCSN values for each serverID.) >> >> I discussed this several times with Howard and Ondrej but no idea came >> up why that happens. > > I don't remember the discussion anymore but there's a corner case people > writing syncrepl checking scripts often forget to address: > > If it takes 1 second to replicate a change and a previous change > happened x seconds before this one there's going to be a window of 1 > second where you see an x second CSN difference between the provider and > consumer. In no way does it mean the consumer is x seconds behind.
I'm talking about the contextCSN difference being visible for several *hours* while the changes have been already successfully replicated. Replication delay is very short, syncrepl type is refreshAndPersist. > If there's an acceptable delay of n seconds, you better wait for that > amount of time before raising an alarm, And what's an appropriate value for n? 86400? ;-] > See the logic in syncmonitor[0] Ideally I'd like to query cn=monitor whether slapd thinks replication is in a healthy state. Ciao, Michael.
