On Wed, Dec 25, 2024 at 04:24:49PM -0000, Dave wrote:
> Hello Everyone,
>
> Hope you are enjoying the day.
>
> Was curious what people are doing to monitor their MMR clusters being
> out of sync.
>
> At least the implementation we found and have been trying is using
> telegraf and the ldap_org plugin to gather all objects on each ldap
> master and compare their count.
>
> How is the community doing it?
> Any better way to go about this?
Hi,
I've seen two main approaches, sometimes combined:
1. Tracking contextCSN/cookie:
a) read out the contextCSN from the DB's top-level entry (poll)
b) have the server push any cookie changes on-line (push), you also
get to discover the provider's serverID this way
2. Read the olmMDBEntries from cn=monitor and make sure those stay in
sync
The former gives you more information and is my go-to, but its use in
monitoring can be confusing: each serverID CSN has to be compared
independently, you cannot do straight time arithmetic for alerting, ...
Some of that is abstracted away by syncmonitor[0] which should be easy
to adapt for most monitoring solutions and can even do real-time
monitoring + alerting. It is under active development, most recently in
the textual branch to expose a TUI frontend and refactor the library to
track the cookies on a per-SID basis for real-time replication delay
measurement.
Both of them can and often are run in tandem - entry count is useful at
catching misconfiguration where ACLs do not give the replication
identity the intended permissions (or only for accesslog but not main
DB). In deltasync you also have to monitor accesslog DB *never* runs out
of space, the desyncs that result from such a failure are not
recoverable save by identifying a canonical provider and using it to
reseed the cluster manually.
[0]. https://git.openldap.org/openldap/syncmonitor
Regards,
--
Ondřej Kuzník
Senior Software Engineer
Symas Corporation http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP