Re: [chrony-users] Silent Failure -- Enhancement Request

Carsten Finis Fri, 19 Apr 2024 09:57:46 -0700

> (..just in case anybody else using Prometheus is reading this :)

Not related to the original request, but related to Prometheus monitoring of 
time synchronization in general:
We took a low level approach that catches a very broad range of problems by 
monitoring the node_timex_frequency_adjustment_ratio from the standard 
Prometheus node exporter. If there is no change to this value for a certain 
period (90 minutes in our case), there is nobody governing the kernel clock. 
This not only catches everything from chrony being down to servers unavailable, 
but it also works out of the box for other time synchronization systems like 
plain old ntpd.
When this alert goes off, it's usually pretty straight-forward to find the root 
case.


Regards,
  Carsten
--
To unsubscribe email [email protected]
with "unsubscribe" in the subject.
For help email [email protected]
with "help" in the subject.
Trouble?  Email [email protected].

Re: [chrony-users] Silent Failure -- Enhancement Request

Reply via email to