As a starting point, put the expression "probe_icmp_duration_seconds == 0"
into the PromQL web browser in Prometheus, and zoom into the expected time
area. What do you see?
One possible issue is if the timeseries appears and disappears; 5 minutes
happens to be the default staleness interval (lookback-delta). Another
problem is that from a single scrape, probe_icmp_duration_seconds has
multiple values with different labels:
probe_icmp_duration_seconds{phase="resolve"} 1.5765e-05
probe_icmp_duration_seconds{phase="rtt"} 0
probe_icmp_duration_seconds{phase="setup"} 8.742e-05
For both these reasons, it would be safer to use probe_success == 0 as your
alerting expression. If the problem still exists with that, it should be
easier to debug.
Also, does the rule group have a non-default rule evaluation interval, or
have you globally set the default evaluation interval?
On Thursday, 19 December 2024 at 12:48:34 UTC Steven Relf wrote:
> Hey,
>
> Having an interesting issue with Prom and Alert manager, im 99% sure its a
> config issue, but having a hard time figuring it out.
>
> We have a group of polls that use the blackbox exporter to ping some
> endpoints. It pings once every 30 seconds.
>
> The rule looks like this
>
> - name: blackbox.rules.icmpFailed
> rules:
> - alert: BlackboxIcmpFailed
> expr: probe_icmp_duration_seconds == 0
> for: 5m
> labels:
> severity: critical
> annotations:
> summary: Ping to Device Failed.
>
> And our alert manager config look like this
>
> spec:
> route:
> groupBy: [ 'instance','severity' ]
> groupWait: 30s
> groupInterval: 5m
> repeatInterval: 12h
>
> Now here is what I am seeing.
>
> If we have a single ping failure then an alert message is sent to slack,
> which immediately clears on the next 5 min cycle.
>
> I thought having the "for: 5m" should mean that an alert is ONLY sent if
> that condition has been seen for 5 mins consecutively. As you can imagine
> this leads to lots of angst :D
>
> Any ideas?
>
> This email contains information which is private and confidential, all
> commercial rights to the details included are owned exclusively by Nscale.
> Disclosure without written permission is strictly prohibited. If you have
> received this email in error, please inform me as soon as possible.
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/prometheus-users/46dfd7bd-4c46-40e7-810d-ffc347e7b28bn%40googlegroups.com.