As a starting point, put the expression "probe_icmp_duration_seconds == 0" 
into the PromQL web browser in Prometheus, and zoom into the expected time 
area. What do you see?

One possible issue is if the timeseries appears and disappears; 5 minutes 
happens to be the default staleness interval (lookback-delta).  Another 
problem is that from a single scrape, probe_icmp_duration_seconds has 
multiple values with different labels:

probe_icmp_duration_seconds{phase="resolve"} 1.5765e-05
probe_icmp_duration_seconds{phase="rtt"} 0
probe_icmp_duration_seconds{phase="setup"} 8.742e-05

For both these reasons, it would be safer to use probe_success == 0 as your 
alerting expression.  If the problem still exists with that, it should be 
easier to debug.

Also, does the rule group have a non-default rule evaluation interval, or 
have you globally set the default evaluation interval?

On Thursday, 19 December 2024 at 12:48:34 UTC Steven Relf wrote:

> Hey,
>
> Having an interesting issue with Prom and Alert manager, im 99% sure its a 
> config issue, but having a hard time figuring it out.
>
> We have a group of polls that use the blackbox exporter to ping some 
> endpoints. It pings once every 30 seconds. 
>
> The rule looks like this
>
> - name: blackbox.rules.icmpFailed
>   rules:
>     - alert: BlackboxIcmpFailed
>       expr: probe_icmp_duration_seconds == 0
>       for: 5m
>       labels:
>         severity: critical
>       annotations:
>         summary: Ping to Device Failed.
>
> And our alert manager config look like this
>
> spec:
>   route:
>     groupBy: [ 'instance','severity' ]
>     groupWait: 30s
>     groupInterval: 5m
>     repeatInterval: 12h
>
> Now here is what I am seeing.
>
> If we have a single ping failure then an alert message is sent to slack, 
> which immediately clears on the next 5 min cycle. 
>
> I thought having the  "for: 5m" should mean that an alert is ONLY sent if 
> that condition has been seen for 5 mins consecutively. As you can imagine 
> this leads to lots of angst :D
>
> Any ideas? 
>
> This email contains information which is private and confidential, all 
> commercial rights to the details included are owned exclusively by Nscale. 
> Disclosure without written permission is strictly prohibited. If you have 
> received this email in error, please inform me as soon as possible.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/46dfd7bd-4c46-40e7-810d-ffc347e7b28bn%40googlegroups.com.

Reply via email to