Brian, you sir are a gentleman and a scholar. I have removed that rule as I already have the other rule you suggested in place.... So no need for the bouncy one.
Rgds Steve. On Thu, 19 Dec 2024 at 13:33, 'Brian Candler' via Prometheus Users < [email protected]> wrote: > As a starting point, put the expression "probe_icmp_duration_seconds == 0" > into the PromQL web browser in Prometheus, and zoom into the expected time > area. What do you see? > > One possible issue is if the timeseries appears and disappears; 5 minutes > happens to be the default staleness interval (lookback-delta). Another > problem is that from a single scrape, probe_icmp_duration_seconds has > multiple values with different labels: > > probe_icmp_duration_seconds{phase="resolve"} 1.5765e-05 > probe_icmp_duration_seconds{phase="rtt"} 0 > probe_icmp_duration_seconds{phase="setup"} 8.742e-05 > > For both these reasons, it would be safer to use probe_success == 0 as > your alerting expression. If the problem still exists with that, it should > be easier to debug. > > Also, does the rule group have a non-default rule evaluation interval, or > have you globally set the default evaluation interval? > > On Thursday, 19 December 2024 at 12:48:34 UTC Steven Relf wrote: > >> Hey, >> >> Having an interesting issue with Prom and Alert manager, im 99% sure its >> a config issue, but having a hard time figuring it out. >> >> We have a group of polls that use the blackbox exporter to ping some >> endpoints. It pings once every 30 seconds. >> >> The rule looks like this >> >> - name: blackbox.rules.icmpFailed >> rules: >> - alert: BlackboxIcmpFailed >> expr: probe_icmp_duration_seconds == 0 >> for: 5m >> labels: >> severity: critical >> annotations: >> summary: Ping to Device Failed. >> >> And our alert manager config look like this >> >> spec: >> route: >> groupBy: [ 'instance','severity' ] >> groupWait: 30s >> groupInterval: 5m >> repeatInterval: 12h >> >> Now here is what I am seeing. >> >> If we have a single ping failure then an alert message is sent to slack, >> which immediately clears on the next 5 min cycle. >> >> I thought having the "for: 5m" should mean that an alert is ONLY sent if >> that condition has been seen for 5 mins consecutively. As you can imagine >> this leads to lots of angst :D >> >> Any ideas? >> >> This email contains information which is private and confidential, all >> commercial rights to the details included are owned exclusively by Nscale. >> Disclosure without written permission is strictly prohibited. If you have >> received this email in error, please inform me as soon as possible. >> > -- > You received this message because you are subscribed to a topic in the > Google Groups "Prometheus Users" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/prometheus-users/uFK2aZdRT2A/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion visit > https://groups.google.com/d/msgid/prometheus-users/46dfd7bd-4c46-40e7-810d-ffc347e7b28bn%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/46dfd7bd-4c46-40e7-810d-ffc347e7b28bn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- This email contains information which is private and confidential, all commercial rights to the details included are owned exclusively by Nscale. Disclosure without written permission is strictly prohibited. If you have received this email in error, please inform me as soon as possible. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/prometheus-users/CAB1CzneopM1PkipbP6X_mZEkf_O-MdjP6jMV_Ke9%3DN1597FF%2Bg%40mail.gmail.com.

