Hey.
I have some simple alerts like:
- alert: node_upgrades_non-security_apt
expr: 'sum by (instance,job) (
apt_upgrades_pending{origin!~"(?i)^.*-security(?:\\PL.*)?$"} )'
- alert: node_upgrades_security_apt
expr: 'sum by (instance,job) (
apt_upgrades_pending{origin=~"(?i)^.*-security(?:\\PL.*)?$"} )'
If there's no upgrades, these give no value.
Similarly, for all other simple alerts, like free disk space:
1 - node_filesystem_avail_bytes{mountpoint="/", fstype!="rootfs",
instance!~"(?i)^.*\\.garching\\.physik\\.uni-muenchen\\.de$"} /
node_filesystem_size_bytes > 0.80
No value => all ok, some value => alert.
I do have some instances which are pretty unstable (i.e. scraping fails
every know and then - or more often than that), which are however mostly
out of my control, so I cannot do anything about that.
When the target goes down, the alert clears and as soon as it's back, it
pops up again, sending a fresh alert notification.
Now I've seen:
https://github.com/prometheus/prometheus/pull/11827
which describes keep_firing_for as "the minimum amount of time that an
alert should remain firing, after the expression does not return any
results", respectively in
https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#rule
:
# How long an alert will continue firing after the condition that triggered
it # has cleared. [ keep_firing_for: <duration> | default = 0s ]
but AFAIU that would simply affect all alerts, i.e. it wouldn't just keep
firing, when the scraping failed, but also when it actually goes back to an
ok state, right?
That's IMO however rather undesirable.
Similarly, when a node goes completely down (maintenance or so) and then up
again, all alerts would then start again to fire (and even a generous
keep_firing_for would have been exceeded)... and send new notifications.
Is there any way to solve this? Especially that one doesn't get new
notifications sent, when the alert never really stopped?
At least I wouldn't understand how keep_firing_for would do this.
Thanks,
Chris.
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/df6e33c7-b621-4f93-b265-7aad0802ffffn%40googlegroups.com.