Hey.

I have some simple alerts like:
    - alert: node_upgrades_non-security_apt
      expr:  'sum by (instance,job) ( 
apt_upgrades_pending{origin!~"(?i)^.*-security(?:\\PL.*)?$"} )'
    - alert: node_upgrades_security_apt
      expr:  'sum by (instance,job) ( 
apt_upgrades_pending{origin=~"(?i)^.*-security(?:\\PL.*)?$"} )'

If there's no upgrades, these give no value.
Similarly, for all other simple alerts, like free disk space:
1 - node_filesystem_avail_bytes{mountpoint="/", fstype!="rootfs", 
instance!~"(?i)^.*\\.garching\\.physik\\.uni-muenchen\\.de$"} / 
node_filesystem_size_bytes  >  0.80

No value => all ok, some value => alert.

I do have some instances which are pretty unstable (i.e. scraping fails 
every know and then - or more often than that), which are however mostly 
out of my control, so I cannot do anything about that.

When the target goes down, the alert clears and as soon as it's back, it 
pops up again, sending a fresh alert notification.

Now I've seen:
https://github.com/prometheus/prometheus/pull/11827
which describes keep_firing_for as "the minimum amount of time that an 
alert should remain firing, after the expression does not return any 
results", respectively in 
https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#rule
 
:
# How long an alert will continue firing after the condition that triggered 
it # has cleared. [ keep_firing_for: <duration> | default = 0s ] 

but AFAIU that would simply affect all alerts, i.e. it wouldn't just keep 
firing, when the scraping failed, but also when it actually goes back to an 
ok state, right?
That's IMO however rather undesirable.

Similarly, when a node goes completely down (maintenance or so) and then up 
again, all alerts would then start again to fire (and even a generous 
keep_firing_for would have been exceeded)... and send new notifications.


Is there any way to solve this? Especially that one doesn't get new 
notifications sent, when the alert never really stopped?

At least I wouldn't understand how keep_firing_for would do this.

Thanks,
Chris.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/df6e33c7-b621-4f93-b265-7aad0802ffffn%40googlegroups.com.

Reply via email to