> What can be done? Perhaps the alert condition resolved very briefly. The solution with modern versions of prometheus (v2.42.0 <https://github.com/prometheus/prometheus/releases/v2.42.0> or later) is to do this:
for: 2d keep_firing_for: 10m The alert won't be resolved unless it has been *continuously* absent for 10 minutes. (Of course, this means your "resolved" notifications will be delayed by 10 minutes - but that's basically the whole point, don't send them until you're sure they're not going to retrigger) The other alternative is simply to turn off resolved notifications entirely. This approach sounds odd but has a lot to recommend it: https://www.robustperception.io/running-into-burning-buildings-because-the-fire-alarm-stopped https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/edit https://blog.cloudflare.com/alerts-observability The point is that if a problem occurred which was serious enough to alert on, then it requires investigation before the case can be "closed": either there's an underlying problem, or if it was a false positive then the alert condition needs tuning. Sending a resolved message encourages laziness ("oh, it fixed itself, no further work required"). Also, turning off resolved messages instantly reduces your notifications by 50%. On Saturday 18 May 2024 at 19:50:32 UTC+1 Sarah Dundras wrote: > Hi, this problem is driving me mad: > > I am monitoring backups that log their backup results to a textfile. It is > being picked up and all is well, also the alert are ok, BUT! Alertmanager > frequently sends out odd "resolved" notifications although the firing > status never changed! > > Here's such an alert rule that does this: > > - alert: Restic Prune Freshness > expr: restic_prune_status{uptodate!="1"} and > restic_prune_status{alerts!="0"} > for: 2d > labels: > topic: backup > freshness: outdated > job: "{{ $labels.restic_backup }}" > server: "{{ $labels.server }}" > product: veeam > annotations: > description: "Restic Prune for '{{ $labels.backup_name }}' on host '{{ > $labels.server_name }}' is not up-to-date (too old)" > host_url: " > https://backups.example.com/d/3be21566-3d15-4238-a4c5-508b059dccec/restic?orgId=2&var-server_name={{ > > $labels.server_name }}&var-result=0&var-backup_name=All" > service_url: " > https://backups.example.com/d/3be21566-3d15-4238-a4c5-508b059dccec/restic?orgId=2&var-server_name=All&var-result=0&var-backup_name={{ > > $labels.backup_name }}" > service: "{{ $labels.job_name }}" > > What can be done? > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/3f1dc4fe-9378-4b6f-a7eb-cc0e7e02bcfan%40googlegroups.com.

