Monitoring for a metric vanishing is not a very good way to do alerting. 
Metrics hang around for the "staleness" interval, which by default is 5 
minutes. Ideally, you should monitor all the things you care about 
explicitly, get a success metric like "up" (1 = working, 0 = not working) 
and then alert on "up == 0" or equivalent. This is much more flexible and 
timely.

Having said that, there's a quick and dirty hack that might be good enough 
for you:

    expr: container_memory_usage_bytes offset 10m unless 
container_memory_usage_bytes

This will give you an alert if any metric container_memory_usage_bytes 
existed 10 minutes ago but does not exist now. The alert will resolve 
itself after 10 minutes.

The result of this expression is a vector, so it can alert on multiple 
containers at once; each element of the vector will have the container name 
in the label ("name")

On Saturday 18 May 2024 at 19:50:48 UTC+1 Sleep Man wrote:

> I have a large number of containers. I learned that the following 
> configuration can monitor a single container down. How to configure it to 
> monitor all containers and send the container name once a container is down.
>
>
> - name: containers
>   rules:
>   - alert: jenkins_down
>     expr: absent(container_memory_usage_bytes{name="jenkins"})
>     for: 30s
>     labels:
>       severity: critical
>     annotations:
>       summary: "Jenkins down"
>       description: "Jenkins container is down for more than 30 seconds."
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/17dd75ea-16d6-4e2d-bdc1-3d2bb345c4fan%40googlegroups.com.

Reply via email to