On 07.02.23 05:57, 'George Robinson' via Prometheus Developers wrote: > > While I appreciate the responsibility of writing correct templates is on > the user, I have also been considering whether Alertmanager should be more > tolerant of template errors, and attempt to send some kind of notification > when this happens. For example, falling back to the default template that > we have high confidence of being correct.
I think that makes sense. The fall-back template could call out very explicitly that the intended template failed to expand and therefore you get a replacement, maybe even with the error message of the attempt to expand the original template. But I'm not really an Alertmanager experts. And despite having a lot of historical context about Prometheus in general, I don't remember anything specific about error handling in alert templates. I only remember that trying out an alert "in production" is really hard since you need to trigger it. And if the moment you notice that your template doesn't work is also the moment when your alert is supposed to fire, that's really bad. So better test tooling might help here, but even if we had that, I think there should be a safe fall-back so that no alert is ever swallowed because of a templating error. -- Björn Rabenstein [PGP-ID] 0x851C3DA17D748D03 [email] [email protected] -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/Y%2BUxD3QTKJbrLACk%40mail.rabenste.in.

