This took about a week to appear on the list? Meantime, I have come up with
the following..
- alert: outboundSocketCountChange
expr: *(*(count({__name__=~"tcpsocket(.+)Inbound"} offset 30s) -
count({__name__=~"tcpsocket(.+)Inbound"})) != bool 0*) == 1*
labels:
severity: critical
annotations:
summary: OB socket count has changed
This does what I need but it makes me think I do not really understand how
expr works in prom rules - is it something that simply evaluates to either
1 or 'true' as a go bool type?
c
On Friday, 13 December 2024 at 08:49:33 UTC cam wrote:
> Hello all,
>
> I have a rule which is trying to count time series that match a certain
> regexp and spot when this changes, to raise an alert more or less
> immediately (i.e. no for clause). This is counting a custom socket count
> metric that we need to catch any changes in.
>
> - alert: outboundSocketCountChange
> expr: (count({__name__=~"tcpsocket(.+)Inbound"} offset 30s) -
> count({__name__=~"tcpsocket(.+)Inbound"})) != bool 0
> labels:
> severity: critical
> annotations:
> summary: OB socket count has changed
>
> It triggers fine when the value changes but it appears to then be stuck in
> firing, rather than resolving when the next evaluation window completes.
> Graphing the promQL shows exactly what I would expect - a single spike to 1
> when the value changes and then back to zero. I would expect the alert to
> clear when it hits that zero.
>
> Scrape and evaluation intervals are both set to 15s. Prom v2.45.
>
> Am I missing something here?
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/prometheus-users/d97adf3c-f82e-4fdb-8b75-5ed498ef9a4cn%40googlegroups.com.