> I do not really understand how expr works in prom rules - is it something
that simply evaluates to either 1 or 'true' as a go bool type?
No. It's not boolean logic at all.
PromQL works with *vectors*: a vector contains zero or more values, each
with a distinct set of labels. An alert fires whenever the vector is
non-empty, regardless of the value. That is, a value of 0 triggers an alert
just as much as a value of 1000. It's the presence or absence of a value
which controls alerting.
Take, for example, the promql query "foo". It might return the following,
all current values of metric foo:
foo{instance="aaa"} 7
foo{instance="bbb"} 3
foo{instance="ccc"} 1
That's a vector with three values.
Now take the promql query "foo > 2". It returns a vector with 2 values:
foo{instance="aaa"} 7
foo{instance="bbb"} 3
If you use "foo > 2" as an alerting expression, then you'll have two alerts
firing. If the value of foo{instance="bbb"} drops to 2 or less, then the
alerting expression returns an instant vector with only one value, so the
bbb alert resolves, but the aaa alert continues.
This is the reason why "resolved" messages show the most recent value which
triggered the alert, not the current (non-alerting) value. The current
value is below the threshold, so is filtered out entirely from the PromQL
results.
Now, an expression like count({__name__=~"tcpsocket(.+)Inbound"}) also
gives a vector as its result. If there are no timeseries inside the
parentheses, then it is the empty vector. If there are one or more
timeseries, then you get a single-element vector containing a single value
(which is the count of timeseries) and an empty label set. You can try
this for yourself in the PromQL query browser:
count({__name__=~"blah_nonexistent(.*)"}) # empty result
count({__name__=~"node_filesystem(.*)"}) # {} 1234 where {} means
"empty label set"
Now, when you do a binary operation between two vector values, by default
the result vector has one entry for every label set which matches exactly
between the LHS and RHS vectors. Any label set on the LHS which is not
matched on the RHS, or vice versa, is discarded and gives no value in the
result vector. But in this case, since the LHS and RHS will (almost)
always have a single entry with empty label set, it will match.
Therefore, what I think you want is simply:
expr: count({__name__=~"tcpsocket(.+)Inbound"}) offset 30s !=
count({__name__=~"tcpsocket(.+)Inbound"})
That should do what you want *unless* __name__=~"tcpsocket(.+)Inbound"
matches no timeseries at all, in which case the vector will be empty (on
either the LHS or the RHS) and therefore the count() will be empty, and
there's nothing to match to the other side. If this is an important case
for you then you can fake up a vector with empty labels:
expr: count({__name__=~"tcpsocket(.+)Inbound"}) offset 30s !=
count({__name__=~"tcpsocket(.+)Inbound"})
or vector(0)
Again, PromQL's "or" operator doesn't behave like boolean expression. What
"or" does is to match the vectors on the LHS and the RHS:
- for any value on the LHS, use the value and label set from the LHS in the
result (whether or not it matches something in the RHS)
- for any value on the RHS, whose label set does not exist in the LHS, then
add it to the result.
vector(0) is a static value: an instant vector containing one element whose
label set is empty with value 0. So if the previous expression doesn't
contain an element with empty label set, "... or vector(0)" will add it to
the result, and that will trigger the alert (with value 0).
On Friday, 13 December 2024 at 09:39:02 UTC cam wrote:
> This took about a week to appear on the list? Meantime, I have come up
> with the following..
>
> - alert: outboundSocketCountChange
> expr: *(*(count({__name__=~"tcpsocket(.+)Inbound"} offset 30s) -
> count({__name__=~"tcpsocket(.+)Inbound"})) != bool 0*) == 1*
>
> labels:
> severity: critical
> annotations:
> summary: OB socket count has changed
>
> This does what I need but it makes me think I do not really understand how
> expr works in prom rules - is it something that simply evaluates to either
> 1 or 'true' as a go bool type?
>
> c
>
> On Friday, 13 December 2024 at 08:49:33 UTC cam wrote:
>
>> Hello all,
>>
>> I have a rule which is trying to count time series that match a certain
>> regexp and spot when this changes, to raise an alert more or less
>> immediately (i.e. no for clause). This is counting a custom socket count
>> metric that we need to catch any changes in.
>>
>> - alert: outboundSocketCountChange
>> expr: (count({__name__=~"tcpsocket(.+)Inbound"} offset 30s) -
>> count({__name__=~"tcpsocket(.+)Inbound"})) != bool 0
>> labels:
>> severity: critical
>> annotations:
>> summary: OB socket count has changed
>>
>> It triggers fine when the value changes but it appears to then be stuck
>> in firing, rather than resolving when the next evaluation window completes.
>> Graphing the promQL shows exactly what I would expect - a single spike to 1
>> when the value changes and then back to zero. I would expect the alert to
>> clear when it hits that zero.
>>
>> Scrape and evaluation intervals are both set to 15s. Prom v2.45.
>>
>> Am I missing something here?
>>
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/prometheus-users/77fed316-4283-4fc3-98d9-99bcf630e37bn%40googlegroups.com.