This is incredibly helpful, thanks for taking the time to write it. I don't
think there is anything like this level of description of how expr works in
the docs, but I may have missed it.

You also correctly anticipated that the missing-time-series scenario was an
issue for me in this work, so thanks for that too.

cam

On Fri, 13 Dec 2024 at 12:00, 'Brian Candler' via Prometheus Users <
[email protected]> wrote:

> > I do not really understand how expr works in prom rules - is it
> something that simply evaluates to either 1 or 'true' as a go bool type?
>
> No. It's not boolean logic at all.
>
> PromQL works with *vectors*: a vector contains zero or more values, each
> with a distinct set of labels. An alert fires whenever the vector is
> non-empty, regardless of the value. That is, a value of 0 triggers an alert
> just as much as a value of 1000. It's the presence or absence of a value
> which controls alerting.
>
> Take, for example, the promql query "foo". It might return the following,
> all current values of metric foo:
>
> foo{instance="aaa"} 7
> foo{instance="bbb"} 3
> foo{instance="ccc"} 1
>
> That's a vector with three values.
>
> Now take the promql query "foo > 2". It returns a vector with 2 values:
>
> foo{instance="aaa"} 7
> foo{instance="bbb"} 3
>
> If you use "foo > 2" as an alerting expression, then you'll have two
> alerts firing.  If the value of foo{instance="bbb"} drops to 2 or less,
> then the alerting expression returns an instant vector with only one value,
> so the bbb alert resolves, but the aaa alert continues.
>
> This is the reason why "resolved" messages show the most recent value
> which triggered the alert, not the current (non-alerting) value. The
> current value is below the threshold, so is filtered out entirely from the
> PromQL results.
>
> Now, an expression like count({__name__=~"tcpsocket(.+)Inbound"}) also
> gives a vector as its result. If there are no timeseries inside the
> parentheses, then it is the empty vector. If there are one or more
> timeseries, then you get a single-element vector containing a single value
> (which is the count of timeseries) and an empty label set.  You can try
> this for yourself in the PromQL query browser:
>
> count({__name__=~"blah_nonexistent(.*)"})   #   empty result
> count({__name__=~"node_filesystem(.*)"})    #    {} 1234   where {} means
> "empty label set"
>
> Now, when you do a binary operation between two vector values, by default
> the result vector has one entry for every label set which matches exactly
> between the LHS and RHS vectors. Any label set on the LHS which is not
> matched on the RHS, or vice versa, is discarded and gives no value in the
> result vector.  But in this case, since the LHS and RHS will (almost)
> always have a single entry with empty label set, it will match.
>
> Therefore, what I think you want is simply:
>
> expr: count({__name__=~"tcpsocket(.+)Inbound"}) offset 30s !=
> count({__name__=~"tcpsocket(.+)Inbound"})
>
> That should do what you want *unless* __name__=~"tcpsocket(.+)Inbound"
> matches no timeseries at all, in which case the vector will be empty (on
> either the LHS or the RHS) and therefore the count() will be empty, and
> there's nothing to match to the other side.  If this is an important case
> for you then you can fake up a vector with empty labels:
>
> expr: count({__name__=~"tcpsocket(.+)Inbound"}) offset 30s != 
> count({__name__=~"tcpsocket(.+)Inbound"})
> or vector(0)
>
> Again, PromQL's "or" operator doesn't behave like boolean expression. What
> "or" does is to match the vectors on the LHS and the RHS:
> - for any value on the LHS, use the value and label set from the LHS in
> the result (whether or not it matches something in the RHS)
> - for any value on the RHS, whose label set does not exist in the LHS,
> then add it to the result.
>
> vector(0) is a static value: an instant vector containing one element
> whose label set is empty with value 0.  So if the previous expression
> doesn't contain an element with empty label set, "... or vector(0)" will
> add it to the result, and that will trigger the alert (with value 0).
>
> On Friday, 13 December 2024 at 09:39:02 UTC cam wrote:
>
>> This took about a week to appear on the list? Meantime, I have come up
>> with the following..
>>
>>   - alert: outboundSocketCountChange
>>     expr: *(*(count({__name__=~"tcpsocket(.+)Inbound"} offset 30s) -
>> count({__name__=~"tcpsocket(.+)Inbound"})) != bool 0*) == 1*
>>
>>     labels:
>>       severity: critical
>>     annotations:
>>       summary: OB socket count has changed
>>
>> This does what I need but it makes me think I do not really understand
>> how expr works in prom rules - is it something that simply evaluates to
>> either 1 or 'true' as a go bool type?
>>
>> c
>>
>> On Friday, 13 December 2024 at 08:49:33 UTC cam wrote:
>>
>>> Hello all,
>>>
>>> I have a rule which is trying to count time series that match a certain
>>> regexp and spot when this changes, to raise an alert more or less
>>> immediately (i.e. no for clause). This is counting a custom socket count
>>> metric that we need to catch any changes in.
>>>
>>>   - alert: outboundSocketCountChange
>>>     expr: (count({__name__=~"tcpsocket(.+)Inbound"} offset 30s) -
>>> count({__name__=~"tcpsocket(.+)Inbound"})) != bool 0
>>>     labels:
>>>       severity: critical
>>>     annotations:
>>>       summary: OB socket count has changed
>>>
>>> It triggers fine when the value changes but it appears to then be stuck
>>> in firing, rather than resolving when the next evaluation window completes.
>>> Graphing the promQL shows exactly what I would expect - a single spike to 1
>>> when the value changes and then back to zero. I would expect the alert to
>>> clear when it hits that zero.
>>>
>>> Scrape and evaluation intervals are both set to 15s. Prom v2.45.
>>>
>>> Am I missing something here?
>>>
>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Prometheus Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/prometheus-users/AfVOhJ5rfOg/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion visit
> https://groups.google.com/d/msgid/prometheus-users/77fed316-4283-4fc3-98d9-99bcf630e37bn%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/77fed316-4283-4fc3-98d9-99bcf630e37bn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
............................................................
[email protected]

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/CAD6vR%2BT2%2BAtB3%3DxdR08tXoKxqLE-i3Q_iWWznhScebT-%2BWahnQ%40mail.gmail.com.

Reply via email to