[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

'Brian Candler' via Prometheus Users Wed, 03 Apr 2024 05:14:22 -0700

> but i was expecting an additional host=server2 tag on the ticket. 

You won't get that, because CommonLabels is exactly how it sounds: those 
labels which are common to all the alerts in the group.  If one alert has 
instance=server1 and the other has instance=server2, but they're in the 
same alert group, then no 'instance' will appear in CommonLabels.


The documentation is here:
https://prometheus.io/docs/alerting/latest/notifications/

It looks like you could iterate over Alerts.Firing then the Labels within 
each alert.

Alternatively, you could disable grouping and let opsgenie do the grouping 
(I don't know opsgenie, so I don't know how good a job it would do of that)


On Wednesday 3 April 2024 at 09:11:24 UTC+1 mohan garden wrote:

> *correction: 
> *Scenario2: *While server1 trigger is active, a second server ( say 
> server2)'s local disk usage reaches 50%,
>
> i see that the already open Opsgenie ticket's details gets updated as:
>
> ticket header name:  local disk usage reached 50%
> ticket description:  space on /var file system at server1:9100 server = 
> 82%."
>                                  space on /var file system at 
> server2:9100 server = 80%."
> ticket tags: criteria: overuse , team: support, severity: critical, 
> infra,monitor,host=server1
>
> [image: photo003.png]
>
>
>
> On Wednesday, April 3, 2024 at 1:37:12 PM UTC+5:30 mohan garden wrote:
>
>> Hi Brian, 
>> Thank you for the response, Here are some more details, hope this will 
>> help you in gaining more understanding into the configuration and method i 
>> am using to generate tags :
>>
>>
>> 1. We collect data from the node exporter, and have created some rules 
>> around the collected data. Here is one example - 
>>     - alert: "Local Disk usage has reached 50%"
>>       expr: (round( 
>> node_filesystem_avail_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*",}
>>  
>> / 
>> node_filesystem_size_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
>>  
>> * 100  ,0.1) >= y ) and (round( 
>> node_filesystem_avail_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
>>  
>> / 
>> node_filesystem_size_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
>>  
>> * 100  ,0.1) <= z )
>>       for: 5m
>>       labels:
>>         criteria: overuse
>>         severity: critical
>>         team: support
>>       annotations:
>>         summary: "{{ $labels.instance }} 's  ({{ $labels.device }}) has 
>> low space."
>>         description: "space on {{ $labels.mountpoint }} file system at {{ 
>> $labels.instance }} server = {{ $value }}%."
>>
>> 2. at the alert manager , we have created notification rules to notify in 
>> case the aforementioned condition occurs:
>>
>>   smtp_from: '[email protected]'
>>   smtp_require_tls: false
>>   smtp_smarthost: '[email protected]:25 <http://[email protected]:25>'
>>
>> templates:
>>   - /home/ALERTMANAGER/conf/template/*.tmpl
>>
>> route:
>>   group_wait: 5m
>>   group_interval: 2h
>>   repeat_interval: 5h
>>   receiver: admin
>>   routes:
>>   - match_re:
>>       alertname: ".*Local Disk usage has reached .*%"
>>     receiver: admin
>>     routes:
>>     - match:
>>         criteria: overuse
>>         severity: critical
>>         team: support
>>       receiver: mailsupport
>>       continue: true
>>     - match:
>>         criteria: overuse
>>         team: support
>>         severity: critical
>>         receiver: opsgeniesupport
>>
>> receivers:
>>   - name: opsgeniesupport
>>     opsgenie_configs:
>>     - api_key: XYZ
>>       api_url: https://api.opsgenie.com
>>       message: '{{ .CommonLabels.alertname }}'
>>       description: "{{ range .Alerts }}{{ .Annotations.description 
>> }}\n\r{{ end }}"
>>       tags: '{{ range $k, $v := .CommonLabels}}{{ if or (eq $k 
>> "criteria")  (eq $k "severity") (eq $k "team") }}{{$k}}={{$v}},{{ else if 
>> eq $k "instance" }}{{ reReplaceAll "(.+):(.+)" "host=$1" $v 
>> }},{{end}}{{end}},infra,monitor'
>>       priority: 'P1'
>>       update_alerts: true
>>       send_resolved: true
>> ...
>> So you can see that i derive a  tag host=<hostname> from the instance 
>> label.
>>
>>
>> *Scenario1: *When server1 's local disk usage reaches 50%, i see that 
>> Opsgenie ticket is created having:
>> Opsgenie Ticket metadata: 
>> ticket header name:  local disk usage reached 50%
>> ticket description:  space on /var file system at server1:9100 server = 
>> 82%."
>> ticket tags: criteria: overuse , team: support, severity: critical, 
>> infra,monitor,host=server1
>>
>> so everything works as expected, no issues with Scenario1.
>>
>>
>> *Scenario2: *While server1 trigger is active, a second server ( say 
>> server2)'s local disk usage reaches 50%,
>>
>> i see that Opsgenie tickets are getting updated as:
>> ticket header name:  local disk usage reached 50%
>> ticket description:  space on /var file system at server1:9100 server = 
>> 82%."
>> ticket description:  space on /var file system at server2:9100 server = 
>> 80%."
>> ticket tags: criteria: overuse , team: support, severity: critical, 
>> infra,monitor,host=server1
>>
>>
>> but i was expecting an additional host=server2 tag on the ticket.  
>> in Summary - i see updated description , but unable to see updated tags.
>>
>> in tags section of the alertmanager - opsgenie integration configuration 
>> , i had tried iterating over Alerts and CommonLabels, but i was unable to 
>> add  additional host=server2 tag .
>> {{ range $idx, $alert := .Alerts}}{{range $k, $v := $alert.Labels 
>> }}{{$k}}={{$v}},{{end}}{{end}},test=test
>> {{ range $k, $v := .CommonLabels}}....{{end}}
>>
>>
>> At the moment, i am not sure that what is potentially preventing the 
>> update of tags on the opsgenie tickets.
>> If i can get some clarity on the fact that if the configurations i have 
>> for  alertmanager are good enough, then i can look at the opsgenie 
>> configurations.
>>
>>
>> Please advice.
>>
>>
>> Regards
>> CP
>>
>>
>> On Tuesday, April 2, 2024 at 10:46:36 PM UTC+5:30 Brian Candler wrote:
>>
>>> FYI, those images are unreadable - copy-pasted text would be much better.
>>>
>>> My guess, though, is that you probably don't want to group alerts before 
>>> sending them to opsgenie. You haven't shown your full alertmanager config, 
>>> but if you have a line like
>>>
>>>    group_by: ['alertname']
>>>
>>> then try
>>>
>>>    group_by: ["..."]
>>>
>>> (literally, exactly that: a single string containing three dots, inside 
>>> square brackets)
>>>
>>> On Tuesday 2 April 2024 at 17:15:39 UTC+1 mohan garden wrote:
>>>
>>>> Dear Prometheus Community,
>>>> I am reaching out regarding an issue i have encountered with  
>>>> prometheus alert tagging, specifically while creating tickets in Opsgenie.
>>>>
>>>>
>>>> I have configured alertmanager  to send alerts to Opsgenie as , the 
>>>> configuration as :
>>>> [image: photo001.png]i ticket is generated with expected description 
>>>> and tags as - 
>>>> [image: photo002.png]
>>>>
>>>> Now, by default the alerts are grouped by the alert name( default 
>>>> behavior).So when the similar event happens on a different server i see 
>>>> that the description is updated as:
>>>> [image: photo003.png]
>>>> but the tag on the ticket remains same, 
>>>> expected behavior: criteria=..., host=108, host=114, infra.....support 
>>>>
>>>> I have set update_alert and send_resolved settings to true.
>>>> I am not sure that in order to make it work as expected, If i need 
>>>> additional configuration at opsgenie or at the alertmanager. 
>>>>
>>>> I would appreciate any insight or guidance on the method to resolve 
>>>> this issue and ensure that alerts for different servers are correctly 
>>>> tagged in Opsgenie.
>>>>
>>>> Thank you in advance.
>>>> Regards,
>>>> CP
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/9e2be26c-2fcf-46e4-af0a-9b4e56debaa1n%40googlegroups.com.

[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

Reply via email to