Hi Brian,
Thank you for the response, Here are some more details, hope this will help
you in gaining more understanding into the configuration and method i am
using to generate tags :
1. We collect data from the node exporter, and have created some rules
around the collected data. Here is one example -
- alert: "Local Disk usage has reached 50%"
expr: (round(
node_filesystem_avail_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*",}
/
node_filesystem_size_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
* 100 ,0.1) >= y ) and (round(
node_filesystem_avail_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
/
node_filesystem_size_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
* 100 ,0.1) <= z )
for: 5m
labels:
criteria: overuse
severity: critical
team: support
annotations:
summary: "{{ $labels.instance }} 's ({{ $labels.device }}) has low
space."
description: "space on {{ $labels.mountpoint }} file system at {{
$labels.instance }} server = {{ $value }}%."
2. at the alert manager , we have created notification rules to notify in
case the aforementioned condition occurs:
smtp_from: '[email protected]'
smtp_require_tls: false
smtp_smarthost: '[email protected]:25'
templates:
- /home/ALERTMANAGER/conf/template/*.tmpl
route:
group_wait: 5m
group_interval: 2h
repeat_interval: 5h
receiver: admin
routes:
- match_re:
alertname: ".*Local Disk usage has reached .*%"
receiver: admin
routes:
- match:
criteria: overuse
severity: critical
team: support
receiver: mailsupport
continue: true
- match:
criteria: overuse
team: support
severity: critical
receiver: opsgeniesupport
receivers:
- name: opsgeniesupport
opsgenie_configs:
- api_key: XYZ
api_url: https://api.opsgenie.com
message: '{{ .CommonLabels.alertname }}'
description: "{{ range .Alerts }}{{ .Annotations.description }}\n\r{{
end }}"
tags: '{{ range $k, $v := .CommonLabels}}{{ if or (eq $k "criteria")
(eq $k "severity") (eq $k "team") }}{{$k}}={{$v}},{{ else if eq $k
"instance" }}{{ reReplaceAll "(.+):(.+)" "host=$1" $v
}},{{end}}{{end}},infra,monitor'
priority: 'P1'
update_alerts: true
send_resolved: true
...
So you can see that i derive a tag host=<hostname> from the instance label.
*Scenario1: *When server1 's local disk usage reaches 50%, i see that
Opsgenie ticket is created having:
Opsgenie Ticket metadata:
ticket header name: local disk usage reached 50%
ticket description: space on /var file system at server1:9100 server =
82%."
ticket tags: criteria: overuse , team: support, severity: critical,
infra,monitor,host=server1
so everything works as expected, no issues with Scenario1.
*Scenario2: *While server1 trigger is active, a second server ( say
server2)'s local disk usage reaches 50%,
i see that Opsgenie tickets are getting updated as:
ticket header name: local disk usage reached 50%
ticket description: space on /var file system at server1:9100 server =
82%."
ticket description: space on /var file system at server2:9100 server =
80%."
ticket tags: criteria: overuse , team: support, severity: critical,
infra,monitor,host=server1
but i was expecting an additional host=server2 tag on the ticket.
in Summary - i see updated description , but unable to see updated tags.
in tags section of the alertmanager - opsgenie integration configuration ,
i had tried iterating over Alerts and CommonLabels, but i was unable to
add additional host=server2 tag .
{{ range $idx, $alert := .Alerts}}{{range $k, $v := $alert.Labels
}}{{$k}}={{$v}},{{end}}{{end}},test=test
{{ range $k, $v := .CommonLabels}}....{{end}}
At the moment, i am not sure that what is potentially preventing the update
of tags on the opsgenie tickets.
If i can get some clarity on the fact that if the configurations i have
for alertmanager are good enough, then i can look at the opsgenie
configurations.
Please advice.
Regards
CP
On Tuesday, April 2, 2024 at 10:46:36 PM UTC+5:30 Brian Candler wrote:
> FYI, those images are unreadable - copy-pasted text would be much better.
>
> My guess, though, is that you probably don't want to group alerts before
> sending them to opsgenie. You haven't shown your full alertmanager config,
> but if you have a line like
>
> group_by: ['alertname']
>
> then try
>
> group_by: ["..."]
>
> (literally, exactly that: a single string containing three dots, inside
> square brackets)
>
> On Tuesday 2 April 2024 at 17:15:39 UTC+1 mohan garden wrote:
>
>> Dear Prometheus Community,
>> I am reaching out regarding an issue i have encountered with prometheus
>> alert tagging, specifically while creating tickets in Opsgenie.
>>
>>
>> I have configured alertmanager to send alerts to Opsgenie as , the
>> configuration as :
>> [image: photo001.png]i ticket is generated with expected description and
>> tags as -
>> [image: photo002.png]
>>
>> Now, by default the alerts are grouped by the alert name( default
>> behavior).So when the similar event happens on a different server i see
>> that the description is updated as:
>> [image: photo003.png]
>> but the tag on the ticket remains same,
>> expected behavior: criteria=..., host=108, host=114, infra.....support
>>
>> I have set update_alert and send_resolved settings to true.
>> I am not sure that in order to make it work as expected, If i need
>> additional configuration at opsgenie or at the alertmanager.
>>
>> I would appreciate any insight or guidance on the method to resolve this
>> issue and ensure that alerts for different servers are correctly tagged in
>> Opsgenie.
>>
>> Thank you in advance.
>> Regards,
>> CP
>>
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/eee12f80-f738-4bb0-99bc-ecccf86d4907n%40googlegroups.com.