*correction:
*Scenario2: *While server1 trigger is active, a second server ( say
server2)'s local disk usage reaches 50%,
i see that the already open Opsgenie ticket's details gets updated as:
ticket header name: local disk usage reached 50%
ticket description: space on /var file system at server1:9100 server =
82%."
space on /var file system at server2:9100
server = 80%."
ticket tags: criteria: overuse , team: support, severity: critical,
infra,monitor,host=server1
[image: photo003.png]
On Wednesday, April 3, 2024 at 1:37:12 PM UTC+5:30 mohan garden wrote:
> Hi Brian,
> Thank you for the response, Here are some more details, hope this will
> help you in gaining more understanding into the configuration and method i
> am using to generate tags :
>
>
> 1. We collect data from the node exporter, and have created some rules
> around the collected data. Here is one example -
> - alert: "Local Disk usage has reached 50%"
> expr: (round(
> node_filesystem_avail_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*",}
>
> /
> node_filesystem_size_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
>
> * 100 ,0.1) >= y ) and (round(
> node_filesystem_avail_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
>
> /
> node_filesystem_size_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
>
> * 100 ,0.1) <= z )
> for: 5m
> labels:
> criteria: overuse
> severity: critical
> team: support
> annotations:
> summary: "{{ $labels.instance }} 's ({{ $labels.device }}) has
> low space."
> description: "space on {{ $labels.mountpoint }} file system at {{
> $labels.instance }} server = {{ $value }}%."
>
> 2. at the alert manager , we have created notification rules to notify in
> case the aforementioned condition occurs:
>
> smtp_from: '[email protected]'
> smtp_require_tls: false
> smtp_smarthost: '[email protected]:25 <http://[email protected]:25>'
>
> templates:
> - /home/ALERTMANAGER/conf/template/*.tmpl
>
> route:
> group_wait: 5m
> group_interval: 2h
> repeat_interval: 5h
> receiver: admin
> routes:
> - match_re:
> alertname: ".*Local Disk usage has reached .*%"
> receiver: admin
> routes:
> - match:
> criteria: overuse
> severity: critical
> team: support
> receiver: mailsupport
> continue: true
> - match:
> criteria: overuse
> team: support
> severity: critical
> receiver: opsgeniesupport
>
> receivers:
> - name: opsgeniesupport
> opsgenie_configs:
> - api_key: XYZ
> api_url: https://api.opsgenie.com
> message: '{{ .CommonLabels.alertname }}'
> description: "{{ range .Alerts }}{{ .Annotations.description
> }}\n\r{{ end }}"
> tags: '{{ range $k, $v := .CommonLabels}}{{ if or (eq $k
> "criteria") (eq $k "severity") (eq $k "team") }}{{$k}}={{$v}},{{ else if
> eq $k "instance" }}{{ reReplaceAll "(.+):(.+)" "host=$1" $v
> }},{{end}}{{end}},infra,monitor'
> priority: 'P1'
> update_alerts: true
> send_resolved: true
> ...
> So you can see that i derive a tag host=<hostname> from the instance
> label.
>
>
> *Scenario1: *When server1 's local disk usage reaches 50%, i see that
> Opsgenie ticket is created having:
> Opsgenie Ticket metadata:
> ticket header name: local disk usage reached 50%
> ticket description: space on /var file system at server1:9100 server =
> 82%."
> ticket tags: criteria: overuse , team: support, severity: critical,
> infra,monitor,host=server1
>
> so everything works as expected, no issues with Scenario1.
>
>
> *Scenario2: *While server1 trigger is active, a second server ( say
> server2)'s local disk usage reaches 50%,
>
> i see that Opsgenie tickets are getting updated as:
> ticket header name: local disk usage reached 50%
> ticket description: space on /var file system at server1:9100 server =
> 82%."
> ticket description: space on /var file system at server2:9100 server =
> 80%."
> ticket tags: criteria: overuse , team: support, severity: critical,
> infra,monitor,host=server1
>
>
> but i was expecting an additional host=server2 tag on the ticket.
> in Summary - i see updated description , but unable to see updated tags.
>
> in tags section of the alertmanager - opsgenie integration configuration ,
> i had tried iterating over Alerts and CommonLabels, but i was unable to
> add additional host=server2 tag .
> {{ range $idx, $alert := .Alerts}}{{range $k, $v := $alert.Labels
> }}{{$k}}={{$v}},{{end}}{{end}},test=test
> {{ range $k, $v := .CommonLabels}}....{{end}}
>
>
> At the moment, i am not sure that what is potentially preventing the
> update of tags on the opsgenie tickets.
> If i can get some clarity on the fact that if the configurations i have
> for alertmanager are good enough, then i can look at the opsgenie
> configurations.
>
>
> Please advice.
>
>
> Regards
> CP
>
>
> On Tuesday, April 2, 2024 at 10:46:36 PM UTC+5:30 Brian Candler wrote:
>
>> FYI, those images are unreadable - copy-pasted text would be much better.
>>
>> My guess, though, is that you probably don't want to group alerts before
>> sending them to opsgenie. You haven't shown your full alertmanager config,
>> but if you have a line like
>>
>> group_by: ['alertname']
>>
>> then try
>>
>> group_by: ["..."]
>>
>> (literally, exactly that: a single string containing three dots, inside
>> square brackets)
>>
>> On Tuesday 2 April 2024 at 17:15:39 UTC+1 mohan garden wrote:
>>
>>> Dear Prometheus Community,
>>> I am reaching out regarding an issue i have encountered with prometheus
>>> alert tagging, specifically while creating tickets in Opsgenie.
>>>
>>>
>>> I have configured alertmanager to send alerts to Opsgenie as , the
>>> configuration as :
>>> [image: photo001.png]i ticket is generated with expected description
>>> and tags as -
>>> [image: photo002.png]
>>>
>>> Now, by default the alerts are grouped by the alert name( default
>>> behavior).So when the similar event happens on a different server i see
>>> that the description is updated as:
>>> [image: photo003.png]
>>> but the tag on the ticket remains same,
>>> expected behavior: criteria=..., host=108, host=114, infra.....support
>>>
>>> I have set update_alert and send_resolved settings to true.
>>> I am not sure that in order to make it work as expected, If i need
>>> additional configuration at opsgenie or at the alertmanager.
>>>
>>> I would appreciate any insight or guidance on the method to resolve this
>>> issue and ensure that alerts for different servers are correctly tagged in
>>> Opsgenie.
>>>
>>> Thank you in advance.
>>> Regards,
>>> CP
>>>
>>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/38adb61c-20a8-43bd-badb-7fc726796324n%40googlegroups.com.