[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

mohan garden Wed, 03 Apr 2024 01:07:17 -0700

Hi Brian, 
Thank you for the response, Here are some more details, hope this will help 
you in gaining more understanding into the configuration and method i am 
using to generate tags :



1. We collect data from the node exporter, and have created some rules 
around the collected data. Here is one example - 
    - alert: "Local Disk usage has reached 50%"
      expr: (round( 
node_filesystem_avail_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*",}
 
/ 
node_filesystem_size_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
 
* 100  ,0.1) >= y ) and (round( 
node_filesystem_avail_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
 
/ 
node_filesystem_size_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
 
* 100  ,0.1) <= z )
      for: 5m
      labels:
        criteria: overuse
        severity: critical
        team: support
      annotations:
        summary: "{{ $labels.instance }} 's  ({{ $labels.device }}) has low 
space."
        description: "space on {{ $labels.mountpoint }} file system at {{ 
$labels.instance }} server = {{ $value }}%."

2. at the alert manager , we have created notification rules to notify in 
case the aforementioned condition occurs:

  smtp_from: '[email protected]'
  smtp_require_tls: false
  smtp_smarthost: '[email protected]:25'

templates:
  - /home/ALERTMANAGER/conf/template/*.tmpl

route:
  group_wait: 5m
  group_interval: 2h
  repeat_interval: 5h
  receiver: admin
  routes:
  - match_re:
      alertname: ".*Local Disk usage has reached .*%"
    receiver: admin
    routes:
    - match:
        criteria: overuse
        severity: critical
        team: support
      receiver: mailsupport
      continue: true
    - match:
        criteria: overuse
        team: support
        severity: critical
        receiver: opsgeniesupport

receivers:
  - name: opsgeniesupport
    opsgenie_configs:
    - api_key: XYZ
      api_url: https://api.opsgenie.com
      message: '{{ .CommonLabels.alertname }}'
      description: "{{ range .Alerts }}{{ .Annotations.description }}\n\r{{ 
end }}"
      tags: '{{ range $k, $v := .CommonLabels}}{{ if or (eq $k "criteria")  
(eq $k "severity") (eq $k "team") }}{{$k}}={{$v}},{{ else if eq $k 
"instance" }}{{ reReplaceAll "(.+):(.+)" "host=$1" $v 
}},{{end}}{{end}},infra,monitor'
      priority: 'P1'
      update_alerts: true
      send_resolved: true
...
So you can see that i derive a  tag host=<hostname> from the instance label.


*Scenario1: *When server1 's local disk usage reaches 50%, i see that 
Opsgenie ticket is created having:
Opsgenie Ticket metadata: 
ticket header name:  local disk usage reached 50%
ticket description:  space on /var file system at server1:9100 server = 
82%."
ticket tags: criteria: overuse , team: support, severity: critical, 
infra,monitor,host=server1

so everything works as expected, no issues with Scenario1.


*Scenario2: *While server1 trigger is active, a second server ( say 
server2)'s local disk usage reaches 50%,

i see that Opsgenie tickets are getting updated as:
ticket header name:  local disk usage reached 50%
ticket description:  space on /var file system at server1:9100 server = 
82%."
ticket description:  space on /var file system at server2:9100 server = 
80%."
ticket tags: criteria: overuse , team: support, severity: critical, 
infra,monitor,host=server1


but i was expecting an additional host=server2 tag on the ticket.  
in Summary - i see updated description , but unable to see updated tags.

in tags section of the alertmanager - opsgenie integration configuration , 
i had tried iterating over Alerts and CommonLabels, but i was unable to 
add  additional host=server2 tag .
{{ range $idx, $alert := .Alerts}}{{range $k, $v := $alert.Labels 
}}{{$k}}={{$v}},{{end}}{{end}},test=test
{{ range $k, $v := .CommonLabels}}....{{end}}


At the moment, i am not sure that what is potentially preventing the update 
of tags on the opsgenie tickets.
If i can get some clarity on the fact that if the configurations i have 
for  alertmanager are good enough, then i can look at the opsgenie 
configurations.


Please advice.


Regards
CP


On Tuesday, April 2, 2024 at 10:46:36 PM UTC+5:30 Brian Candler wrote:

> FYI, those images are unreadable - copy-pasted text would be much better.
>
> My guess, though, is that you probably don't want to group alerts before 
> sending them to opsgenie. You haven't shown your full alertmanager config, 
> but if you have a line like
>
>    group_by: ['alertname']
>
> then try
>
>    group_by: ["..."]
>
> (literally, exactly that: a single string containing three dots, inside 
> square brackets)
>
> On Tuesday 2 April 2024 at 17:15:39 UTC+1 mohan garden wrote:
>
>> Dear Prometheus Community,
>> I am reaching out regarding an issue i have encountered with  prometheus 
>> alert tagging, specifically while creating tickets in Opsgenie.
>>
>>
>> I have configured alertmanager  to send alerts to Opsgenie as , the 
>> configuration as :
>> [image: photo001.png]i ticket is generated with expected description and 
>> tags as - 
>> [image: photo002.png]
>>
>> Now, by default the alerts are grouped by the alert name( default 
>> behavior).So when the similar event happens on a different server i see 
>> that the description is updated as:
>> [image: photo003.png]
>> but the tag on the ticket remains same, 
>> expected behavior: criteria=..., host=108, host=114, infra.....support 
>>
>> I have set update_alert and send_resolved settings to true.
>> I am not sure that in order to make it work as expected, If i need 
>> additional configuration at opsgenie or at the alertmanager. 
>>
>> I would appreciate any insight or guidance on the method to resolve this 
>> issue and ensure that alerts for different servers are correctly tagged in 
>> Opsgenie.
>>
>> Thank you in advance.
>> Regards,
>> CP
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/eee12f80-f738-4bb0-99bc-ecccf86d4907n%40googlegroups.com.

[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

Reply via email to