[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

mohan garden Sat, 27 Jul 2024 08:57:29 -0700

I plan to disable the grouping only for opsgenie routes and for specific 
set of alerts. Here is the example of current alert manager configuration - 
Example -


route:
  group_wait: 5m
  group_interval: 5m
  repeat_interval: 7h
  receiver: admin
  routes:
  - match_re:
      alertname: ".* Type1 Server is down.* "
    receiver: admingroup2
    routes:
    - match:
        team: support
        severity: critical
      receiver: opsgeniesupport
      group_wait: 1m
      group_interval: 5m
      repeat_interval: 6h
      continue: true
    - match:
        team: support
        severity: critical
      receiver: mailsupport
      group_wait: 1m
      group_interval: 1h
      repeat_interval: 12h

Q1:   Is is possible to disable the grouping for specific type of alerts ( 
Say  Type1 keyword in  alert manager) only for opsgenie route?  I am 
looking for something like - 

    - match:
        team: support
        severity: critical
      receiver: opsgeniesupport
   *   group_by: [instance]*
      group_wait: 1m
      group_interval: 5m
      repeat_interval: 6h
      continue: true
    - match:
        team: support
        severity: critical
      receiver: mailsupport
      *   group_by: [instance]*
      group_wait: 1m
      group_interval: 1h
      repeat_interval: 12h
Is this allowed by Alert Manager?


Q2:  Is it possible to change the alert name from the prometheus before 
prometheus dispatches alert to the alert manager?
- alert: "Type1 down or process monitoring service is unreachable"
      expr: up{ SERVER_CATEGORY='Type1'  } == 0
      for: 2m
      labels:
        severity: critical
        team: support
      annotations:
        summary: "{{ $labels.instance }} is not reachable"
        description: "{{ $labels.instance }} is not reachable"

    - alert: " Type1 down or process monitoring service is unreachable   - 
{{ $labels.instance}} " 

Hopefully this will help me as i am unable to get the appropriate tags in 
opsgenie using grouping.
Having host name tag will be helpful and we can know via JIRA integration 
that how many incidents have occured for a host in past.

Regards
MG
On Saturday, July 27, 2024 at 9:09:57 PM UTC+5:30 mohan garden wrote:

> Hi Brian,
> Thank you for the suggestion, 
> I was able to setup a flask application to monitor the data sent by alert 
> manager for opsgenie using api_url end point.
> I had to create 3 end points
> 1. POST for - / 
> 2. PUT for /v2/alerts/message
> 3. PUT for  /v2/alerts/description
>
>
> *POST:*
> {'alias': '<mangled>71c5c169a773796b467cc741f70457c4', 'message': 'Type1 
> Server is down or node exporter is unreachable', 'description': 
> 'server1:9100 server is down or prometheus is unable to query the node 
> exporter service which should be up and running.\n\rserver2:9100 server is 
> down or prometheus is unable to query the node exporter service which 
> should be up and running.\n\r', 'details': {'SERVER_CATEGORY': 'Type1', 
> 'SERVER_SITE': 'ind', 'alertname': 'Type1 Server is down or node exporter 
> is unreachable', 'criteria': 'nodedown', 'job': 'default_nodeexporters', 
> 'severity': 'critical', 'team': 'infrasupport'}, 'source': '
> http://alertmanager:9093/#/alerts?receiver=opsgenie_support', 'tags': 
> ['SERVER_CATEGORY=Type1', 'SERVER_SITE=ind', 'criteria=nodedown', 
> 'severity=critical', 'team=support', 'support', 'monitor', 
> 'server1:9100', 'server2:9100'], 'priority': 'P1'}
> 10.73.6.210 - - [27/Jul/2024 07:32:04] "POST /v2/alerts HTTP/1.1" 200 -
>
> *First PUT:*
> {'message': 'Utility Server is down or node exporter is unreachable'}
> 10.73.6.210 - - [27/Jul/2024 07:32:04] "PUT 
> /v2/alerts/<mangled>71c5c169a773796b467cc741f70457c4/message?identifierType=alias
>  
> HTTP/1.1" 200 -
>
> *Second PUT:*
> {'description': 'server1:9100 server is down or prometheus is unable to 
> query the node exporter service which should be up and 
> running.\n\rserver2:9100 server is down or prometheus is unable to query 
> the node exporter service which should be up and running.\n\r'}
> 10.73.6.210 - - [27/Jul/2024 07:32:04] "PUT 
> /v2/alerts/<mangled>71c5c169a773796b467cc741f70457c4/description?identifierType=alias
>  
> HTTP/1.1" 200 -
>
> It seems the alert manager needs to send another PUT request for updating 
> the opsgenie tags.
>
>
>
>
> On Wednesday, April 3, 2024 at 9:59:06 PM UTC+5:30 Brian Candler wrote:
>
>> On Wednesday 3 April 2024 at 16:01:21 UTC+1 mohan garden wrote:
>>
>> Is there a way i can see the entire message which alert manager sends out 
>> to the Opsgenie? - somewhere in the alertmanager logs or a text file?
>>
>>
>> You could try setting api_url to point to a webserver that you control.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/02681943-17e4-498b-a6be-d5222705186cn%40googlegroups.com.

[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

Reply via email to