Re: [prometheus-developers] consul_exporter - Expose health statuses as values?

Matt Russi Mon, 04 Oct 2021 14:26:09 -0700

Bump. :) 

Example of what would be changed in the exporter 
itself: 
https://github.com/mattrussi/consul_exporter/commit/41b04338a8a0e2af14b789e484d3bc33e057a56c


On Thursday, September 2, 2021 at 8:42:38 PM UTC-4 Matt Russi wrote:

> The query syntax would not be drastically impacted (though I understand 
> there is still a change). The significance would be a 75% reduction in the 
> number of series generated by these metrics. Less to store and compute.
>
> *> "what fraction of nodes is down"*
>
> Current:
>     count (consul_health_node_status{status!="passing"} == 1)
>     /
>     count (consul_health_node_status)
>
> Proposed:
>     count (consul_health_node_status == 0)
>     /
>     count (consul_health_node_status)
>
> *> "which nodes have multiple services down?"*
>
> Current:
>     count by (node) (consul_health_service_status{status!="passing"} == 1) 
> > 1
>
> Proposed:
>     count by (node) (consul_health_service_status == 0) > 1
>
> *> What service checks are critical?*
>
> Current:
>     consul_health_service_status{status="critical"} == 1
>
> Proposed:
>     consul_health_service_status == 0
>
>
> On Tuesday, August 17, 2021 at 5:01:27 AM UTC-4 [email protected] 
> wrote:
>
>> What would some common queries be that this affects, and how would they 
>> look in the future? For example, "what fraction of nodes is down" "which 
>> nodes have multiple services down?"
>>
>> /MR
>>
>> On Mon, Aug 16, 2021, 22:31 Matt Russi <[email protected]> wrote:
>>
>>> Currently, the consul_exporter exposes 4 series per health_node and 
>>> health_service status check. Each with a label indicating the status 
>>> (maintenance, warning, critical, or passing). In larger environments, this 
>>> creates quite a few extra series. 
>>>
>>> As somewhat of a precedent, the status is already being mapped to a 
>>> value for the consul_serf_lan_member_status metric (as Consul's API 
>>> provides this mapping).
>>> # HELP consul_serf_lan_member_status Status of member in the cluster. 
>>> 1=Alive, 2=Leaving, 3=Left, 4=Failed.
>>>
>>> I wanted to get some thoughts around this before pursuing a PR.
>>>
>>> In my example, I used -2=maintenance, -1=warning, 0=critical, and 
>>> 1=passing to fall in line with the Prometheus paradigm of up=0 (down) and 
>>> up=1 (up). Since we have two additional values, the negative numbers play 
>>> more nicely when trying to do a value mapping in Grafana. Not married to 
>>> the values themselves though. :) 
>>>
>>> Present Example:
>>> consul_health_node_status{check="serfHealth",node="example_node",status="critical"}
>>>  
>>> 0
>>> consul_health_node_status{check="serfHealth",node="example_node",status="maintenance"}
>>>  
>>> 0
>>> consul_health_node_status{check="serfHealth",node="example_node",status="passing"}
>>>  
>>> 1
>>> consul_health_node_status{check="serfHealth",node="example_node",status="warning"}
>>>  
>>> 0
>>>
>>> consul_health_service_status{check="service:10.0.0.1_443",node="example_node",service_id="10.0.0.1_443",service_name="auth_service",status="critical"}
>>>  
>>> 0
>>> consul_health_service_status{check="service:10.0.0.1_443",node="example_node",service_id="10.0.0.1_443",service_name="auth_service",status="maintenance"}
>>>  
>>> 0
>>> consul_health_service_status{check="service:10.0.0.1_443",node="example_node",service_id="10.0.0.1_443",service_name="auth_service",status="passing"}
>>>  
>>> 1
>>> consul_health_service_status{check="service:10.0.0.1_443",node="example_node",service_id="10.0.0.1_443",service_name="auth_service",status="warning"}
>>>  
>>> 0
>>>
>>> Proposed Example:
>>> # HELP consul_health_node_status Status of health checks associated with 
>>> a node. -2=maintenance, -1=warning, 0=critical, 1=passing
>>> consul_health_node_status{check="serfHealth",node="example_node"} 1
>>>
>>> # HELP consul_health_service_status Status of health checks associated 
>>> with a service. -2=maintenance, -1=warning, 0=critical, 1=passing 
>>> consul_health_service_status{check="service:10.0.0.1_443",node="example_node",service_id="10.0.0.1_443",service_name="auth_service"}
>>>  
>>> 1
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Prometheus Developers" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/prometheus-developers/9bb6b446-728d-47d9-8a08-355dec88d572n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/prometheus-developers/9bb6b446-728d-47d9-8a08-355dec88d572n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/d256e88a-3a75-4f02-8589-c60be5eff836n%40googlegroups.com.

Re: [prometheus-developers] consul_exporter - Expose health statuses as values?

Reply via email to