What would some common queries be that this affects, and how would they look in the future? For example, "what fraction of nodes is down" "which nodes have multiple services down?"
/MR On Mon, Aug 16, 2021, 22:31 Matt Russi <[email protected]> wrote: > Currently, the consul_exporter exposes 4 series per health_node and > health_service status check. Each with a label indicating the status > (maintenance, warning, critical, or passing). In larger environments, this > creates quite a few extra series. > > As somewhat of a precedent, the status is already being mapped to a value > for the consul_serf_lan_member_status metric (as Consul's API provides this > mapping). > # HELP consul_serf_lan_member_status Status of member in the cluster. > 1=Alive, 2=Leaving, 3=Left, 4=Failed. > > I wanted to get some thoughts around this before pursuing a PR. > > In my example, I used -2=maintenance, -1=warning, 0=critical, and > 1=passing to fall in line with the Prometheus paradigm of up=0 (down) and > up=1 (up). Since we have two additional values, the negative numbers play > more nicely when trying to do a value mapping in Grafana. Not married to > the values themselves though. :) > > Present Example: > consul_health_node_status{check="serfHealth",node="example_node",status="critical"} > 0 > consul_health_node_status{check="serfHealth",node="example_node",status="maintenance"} > 0 > consul_health_node_status{check="serfHealth",node="example_node",status="passing"} > 1 > consul_health_node_status{check="serfHealth",node="example_node",status="warning"} > 0 > > consul_health_service_status{check="service:10.0.0.1_443",node="example_node",service_id="10.0.0.1_443",service_name="auth_service",status="critical"} > 0 > consul_health_service_status{check="service:10.0.0.1_443",node="example_node",service_id="10.0.0.1_443",service_name="auth_service",status="maintenance"} > 0 > consul_health_service_status{check="service:10.0.0.1_443",node="example_node",service_id="10.0.0.1_443",service_name="auth_service",status="passing"} > 1 > consul_health_service_status{check="service:10.0.0.1_443",node="example_node",service_id="10.0.0.1_443",service_name="auth_service",status="warning"} > 0 > > Proposed Example: > # HELP consul_health_node_status Status of health checks associated with a > node. -2=maintenance, -1=warning, 0=critical, 1=passing > consul_health_node_status{check="serfHealth",node="example_node"} 1 > > # HELP consul_health_service_status Status of health checks associated > with a service. -2=maintenance, -1=warning, 0=critical, 1=passing > consul_health_service_status{check="service:10.0.0.1_443",node="example_node",service_id="10.0.0.1_443",service_name="auth_service"} > 1 > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-developers/9bb6b446-728d-47d9-8a08-355dec88d572n%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-developers/9bb6b446-728d-47d9-8a08-355dec88d572n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAMV%3D_ga0z4O3v_DbhQ-vgjJ42vH-UZWF3XrHCExPjDeXEZ8nbA%40mail.gmail.com.

