I've recently started monitoring a large fleet of hardware devices using a 
combination of blackbox, snmp, node, and json exporters.
I started out using the *up* metric, but I noticed when using blackbox 
ping, *up* is *always* 1 even when the device is offline.  So I plan to 
switch to *probe_success* instead.  But I'm thinking about the implications 
of this when mixed with other exporters.  For example json-exporter does 
not offer a *probe_success* metric; instead it returns *up*=0 when the 
target times out.

My goal is to build a Grafana dashboard and alerts that monitors a 
combination of blackbox and other exporters.  For context, when certain 
devices crash, they remain pingable, but they return their failed state via 
REST API.  So I'm setting the json-exporter to an HTTP target endpoint.  
I'm struggling to come up with a unified way of monitoring all these 
different devices types.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/1746ad20-654f-499c-ae1d-28b84d3cb962n%40googlegroups.com.

Reply via email to