Re: [prometheus-users] Counter or Gauge metric?

'Brian Candler' via Prometheus Users Sun, 21 Jul 2024 01:13:53 -0700

On Sunday 21 July 2024 at 00:51:48 UTC+1 Christoph Anton Mitterer wrote:

Hey.

On Sat, 2024-07-20 at 10:26 -0700, 'Brian Candler' via Prometheus Users
wrote:
>
> If the label stays constant, then the amount of extra space required
> is tiny. There is an internal mapping between a bag of labels and a
> timeseries ID.

Is it the same if one uses a metric (like for the RPMs from below) and
that never changes? I mean is that also efficient?

Yes:

smartraid_physical_drive_rotational_speed_rpm 7200
smartraid_info{rpm="7200"} 1

are both static timeseries. Prometheus does delta compression; if you store
the same value repeatedly the difference between adjacent points is zero.
It doesn't matter if the timeseries value is 1 or 7200.

> But if any label changes, that generates a completely new timeseries.
> This is not something you want to happen too often (a.k.a "timeseries
> churn"), but moderate amounts are OK.

Why exactly wouldn't one want this? I mean especially with respect to
such _info metrics.

It's just a general consideration. When a timeseries churns you get new a
new index entry, new head blocks etc.

For info metrics which rarely change, it's fine.

The limiting worst case is where you have a label value that changes every
sample (for example, putting a timestamp in a label). Then every scrape
generates a new timeseries containing one point. Have a few hundred
thousand scrapes like that and your server will collapse.

Graphing _info time series doesn't make sense anyway... so it's not as
if one would get some usable time series/graph (like a temperature or
so) interrupted, if e.g. the state changes for a while from OK to
degraded.

Indeed, and Grafana has a swim-lanes type view that works quite well for
that. When a time series disappears, it goes "stale". But the good news
is, for quite some time now, Prometheus has been automatically inserting
staleness markers for a timeseries which existed in a previous scrape but
not in the current scrape from the same job and target.

Prior to that, timeseries would only go stale if there had been no data
point ingested for 5 minutes, so it would be very unclear when the
timeseries had actually vanished.

I guess with appearing/disappearing you mean, that one has to take into
account, that e.g. pd_info{state=="OK",pd_name="foo"} won't exist while
"foo" is failed... and thus e.g. when graphing the OK-times of a
device, it would per default show nothing during that time and not a
value of zero?

Yes. And it's a bit harder to alert on that condition, but you just have to
approach it the right way. As you've realised, you can alert on the
presence of a timeseries with a label not "OK", which is easier than
alerting on the absence of a timeseries whose label is "OK".

> The other option, if the state values are integer enumerations at
> source (e.g. as from SNMP), is to store the raw numeric value:
>
> foo 3
>
> That means the querier has to know how the meaning of these values.
> (Grafana can map specific values to textual labels and/or colours
> though).

But that also requires me to use a label like in enum_metric{value=3},

No, I mean

my_metric{other_labels="dontcare"} 3

An example is ifOperStatus in SNMP, where the meaning of values 1, 2, 3
...etc is defined in the MIB.

or I have to construct metric names dynamically (which I could also
have done for the symbolic name), which seems however discouraged (and
I'd say for good reasons)?

Don't generate metric names dynamically. That's what labels are for. (In
any case, the metric name is itself just a hidden label called "__name__")

There is good advice at https://prometheus.io/docs/practices/naming/

I mean if both, label and metric, are equally efficient (in therms of
storage)... then using a metric would have still the advantage of being
able to do things like:
smartraid_logical_drive_chunk_size_bytes > (256*1024)
i.e. select those LDs, that use a chunk size > 256 KiB ... which I
cannot (as easily) do if it's in a label.

Correct. The flip side is if you want to see at a glance all the
information about a logical volume, you'll need to look at a bunch of
different metrics and associate them by some common label (e.g. a unique
volume ID)

Both approaches are valid. If you see a use case for the filtering or
arithmetic, that pushes you down the path of separate metrics.

If you're comparing a hundred static metrics versus a single metric with a
hundred labels then I'd *guess* the single metric would be a bit more
efficient in terms of storage and ingestion performance, but it's marginal
and shouldn't really be a consideration: data is there to be used, so put
it in whatever form allows you to make best use of it.

You can look at other exporters for guidance. For example, node_exporter
has node_md_* metrics for MD arrays. It provides a combined metric:

node_md_info{ActiveDevices="10", ChunkSize="512K", ConsistencyPolicy="none"
, CreationTime="Fri Feb 19 12:20:25 2021", FailedDevices="0", Layout=
"-unknown-", Name="dar6:127 (local to host dar6)", Persistence="Superblock
is persistent", RaidDevices="10", RaidLevel="raid0", SpareDevices="0", State
="clean ", TotalDevices="10", UUID="6c1f02c0:4ade9cee:17936d5f:1990e5db",
Version="1.2", WorkingDevices="10", md_device="md127", md_metadata_version=
"1.2", md_name="dar6:127", md_num_raid_disks="10", raid_level="0"} 1

But there are also separate metrics for:

node_md_info_ActiveDevices
node_md_info_ArraySize
node_md_info_Events
node_md_info_FailedDevices
node_md_info_RaidDevices
node_md_info_SpareDevices
node_md_info_TotalDevices
node_md_info_WorkingDevices

Should I have made e.g. only one
smartraid_temperature{type="bla"}
value (or perhaps a bit more than just "type",
with "bla" being e.g. controller, capacitor, cache_module or "sensor".

I.e. putting all temperatures in one metric, rather than the 4
different ones I have now (and where I have no "type" label).

Personally I'd make all "temperature readings" be one metric, with labels
to distinguish them. It's more useful for graphing and aggregation.
Having lots of different metrics is just harder to work with, in particular
when it comes to making dashboards for them. It's easy to go the other way
(e.g. select all temperature readings with type="bla")

However as a guideline, I'd suggest that all the readings for a given
metric have the *same* set of labels, and they should all be non-empty
(since an empty label is exactly the same as an absent label). That is: if
you decide to categorise your temperature readings with two labels, say foo
and bar, then every temperature reading should have foo="..." and
bar="...".

It *can* work if you don't follow that rule, but it's much easier to make
mistakes especially with join queries, so don't make your life harder than
it needs to be.

--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/b0e9c306-99d0-4fa1-a934-bb43673c433fn%40googlegroups.com.

Re: [prometheus-users] Counter or Gauge metric?

Reply via email to