To add to what Ben Kochie said, in general the distinction between
counters and gauges is not particularly strong at the PromQL level, in
that you can apply functions like rate() or resets() to either of them.
I believe it mostly affects the help text, and perhaps some downstream
things like Grafana will try to prevent you from applying counter
functions to metrics that are officially labelled as gauges.

(Client libraries to create Prometheus metrics may not be so
accommodating, but here you're directly turning things into metrics
instead of trying to track them.)

Also, some drives may roll over their powered-on-hours count all on
their own, even without being replaced. Possibly this is restricted to
consumer drives, but we definitely have some SSDs in our collection of
fileservers and regular servers that have POH figures that are far too
low for their actual service lifetimes.

(I would definitely include the serial number or other unique
identification in the per-drive metrics. If you keep this data for a
long enough time, it will definitely save you headaches later when you
go back to look at and analyze it.)

        - cks

> This is one of those tricky situations where there's not a strict correct
> answer.
>
> For power-on-hours I would probably go with a gauge.
> * You don't really have a "perfect" monotonic counter here.
> * I would also include the serial number label as well, just for uniqueness
> identification sake.
> * Power-on-hours doesn't really have a lot of use as a counter. Do actually
> want to display a counter like `rate(power_on_hours[1h])`?
>
> On Fri, Jul 19, 2024 at 4:54 AM Christoph Anton Mitterer <[email protected]>
> wrote:
>
> > Hey.
> >
> > I'm writing an exporter for SmartRAID controllers.
> >
> > One of the metrics I'm collecting is the power-on-hours of physical
> > drives, where the value is the number of hours (which can obviously only
> > increase) and with labels that identify the drive by the controller number
> > and a drive name (which itself consist of the port/box/bay the drive is
> > connected to).
> >
> > Now I wonder whether this should be a counter or a gauge.
> >
> > As the value can only increase it sounds of course like a counter....
> > ... however, a physical drive might be replaced (e.g. after it broke) and
> > the new drive would likely have the same drive name (because it's mounted
> > at the same place), yet the power-on-hours would of course be lower for a
> > new drive.
> >
> > So question is, especially with PromQL functions like increase() and
> > rate(), which detect counter resets, whether this should be a counter or a
> > gauge.
> >
> > Or whether the proper approach would be counter + another label which
> > makes the time series unique (like drive serial number), though I'd rather
> > no go that approach (because the serial is already stored in an info
> > metric, and I don't wanna store it twice).
> >
> > Thanks,
> > Chris.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/819365.1721402929%40apps0.cs.toronto.edu.

Reply via email to