You wouldn't even need the "sum" operator, since that sums across multiple
timeseries. There would be a single import_process_total timeseries.
Queries might be:
import_process_total - import_process_total offset 1h
rate(import_process_total[1h])
... etc
On Friday, 20 December 2024 at 13:03:04 UTC Florian Luce wrote:
> Hey Brian,
>
> Thx for the answer, if i understand correctly prom. can be used for this
> use case, but I need to integrate a new component (aka statsd_exporter) to
> clean metrics and remove "non static" label value.
>
> Your approach allows you to go from, several time series, e.g:
>
> - import_process_total{pod=“ pod-sdsdsf”}
> - import_process_total{pod=“ pod-zzze”}
> - import_process_total{pod=“ pod-fdssfdf”}
> - ...
>
>
> to a single
>
> - import_process_total{}
>
> And then I can use the classic “sum” operator to get the total of the
> (processed) import, increase...
>
> Is that the idea?
>
>
> Le vendredi 20 décembre 2024 à 12:39:35 UTC+1, Brian Candler a écrit :
>
> I see at least two distinct issues there.
>
> 1. "Is Prometheus and PromQL suitable for working on a metric that doesn't
> change much?" - quite simply, "yes". Prometheus uses delta compression, so
> adjacent identical values compress extremely well. Indeed, Prometheus is
> often used for metrics which *never* change, so long as the labels are
> static, for example:
> node_os_info{id="ubuntu", id_like="debian", name="Ubuntu",
> pretty_name="Ubuntu 22.04.5 LTS", version="22.04.5 LTS (Jammy Jellyfish)",
> version_codename="jammy", version_id="22.04"} 1
> The overhead of scraping this repeatedly is tiny.
>
> 2. You have a specific issue with distributed counters. Ideally you'd use
> sum(import_processed_total) to get the total amount of work done over all
> pods, but that's not reliable because parts of the counter will *vanish*
> when the pod terminates, and you don't want the total counter to go down.
>
> I think the best solution is to accumulate your counters in some other
> external process, such as statsd_exporter. Send a '+1' whenever you do some
> work. The value scraped from statsd_exporter will be the total amount of
> work done, independently of which pod has performed the work. That is fine
> for both total work done and for calculating the overall rate of processing
> work.
>
> On Friday, 20 December 2024 at 10:52:37 UTC Florian Luce wrote:
>
> Hi everyone,
>
> I have a use case where I'm trying to track the use of a feature that
> isn't often used, and I've decided to use a counter.
>
> To give you some stats, for the moment this counter will be incremented 50
> times over 24h on average.
>
> This functionality is implemented within a service that is deployed and
> replicated on 10 to 20 pods (infra k8s), with metrics scrapped at regular
> frequency (30sec). We have a label on the metrics to identify the pods and
> avoid collisions, so this metric evolves very little and is spread over a
> number of time series.
>
> Here's a small example of the “flat” side of this metric
>
> [image: Capture d’écran 2024-12-20 à 09.26.03.png]
>
> The first problem we had to solve was losing the 0 to 1 transition (we
> tested the feature beta created timestamps zero injection
> <https://prometheus.io/docs/prometheus/latest/feature_flags/#created-timestamps-zero-injection>,
>
> but it generated a significant CPU overload, so we didn't activate it).
>
> So we went with a request like this :
>
> clamp_min(
> sum (max_over_time(import_processed_total{}[1m]) or vector(0))
> - sum (max_over_time(import_processed_total{}[1m] offset 1m) or vector(0)),
>
> 0)
>
> And i fix the "Min interval" of query options in grafana to 1m.
>
> It's still imperfect at the end of time series, but arrives at a result
> close to reality if I analyze it over time windows of 24 / 48 hours.
>
> However, it becomes unusable if I use this approach over 30 days.
>
> The questions I have are the following:
>
> - Is there a different approach (promql query) to exploit this metric
> without losing precision?
> - Is Prometheus suitable for this kind of use case?
> - Couldn't an “adaptive metrics
>
> <https://grafana.com/blog/2023/05/09/adaptive-metrics-grafana-cloud-announcement/>”
>
> approach be a solution for cleaning up this metric and generating a
> synthetic version for one day, which can then be analyzed over 30 days?
>
> Thx for the read and futur answers
>
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/prometheus-users/f7688590-95a7-4da0-909d-c173e0cb75dfn%40googlegroups.com.