[prometheus-users] Re: Is Prometheus and PromQL suitable for working on a metric that doesn't change much?

'Brian Candler' via Prometheus Users Fri, 20 Dec 2024 06:57:00 -0800

You wouldn't even need the "sum" operator, since that sums across multiple 
timeseries. There would be a single import_process_total timeseries.


Queries might be:

import_process_total - import_process_total offset 1h
rate(import_process_total[1h])
... etc

On Friday, 20 December 2024 at 13:03:04 UTC Florian Luce wrote:

> Hey Brian,
>
> Thx for the answer, if i understand correctly prom. can be used for this 
> use case, but I need to integrate a new component (aka statsd_exporter) to 
> clean metrics and remove "non static" label value. 
>
> Your approach allows you to go from, several time series, e.g:
>
>    - import_process_total{pod=“ pod-sdsdsf”}
>    - import_process_total{pod=“ pod-zzze”}
>    - import_process_total{pod=“ pod-fdssfdf”}
>    - ...
>
>
> to a single 
>
>    - import_process_total{}
>
> And then I can use the classic “sum” operator to get the total of the 
> (processed) import, increase...
>
> Is that the idea?
>
>
> Le vendredi 20 décembre 2024 à 12:39:35 UTC+1, Brian Candler a écrit :
>
> I see at least two distinct issues there.
>
> 1. "Is Prometheus and PromQL suitable for working on a metric that doesn't 
> change much?" - quite simply, "yes". Prometheus uses delta compression, so 
> adjacent identical values compress extremely well.  Indeed, Prometheus is 
> often used for metrics which *never* change, so long as the labels are 
> static, for example:
> node_os_info{id="ubuntu", id_like="debian", name="Ubuntu", 
> pretty_name="Ubuntu 22.04.5 LTS", version="22.04.5 LTS (Jammy Jellyfish)", 
> version_codename="jammy", version_id="22.04"} 1
> The overhead of scraping this repeatedly is tiny.
>
> 2. You have a specific issue with distributed counters. Ideally you'd use 
> sum(import_processed_total) to get the total amount of work done over all 
> pods, but that's not reliable because parts of the counter will *vanish* 
> when the pod terminates, and you don't want the total counter to go down.
>
> I think the best solution is to accumulate your counters in some other 
> external process, such as statsd_exporter. Send a '+1' whenever you do some 
> work. The value scraped from statsd_exporter will be the total amount of 
> work done, independently of which pod has performed the work. That is fine 
> for both total work done and for calculating the overall rate of processing 
> work.
>
> On Friday, 20 December 2024 at 10:52:37 UTC Florian Luce wrote:
>
> Hi everyone,
>
> I have a use case where I'm trying to track the use of a feature that 
> isn't often used, and I've decided to use a counter. 
>
> To give you some stats, for the moment this counter will be incremented 50 
> times over 24h on average.
>
> This functionality is implemented within a service that is deployed and 
> replicated on 10 to 20 pods (infra k8s), with metrics scrapped at regular 
> frequency (30sec). We have a label on the metrics to identify the pods and 
> avoid collisions, so this metric evolves very little and is spread over a 
> number of time series.
>
> Here's a small example of the “flat” side of this metric
>
> [image: Capture d’écran 2024-12-20 à 09.26.03.png]
>  
> The first problem we had to solve was losing the 0 to 1 transition (we 
> tested the feature beta created timestamps zero injection 
> <https://prometheus.io/docs/prometheus/latest/feature_flags/#created-timestamps-zero-injection>,
>  
> but it generated a significant CPU overload, so we didn't activate it). 
>
> So we went with a request like this : 
>
> clamp_min(
> sum (max_over_time(import_processed_total{}[1m]) or vector(0)) 
> - sum (max_over_time(import_processed_total{}[1m] offset 1m) or vector(0)), 
>
> 0)
>
> And i fix the "Min interval" of query options in grafana to 1m.
>
> It's still imperfect at the end of time series, but arrives at a result 
> close to reality if I analyze it over time windows of 24 / 48 hours.
>
> However, it becomes unusable if I use this approach over 30 days.
>
> The questions I have are the following:
>
>    - Is there a different approach (promql query) to exploit this metric 
>    without losing precision?
>    - Is Prometheus suitable for this kind of use case?
>    - Couldn't an “adaptive metrics 
>    
> <https://grafana.com/blog/2023/05/09/adaptive-metrics-grafana-cloud-announcement/>”
>  
>    approach be a solution for cleaning up this metric and generating a 
>    synthetic version for one day, which can then be analyzed over 30 days?
>
> Thx for the read and futur answers
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/f7688590-95a7-4da0-909d-c173e0cb75dfn%40googlegroups.com.

[prometheus-users] Re: Is Prometheus and PromQL suitable for working on a metric that doesn't change much?

Reply via email to