[prometheus-users] Re: Questions about best way to monitor CPU usage and accuracy and container life-time

Simon Hardy-Francis Mon, 19 Aug 2024 10:34:55 -0700

I think I found one answer to question 2, which is to turn on this via "
--enable-feature=promql-at-modifier" feature [1].
However, [2] says such " features .. are disabled by default since they are 
breaking changes or are considered experimental". So not sure if I want to 
use it.


[1] https://prometheus.io/blog/2021/02/18/introducing-the-@-modifier/
[2] https://prometheus.io/docs/prometheus/2.45/feature_flags/

On Friday, August 16, 2024 at 9:30:27 PM UTC-7 Simon Hardy-Francis wrote:

> Hello!
>
> I am a relative newbie to Prometheus and promql and have the following 
> questions:
>
> Question 1: I tried to find the CPU used by containers in the last 1 
> minute, using 3 different ways via promql, and the ways do not agree with 
> each other.
>
> One way is finding the absolute value for " 
> container_cpu_usage_seconds_total" before and after and subtracting the 
> values. The other way is using "rate" and the last way is using "increase". 
> Why do the "rate" and "increase" ways show so much less CPU being used?
>
> Question 2: Let's say I want to find the CPU used for all containers 
> active between 1 and 2 hours ago. And to discover how long each container 
> is active for the case that e.g. a container only existed for e.g. 10 
> minutes of that 60 minute window. How to create a promql command to do that?
>
> Thanks,
> Simon
>
> P.S.: Here are my commands:
>
> $ cat promql.try-1m.sh
> (promql --timeout 300 --no-headers --host 'https://<my prometheus server>' 
> 'sort_desc(sum by (instance, namespace, node, pod, container) 
> (container_cpu_usage_seconds_total{container!=""}))' > promql.before.txt) &
> sleep 60
> (promql --timeout 300 --no-headers --host 'https://<my prometheus server>' 
> 'sort_desc(sum by (instance, namespace, node, pod, container) 
> (container_cpu_usage_seconds_total{container!=""}))' > promql.after.txt) &
> (promql --timeout 300 --no-headers --host 'https://<my prometheus server>' 
> 'sum(rate(container_cpu_usage_seconds_total{container!=""}[1m])) by (node, 
> instance, namespace, pod, container)' > promql.rate-1m.txt) &
> (promql --timeout 300 --no-headers --host 'https://<my prometheus server>' 
> 'sum(increase(container_cpu_usage_seconds_total{container!=""}[1m])) by 
> (node, instance, namespace, pod, container)' > promql.increase-1m.txt) &
>
> $ ./promql.try-1m.sh
>
> $ ls -al promql.*.txt
> -rw-rw-r-- 1 simon simon 3845801 Aug 16 15:36 promql.before.txt
> -rw-rw-r-- 1 simon simon 3844193 Aug 16 15:37 promql.after.txt
> -rw-rw-r-- 1 simon simon 3565377 Aug 16 15:37 promql.increase-1m.txt
> -rw-rw-r-- 1 simon simon 3591045 Aug 16 15:37 promql.rate-1m.txt
>
> $ cat promql.before.txt | egrep kube-router | head -1
> kube-router  10.34.28.252:10250  kube-system  <my k8s node> 
>  kube-router-kjxnp  826909.35828385  2024-08-16T15:36:16-07:00
>
> $ cat promql.after.txt | egrep kube-router | egrep 
> "kube-router.*kube-router-kjxnp"
> kube-router  10.34.28.252:10250  kube-system  <my k8s node> 
>  kube-router-kjxnp  826929.923827853  2024-08-16T15:37:16-07:00
>
> $ perl -e 'printf qq[%f\n], 826929.923827853 - 826909.35828385;'
> 20.565544
>
> $ cat promql.increase-1m.txt | egrep kube-router | egrep 
> "kube-router.*kube-router-kjxnp"
> kube-router  10.34.28.252:10250  kube-system  <my k8s node> 
>  kube-router-kjxnp  5.908468843536704  2024-08-16T15:37:16-07:00
>
> $ cat promql.rate-1m.txt | egrep kube-router | egrep 
> "kube-router.*kube-router-kjxnp"
> kube-router  10.34.28.252:10250  kube-system  <my k8s node> 
>  kube-router-kjxnp  0.09846672408107424  2024-08-16T15:37:16-07:00
>
> $ perl -e 'printf qq[%f\n], 0.09846672408107424 * 60;'
> 5.908003
>
> $ perl -e 'printf qq[%f%%\n], (20.565544 - 5.908003) / 5.908003 * 100;'
> 248.096370%
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/6e7e2f0c-a09b-48f7-936a-9d6ba309f381n%40googlegroups.com.

[prometheus-users] Re: Questions about best way to monitor CPU usage and accuracy and container life-time

Reply via email to