[prometheus-users] Questions about best way to monitor CPU usage and accuracy and container life-time

Simon Hardy-Francis Fri, 16 Aug 2024 21:30:24 -0700

Hello!

I am a relative newbie to Prometheus and promql and have the following 
questions:


Question 1: I tried to find the CPU used by containers in the last 1 
minute, using 3 different ways via promql, and the ways do not agree with 
each other.

One way is finding the absolute value for " 
container_cpu_usage_seconds_total" before and after and subtracting the 
values. The other way is using "rate" and the last way is using "increase". 
Why do the "rate" and "increase" ways show so much less CPU being used?

Question 2: Let's say I want to find the CPU used for all containers active 
between 1 and 2 hours ago. And to discover how long each container is 
active for the case that e.g. a container only existed for e.g. 10 minutes 
of that 60 minute window. How to create a promql command to do that?

Thanks,
Simon

P.S.: Here are my commands:

$ cat promql.try-1m.sh
(promql --timeout 300 --no-headers --host 'https://<my prometheus server>' 
'sort_desc(sum by (instance, namespace, node, pod, container) 
(container_cpu_usage_seconds_total{container!=""}))' > promql.before.txt) &
sleep 60
(promql --timeout 300 --no-headers --host 'https://<my prometheus server>' 
'sort_desc(sum by (instance, namespace, node, pod, container) 
(container_cpu_usage_seconds_total{container!=""}))' > promql.after.txt) &
(promql --timeout 300 --no-headers --host 'https://<my prometheus server>' 
'sum(rate(container_cpu_usage_seconds_total{container!=""}[1m])) by (node, 
instance, namespace, pod, container)' > promql.rate-1m.txt) &
(promql --timeout 300 --no-headers --host 'https://<my prometheus server>' 
'sum(increase(container_cpu_usage_seconds_total{container!=""}[1m])) by 
(node, instance, namespace, pod, container)' > promql.increase-1m.txt) &

$ ./promql.try-1m.sh

$ ls -al promql.*.txt
-rw-rw-r-- 1 simon simon 3845801 Aug 16 15:36 promql.before.txt
-rw-rw-r-- 1 simon simon 3844193 Aug 16 15:37 promql.after.txt
-rw-rw-r-- 1 simon simon 3565377 Aug 16 15:37 promql.increase-1m.txt
-rw-rw-r-- 1 simon simon 3591045 Aug 16 15:37 promql.rate-1m.txt

$ cat promql.before.txt | egrep kube-router | head -1
kube-router  10.34.28.252:10250  kube-system  <my k8s node> 
 kube-router-kjxnp  826909.35828385  2024-08-16T15:36:16-07:00

$ cat promql.after.txt | egrep kube-router | egrep 
"kube-router.*kube-router-kjxnp"
kube-router  10.34.28.252:10250  kube-system  <my k8s node> 
 kube-router-kjxnp  826929.923827853  2024-08-16T15:37:16-07:00

$ perl -e 'printf qq[%f\n], 826929.923827853 - 826909.35828385;'
20.565544

$ cat promql.increase-1m.txt | egrep kube-router | egrep 
"kube-router.*kube-router-kjxnp"
kube-router  10.34.28.252:10250  kube-system  <my k8s node> 
 kube-router-kjxnp  5.908468843536704  2024-08-16T15:37:16-07:00

$ cat promql.rate-1m.txt | egrep kube-router | egrep 
"kube-router.*kube-router-kjxnp"
kube-router  10.34.28.252:10250  kube-system  <my k8s node> 
 kube-router-kjxnp  0.09846672408107424  2024-08-16T15:37:16-07:00

$ perl -e 'printf qq[%f\n], 0.09846672408107424 * 60;'
5.908003

$ perl -e 'printf qq[%f%%\n], (20.565544 - 5.908003) / 5.908003 * 100;'
248.096370%

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/19a822ad-3ceb-4863-a64f-e2a98982ac59n%40googlegroups.com.

[prometheus-users] Questions about best way to monitor CPU usage and accuracy and container life-time

Reply via email to