[prometheus-users] Need a PromQL or Metric to see the HTTP request count or API calls against Prometheus endpoint

Moksha Reddy G Thu, 12 Sep 2024 18:38:05 -0700

Hi Everyone,

We are facing some strange and serious issue with our Prometheus pods 
running on Azure instances, frequent pod restarts occurring when the load 
balancer or DNS points to Prometheus. I need to know the metric or some 
sort of PromQL where we could see the incoming requests count and API calls 
against Prometheus endpoint at the time of pod restarts. I would really 
appreciate if you could assist me with this matter, thank you!Little 
background on what we did so far!

- I tried using scrape_samples_scraped metric and it is showing spike
that occurs once in a day BUT pods are getting restarted more than 20 times
in a day.
- Tried with http_requests_total metric as mentioned in
https://prometheus.io/docs/prometheus/latest/querying/examples/ BUT it
did not show any spike in the requests at all.
- I got this prometheus_http_requests_total metric and there also I
don't see any spike at all.

To remediate the pod restarts problem, we have performed below actions to
understand the cause of this frequent pod restarts but no luck. *Do you
have any recommendation or solution to stop these restarts?*

1. We spawn up new pods without pointing to any LB or DNS, this helped
NO pod restarts. But this is just for testing purpose as we cannot go live
without LB or DNS!
2. We checked the access logs to see if any HTTP requests from
applications are causing the issue. We are seeing many readiness probe
failures for Prometheus. As a workaround, we have increased the readiness
and liveliness checks timeout but this didn't help.
3. We tried deleting /wal directory to clear the broken files to avoid
Prometheus pod restarts but still the issues is same after few hours. NO
immediate restarts at least!
4. We have scaled up the Azure instance type to make pods having enough
resources to handle the load(which is invisible in Prometheus) and Azure
monitoring does not showing any spike or much usage still Prometheus pods
are getting restarted.
5. We had a call with App teams to cross check whether our Prometheus is
getting hit by any applications/services or some load test. We still see
the restarts even after we suspended some apps.

Best,
Moksh

--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/f2e63b64-227a-4498-b02c-701ee0bd4e52n%40googlegroups.com.

[prometheus-users] Need a PromQL or Metric to see the HTTP request count or API calls against Prometheus endpoint

Reply via email to