Check File Format example. Time Out, retries, max-repetition.
I use Repetition 50 or 100 with Cisco, retries 0 and Time Out 1s or 500ms below Prometheus timeout Ben Kochie schrieb am Samstag, 16. März 2024 um 06:31:17 UTC+1: > This is very likely a problem with counter resets or some other kind of > duplicate data. > > The best way to figure this out is to perform the query, but without the > `rate()` function. > > This can be done via the Prometheus UI (harder to do in Grafana) in the > "Table" view. > > Here is an example demo query > <https://prometheus.demo.do.prometheus.io/graph?g0.expr=process_cpu_seconds_total%7Bjob%3D%22prometheus%22%7D%5B2m%5D&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=1h> > > The results is a list of the raw samples that are needed to debug. > > On Fri, Mar 15, 2024 at 11:41 PM Nick Carlton <[email protected]> > wrote: > >> Hello Everyone, >> >> I have just seen something weird in my environment where I saw interface >> bandwidth on a gigabit switch reach about 1tbps on some of the >> interfaces..... >> >> Here is the query im using: >> >> rate(ifHCInOctets{ifHCInOctetsIntfName=~".*.\\/.*.",instance="<device-name>"}[2m]) >> >> * 8 >> >> Which ive never had a problem with. Here is an image of the graph showing >> the massive increase in bandwidth and then decrease back to normal: >> >> [image: Screenshot 2024-03-15 222353.png] >> >> When Ive done some more investigation into what could have happened, I >> can see that the 'snmp_scrape_duration_seconds' metric increases to around >> 20s at the time. So the cisco switch is talking 20 seconds to respond to >> the SNMP request. >> >> [image: Screenshot 2024-03-15 222244.png] >> >> Im a bit confused as to how this could cause the rate query to give >> completely false data? Could the delay in data have caused prometheus to >> think there was more bandwidth on the interface? The switch certainly >> cannot do the speeds the graph is claiming! >> >> Im on v0.25.0 on the SNMP exporter and its normally sat around 2s for the >> scrapes. Im not blaming the exporter for the high response times, thats >> probably the switch. Just wondering if in some way the high response time >> could cause the rate query to give incorrect data. The fact the graph went >> back to normal post the high reponse times makes me think it wasn't the >> switch giving duff data. >> >> Anyone seen this before and is there any way to mitigate? Happy to >> provide more info if required :) >> >> Thanks >> Nick >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Prometheus Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/prometheus-users/6fd3dca6-2013-47ad-af8f-3344e79954a7n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/prometheus-users/6fd3dca6-2013-47ad-af8f-3344e79954a7n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/5fc62bc2-42e0-46e6-8382-134f71836b9dn%40googlegroups.com.

