Re: [prometheus-users] Correlation between snmp scrape time and massive rate output for ifHCInOctets

Alexander Wilke Sat, 16 Mar 2024 01:08:51 -0700

Check File Format example.

Time Out, retries, max-repetition.


I use Repetition 50 or 100 with Cisco, retries 0 and Time Out 1s or 500ms 
below Prometheus timeout

Ben Kochie schrieb am Samstag, 16. März 2024 um 06:31:17 UTC+1:

> This is very likely a problem with counter resets or some other kind of 
> duplicate data.
>
> The best way to figure this out is to perform the query, but without the 
> `rate()` function.
>
> This can be done via the Prometheus UI (harder to do in Grafana) in the 
> "Table" view.
>
> Here is an example demo query 
> <https://prometheus.demo.do.prometheus.io/graph?g0.expr=process_cpu_seconds_total%7Bjob%3D%22prometheus%22%7D%5B2m%5D&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=1h>
>
> The results is a list of the raw samples that are needed to debug.
>
> On Fri, Mar 15, 2024 at 11:41 PM Nick Carlton <[email protected]> 
> wrote:
>
>> Hello Everyone,
>>
>> I have just seen something weird in my environment where I saw interface 
>> bandwidth on a gigabit switch reach about 1tbps on some of the 
>> interfaces.....
>>
>> Here is the query im using:
>>
>> rate(ifHCInOctets{ifHCInOctetsIntfName=~".*.\\/.*.",instance="<device-name>"}[2m])
>>  
>> * 8
>>
>> Which ive never had a problem with. Here is an image of the graph showing 
>> the massive increase in bandwidth and then decrease back to normal:
>>
>> [image: Screenshot 2024-03-15 222353.png]
>>
>> When Ive done some more investigation into what could have happened, I 
>> can see that the 'snmp_scrape_duration_seconds' metric increases to around 
>> 20s at the time. So the cisco switch is talking 20 seconds to respond to 
>> the SNMP request.
>>
>> [image: Screenshot 2024-03-15 222244.png]
>>
>> Im a bit confused as to how this could cause the rate query to give 
>> completely false data? Could the delay in data have caused prometheus to 
>> think there was more bandwidth on the interface? The switch certainly 
>> cannot do the speeds the graph is claiming!
>>
>> Im on v0.25.0 on the SNMP exporter and its normally sat around 2s for the 
>> scrapes. Im not blaming the exporter for the high response times, thats 
>> probably the switch. Just wondering if in some way the high response time 
>> could cause the rate query to give incorrect data. The fact the graph went 
>> back to normal post the high reponse times makes me think it wasn't the 
>> switch giving duff data.
>>
>> Anyone seen this before and is there any way to mitigate? Happy to 
>> provide more info if required :)
>>
>> Thanks
>> Nick
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/6fd3dca6-2013-47ad-af8f-3344e79954a7n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/6fd3dca6-2013-47ad-af8f-3344e79954a7n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/5fc62bc2-42e0-46e6-8382-134f71836b9dn%40googlegroups.com.

Re: [prometheus-users] Correlation between snmp scrape time and massive rate output for ifHCInOctets

Reply via email to