Re: [prometheus-users] Correlation between snmp scrape time and massive rate output for ifHCInOctets

Alexander Wilke Sat, 16 Mar 2024 01:10:23 -0700

https://github.com/prometheus/snmp_exporter/tree/main/generator


Alexander Wilke schrieb am Samstag, 16. März 2024 um 09:08:44 UTC+1:

> Check File Format example.
>
> Time Out, retries, max-repetition.
>
> I use Repetition 50 or 100 with Cisco, retries 0 and Time Out 1s or 500ms 
> below Prometheus timeout
>
> Ben Kochie schrieb am Samstag, 16. März 2024 um 06:31:17 UTC+1:
>
>> This is very likely a problem with counter resets or some other kind of 
>> duplicate data.
>>
>> The best way to figure this out is to perform the query, but without the 
>> `rate()` function.
>>
>> This can be done via the Prometheus UI (harder to do in Grafana) in the 
>> "Table" view.
>>
>> Here is an example demo query 
>> <https://prometheus.demo.do.prometheus.io/graph?g0.expr=process_cpu_seconds_total%7Bjob%3D%22prometheus%22%7D%5B2m%5D&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=1h>
>>
>> The results is a list of the raw samples that are needed to debug.
>>
>> On Fri, Mar 15, 2024 at 11:41 PM Nick Carlton <[email protected]> 
>> wrote:
>>
>>> Hello Everyone,
>>>
>>> I have just seen something weird in my environment where I saw interface 
>>> bandwidth on a gigabit switch reach about 1tbps on some of the 
>>> interfaces.....
>>>
>>> Here is the query im using:
>>>
>>> rate(ifHCInOctets{ifHCInOctetsIntfName=~".*.\\/.*.",instance="<device-name>"}[2m])
>>>  
>>> * 8
>>>
>>> Which ive never had a problem with. Here is an image of the graph 
>>> showing the massive increase in bandwidth and then decrease back to normal:
>>>
>>> [image: Screenshot 2024-03-15 222353.png]
>>>
>>> When Ive done some more investigation into what could have happened, I 
>>> can see that the 'snmp_scrape_duration_seconds' metric increases to around 
>>> 20s at the time. So the cisco switch is talking 20 seconds to respond to 
>>> the SNMP request.
>>>
>>> [image: Screenshot 2024-03-15 222244.png]
>>>
>>> Im a bit confused as to how this could cause the rate query to give 
>>> completely false data? Could the delay in data have caused prometheus to 
>>> think there was more bandwidth on the interface? The switch certainly 
>>> cannot do the speeds the graph is claiming!
>>>
>>> Im on v0.25.0 on the SNMP exporter and its normally sat around 2s for 
>>> the scrapes. Im not blaming the exporter for the high response times, thats 
>>> probably the switch. Just wondering if in some way the high response time 
>>> could cause the rate query to give incorrect data. The fact the graph went 
>>> back to normal post the high reponse times makes me think it wasn't the 
>>> switch giving duff data.
>>>
>>> Anyone seen this before and is there any way to mitigate? Happy to 
>>> provide more info if required :)
>>>
>>> Thanks
>>> Nick
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/prometheus-users/6fd3dca6-2013-47ad-af8f-3344e79954a7n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/prometheus-users/6fd3dca6-2013-47ad-af8f-3344e79954a7n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/4cd1e6b8-fa73-4ee0-92c0-c504c161870bn%40googlegroups.com.

Re: [prometheus-users] Correlation between snmp scrape time and massive rate output for ifHCInOctets

Reply via email to