[prometheus-users] Re: Correlation between snmp scrape time and massive rate output for ifHCInOctets

Alexander Wilke Fri, 15 Mar 2024 16:01:02 -0700

Hello,

1.) is the timeout of 50s the same on prometheus scrape_config and snmp.yml 
file?
2.) is this really the name of the interface? ifHCInOctetsIntfName
3) the =~".*.\\/.*." maybo shows many interfaces, maybe som internal 
loppback which may count traffic twice? Further it may show PortChannel 
(Po) and then VLAN (Po.xy) and physical interfaces !?


I am not sure but the screenshost show "stacked lines" - is it possible 
that in the first screenshot the throughput of all interfaces was stacked ?


Nick Carlton schrieb am Freitag, 15. März 2024 um 23:43:19 UTC+1:

> To clarify, my scrapes for this data run every 1m and have a timeout of 50s
>
> On Friday 15 March 2024 at 22:41:52 UTC Nick Carlton wrote:
>
>> Hello Everyone,
>>
>> I have just seen something weird in my environment where I saw interface 
>> bandwidth on a gigabit switch reach about 1tbps on some of the 
>> interfaces.....
>>
>> Here is the query im using:
>>
>> rate(ifHCInOctets{ifHCInOctetsIntfName=~".*.\\/.*.",instance="<device-name>"}[2m])
>>  
>> * 8
>>
>> Which ive never had a problem with. Here is an image of the graph showing 
>> the massive increase in bandwidth and then decrease back to normal:
>>
>> [image: Screenshot 2024-03-15 222353.png]
>>
>> When Ive done some more investigation into what could have happened, I 
>> can see that the 'snmp_scrape_duration_seconds' metric increases to around 
>> 20s at the time. So the cisco switch is talking 20 seconds to respond to 
>> the SNMP request.
>>
>> [image: Screenshot 2024-03-15 222244.png]
>>
>> Im a bit confused as to how this could cause the rate query to give 
>> completely false data? Could the delay in data have caused prometheus to 
>> think there was more bandwidth on the interface? The switch certainly 
>> cannot do the speeds the graph is claiming!
>>
>> Im on v0.25.0 on the SNMP exporter and its normally sat around 2s for the 
>> scrapes. Im not blaming the exporter for the high response times, thats 
>> probably the switch. Just wondering if in some way the high response time 
>> could cause the rate query to give incorrect data. The fact the graph went 
>> back to normal post the high reponse times makes me think it wasn't the 
>> switch giving duff data.
>>
>> Anyone seen this before and is there any way to mitigate? Happy to 
>> provide more info if required :)
>>
>> Thanks
>> Nick
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/12ddad30-e5f8-4fb1-9869-45095b48b647n%40googlegroups.com.

[prometheus-users] Re: Correlation between snmp scrape time and massive rate output for ifHCInOctets

Reply via email to