Re: [prometheus-users] Re: Correlation between snmp scrape time and massive rate output for ifHCInOctets

Nick Carlton Fri, 15 Mar 2024 16:24:43 -0700

Thanks Alexander,

I’m not aware you can setup a timeout in the snmp.yml, or at least I’m not
familiar with it?


The name of the interface is a regex string to match only interface names
with a ‘/‘ in them. So on Cisco will only match physical interfaces like
Gig1/0/1 and not stuff like Loopback0 or Port-Channel1 for example. So only
physical interfaces included.

I had to use the visualisation of stacked lines because when it was on line
for some reason all of the lines were in really light colours which meant
you could barely see them on the graph. I can get another screenshot if
needs be? But the value on the left shows the bytes per second the
interfaces showed as getting to.

Thanks
Nick

On Fri, 15 Mar 2024 at 23:01, Alexander Wilke <[email protected]>
wrote:

> Hello,
>
> 1.) is the timeout of 50s the same on prometheus scrape_config and
> snmp.yml file?
> 2.) is this really the name of the interface? ifHCInOctetsIntfName
> 3) the =~".*.\\/.*." maybo shows many interfaces, maybe som internal
> loppback which may count traffic twice? Further it may show PortChannel
> (Po) and then VLAN (Po.xy) and physical interfaces !?
>
> I am not sure but the screenshost show "stacked lines" - is it possible
> that in the first screenshot the throughput of all interfaces was stacked ?
>
>
> Nick Carlton schrieb am Freitag, 15. März 2024 um 23:43:19 UTC+1:
>
>> To clarify, my scrapes for this data run every 1m and have a timeout of
>> 50s
>>
>> On Friday 15 March 2024 at 22:41:52 UTC Nick Carlton wrote:
>>
>>> Hello Everyone,
>>>
>>> I have just seen something weird in my environment where I saw interface
>>> bandwidth on a gigabit switch reach about 1tbps on some of the
>>> interfaces.....
>>>
>>> Here is the query im using:
>>>
>>> rate(ifHCInOctets{ifHCInOctetsIntfName=~".*.\\/.*.",instance="<device-name>"}[2m])
>>> * 8
>>>
>>> Which ive never had a problem with. Here is an image of the graph
>>> showing the massive increase in bandwidth and then decrease back to normal:
>>>
>>> [image: Screenshot 2024-03-15 222353.png]
>>>
>>> When Ive done some more investigation into what could have happened, I
>>> can see that the 'snmp_scrape_duration_seconds' metric increases to around
>>> 20s at the time. So the cisco switch is talking 20 seconds to respond to
>>> the SNMP request.
>>>
>>> [image: Screenshot 2024-03-15 222244.png]
>>>
>>> Im a bit confused as to how this could cause the rate query to give
>>> completely false data? Could the delay in data have caused prometheus to
>>> think there was more bandwidth on the interface? The switch certainly
>>> cannot do the speeds the graph is claiming!
>>>
>>> Im on v0.25.0 on the SNMP exporter and its normally sat around 2s for
>>> the scrapes. Im not blaming the exporter for the high response times, thats
>>> probably the switch. Just wondering if in some way the high response time
>>> could cause the rate query to give incorrect data. The fact the graph went
>>> back to normal post the high reponse times makes me think it wasn't the
>>> switch giving duff data.
>>>
>>> Anyone seen this before and is there any way to mitigate? Happy to
>>> provide more info if required :)
>>>
>>> Thanks
>>> Nick
>>>
>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Prometheus Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/prometheus-users/poGtu50nisA/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/12ddad30-e5f8-4fb1-9869-45095b48b647n%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/12ddad30-e5f8-4fb1-9869-45095b48b647n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAC4WY59eJqqGKKO19_YGXc8r4ixX5t2aWc%3Dmf79CxUVvRPrGWw%40mail.gmail.com.

Re: [prometheus-users] Re: Correlation between snmp scrape time and massive rate output for ifHCInOctets

Reply via email to