Package: iputils-ping
Version: 3:20221126-1
Severity: normal

Hi,

After upgrading our monitoring host from bullseye to bookworm, our
check_ping plugin suddenly reports that hosts that have been down for
months are up again for a few minutes.

I've looked into this and it seems we are running so many ping checks,
that the randomly selected id from a ping to host A sometimes matches
to the id of the ping to host B. Since all icmp responses are forwarded
to both ping processes on the system. I've looked at the code between
the ping command from bullseye and bookworm and it seems that the code
from bookworm accounts for these wrong packets: it shows the response
but marks it as "DIFFERENT ADDRESS!".

This is correct, but these packets are then accounted for in the number
of responses received and this makes it seem that a host is up when it
isn't:

root@iridium:~# ping 10.89.22.23 -v                                             
                      
ping: sock4.fd: 3 (socktype: SOCK_RAW), sock6.fd: 4 (socktype: SOCK_RAW), 
hints.ai_family: AF_UNSPEC  
                                                                                
                      
ai->ai_family: AF_INET, ai->ai_canonname: '10.89.22.23'                         
                      
PING 10.89.22.23 (10.89.22.23) 56(84) bytes of data.                            
                      
64 bytes from 10.89.22.179: icmp_seq=1 ident=47195 ttl=252 time=1.04 ms 
(DIFFERENT ADDRESS!)          
64 bytes from 10.89.22.179: icmp_seq=2 ident=47195 ttl=252 time=5.79 ms 
(DIFFERENT ADDRESS!)          
64 bytes from 10.89.22.179: icmp_seq=3 ident=47195 ttl=252 time=69.8 ms 
(DIFFERENT ADDRESS!)          
64 bytes from 10.89.22.179: icmp_seq=4 ident=47195 ttl=252 time=0.988 ms 
(DIFFERENT ADDRESS!)         
64 bytes from 10.89.22.179: icmp_seq=5 ident=47195 ttl=252 time=0.975 ms 
(DIFFERENT ADDRESS!)
^C                                                                              
                      
--- 10.89.22.23 ping statistics ---                
1022 packets transmitted, 5 received, 99.5108% packet loss, time 1045246ms      
                      
rtt min/avg/max/mdev = 0.975/15.720/69.808/27.107 ms, pipe 994

Since 10.89.22.23 (the one we are pinging) is down, there are no responses from 
this system.
But because the ident (47195 in the example above) matches a different ping to 
10.89.22.179,
the responses are also parsed by this ping. It correctly shows that the 
response came from
a different address, but still adds the 5 packets as 5 valid received packets, 
which makes
the packet loss 99.5% instead of 100%.

Those invalid packets should not count as valid responses as they are not from 
the correct
host.

I've compared the source code of ping between the bullseye and bookworm 
version, and the
bullseye version discarded the packets if this occured.


This change results in hosts that are actually down being reported as up in our 
monitoring
system.

Regards,
Rik


-- System Information:
Debian Release: 12.0
  APT prefers stable-security
  APT policy: (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.1.0-9-amd64 (SMP w/2 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages iputils-ping depends on:
ii  libc6        2.36-9
ii  libcap2      1:2.66-4
ii  libcap2-bin  1:2.66-4
ii  libidn2-0    2.3.3-1+b1

iputils-ping recommends no packages.

iputils-ping suggests no packages.

-- no debconf information

Reply via email to