Re: Bizarre arp entry corruption

Joe Holden Tue, 07 Mar 2017 11:38:42 -0800

On 12/12/2016 16:55, Joe Holden wrote:

On 12/12/2016 10:27, Martin Pieuchot wrote:

On 11/12/16(Sun) 00:50, Joe Holden wrote:

On 10/12/2016 08:43, Mihai Popescu wrote:

seeing some bizarre behaviour on one box, on one specific interface:


Hello,

This looks like some stupid TV game, where contesters are given some
clues from time to time and they have to guess what is the real shit.

Do post your FULL dmesg and configurations for network if you really
want someone to even think at your issue. Isn't that obvious?

Bye!


Appreciate the useless response (but still better than nothing!), the
affected box has since been reverted to older snapshot and thus no more
debugging can be done - someone else will have to do it.


I'd appreciate to see the output of 'netstat -rnf inet' when it is
relevant.  Without that information it's hard to understand.

But there's a bug somewhere, it has to be fixed.

Not that dmesg is even relevant since it is a userland bug not a kernel
problem but anyway:


It's a kernel problem.

I'll see if I can recreate it but I'm not holding my breath - it only
breaks once BGP loaded the table which leads me to thing it is actually
bgpd that is updating the llinfo with bogus info and even though I have
a feed in my lab it doesn't do the same thing.

Ok so, inadvertantly recreated this (pretty much exactly the same) issueon a lab/test setup:

For the purposes of debug, ignore the fact that the interfaces are tapinterfaces, they're still emulated ethernet...


Wall of text incoming, various info...

box#1:

tap1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr fe:e1:ba:d1:be:f3
        index 7 priority 0 llprio 3
        groups: tap
        status: active
        inet 172.20.230.72 netmask 0xfffffffe

box#2:

tap1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr fe:e1:ba:d1:cf:92
        index 7 priority 0 llprio 3
        groups: tap
        status: active
        inet 172.20.230.73 netmask 0xfffffffe

All is fine after starting ospfd, but as soon as I start bgpd, box#2shows the following:

Host Ethernet Address Netif ExpireFlags

172.20.230.72                        00:00:00:00:20:12       ? 12m30s

# route -n get 172.20.230.72
   route to: 172.20.230.72
destination: 172.20.230.72
       mask: 255.255.255.255
  interface: tap1
 if address: 172.20.230.73
   priority: 3 ()
      flags: <UP,HOST,DONE,LLINFO,CLONED,CACHED>
     use       mtu    expire
      20         0       702

flags destination          gateway          lpref   med aspath origin
IS*>  172.20.230.72/31     172.20.230.64      200     0 i

.64 is the loopback on one of its connected boxes that doesn't havebroken entries


tcpdump looks ok, afterwards:

19:14:23.723876 arp who-has 172.20.230.72 tell 172.20.230.73
19:14:23.901883 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3
19:14:24.022948 arp who-has 172.20.230.72 tell 172.20.230.73
19:14:24.201095 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3

but the correct entry is never installed, after I delete the broken arpentry it never readds a new one.

This only happens with redist connected as far as I can tell, but bgpdprobably shouldn't be able to mangle arp entries and prevent the correctone being added.

If someone thinks they can diag/fix it then hit me up off-list and I canfire over ssh details.


Thanks

Re: Bizarre arp entry corruption

Reply via email to