On 12/12/2016 16:55, Joe Holden wrote:
On 12/12/2016 10:27, Martin Pieuchot wrote:
On 11/12/16(Sun) 00:50, Joe Holden wrote:
On 10/12/2016 08:43, Mihai Popescu wrote:
seeing some bizarre behaviour on one box, on one specific interface:

Hello,

This looks like some stupid TV game, where contesters are given some
clues from time to time and they have to guess what is the real shit.

Do post your FULL dmesg and configurations for network if you really
want someone to even think at your issue. Isn't that obvious?

Bye!


Appreciate the useless response (but still better than nothing!), the
affected box has since been reverted to older snapshot and thus no more
debugging can be done - someone else will have to do it.

I'd appreciate to see the output of 'netstat -rnf inet' when it is
relevant.  Without that information it's hard to understand.

But there's a bug somewhere, it has to be fixed.

Not that dmesg is even relevant since it is a userland bug not a kernel
problem but anyway:

It's a kernel problem.

I'll see if I can recreate it but I'm not holding my breath - it only
breaks once BGP loaded the table which leads me to thing it is actually
bgpd that is updating the llinfo with bogus info and even though I have
a feed in my lab it doesn't do the same thing.

Ok so, inadvertantly recreated this (pretty much exactly the same) issue on a lab/test setup:

For the purposes of debug, ignore the fact that the interfaces are tap interfaces, they're still emulated ethernet...

Wall of text incoming, various info...

box#1:

tap1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr fe:e1:ba:d1:be:f3
        index 7 priority 0 llprio 3
        groups: tap
        status: active
        inet 172.20.230.72 netmask 0xfffffffe

box#2:

tap1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr fe:e1:ba:d1:cf:92
        index 7 priority 0 llprio 3
        groups: tap
        status: active
        inet 172.20.230.73 netmask 0xfffffffe

All is fine after starting ospfd, but as soon as I start bgpd, box#2 shows the following:

Host Ethernet Address Netif Expire Flags
172.20.230.72                        00:00:00:00:20:12       ? 12m30s

# route -n get 172.20.230.72
   route to: 172.20.230.72
destination: 172.20.230.72
       mask: 255.255.255.255
  interface: tap1
 if address: 172.20.230.73
   priority: 3 ()
      flags: <UP,HOST,DONE,LLINFO,CLONED,CACHED>
     use       mtu    expire
      20         0       702

flags destination          gateway          lpref   med aspath origin
IS*>  172.20.230.72/31     172.20.230.64      200     0 i

.64 is the loopback on one of its connected boxes that doesn't have broken entries

tcpdump looks ok, afterwards:

19:14:23.723876 arp who-has 172.20.230.72 tell 172.20.230.73
19:14:23.901883 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3
19:14:24.022948 arp who-has 172.20.230.72 tell 172.20.230.73
19:14:24.201095 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3

but the correct entry is never installed, after I delete the broken arp entry it never readds a new one.

This only happens with redist connected as far as I can tell, but bgpd probably shouldn't be able to mangle arp entries and prevent the correct one being added.

If someone thinks they can diag/fix it then hit me up off-list and I can fire over ssh details.

Thanks

Reply via email to