Public bug reported: Ubuntu 20.04.5 LTS ubuntu@compute-09:~$ uname -a Linux compute-09 5.4.0-139-generic #156-Ubuntu SMP Fri Jan 20 17:27:18 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@compute-09:~$ sudo update-pciids ubuntu@compute-09:~$ lspci |grep Intel|grep -i Ether 31:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02) 31:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02) ca:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02) ca:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02) The test instance with provider network floating ip 10.40.0.213 cannot reach the provider network gateway openstack server create --key-name ubuntu-keypair --image auto-sync/ubuntu-jammy-22.04-amd64-server-20230210-disk1.img --flavor m1.small --net analyse-private-net ubuntu-analyse ubuntu@compute-05:~$ sudo -E ip netns exec ovnmeta-fcd1b354-6f41-42dc-ae73-87df28856ee5 ssh ubuntu@192.168.100.123 ubuntu@ubuntu-analyse:~$ ping 10.40.0.254 PING 10.40.0.254 (10.40.0.254) 56(84) bytes of data. ^C --- 10.40.0.254 ping statistics --- 419 packets transmitted, 0 received, 100% packet loss, time 428035ms I found the compute from which the outside traffic is going out and I see ARP requests with no response compute-09:~$ sudo tcpdump -vteni bond1 '(vlan 300)' tcpdump: listening on bond1, link-type EN10MB (Ethernet), capture size 262144 bytes fa:16:3e:ab:87:ad > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 300, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.40.0.254 tell 10.40.0.88, length 28 fa:16:3e:ab:87:ad > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 300, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.40.0.254 tell 10.40.0.88, length 28 fa:16:3e:ab:87:ad > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 300, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.40.0.254 tell 10.40.0.88, length 28 For the test you may ping .254 indifenetely The error count grows on tx packets on bond1 and the card ens2f0 (which happens to push the traffic) ubuntu@compute-09:~$ sudo ethtool -S ens2f0|grep error tx_errors: 12 tx_errors.nic: 0 rx_length_errors.nic: 0 rx_crc_errors.nic: 0 ubuntu@compute-09:~$ ifconfig ens2f0 ens2f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000 ether b4:83:51:00:83:d1 txqueuelen 1000 (Ethernet) RX packets 53784 bytes 22064970 (22.0 MB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 52163 bytes 18393142 (18.3 MB) TX errors 12 dropped 0 overruns 0 carrier 0 collisions 0 If I create vlan interface directly on bond1 I can ping the gateway with no problem so that creates opportunity for WORKAROUND 1: set the network to flat and push traffic on vlan interfaces on computes as for physnet device Anonother thing I tried was to install the HWE kernel ubuntu@compute-09:~$ uname -a Linux compute-09 5.15.0-60-generic #66~20.04.1-Ubuntu SMP Wed Jan 25 09:41:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux Fortunately traffic was still going out from compute-09 after reboot, that fixed the issue so we have WORKAROUND 2 ubuntu@ubuntu-analyse:~$ ping 10.40.0.254 PING 10.40.0.254 (10.40.0.254) 56(84) bytes of data. 64 bytes from 10.40.0.254: icmp_seq=1 ttl=63 time=2.15 ms 64 bytes from 10.40.0.254: icmp_seq=2 ttl=63 time=0.896 ms 64 bytes from 10.40.0.254: icmp_seq=3 ttl=63 time=1.12 ms ^C ubuntu@infra-1:~$ ping 10.40.0.213 PING 10.40.0.213 (10.40.0.213) 56(84) bytes of data. 64 bytes from 10.40.0.213: icmp_seq=1 ttl=62 time=5.12 ms 64 bytes from 10.40.0.213: icmp_seq=2 ttl=62 time=2.17 ms 64 bytes from 10.40.0.213: icmp_seq=3 ttl=62 time=0.948 ms 64 bytes from 10.40.0.213: icmp_seq=4 ttl=62 time=1.00 ms 64 bytes from 10.40.0.213: icmp_seq=5 ttl=62 time=0.891 ms 64 bytes from 10.40.0.213: icmp_seq=6 ttl=62 time=1.05 ms Now I can ping both ways However I am afraid that we may encounter same issue like for Jammy for the cards when booting, as it happens randomly for the kernel with the same number 5.15.0-60 Here's the bug I am reffering https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2004262 ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Tags: kernel-bug openstack ovn vlan -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2008781 Title: OVN provider network type vlan packets cannot go outside the bond on Intel E810-XXV card Status in linux package in Ubuntu: New Bug description: Ubuntu 20.04.5 LTS ubuntu@compute-09:~$ uname -a Linux compute-09 5.4.0-139-generic #156-Ubuntu SMP Fri Jan 20 17:27:18 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux ubuntu@compute-09:~$ sudo update-pciids ubuntu@compute-09:~$ lspci |grep Intel|grep -i Ether 31:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02) 31:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02) ca:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02) ca:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02) The test instance with provider network floating ip 10.40.0.213 cannot reach the provider network gateway openstack server create --key-name ubuntu-keypair --image auto-sync/ubuntu-jammy-22.04-amd64-server-20230210-disk1.img --flavor m1.small --net analyse-private-net ubuntu-analyse ubuntu@compute-05:~$ sudo -E ip netns exec ovnmeta-fcd1b354-6f41-42dc-ae73-87df28856ee5 ssh ubuntu@192.168.100.123 ubuntu@ubuntu-analyse:~$ ping 10.40.0.254 PING 10.40.0.254 (10.40.0.254) 56(84) bytes of data. ^C --- 10.40.0.254 ping statistics --- 419 packets transmitted, 0 received, 100% packet loss, time 428035ms I found the compute from which the outside traffic is going out and I see ARP requests with no response compute-09:~$ sudo tcpdump -vteni bond1 '(vlan 300)' tcpdump: listening on bond1, link-type EN10MB (Ethernet), capture size 262144 bytes fa:16:3e:ab:87:ad > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 300, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.40.0.254 tell 10.40.0.88, length 28 fa:16:3e:ab:87:ad > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 300, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.40.0.254 tell 10.40.0.88, length 28 fa:16:3e:ab:87:ad > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 300, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.40.0.254 tell 10.40.0.88, length 28 For the test you may ping .254 indifenetely The error count grows on tx packets on bond1 and the card ens2f0 (which happens to push the traffic) ubuntu@compute-09:~$ sudo ethtool -S ens2f0|grep error tx_errors: 12 tx_errors.nic: 0 rx_length_errors.nic: 0 rx_crc_errors.nic: 0 ubuntu@compute-09:~$ ifconfig ens2f0 ens2f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000 ether b4:83:51:00:83:d1 txqueuelen 1000 (Ethernet) RX packets 53784 bytes 22064970 (22.0 MB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 52163 bytes 18393142 (18.3 MB) TX errors 12 dropped 0 overruns 0 carrier 0 collisions 0 If I create vlan interface directly on bond1 I can ping the gateway with no problem so that creates opportunity for WORKAROUND 1: set the network to flat and push traffic on vlan interfaces on computes as for physnet device Anonother thing I tried was to install the HWE kernel ubuntu@compute-09:~$ uname -a Linux compute-09 5.15.0-60-generic #66~20.04.1-Ubuntu SMP Wed Jan 25 09:41:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux Fortunately traffic was still going out from compute-09 after reboot, that fixed the issue so we have WORKAROUND 2 ubuntu@ubuntu-analyse:~$ ping 10.40.0.254 PING 10.40.0.254 (10.40.0.254) 56(84) bytes of data. 64 bytes from 10.40.0.254: icmp_seq=1 ttl=63 time=2.15 ms 64 bytes from 10.40.0.254: icmp_seq=2 ttl=63 time=0.896 ms 64 bytes from 10.40.0.254: icmp_seq=3 ttl=63 time=1.12 ms ^C ubuntu@infra-1:~$ ping 10.40.0.213 PING 10.40.0.213 (10.40.0.213) 56(84) bytes of data. 64 bytes from 10.40.0.213: icmp_seq=1 ttl=62 time=5.12 ms 64 bytes from 10.40.0.213: icmp_seq=2 ttl=62 time=2.17 ms 64 bytes from 10.40.0.213: icmp_seq=3 ttl=62 time=0.948 ms 64 bytes from 10.40.0.213: icmp_seq=4 ttl=62 time=1.00 ms 64 bytes from 10.40.0.213: icmp_seq=5 ttl=62 time=0.891 ms 64 bytes from 10.40.0.213: icmp_seq=6 ttl=62 time=1.05 ms Now I can ping both ways However I am afraid that we may encounter same issue like for Jammy for the cards when booting, as it happens randomly for the kernel with the same number 5.15.0-60 Here's the bug I am reffering https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2004262 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2008781/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp