Public bug reported:

Ubuntu 20.04.5 LTS
ubuntu@compute-09:~$ uname -a
Linux compute-09 5.4.0-139-generic #156-Ubuntu SMP Fri Jan 20 17:27:18 UTC 2023 
x86_64 x86_64 x86_64 GNU/Linux

ubuntu@compute-09:~$ sudo update-pciids
ubuntu@compute-09:~$ lspci |grep Intel|grep -i Ether
31:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for 
SFP (rev 02)
31:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for 
SFP (rev 02)
ca:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for 
SFP (rev 02)
ca:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for 
SFP (rev 02)


The test instance with provider network floating ip 10.40.0.213 cannot reach 
the provider network gateway
openstack server create --key-name ubuntu-keypair --image 
auto-sync/ubuntu-jammy-22.04-amd64-server-20230210-disk1.img --flavor m1.small 
--net analyse-private-net ubuntu-analyse

ubuntu@compute-05:~$ sudo -E ip netns exec 
ovnmeta-fcd1b354-6f41-42dc-ae73-87df28856ee5 ssh ubuntu@192.168.100.123
ubuntu@ubuntu-analyse:~$ ping 10.40.0.254
PING 10.40.0.254 (10.40.0.254) 56(84) bytes of data.
^C
--- 10.40.0.254 ping statistics ---
419 packets transmitted, 0 received, 100% packet loss, time 428035ms


I found the compute from which the outside traffic is going out
and I see ARP requests with no response
compute-09:~$ sudo tcpdump -vteni bond1 '(vlan 300)'
tcpdump: listening on bond1, link-type EN10MB (Ethernet), capture size 262144 
bytes fa:16:3e:ab:87:ad > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 
46: vlan 300, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request 
who-has 10.40.0.254 tell 10.40.0.88, length 28
fa:16:3e:ab:87:ad > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: 
vlan 300, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
10.40.0.254 tell 10.40.0.88, length 28
fa:16:3e:ab:87:ad > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: 
vlan 300, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
10.40.0.254 tell 10.40.0.88, length 28
For the test you may ping .254 indifenetely


The error count grows on tx packets on bond1 and the card ens2f0 (which happens 
to push the traffic)
ubuntu@compute-09:~$ sudo ethtool -S ens2f0|grep error
     tx_errors: 12
     tx_errors.nic: 0
     rx_length_errors.nic: 0
     rx_crc_errors.nic: 0
ubuntu@compute-09:~$ ifconfig ens2f0
ens2f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000
        ether b4:83:51:00:83:d1  txqueuelen 1000  (Ethernet)
        RX packets 53784  bytes 22064970 (22.0 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 52163  bytes 18393142 (18.3 MB)
        TX errors 12  dropped 0 overruns 0  carrier 0  collisions 0

If I create vlan interface directly on bond1 I can ping the gateway with no 
problem
so that creates opportunity for 
WORKAROUND 1: set the network to flat and push traffic on vlan interfaces on 
computes as for physnet device


Anonother thing I tried was to install the HWE kernel

ubuntu@compute-09:~$ uname -a
Linux compute-09 5.15.0-60-generic #66~20.04.1-Ubuntu SMP Wed Jan 25 09:41:30 
UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Fortunately traffic was still going out from compute-09 after reboot, 
that fixed the issue
so we have WORKAROUND 2
ubuntu@ubuntu-analyse:~$ ping 10.40.0.254
PING 10.40.0.254 (10.40.0.254) 56(84) bytes of data.
64 bytes from 10.40.0.254: icmp_seq=1 ttl=63 time=2.15 ms
64 bytes from 10.40.0.254: icmp_seq=2 ttl=63 time=0.896 ms
64 bytes from 10.40.0.254: icmp_seq=3 ttl=63 time=1.12 ms
^C
ubuntu@infra-1:~$ ping 10.40.0.213
PING 10.40.0.213 (10.40.0.213) 56(84) bytes of data.
64 bytes from 10.40.0.213: icmp_seq=1 ttl=62 time=5.12 ms
64 bytes from 10.40.0.213: icmp_seq=2 ttl=62 time=2.17 ms
64 bytes from 10.40.0.213: icmp_seq=3 ttl=62 time=0.948 ms
64 bytes from 10.40.0.213: icmp_seq=4 ttl=62 time=1.00 ms
64 bytes from 10.40.0.213: icmp_seq=5 ttl=62 time=0.891 ms
64 bytes from 10.40.0.213: icmp_seq=6 ttl=62 time=1.05 ms

Now I can ping both ways

However I am afraid that we may encounter same issue like for Jammy for the 
cards when booting, as it happens randomly for the kernel with the same number 
5.15.0-60
Here's the bug I am reffering
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2004262

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: kernel-bug openstack ovn vlan

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2008781

Title:
  OVN provider network type vlan packets cannot go outside the bond on
  Intel E810-XXV card

Status in linux package in Ubuntu:
  New

Bug description:
  Ubuntu 20.04.5 LTS
  ubuntu@compute-09:~$ uname -a
  Linux compute-09 5.4.0-139-generic #156-Ubuntu SMP Fri Jan 20 17:27:18 UTC 
2023 x86_64 x86_64 x86_64 GNU/Linux

  ubuntu@compute-09:~$ sudo update-pciids
  ubuntu@compute-09:~$ lspci |grep Intel|grep -i Ether
  31:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV 
for SFP (rev 02)
  31:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV 
for SFP (rev 02)
  ca:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV 
for SFP (rev 02)
  ca:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV 
for SFP (rev 02)

  
  The test instance with provider network floating ip 10.40.0.213 cannot reach 
the provider network gateway
  openstack server create --key-name ubuntu-keypair --image 
auto-sync/ubuntu-jammy-22.04-amd64-server-20230210-disk1.img --flavor m1.small 
--net analyse-private-net ubuntu-analyse

  ubuntu@compute-05:~$ sudo -E ip netns exec 
ovnmeta-fcd1b354-6f41-42dc-ae73-87df28856ee5 ssh ubuntu@192.168.100.123
  ubuntu@ubuntu-analyse:~$ ping 10.40.0.254
  PING 10.40.0.254 (10.40.0.254) 56(84) bytes of data.
  ^C
  --- 10.40.0.254 ping statistics ---
  419 packets transmitted, 0 received, 100% packet loss, time 428035ms

  
  I found the compute from which the outside traffic is going out
  and I see ARP requests with no response
  compute-09:~$ sudo tcpdump -vteni bond1 '(vlan 300)'
  tcpdump: listening on bond1, link-type EN10MB (Ethernet), capture size 262144 
bytes fa:16:3e:ab:87:ad > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 
46: vlan 300, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request 
who-has 10.40.0.254 tell 10.40.0.88, length 28
  fa:16:3e:ab:87:ad > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: 
vlan 300, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
10.40.0.254 tell 10.40.0.88, length 28
  fa:16:3e:ab:87:ad > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: 
vlan 300, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
10.40.0.254 tell 10.40.0.88, length 28
  For the test you may ping .254 indifenetely

  
  The error count grows on tx packets on bond1 and the card ens2f0 (which 
happens to push the traffic)
  ubuntu@compute-09:~$ sudo ethtool -S ens2f0|grep error
       tx_errors: 12
       tx_errors.nic: 0
       rx_length_errors.nic: 0
       rx_crc_errors.nic: 0
  ubuntu@compute-09:~$ ifconfig ens2f0
  ens2f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000
          ether b4:83:51:00:83:d1  txqueuelen 1000  (Ethernet)
          RX packets 53784  bytes 22064970 (22.0 MB)
          RX errors 0  dropped 0  overruns 0  frame 0
          TX packets 52163  bytes 18393142 (18.3 MB)
          TX errors 12  dropped 0 overruns 0  carrier 0  collisions 0

  If I create vlan interface directly on bond1 I can ping the gateway with no 
problem
  so that creates opportunity for 
  WORKAROUND 1: set the network to flat and push traffic on vlan interfaces on 
computes as for physnet device


  Anonother thing I tried was to install the HWE kernel

  ubuntu@compute-09:~$ uname -a
  Linux compute-09 5.15.0-60-generic #66~20.04.1-Ubuntu SMP Wed Jan 25 09:41:30 
UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

  Fortunately traffic was still going out from compute-09 after reboot, 
  that fixed the issue
  so we have WORKAROUND 2
  ubuntu@ubuntu-analyse:~$ ping 10.40.0.254
  PING 10.40.0.254 (10.40.0.254) 56(84) bytes of data.
  64 bytes from 10.40.0.254: icmp_seq=1 ttl=63 time=2.15 ms
  64 bytes from 10.40.0.254: icmp_seq=2 ttl=63 time=0.896 ms
  64 bytes from 10.40.0.254: icmp_seq=3 ttl=63 time=1.12 ms
  ^C
  ubuntu@infra-1:~$ ping 10.40.0.213
  PING 10.40.0.213 (10.40.0.213) 56(84) bytes of data.
  64 bytes from 10.40.0.213: icmp_seq=1 ttl=62 time=5.12 ms
  64 bytes from 10.40.0.213: icmp_seq=2 ttl=62 time=2.17 ms
  64 bytes from 10.40.0.213: icmp_seq=3 ttl=62 time=0.948 ms
  64 bytes from 10.40.0.213: icmp_seq=4 ttl=62 time=1.00 ms
  64 bytes from 10.40.0.213: icmp_seq=5 ttl=62 time=0.891 ms
  64 bytes from 10.40.0.213: icmp_seq=6 ttl=62 time=1.05 ms

  Now I can ping both ways

  However I am afraid that we may encounter same issue like for Jammy for the 
cards when booting, as it happens randomly for the kernel with the same number 
5.15.0-60
  Here's the bug I am reffering
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2004262

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2008781/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to