I've re-executed the test plan on Mellanox ConnectX-6 Dx (MT2892). As
there seems to be some issue with this hardware generation (see bug
#2020409 comment #11++). And indeed, I seem to be able to reproduce that
failure, the devices are not set to "switchdev" mode and the VF-LAG is
not activated:

ubuntu@romano:~$ sudo lshw -c network -businfo
Bus info          Device          Class          Description
============================================================
pci@0000:21:00.0  ens13f0np0      network        BCM57416 NetXtreme-E 
Dual-Media 10G RDMA Ethernet Controller
pci@0000:21:00.1  ens13f1np1      network        BCM57416 NetXtreme-E 
Dual-Media 10G RDMA Ethernet Controller
pci@0000:61:00.0  ens7f0          network        MT2892 Family [ConnectX-6 Dx]
pci@0000:61:00.1  ens7f1          network        MT2892 Family [ConnectX-6 Dx]
ubuntu@romano:~$ sudo devlink dev eswitch show pci/0000:61:00.0
kernel answers: Operation not supported
ubuntu@romano:~$ sudo devlink dev eswitch show pci/0000:61:00.1
kernel answers: Operation not supported
ubuntu@romano:~$ sudo apt-get install --install-recommends 
linux-generic-hwe-22.04
# reboot
ubuntu@romano:~$ uname -a
Linux romano 6.8.0-52-generic #53~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 15 
19:18:46 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@romano:~$ sudo apt install -t jammy-proposed netplan.io
ubuntu@romano:~$ apt list *netplan*
Listing... Done
libnetplan-dev/jammy-proposed 0.107.1-3ubuntu0.22.04.2 amd64
libnetplan0/jammy-proposed,now 0.107.1-3ubuntu0.22.04.2 amd64 
[installed,automatic]
netplan-generator/jammy-proposed,now 0.107.1-3ubuntu0.22.04.2 amd64 
[installed,automatic]
netplan.io/jammy-proposed,now 0.107.1-3ubuntu0.22.04.2 amd64 [installed]
python3-netplan/jammy-proposed,now 0.107.1-3ubuntu0.22.04.2 amd64 
[installed,automatic]
ubuntu@romano:~$ sudo cat /sys/kernel/debug/mlx5/0000:61:00.0/lag/state
disabled
ubuntu@romano:~$ sudo cat /sys/kernel/debug/mlx5/0000:61:00.1/lag/state
disabled
ubuntu@romano:~$ sudo netplan get
network:
  version: 2
  ethernets:
    ens13f0np0:
      match:
        macaddress: "84:16:0c:3d:63:ce"
      addresses:
      - "10.241.7.26/24"
      nameservers:
        addresses:
        - 10.239.8.12
        - 10.239.8.13
        - 10.239.8.11
        - 10.176.2.4
        - 10.176.2.2
        - 10.176.2.3
        search:
        - maas
        - dh1-j8-1.tor3-sqa-shared-maas.solutionsqa
        - dh1-j8-2.tor3-sqa-shared-maas.solutionsqa
        - dh1-j9-1.tor3-sqa-shared-maas.solutionsqa
        - dh1-j9-2.tor3-sqa-shared-maas.solutionsqa
      gateway4: 10.241.7.1
      set-name: "ens13f0np0"
      mtu: 1500
    ens13f1np1:
      match:
        macaddress: "84:16:0c:3d:63:cf"
      set-name: "ens13f1np1"
      mtu: 1500
    ens7f0:
      match:
        macaddress: "b8:3f:d2:2d:68:7e"
      optional: true
      set-name: "ens7f0"
      mtu: 1500
      virtual-function-count: 8
      embedded-switch-mode: "switchdev"
      delay-virtual-functions-rebind: true
    ens7f1:
      match:
        macaddress: "b8:3f:d2:2d:68:7f"
      set-name: "ens7f1"
      mtu: 1500
      virtual-function-count: 8
      embedded-switch-mode: "switchdev"
      delay-virtual-functions-rebind: true
  bonds:
    bond0:
      interfaces:
      - ens7f0
      - ens7f1
      parameters:
        mode: "active-backup"
# reboot


## FAILURE

ubuntu@romano:~$ sudo lshw -c network -businfo
Bus info          Device          Class          Description
============================================================
pci@0000:21:00.0  ens13f0np0      network        BCM57416 NetXtreme-E 
Dual-Media 10G RDMA Ethernet Controller
pci@0000:21:00.1  ens13f1np1      network        BCM57416 NetXtreme-E 
Dual-Media 10G RDMA Ethernet Controller
pci@0000:61:00.0  ens7f0          network        MT2892 Family [ConnectX-6 Dx]
pci@0000:61:00.1  ens7f1          network        MT2892 Family [ConnectX-6 Dx]
pci@0000:61:00.2  ens7f0v0        network        ConnectX Family mlx5Gen 
Virtual Function
pci@0000:61:00.3  ens7f0v1        network        ConnectX Family mlx5Gen 
Virtual Function
pci@0000:61:00.4  ens7f0v2        network        ConnectX Family mlx5Gen 
Virtual Function
pci@0000:61:00.5  ens7f0v3        network        ConnectX Family mlx5Gen 
Virtual Function
pci@0000:61:00.6  ens7f0v4        network        ConnectX Family mlx5Gen 
Virtual Function
pci@0000:61:00.7  ens7f0v5        network        ConnectX Family mlx5Gen 
Virtual Function
pci@0000:61:01.0  ens7f0v6        network        ConnectX Family mlx5Gen 
Virtual Function
pci@0000:61:01.1  ens7f0v7        network        ConnectX Family mlx5Gen 
Virtual Function

ubuntu@romano:~$ sudo cat /sys/kernel/debug/mlx5/0000:61:00.0/lag/state
disabled
ubuntu@romano:~$ sudo cat /sys/kernel/debug/mlx5/0000:61:00.1/lag/state
disabled

ubuntu@romano:~$ sudo devlink dev eswitch show pci/0000:61:00.0
pci/0000:61:00.0: mode legacy inline-mode none encap-mode basic
ubuntu@romano:~$ sudo devlink dev eswitch show pci/0000:61:00.1
pci/0000:61:00.1: mode legacy inline-mode none encap-mode basic

** Tags added: block-proposed-jammy

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1988018

Title:
  [SRU][mlx5] Intermittent VF-LAG activation failure

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1988018/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to