This does look resolved on WS2016 and Azure.
However, SR-IOV is broken in another way now on WS2019.

Affected proposed kernels:
- cosmic proposed
- bionic proposed edge - 4.18 based.

Tested this with cosmic linux-azure 4.18.0.1006.7 from proposed, same vhd:
- SR-IOV with Mellanox CX3 works fine on WS2016, all testing has passed.
- SR-IOV with Mellanox CX3/CX4 is broken on WS2019.

These are the relevant log portions showing the issue when the kernel
attempts to load the driver:

dmesg:
[   21.059766] mlx4_core: Mellanox ConnectX core driver v4.0-0
[   21.059775] mlx4_core: Initializing 9488:00:02.0
[   21.191481] mlx4_core 9488:00:02.0: Detected virtual function - running in 
slave mode
[   21.191508] mlx4_core 9488:00:02.0: Sending reset
[   21.191602] mlx4_core 9488:00:02.0: Sending vhcr0
[   21.193338] mlx4_core 9488:00:02.0: HCA minimum page size:512
[   21.193804] mlx4_core 9488:00:02.0: Timestamping is not supported in slave 
mode
[   93.148028] mlx4_core 9488:00:02.0: communication channel command 0x5 
(op=0x31) timed out
[   93.148031] mlx4_core 9488:00:02.0: device is going to be reset
[   93.171917] mlx4_core 9488:00:02.0: VF is sending reset request to Firmware
[   93.172584] mlx4_core 9488:00:02.0: VF Reset succeed
[   93.172585] mlx4_core 9488:00:02.0: device was reset successfully
[   93.195311] mlx4_core 9488:00:02.0: NOP command failed to generate MSI-X 
interrupt IRQ 24)
[   93.195312] mlx4_core 9488:00:02.0: Trying again without MSI-X
[   93.196258] mlx4_core 9488:00:02.0: Failed to close slave function
[   93.196866] mlx4_core: probe of 9488:00:02.0 failed with error -5

----

syslog:

Dec  4 14:35:18 ubuntu kernel: [   21.059766] mlx4_core: Mellanox ConnectX core 
driver v4.0-0
Dec  4 14:35:18 ubuntu kernel: [   21.059775] mlx4_core: Initializing 
9488:00:02.0
Dec  4 14:35:18 ubuntu kernel: [   21.191481] mlx4_core 9488:00:02.0: Detected 
virtual function - running in slave mode
Dec  4 14:35:18 ubuntu kernel: [   21.191508] mlx4_core 9488:00:02.0: Sending 
reset
Dec  4 14:35:18 ubuntu kernel: [   21.191602] mlx4_core 9488:00:02.0: Sending 
vhcr0
Dec  4 14:35:18 ubuntu kernel: [   21.193338] mlx4_core 9488:00:02.0: HCA 
minimum page size:512
Dec  4 14:35:18 ubuntu kernel: [   21.193804] mlx4_core 9488:00:02.0: 
Timestamping is not supported in slave mode
Dec  4 14:35:18 ubuntu kernel: [   93.148028] mlx4_core 9488:00:02.0: 
communication channel command 0x5 (op=0x31) timed out
Dec  4 14:35:18 ubuntu kernel: [   93.148031] mlx4_core 9488:00:02.0: device is 
going to be reset
Dec  4 14:35:18 ubuntu kernel: [   93.171917] mlx4_core 9488:00:02.0: VF is 
sending reset request to Firmware
Dec  4 14:35:18 ubuntu kernel: [   93.172584] mlx4_core 9488:00:02.0: VF Reset 
succeed
Dec  4 14:35:18 ubuntu kernel: [   93.172585] mlx4_core 9488:00:02.0: device 
was reset successfully
Dec  4 14:35:18 ubuntu kernel: [   93.195311] mlx4_core 9488:00:02.0: NOP 
command failed to generate MSI-X interrupt IRQ 24)
Dec  4 14:35:18 ubuntu kernel: [   93.195312] mlx4_core 9488:00:02.0: Trying 
again without MSI-X
Dec  4 14:35:18 ubuntu kernel: [   93.196258] mlx4_core 9488:00:02.0: Failed to 
close slave function
Dec  4 14:35:18 ubuntu kernel: [   93.196866] mlx4_core: probe of 9488:00:02.0 
failed with error -5

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1794477

Title:
  Accelerated networking (SR-IOV VF) broken in 18.10 daily

Status in linux package in Ubuntu:
  Fix Committed
Status in linux-azure package in Ubuntu:
  Fix Committed
Status in linux source package in Cosmic:
  New
Status in linux-azure source package in Cosmic:
  Fix Committed

Bug description:
  While testing Ubuntu 18.10 daily from cloud-images repo, on Azure, we 
discovered that accelerated networking wasn’t working inside the VM.
  No VF shows up inside the VM and lspci didn’t show any Mellanox drivers in 
use.
  We tested the daily build on Hyper-V also, but there the Mellanox VF is 
functional, with the same mlx4 drivers.

  To give more details about this:
  • No mellanox logs are showing up in dmesg or syslog.
  • Modinfo mlx4_core/mlx4_en finds the module, but lsmod doesn’t show it as 
loaded, although Accelerated Networking is enabled for the Azure VM, so this 
should happen transparently.
  • Modprobe  -r mlx4_core && modprobe mlx4_core is giving 0 exit code, but 
nothing really happens. And no Mellanox messages are logged in dmesg/syslog.
  - There are no entries in the logs to show anything about the drivers or 
netvsc/pci-hyperv that might relate to this issue.

  Kernel: 4.18.0-7-generic

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794477/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to