------- Comment From mdr...@us.ibm.com 2016-10-04 18:34 EDT-------
Some observations:

1) QEMU appears to be sending the 'device-removed' event prematurely.
The below output shows that the device's VFIO group FD is still open by
the QEMU process at the time it signals libvirt that the device
unplug/cleanup has completed:

root@ltc-fire1:~# virsh event ltc-fire1-vm3-ubuntu-16.10 --event device-removed 
&& lsof /dev/vfio/7
event 'device-removed' for domain ltc-fire1-vm3-ubuntu-16.10: hostdev0
events received: 1

COMMAND     PID         USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
qemu-syst 31231 libvirt-qemu   26u   CHR  242,0      0t0  750 /dev/vfio/7

2) In response to this event, libvirt issues the following sequence to
rebind the VF:

echo $DEVID >/sys/bus/pci/drivers/vfio-pci/unbind
echo $DEVID >/sys/bus/pci/drivers_probe

3) On the VFIO side, this consistently leads to mlx5_core attempting to
bind to the device while VFIO is still running it's cleanup routines:

[  120.099498] KVM guest htab at c000000f2b000000 (order 26), LPID 1
[  120.208235] pci 0001:01: 0.2: [PE# 005] Setting up window#0 0..3fffffff 
pg=1000
[  138.281730] pci 0001:01: 0.2: [PE# 005] Setting up window#1 
800000000000000..8000001ffffffff pg=10000
[  396.873573] vfio-pci 0001:01:00.2: No device request channel registered, 
blocked until released by user
[  396.873791] pci 0001:01: 0.2: [PE# 005] Removing DMA window #0
[  396.873796] pci 0001:01: 0.2: [PE# 005] Removing DMA window #1
[  396.873908] mlx5_core 0001:01:00.2: enabling device (0000 -> 0002)
[  396.873940] mlx5_core 0001:01:00.2: Using 32-bit DMA via iommu
[  396.874034] mlx5_core 0001:01:00.2: firmware version: 12.17.1010

The full cleanup path should include something like:
[ 4762.425039] pci 0001:01: 0.2: [PE# 005] Removing DMA window #0
[ 4762.425043] pci 0001:01: 0.2: [PE# 005] Removing DMA window #1
[ 4762.432014] pci 0001:01: 0.2: [PE# 005] Setting up window#0 0..7fffffff 
pg=1000
[ 4762.432018] pci 0001:01: 0.2: [PE# 005] Enabling 64-bit DMA bypass

So the driver is attempting to enable the device before the default DMA
windows have been restored

4) The sleep Carol inserted above in VFIO cleanup path seems to avoid
the issue. This suggests that the reprobe doesn't blindly run but
instead waits for a signal of some sort, but that that signaling seems
to happen prematurely without the explicit sleep.

This probably needs to be addressed at multiple levels, a fix in QEMU to
defer the device-deleted event until VFIO has cleanup up the device, and
a fix in VFIO path to avoid crashing the host if someone were to issue
the reprobe manually while the device is still in use.

A possible workaround that might be worth trying in the meantime is
specifying managed='no' in the device XML, which according to libvirt
documentation would prevent libvirt from automatically rebinding the
device back to default in the host after unplug. But I saw mention that
maybe this wasn't supported yet for KVM, so it's not a given.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1630304

Title:
  Ubuntu 16.10 KVM: Issue doing hotplug detach to SRIOV VF

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630304/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to