** Description changed: + [Impact] + + Attempts to hotplug devices shared to userspace (qemu) via vfio triggers + a deadlock in the kernel. A reboot is required to resolve this. + + [Test Case] + + Set up a KVM instance with attached devices, attempt to hotplug those + using ipmitool. + + [Regression Potential] + + The change is to an uncommonly used driver. There is common code + changes, but these are a noop in the normal case and should be easy to + confirm basic operation. + + [Other Info] + + This fix has been verified by the reporter as fixing the deadlock. + + === + We are seeing deadlocks during hotplug of devices under vfio. - As per the Linux kernel source code, there is a deadlock situation between vfio_pci_remove() and vfio_pci_release() on PCIe hotplug events. This issue can be avoided either by skipping the PCIe reset functionality or do device_unlock() in vfio_pci_remove() beforfe calling the function vfio_del_group_dev()(). Code flow on PCIe hotplug event: Execution flow 1: - device_release_driver() ( ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L935 ) - device_release_driver_internal() ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L908 ) - device_lock(dev); ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L915 ) - vfio_pci_remove() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 ) - vfio_del_group_dev() https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L923 - send event request to user and wait for VFIO_PCI_DEVICE release in vfio_pci_release() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L967 ) + device_release_driver() ( ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L935 ) + device_release_driver_internal() ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L908 ) + device_lock(dev); ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L915 ) + vfio_pci_remove() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 ) + vfio_del_group_dev() https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L923 + send event request to user and wait for VFIO_PCI_DEVICE release in vfio_pci_release() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L967 ) Execution flow 2 triggered by above step "send event request to user": - vfio_pci_releas() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 ) - vfio_pci_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L302 ) - vfio_pci_try_bus_reset() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L1346 ) - pci_try_reset_bus() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4981 ) - pci_bus_save_and_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4760 ) - pci_dev_lock(dev); ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4765 ) + vfio_pci_releas() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 ) + vfio_pci_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L302 ) + vfio_pci_try_bus_reset() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L1346 ) + pci_try_reset_bus() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4981 ) + pci_bus_save_and_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4760 ) + pci_dev_lock(dev); ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4765 ) - DEADLOCK here since PCI_DEIVCE_LOCK is held by PCI_DEVICE + DEADLOCK here since PCI_DEIVCE_LOCK is held by PCI_DEVICE remove code path in DD.c
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1792099 Title: device hotplug of vfio devices can lead to deadlock in vfio_pci_release Status in linux package in Ubuntu: Confirmed Status in linux source package in Bionic: In Progress Bug description: [Impact] Attempts to hotplug devices shared to userspace (qemu) via vfio triggers a deadlock in the kernel. A reboot is required to resolve this. [Test Case] Set up a KVM instance with attached devices, attempt to hotplug those using ipmitool. [Regression Potential] The change is to an uncommonly used driver. There is common code changes, but these are a noop in the normal case and should be easy to confirm basic operation. [Other Info] This fix has been verified by the reporter as fixing the deadlock. === We are seeing deadlocks during hotplug of devices under vfio. As per the Linux kernel source code, there is a deadlock situation between vfio_pci_remove() and vfio_pci_release() on PCIe hotplug events. This issue can be avoided either by skipping the PCIe reset functionality or do device_unlock() in vfio_pci_remove() beforfe calling the function vfio_del_group_dev()(). Code flow on PCIe hotplug event: Execution flow 1: device_release_driver() ( ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L935 ) device_release_driver_internal() ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L908 ) device_lock(dev); ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L915 ) vfio_pci_remove() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 ) vfio_del_group_dev() https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L923 send event request to user and wait for VFIO_PCI_DEVICE release in vfio_pci_release() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L967 ) Execution flow 2 triggered by above step "send event request to user": vfio_pci_releas() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 ) vfio_pci_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L302 ) vfio_pci_try_bus_reset() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L1346 ) pci_try_reset_bus() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4981 ) pci_bus_save_and_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4760 ) pci_dev_lock(dev); ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4765 ) DEADLOCK here since PCI_DEIVCE_LOCK is held by PCI_DEVICE remove code path in DD.c To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1792099/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp