Public bug reported: We are seeing deadlocks during hotplug of devices under vfio.
As per the Linux kernel source code, there is a deadlock situation between vfio_pci_remove() and vfio_pci_release() on PCIe hotplug events. This issue can be avoided either by skipping the PCIe reset functionality or do device_unlock() in vfio_pci_remove() beforfe calling the function vfio_del_group_dev()(). Code flow on PCIe hotplug event: Execution flow 1: device_release_driver() ( ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L935 ) device_release_driver_internal() ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L908 ) device_lock(dev); ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L915 ) vfio_pci_remove() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 ) vfio_del_group_dev() https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L923 send event request to user and wait for VFIO_PCI_DEVICE release in vfio_pci_release() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L967 ) Execution flow 2 triggered by above step "send event request to user": vfio_pci_releas() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 ) vfio_pci_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L302 ) vfio_pci_try_bus_reset() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L1346 ) pci_try_reset_bus() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4981 ) pci_bus_save_and_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4760 ) pci_dev_lock(dev); ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4765 ) DEADLOCK here since PCI_DEIVCE_LOCK is held by PCI_DEVICE remove code path in DD.c ** Affects: linux (Ubuntu) Importance: Critical Assignee: Andy Whitcroft (apw) Status: Confirmed ** Affects: linux (Ubuntu Bionic) Importance: Critical Assignee: Andy Whitcroft (apw) Status: In Progress ** Changed in: linux (Ubuntu) Status: New => Confirmed ** Changed in: linux (Ubuntu) Importance: Undecided => Critical ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Andy Whitcroft (apw) ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Bionic) Importance: Undecided => Critical ** Changed in: linux (Ubuntu Bionic) Assignee: (unassigned) => Andy Whitcroft (apw) ** Changed in: linux (Ubuntu Bionic) Status: New => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1792099 Title: vfio_pci_release hotplug deadlock Status in linux package in Ubuntu: Confirmed Status in linux source package in Bionic: In Progress Bug description: We are seeing deadlocks during hotplug of devices under vfio. As per the Linux kernel source code, there is a deadlock situation between vfio_pci_remove() and vfio_pci_release() on PCIe hotplug events. This issue can be avoided either by skipping the PCIe reset functionality or do device_unlock() in vfio_pci_remove() beforfe calling the function vfio_del_group_dev()(). Code flow on PCIe hotplug event: Execution flow 1: device_release_driver() ( ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L935 ) device_release_driver_internal() ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L908 ) device_lock(dev); ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L915 ) vfio_pci_remove() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 ) vfio_del_group_dev() https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L923 send event request to user and wait for VFIO_PCI_DEVICE release in vfio_pci_release() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L967 ) Execution flow 2 triggered by above step "send event request to user": vfio_pci_releas() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 ) vfio_pci_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L302 ) vfio_pci_try_bus_reset() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L1346 ) pci_try_reset_bus() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4981 ) pci_bus_save_and_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4760 ) pci_dev_lock(dev); ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4765 ) DEADLOCK here since PCI_DEIVCE_LOCK is held by PCI_DEVICE remove code path in DD.c To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1792099/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp