Public bug reported:

We are seeing deadlocks during hotplug of devices under vfio.


As per the Linux kernel source code, there is a deadlock situation
between vfio_pci_remove() and vfio_pci_release() on PCIe hotplug events.
This issue can be avoided either by skipping the PCIe reset
functionality or do device_unlock() in vfio_pci_remove() beforfe calling
the function vfio_del_group_dev()().

Code flow on PCIe hotplug event:

Execution flow 1:
  device_release_driver() ( ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L935 )
   device_release_driver_internal() ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L908 )
   device_lock(dev); ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L915 )
   vfio_pci_remove() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392
 )
     vfio_del_group_dev() 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L923
       send event request to user and wait for VFIO_PCI_DEVICE release in 
vfio_pci_release() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L967 )

Execution flow 2 triggered by above step "send event request to user":
  vfio_pci_releas() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392
 )
    vfio_pci_disable() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L302
 )
      vfio_pci_try_bus_reset() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L1346
 )
        pci_try_reset_bus() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4981 )
          pci_bus_save_and_disable() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4760 )
            pci_dev_lock(dev); ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4765 )

             DEADLOCK here since PCI_DEIVCE_LOCK is held by PCI_DEVICE
remove code path in DD.c

** Affects: linux (Ubuntu)
     Importance: Critical
     Assignee: Andy Whitcroft (apw)
         Status: Confirmed

** Affects: linux (Ubuntu Bionic)
     Importance: Critical
     Assignee: Andy Whitcroft (apw)
         Status: In Progress

** Changed in: linux (Ubuntu)
       Status: New => Confirmed

** Changed in: linux (Ubuntu)
   Importance: Undecided => Critical

** Changed in: linux (Ubuntu)
     Assignee: (unassigned) => Andy Whitcroft (apw)

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => Critical

** Changed in: linux (Ubuntu Bionic)
     Assignee: (unassigned) => Andy Whitcroft (apw)

** Changed in: linux (Ubuntu Bionic)
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1792099

Title:
  vfio_pci_release hotplug deadlock

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  In Progress

Bug description:
  We are seeing deadlocks during hotplug of devices under vfio.


  As per the Linux kernel source code, there is a deadlock situation
  between vfio_pci_remove() and vfio_pci_release() on PCIe hotplug
  events. This issue can be avoided either by skipping the PCIe reset
  functionality or do device_unlock() in vfio_pci_remove() beforfe
  calling the function vfio_del_group_dev()().

  Code flow on PCIe hotplug event:

  Execution flow 1:
    device_release_driver() ( ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L935 )
     device_release_driver_internal() ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L908 )
     device_lock(dev); ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L915 )
     vfio_pci_remove() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392
 )
       vfio_del_group_dev() 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L923
         send event request to user and wait for VFIO_PCI_DEVICE release in 
vfio_pci_release() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L967 )

  Execution flow 2 triggered by above step "send event request to user":
    vfio_pci_releas() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392
 )
      vfio_pci_disable() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L302
 )
        vfio_pci_try_bus_reset() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L1346
 )
          pci_try_reset_bus() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4981 )
            pci_bus_save_and_disable() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4760 )
              pci_dev_lock(dev); ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4765 )

               DEADLOCK here since PCI_DEIVCE_LOCK is held by PCI_DEVICE
  remove code path in DD.c

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1792099/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to