** Tags removed: verification-needed-noble-linux 
verification-needed-plucky-linux
** Tags added: verification-done-noble-linux verification-done-plucky-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2114174

Title:
  [UBUNTU 24.04] s390/pci: Fix immediate re-add of PCI function after
  remove

Status in Ubuntu on IBM z Systems:
  Fix Committed
Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Noble:
  Fix Committed
Status in linux source package in Plucky:
  Fix Committed

Bug description:
  [ Impact ]

  s390/pci: Fix immediate re-add of PCI function after remove

  A PCI function may be reserved directly after being
  deconfigured. If it subsequently returns back in the standby
  state Linux may not be able to use the new instance generating
  a kernel warning about trying to create an already existing
  sysfs file for the IOMMU.

  The problem occurs because the new instance of the same
  underlying device is created before the prior instance is
  completely torn down. This happens because the lifetime of the
  PCI device representation in Linux is determined by reference
  counts. A driver, the network stack, or even user-space
  (including via vfio-pci) may be holding onto the device
  represenation even after the underlying device is gone.

  The solution to this is twofold. Firstly allow re-using the
  pre-existing struct zpci_dev and/or struct pci_dev for the newly
  re-added instance of the underlying device up until the point
  where the struct zpci_dev is fully removed. Secondly serialize
  the addition and removal of PCI functions such that re-adding
  a new instance, after the old one is already being removed, will
  wait for the removal to finish before adding the new instance.
  This fix also builds on prior upstream work of serializing state
  transitions for PCI devices e.g. from configured to standby.

  [ Fix ]

  Backport from mainline:
  - 0d48566d4b58 s390/pci: rename lock member in struct zpci_dev
  - bcb5d6c76903 s390/pci: introduce lock to synchronize state of zpci_dev's
  - 6ee600bfbe0f s390/pci: remove hotplug slot when releasing the device
  - c4a585e952ca s390/pci: Fix potential double remove of hotplug slot
  - 42420c50c68f s390/pci: Fix missing check for zpci_create_device() error 
return
  - 05a2538f2b48 s390/pci: Fix duplicate pci_dev_put() in disable_slot() when 
PF has child VFs
  - d76f96332967 s390/pci: Remove redundant bus removal and disable from 
zpci_release_device()
  - 47c397844869 s390/pci: Prevent self deletion in disable_slot()
  - 4b1815a52d7e s390/pci: Allow re-add of a reserved but not yet removed device
  - 774a1fa880bc s390/pci: Serialize device addition and removal

  [ Test Plan ]

  The issue can be reproduced looking at the behavior of the kernel wrt
  to NETH PCI functions. In fact, IBM Z firmware temporarily reserves
  NETH PCI functions to check for pending service when the last FID of a
  PCHID is deconfigured. When nothing is pending the PCI function is
  immediately returned in the standby state, thus triggering this issue
  quite reliably.

  [ Where Problems Could Occur ]

  The fix affects the PCI function lifecycle management in the s390 PCI
  hotplug infrastructure, specifically the serialization and reuse logic
  of zpci_dev and pci_dev structures during rapid remove and re-add
  cycles. An issue with this fix may introduce problems such as stale or
  incorrectly reused device state, leading to improper reinitialization
  of PCI functions.


  ---

  Description:   s390/pci: Fix immediate re-add of PCI function after
  remove

  Symptom:       A PCI function may be reserved directly after being
                 deconfigured. If it subsequently returns back in the standby
                 state Linux may not be able to use the new instance generating
                 a kernel warning about trying to create an already existing
                 sysfs file for the IOMMU.

  Problem:       The problem occurs because the new instance of the same
                 underlying device is created before the prior instance is
                 completely torn down. This happens because the lifetime of the
                 PCI device representation in Linux is determined by reference
                 counts. A driver, the network stack, or even user-space
                 (including via vfio-pci) may be holding onto the device
                 represenation even after the underlying device is gone.

  Solution:      The solution to this is twofold. Firstly allow re-using the
                 pre-existing struct zpci_dev and/or struct pci_dev for the 
newly
                 re-added instance of the underlying device up until the point
                 where the struct zpci_dev is fully removed. Secondly serialize
                 the addition and removal of PCI functions such that re-adding
                 a new instance, after the old one is already being removed, 
will
                 wait for the removal to finish before adding the new instance.
                 This fix also builds on prior upstream work of serializing 
state
                 transitions for PCI devices e.g. from configured to standby.

  Reproduction:  This problem was originally found with firmware which
                 temporarily reserves NETH PCI functions to check for pending
                 service when the last FID of a PCHID is deconfigured. When
                 nothing is pending the PCI function is immediately returned in
                 the standby state, thus triggering this issue quite reliably.

  Upstream-ID:   0d48566d4b58946c8e1b0baac0347616060a81c9
                 bcb5d6c769039c8358a2359e7c3ea5d97ce93108
                 6ee600bfbe0f818ffb7748d99e9b0c89d0d9f02a
                 c4a585e952ca403a370586d3f16e8331a7564901
                 42420c50c68f3e95e90de2479464f420602229fc
                 05a2538f2b48500cf4e8a0a0ce76623cc5bafcf1
                 d76f9633296785343d45f85199f4138cb724b6d2
                 47c397844869ad0e6738afb5879c7492f4691122
                 4b1815a52d7eb03b3e0e6742c6728bc16a4b2d1d
                 774a1fa880bc949d88b5ddec9494a13be733dfa8

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/2114174/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to