Verification has already been done for Noble and Plucky, please ignore any other verification request
** Tags removed: verification-needed-noble-linux-nvidia-tegra ** Tags added: verification-done-noble-linux-nvidia-tegra -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2114174 Title: [UBUNTU 24.04] s390/pci: Fix immediate re-add of PCI function after remove Status in Ubuntu on IBM z Systems: Fix Committed Status in linux package in Ubuntu: Invalid Status in linux source package in Noble: Fix Committed Status in linux source package in Plucky: Fix Committed Bug description: [ Impact ] s390/pci: Fix immediate re-add of PCI function after remove A PCI function may be reserved directly after being deconfigured. If it subsequently returns back in the standby state Linux may not be able to use the new instance generating a kernel warning about trying to create an already existing sysfs file for the IOMMU. The problem occurs because the new instance of the same underlying device is created before the prior instance is completely torn down. This happens because the lifetime of the PCI device representation in Linux is determined by reference counts. A driver, the network stack, or even user-space (including via vfio-pci) may be holding onto the device represenation even after the underlying device is gone. The solution to this is twofold. Firstly allow re-using the pre-existing struct zpci_dev and/or struct pci_dev for the newly re-added instance of the underlying device up until the point where the struct zpci_dev is fully removed. Secondly serialize the addition and removal of PCI functions such that re-adding a new instance, after the old one is already being removed, will wait for the removal to finish before adding the new instance. This fix also builds on prior upstream work of serializing state transitions for PCI devices e.g. from configured to standby. [ Fix ] Backport from mainline: - 0d48566d4b58 s390/pci: rename lock member in struct zpci_dev - bcb5d6c76903 s390/pci: introduce lock to synchronize state of zpci_dev's - 6ee600bfbe0f s390/pci: remove hotplug slot when releasing the device - c4a585e952ca s390/pci: Fix potential double remove of hotplug slot - 42420c50c68f s390/pci: Fix missing check for zpci_create_device() error return - 05a2538f2b48 s390/pci: Fix duplicate pci_dev_put() in disable_slot() when PF has child VFs - d76f96332967 s390/pci: Remove redundant bus removal and disable from zpci_release_device() - 47c397844869 s390/pci: Prevent self deletion in disable_slot() - 4b1815a52d7e s390/pci: Allow re-add of a reserved but not yet removed device - 774a1fa880bc s390/pci: Serialize device addition and removal [ Test Plan ] The issue can be reproduced looking at the behavior of the kernel wrt to NETH PCI functions. In fact, IBM Z firmware temporarily reserves NETH PCI functions to check for pending service when the last FID of a PCHID is deconfigured. When nothing is pending the PCI function is immediately returned in the standby state, thus triggering this issue quite reliably. [ Where Problems Could Occur ] The fix affects the PCI function lifecycle management in the s390 PCI hotplug infrastructure, specifically the serialization and reuse logic of zpci_dev and pci_dev structures during rapid remove and re-add cycles. An issue with this fix may introduce problems such as stale or incorrectly reused device state, leading to improper reinitialization of PCI functions. --- Description: s390/pci: Fix immediate re-add of PCI function after remove Symptom: A PCI function may be reserved directly after being deconfigured. If it subsequently returns back in the standby state Linux may not be able to use the new instance generating a kernel warning about trying to create an already existing sysfs file for the IOMMU. Problem: The problem occurs because the new instance of the same underlying device is created before the prior instance is completely torn down. This happens because the lifetime of the PCI device representation in Linux is determined by reference counts. A driver, the network stack, or even user-space (including via vfio-pci) may be holding onto the device represenation even after the underlying device is gone. Solution: The solution to this is twofold. Firstly allow re-using the pre-existing struct zpci_dev and/or struct pci_dev for the newly re-added instance of the underlying device up until the point where the struct zpci_dev is fully removed. Secondly serialize the addition and removal of PCI functions such that re-adding a new instance, after the old one is already being removed, will wait for the removal to finish before adding the new instance. This fix also builds on prior upstream work of serializing state transitions for PCI devices e.g. from configured to standby. Reproduction: This problem was originally found with firmware which temporarily reserves NETH PCI functions to check for pending service when the last FID of a PCHID is deconfigured. When nothing is pending the PCI function is immediately returned in the standby state, thus triggering this issue quite reliably. Upstream-ID: 0d48566d4b58946c8e1b0baac0347616060a81c9 bcb5d6c769039c8358a2359e7c3ea5d97ce93108 6ee600bfbe0f818ffb7748d99e9b0c89d0d9f02a c4a585e952ca403a370586d3f16e8331a7564901 42420c50c68f3e95e90de2479464f420602229fc 05a2538f2b48500cf4e8a0a0ce76623cc5bafcf1 d76f9633296785343d45f85199f4138cb724b6d2 47c397844869ad0e6738afb5879c7492f4691122 4b1815a52d7eb03b3e0e6742c6728bc16a4b2d1d 774a1fa880bc949d88b5ddec9494a13be733dfa8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/2114174/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

