** Also affects: linux (Ubuntu Focal) Importance: Undecided Status: New
** Also affects: linux (Ubuntu Eoan) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Eoan) Status: New => Fix Committed ** Changed in: linux (Ubuntu Focal) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1873537 Title: PCIe AER device recovery failed due to logic flaw Status in linux package in Ubuntu: Incomplete Status in linux source package in Eoan: Fix Committed Status in linux source package in Focal: Fix Committed Bug description: SRU Justification Impact: During PCI Express Downstream Port Containment (DPC) recovery, certain types of failures do not recover due to a logic flaw in pcie_do_recovery(). The upstream git commit log explains the change: PCI/ERR: Update error status after reset_link() Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses reset_link() to recover from fatal errors. But during fatal error recovery, if the initial value of error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER then even after successful recovery (using reset_link()) pcie_do_recovery() will report the recovery result as failure. Update the status of error after reset_link(). You can reproduce this issue by triggering a SW DPC using "DPC Software Trigger" bit in "DPC Control Register". You should see recovery failed dmesg log as below: pcieport 0000:00:16.0: DPC: containment event, status:0x1f27 source:0x0000 pcieport 0000:00:16.0: DPC: software trigger detected pci 0000:04:00.0: AER: can't recover (no error_detected callback) pcieport 0000:00:16.0: AER: device recovery failed Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") Link: https://lore.kernel.org/r/a255fcb3a3fdebcd90f84e08b555f1786eb8eba2.1585000084.git.sathyanarayanan.kuppusw...@linux.intel.com [bhelgaas: split pci_channel_io_frozen simplification to separate patch] Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppusw...@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelg...@google.com> Acked-by: Keith Busch <keith.bu...@intel.com> Cc: Ashok Raj <ashok....@intel.com> Note that a second prerequisite patch is necessary as well. This patch, commit b5dfbeacf74865a8d62a4f70f501cdc61510f8e0 Author: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppusw...@linux.intel.com> Date: Fri Mar 27 17:33:24 2020 -0500 PCI/ERR: Combine pci_channel_io_frozen cases is a code readability change, and makes no functional changes. Testcase: On a system with DPC enabled, setpci may be used to set the DPC Software Trigger bit (bit 6, value 0x40) in the DPC Control register of a suitable PCIe device (a PCIe bridge, for example). On a system lacking the fix, the output will be as shown above (i.e., culminating in the "device recovery failed" message). With the fix applied, the device successfully recovers, resulting in a message of the form pcieport 0000:d9:01.0: AER: Device recovery successful Regression Potential: The risk of regression is low, as (a) the path in question currently does not work, and (b) the changes are minimal, comprising only a housekeeping change and the logically correct updating of a status variable that did not previously occur. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873537/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp