Hi all, Thanks to issue reports from Simon Richter and Alex Bennée, I discovered BAR resize rollback can corrupt the resource tree. As fixing corruption requires avoiding overlapping resource assignments, the correct fix can unfortunately results in worse user experience, what appeared to be "working" previously might no longer do so. Thus, I had to do a larger rework to pci_resize_resource() in order to properly restore resource states as it was prior to BAR resize.
This rework has been on my TODO list anyway but it wasn't the highest prio item until pci_resize_resource() started to cause regressions due to other resource assignment algorithm changes. BAR resize rollback does not always restore BAR resources as they were before the resize operation was started. Currently, when pci_resize_resource() call is made by a driver, the driver must release device resource prior to the call. This is a design flaw in pci_resize_resource() API as PCI core cannot then save the state of those resources from what it was prior to release so it could restore them later if the BAR size change has to be rolled back. PCI core's BAR resize operation doesn't even attempt to restore the device resources currently when rolling back BAR resize operation. If the normal resource assignment algorithm assigned those resources, then device resources might be assigned after pci_resize_resource() call but that could also trigger the resource tree corruption issue so what appeared to an user as "working" might be a corrupted state. With the new pci_resize_resource() interface, the driver calling pci_resize_resource() should no longer release the device resources. I've added WARN_ON_ONCE() to pick up similar bugs that cause resource tree corruption. At least in my tests all looked clear on that front after this series. It would still be nice if the reporters could test these changes resolve the claim conflicts (while I've tested the series to some extent, I don't have such conflicts here). This series will likely conflict with some drm changes from Lucas (will make them partially obsolete by removing the need to release dev's resources on the driver side). I'll soon submit refresh of pci/rebar series on top of this series as there are some conflicts with them. v2: - Add exclude_bars parameter to pci_resize_resource() - Add Link tags - Add kerneldoc patch - Add patch to release pci_bus_sem earlier. - Fix to uninitialized var warnings. - Don't use guard() as goto from before it triggers error with clang. Ilpo Järvinen (11): PCI: Prevent resource tree corruption when BAR resize fails PCI/IOV: Adjust ->barsz[] when changing BAR size PCI: Change pci_dev variable from 'bridge' to 'dev' PCI: Try BAR resize even when no window was released PCI: Freeing saved list does not require holding pci_bus_sem PCI: Fix restoring BARs on BAR resize rollback path PCI: Add kerneldoc for pci_resize_resource() drm/xe: Remove driver side BAR release before resize drm/i915: Remove driver side BAR release before resize drm/amdgpu: Remove driver side BAR release before resize PCI: Prevent restoring assigned resources drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +- drivers/gpu/drm/i915/gt/intel_region_lmem.c | 14 +-- drivers/gpu/drm/xe/xe_vram.c | 5 +- drivers/pci/iov.c | 15 +-- drivers/pci/pci-sysfs.c | 17 +-- drivers/pci/pci.c | 4 + drivers/pci/pci.h | 9 +- drivers/pci/setup-bus.c | 126 ++++++++++++++------ drivers/pci/setup-res.c | 52 ++++---- include/linux/pci.h | 3 +- 10 files changed, 142 insertions(+), 113 deletions(-) base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 -- 2.39.5
