Hi all,

Thanks to issue reports from Simon Richter and Alex Bennée, I
discovered BAR resize rollback can corrupt the resource tree. As fixing
corruption requires avoiding overlapping resource assignments, the
correct fix can unfortunately results in worse user experience, what
appeared to be "working" previously might no longer do so. Thus, I had
to do a larger rework to pci_resize_resource() in order to properly
restore resource states as it was prior to BAR resize.

This rework has been on my TODO list anyway but it wasn't the highest
prio item until pci_resize_resource() started to cause regressions due
to other resource assignment algorithm changes.

BAR resize rollback does not always restore BAR resources as they were
before the resize operation was started. Currently, when
pci_resize_resource() call is made by a driver, the driver must release
device resource prior to the call. This is a design flaw in
pci_resize_resource() API as PCI core cannot then save the state of
those resources from what it was prior to release so it could restore
them later if the BAR size change has to be rolled back.

PCI core's BAR resize operation doesn't even attempt to restore the
device resources currently when rolling back BAR resize operation. If
the normal resource assignment algorithm assigned those resources, then
device resources might be assigned after pci_resize_resource() call but
that could also trigger the resource tree corruption issue so what
appeared to an user as "working" might be a corrupted state.

With the new pci_resize_resource() interface, the driver calling
pci_resize_resource() should no longer release the device resources.

I've added WARN_ON_ONCE() to pick up similar bugs that cause resource
tree corruption. At least in my tests all looked clear on that front
after this series.

It would still be nice if the reporters could test these changes
resolve the claim conflicts (while I've tested the series to some extent,
I don't have such conflicts here).

This series will likely conflict with some drm changes from Lucas (will
make them partially obsolete by removing the need to release dev's
resources on the driver side).

I'll soon submit refresh of pci/rebar series on top of this series as
there are some conflicts with them.

v2:
- Add exclude_bars parameter to pci_resize_resource()
- Add Link tags
- Add kerneldoc patch
- Add patch to release pci_bus_sem earlier.
- Fix to uninitialized var warnings.
- Don't use guard() as goto from before it triggers error with clang.

Ilpo Järvinen (11):
  PCI: Prevent resource tree corruption when BAR resize fails
  PCI/IOV: Adjust ->barsz[] when changing BAR size
  PCI: Change pci_dev variable from 'bridge' to 'dev'
  PCI: Try BAR resize even when no window was released
  PCI: Freeing saved list does not require holding pci_bus_sem
  PCI: Fix restoring BARs on BAR resize rollback path
  PCI: Add kerneldoc for pci_resize_resource()
  drm/xe: Remove driver side BAR release before resize
  drm/i915: Remove driver side BAR release before resize
  drm/amdgpu: Remove driver side BAR release before resize
  PCI: Prevent restoring assigned resources

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  10 +-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c |  14 +--
 drivers/gpu/drm/xe/xe_vram.c                |   5 +-
 drivers/pci/iov.c                           |  15 +--
 drivers/pci/pci-sysfs.c                     |  17 +--
 drivers/pci/pci.c                           |   4 +
 drivers/pci/pci.h                           |   9 +-
 drivers/pci/setup-bus.c                     | 126 ++++++++++++++------
 drivers/pci/setup-res.c                     |  52 ++++----
 include/linux/pci.h                         |   3 +-
 10 files changed, 142 insertions(+), 113 deletions(-)


base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787
-- 
2.39.5

Reply via email to