Simon and Alex, could you please test if this series eliminates the claim conflicts and makes the BAR resize either succeed or not break things while rolling back resource changes? It should be tested without other fix patches (from me; if you need some random unrelated fix, that's okay).
Hi all, Thanks to issue reports from Simon Richter and Alex Bennée, I discovered BAR resize rollback can corrupt the resource tree. As fixing corruption requires avoiding overlapping resource assignments, the correct fix can unfortunately results in worse user experience, what appeared to be "working" previously might no longer do so. Thus, I had to do a larger rework to pci_resize_resource() in order to properly restore resource states as it was prior to BAR resize. This rework has been on my TODO list anyway but it wasn't the highest prio item until pci_resize_resource() started to cause regressions due to other resource assignment algorithm changes. BAR resize rollback does not always restore BAR resources as they were before the resize operation was started. Currently, when pci_resize_resource() call is made by a driver, the driver must release device resource prior to the call. This is a design flaw in pci_resize_resource() API as PCI core cannot then save the state of those resources from what it was prior to release so it could restore them later if the BAR size change has to be rolled back. PCI core's BAR resize operation doesn't even attempt to restore the device resources currently when rolling back BAR resize operation. If the normal resource assignment algorithm assigned those resources, then device resources might be assigned after pci_resize_resource() call but that could also trigger the resource tree corruption issue so what appeared to an user as "working" might be a corrupted state. With the new pci_resize_resource() interface, the driver calling pci_resize_resource() should no longer release the device resources. I've added WARN_ON_ONCE() to pick up similar bugs that cause resource tree corruption. At least in my tests all looked clear on that front after this series. I was a bit on the edge how to split this series. Between patches 1 and 5-8, there might be cases where user experience is made worse if only part of the series are applied. But at the same time I was hesitant to merge all those changes together either as the changes way easier to understand when split properly. Personally I think BAR resize rollback code has not really functioned okay prior to series at all because touching an assigned resource on the rollback path is a bug, plain and simple. If that got things "working" it's still a bad bug (that one can get lucky and corruption results in non-corrupted numbers doesn't make it any better). If those patches need to be merged into one, just let me know and I can rearrange the patch order to make it easier. This series will conflict what's in pci/rebar and likely with some xe changes from Lucas De Marchi that might also be rendered in part unnecessary due to pci_resize_resource() API change. My suggestion is that this series takes precedence over what's in pci/rebar to make things easier for stable people (I can rebase the pci/rebar patches on top of these so feel free to drop those other patches, if needed). Ilpo Järvinen (9): PCI: Prevent resource tree corruption when BAR resize fails PCI/IOV: Adjust ->barsz[] when changing BAR size PCI: Change pci_dev variable from 'bridge' to 'dev' PCI: Try BAR resize even when no window was released PCI: Fix restoring BARs on BAR resize rollback path drm/xe: Remove driver side BAR release before resize drm/i915: Remove driver side BAR release before resize drm/amdgpu: Remove driver side BAR release before resize PCI: Prevent restoring assigned resources drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +- drivers/gpu/drm/i915/gt/intel_region_lmem.c | 12 -- drivers/gpu/drm/xe/xe_vram.c | 3 - drivers/pci/iov.c | 15 +-- drivers/pci/pci-sysfs.c | 15 +-- drivers/pci/pci.c | 4 + drivers/pci/pci.h | 8 +- drivers/pci/setup-bus.c | 119 ++++++++++++++------ drivers/pci/setup-res.c | 30 ++--- 9 files changed, 108 insertions(+), 106 deletions(-) base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 -- 2.39.5
