------- Comment From niklas.schne...@ibm.com 2020-09-21 05:55 EDT-------
SRU Justification:

[Impact]

* As zpci_dma_exit_device() is never called on a zPCI device that
is removed via PCI event which only informs Linxu of device removal
instead of requesting deconfiguration, the vmalloced memory for
DMA tables and vma tracking leaks in this case.

* This is because commit "s390/pci: adapt events for zbus" removed the 
zpci_disable_device() call for a zPCI event with PEC 0x0304 since
the device is then removed via zpci_release_device() however this
did not free the dma tables because the device already appeared
to be in Standby state.

[Fix]

* afdf9550e54627fcf4dd609bdc1153059378cdf5 afdf9550e546 "s390/pci: fix
leak of DMA tables on hard unplug"

[Test Case]

* Have an IBM Z LPAR, that has PCIe devices (like RoCE adapters)
assigned and Ubuntu Server 20.04 installed.

* Disable and re-enable one (or more) of the assigned PCIe cards using
the Reassign I/O Path functionality of the HMC/SE,

* Monitor /proc/meminfo which shows the vmalloc memory usage not
go back to the value before attaching the device.

* The test and verification will be conducted by IBM.

[Regression Potential]

* There regression risk can be considered as moderate, because:

* Only a call of zpci_disable_device(zdev) got reintroduced (and some comment 
lines). This call was done in the same place and with the same
functionality prior to commit "s390/pci: adapt events for zbus".

* Since __zpci_event_availability gets modified, the zPCI event handling could 
be broken. It is however only modified for the single PEC 0x0304 case
and all cases execute independently.

* Nevertheless this could cause issues regarding the availability of
zPCI devices

* In the worst case zPCI devices could become unusable.

* The code changes themselves are minimal, and the zPCI code is limited
to the s390x architecture.

* On top test kernels were built and shared for further testing.

[Other]

* Since this commit needs to land in groovy too, but groovy is still in
development (hence the SRU process does not apply for groovy yet, a
separate Patch request for groovy was made.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896216

Title:
  [Ubuntu 20.10] zPCI DMA tables and bitmap leak on hard unplug (PCI
  Event 0x0304)

Status in Ubuntu on IBM z Systems:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Focal:
  New
Status in linux source package in Groovy:
  In Progress

Bug description:
  Commit "s390/pci: adapt events for zbus" removed the
  zpci_disable_device() call for a zPCI event with PEC 0x0304 (hot
  unplug) becausethe device is already deconfigured by the platform.

  This however skips the Linux side of the disable in particular it leads
  to leaking the DMA tables and bitmaps because zpci_dma_exit_device() is
  never called on the device.

  This has been fixed in the following commit (currently in linux-next)

  afdf9550e54627fcf4dd609bdc1153059378cdf5 s390/pci: fix leak of DMA
  tables on hard unplug

  The commit re-introduces the zpci_disable_device() call as it was before the 
zbus introduction, for good measure I also added a comment to 
zpci_disable_device()
  to call out the fact that it may be called with the device disabled
  already.

  As the commit was introduced with the multi-function support
  this of course should go into both 20.10 and 20.04.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1896216/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to