This bug was fixed in the package linux - 5.4.0-91.102

---------------
linux (5.4.0-91.102) focal; urgency=medium

  * focal/linux: 5.4.0-91.102 -proposed tracker (LP: #1949840)

  * Packaging resync (LP: #1786013)
    - [Packaging] update Ubuntu.md
    - debian/dkms-versions -- update from kernel-versions (main/2021.11.08)

  * KVM emulation failure when booting into  VM crash kernel with multiple CPUs
    (LP: #1948862)
    - KVM: x86: Properly reset MMU context at vCPU RESET/INIT

  * aufs: kernel bug with apparmor and fuseblk (LP: #1948470)
    - SAUCE: aufs: bugfix, stop omitting path->mnt

  * ebpf:  bpf_redirect fails with ip6 gre interfaces (LP: #1947164)
    - net: handle ARPHRD_IP6GRE in dev_is_mac_header_xmit()

  * require CAP_NET_ADMIN to attach N_HCI ldisc (LP: #1949516)
    - Bluetooth: hci_ldisc: require CAP_NET_ADMIN to attach N_HCI ldisc

  * ACL updates on OCFS2 are not revalidated (LP: #1947161)
    - ocfs2: fix remounting needed after setfacl command

  * ppc64 BPF JIT mod by 1 will not return 0 (LP: #1948351)
    - powerpc/bpf: Fix BPF_MOD when imm == 1

  * Drop "UBUNTU: SAUCE: cachefiles: Page leaking in
    cachefiles_read_backing_file while vmscan is active" (LP: #1947709)
    - Revert "UBUNTU: SAUCE: cachefiles: Page leaking in
      cachefiles_read_backing_file while vmscan is active"

  * Reassign I/O Path of ConnectX-5 Port 1 before Port 2 causes NULL dereference
    (LP: #1943464)
    - s390/pci: fix leak of PCI device structure
    - s390/pci: fix use after free of zpci_dev
    - s390/pci: fix zpci_zdev_put() on reserve

  * [SRU][F] USB: serial: pl2303: add support for PL2303HXN (LP: #1948377)
    - USB: serial: pl2303: add support for PL2303HXN
    - USB: serial: pl2303: fix line-speed handling on newer chips

  * Focal update: v5.4.151 upstream stable release (LP: #1947888)
    - tty: Fix out-of-bound vmalloc access in imageblit
    - cpufreq: schedutil: Use kobject release() method to free sugov_tunables
    - cpufreq: schedutil: Destroy mutex before kobject_put() frees the memory
    - usb: cdns3: fix race condition before setting doorbell
    - fs-verity: fix signed integer overflow with i_size near S64_MAX
    - hwmon: (w83793) Fix NULL pointer dereference by removing unnecessary
      structure field
    - hwmon: (w83792d) Fix NULL pointer dereference by removing unnecessary
      structure field
    - hwmon: (w83791d) Fix NULL pointer dereference by removing unnecessary
      structure field
    - scsi: ufs: Fix illegal offset in UPIU event trace
    - mac80211: fix use-after-free in CCMP/GCMP RX
    - x86/kvmclock: Move this_cpu_pvti into kvmclock.h
    - drm/amd/display: Pass PCI deviceid into DC
    - ipvs: check that ip_vs_conn_tab_bits is between 8 and 20
    - hwmon: (mlxreg-fan) Return non-zero value when fan current state is 
enforced
      from sysfs
    - mac80211: Fix ieee80211_amsdu_aggregate frag_tail bug
    - mac80211: limit injected vht mcs/nss in ieee80211_parse_tx_radiotap
    - mac80211: mesh: fix potentially unaligned access
    - mac80211-hwsim: fix late beacon hrtimer handling
    - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb
    - hwmon: (tmp421) report /PVLD condition as fault
    - hwmon: (tmp421) fix rounding for negative values
    - net: ipv4: Fix rtnexthop len when RTA_FLOW is present
    - e100: fix length calculation in e100_get_regs_len
    - e100: fix buffer overrun in e100_get_regs
    - selftests, bpf: test_lwt_ip_encap: Really disable rp_filter
    - scsi: csiostor: Add module softdep on cxgb4
    - net: hns3: do not allow call hns3_nic_net_open repeatedly
    - net: sched: flower: protect fl_walk() with rcu
    - af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
    - perf/x86/intel: Update event constraints for ICX
    - elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings
    - debugfs: debugfs_create_file_size(): use IS_ERR to check for error
    - ipack: ipoctal: fix stack information leak
    - ipack: ipoctal: fix tty registration race
    - ipack: ipoctal: fix tty-registration error handling
    - ipack: ipoctal: fix missing allocation-failure check
    - ipack: ipoctal: fix module reference leak
    - ext4: fix loff_t overflow in ext4_max_bitmap_size()
    - ext4: fix reserved space counter leakage
    - ext4: fix potential infinite loop in ext4_dx_readdir()
    - HID: u2fzero: ignore incomplete packets without data
    - net: udp: annotate data race around udp_sk(sk)->corkflag
    - net: stmmac: don't attach interface until resume finishes
    - PCI: Fix pci_host_bridge struct device release/free handling
    - libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind
    - hso: fix bailout in error case of probe
    - usb: hso: fix error handling code of hso_create_net_device
    - usb: hso: remove the bailout parameter
    - crypto: ccp - fix resource leaks in ccp_run_aes_gcm_cmd()
    - HID: betop: fix slab-out-of-bounds Write in betop_probe
    - netfilter: ipset: Fix oversized kvmalloc() calls
    - HID: usbhid: free raw_report buffers in usbhid_stop
    - Linux 5.4.151

  * Focal update: v5.4.150 upstream stable release (LP: #1947886)
    - usb: gadget: r8a66597: fix a loop in set_feature()
    - usb: dwc2: gadget: Fix ISOC flow for BDMA and Slave
    - usb: dwc2: gadget: Fix ISOC transfer complete handling for DDMA
    - usb: musb: tusb6010: uninitialized data in tusb_fifo_write_unaligned()
    - cifs: fix incorrect check for null pointer in header_assemble
    - xen/x86: fix PV trap handling on secondary processors
    - usb-storage: Add quirk for ScanLogic SL11R-IDE older than 2.6c
    - USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter
    - USB: cdc-acm: fix minor-number release
    - binder: make sure fd closes complete
    - staging: greybus: uart: fix tty use after free
    - Re-enable UAS for LaCie Rugged USB3-FW with fk quirk
    - USB: serial: mos7840: remove duplicated 0xac24 device ID
    - USB: serial: option: add Telit LN920 compositions
    - USB: serial: option: remove duplicate USB device ID
    - USB: serial: option: add device id for Foxconn T99W265
    - mcb: fix error handling in mcb_alloc_bus()
    - erofs: fix up erofs_lookup tracepoint
    - btrfs: prevent __btrfs_dump_space_info() to underflow its free space
    - serial: mvebu-uart: fix driver's tx_empty callback
    - net: hso: fix muxed tty registration
    - afs: Fix incorrect triggering of sillyrename on 3rd-party invalidation
    - platform/x86/intel: punit_ipc: Drop wrong use of ACPI_PTR()
    - enetc: Fix illegal access when reading affinity_hint
    - bnxt_en: Fix TX timeout when TX ring size is set to the smallest
    - net/smc: add missing error check in smc_clc_prfx_set()
    - gpio: uniphier: Fix void functions to remove return value
    - qed: rdma - don't wait for resources under hw error recovery flow
    - net/mlx4_en: Don't allow aRFS for encapsulated packets
    - scsi: iscsi: Adjust iface sysfs attr detection
    - tty: synclink_gt, drop unneeded forward declarations
    - tty: synclink_gt: rename a conflicting function name
    - fpga: machxo2-spi: Return an error on failure
    - fpga: machxo2-spi: Fix missing error code in machxo2_write_complete()
    - thermal/core: Potential buffer overflow in 
thermal_build_list_of_policies()
    - cifs: fix a sign extension bug
    - scsi: qla2xxx: Restore initiator in dual mode
    - scsi: lpfc: Use correct scnprintf() limit
    - irqchip/goldfish-pic: Select GENERIC_IRQ_CHIP to fix build
    - irqchip/gic-v3-its: Fix potential VPE leak on error
    - md: fix a lock order reversal in md_alloc
    - blktrace: Fix uaf in blk_trace access after removing by sysfs
    - net: macb: fix use after free on rmmod
    - net: stmmac: allow CSR clock of 300MHz
    - m68k: Double cast io functions to unsigned long
    - ipv6: delay fib6_sernum increase in fib6_add
    - bpf: Add oversize check before call kvcalloc()
    - xen/balloon: use a kernel thread instead a workqueue
    - nvme-multipath: fix ANA state updates when a namespace is not present
    - sparc32: page align size in arch_dma_alloc
    - blk-cgroup: fix UAF by grabbing blkcg lock before destroying blkg pd
    - compiler.h: Introduce absolute_pointer macro
    - net: i825xx: Use absolute_pointer for memcpy from fixed memory location
    - sparc: avoid stringop-overread errors
    - qnx4: avoid stringop-overread errors
    - parisc: Use absolute_pointer() to define PAGE0
    - arm64: Mark __stack_chk_guard as __ro_after_init
    - alpha: Declare virt_to_phys and virt_to_bus parameter as pointer to 
volatile
    - net: 6pack: Fix tx timeout and slot time
    - spi: Fix tegra20 build with CONFIG_PM=n
    - EDAC/synopsys: Fix wrong value type assignment for edac_mode
    - thermal/drivers/int340x: Do not set a wrong tcc offset on resume
    - arm64: dts: marvell: armada-37xx: Extend PCIe MEM space
    - xen/balloon: fix balloon kthread freezing
    - qnx4: work around gcc false positive warning bug
    - Linux 5.4.150

  * ACL updates on OCFS2 are not revalidated (LP: #1947161) // Focal update:
    v5.4.150 upstream stable release (LP: #1947886)
    - ocfs2: drop acl cache for directories too

  * Focal update: v5.4.149 upstream stable release (LP: #1947885)
    - PCI: pci-bridge-emul: Fix big-endian support
    - PCI: aardvark: Indicate error in 'val' when config read fails
    - PCI: pci-bridge-emul: Add PCIe Root Capabilities Register
    - PCI: aardvark: Fix reporting CRS value
    - PCI/ACPI: Add Ampere Altra SOC MCFG quirk
    - KVM: remember position in kvm->vcpus array
    - console: consume APC, DM, DCS
    - s390/pci_mmio: fully validate the VMA before calling follow_pte()
    - ARM: Qualify enabling of swiotlb_init()
    - apparmor: remove duplicate macro list_entry_is_head()
    - ARM: 9077/1: PLT: Move struct plt_entries definition to header
    - ARM: 9078/1: Add warn suppress parameter to arm_gen_branch_link()
    - ARM: 9079/1: ftrace: Add MODULE_PLTS support
    - ARM: 9098/1: ftrace: MODULE_PLT: Fix build problem without DYNAMIC_FTRACE
    - sctp: validate chunk size in __rcv_asconf_lookup
    - sctp: add param size validation for SCTP_PARAM_SET_PRIMARY
    - staging: rtl8192u: Fix bitwise vs logical operator in
      TranslateRxSignalStuff819xUsb()
    - um: virtio_uml: fix memory leak on init failures
    - dmaengine: acpi: Avoid comparison GSI with Linux vIRQ
    - thermal/drivers/exynos: Fix an error code in exynos_tmu_probe()
    - 9p/trans_virtio: Remove sysfs file on probe failure
    - prctl: allow to setup brk for et_dyn executables
    - nilfs2: use refcount_dec_and_lock() to fix potential UAF
    - profiling: fix shift-out-of-bounds bugs
    - pwm: lpc32xx: Don't modify HW state in .probe() after the PWM chip was
      registered
    - phy: avoid unnecessary link-up delay in polling mode
    - net: stmmac: reset Tx desc base address before restarting Tx
    - Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
    - thermal/core: Fix thermal_cooling_device_register() prototype
    - drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()
    - parisc: Move pci_dev_is_behind_card_dino to where it is used
    - dmaengine: sprd: Add missing MODULE_DEVICE_TABLE
    - dmaengine: ioat: depends on !UML
    - dmaengine: xilinx_dma: Set DMA mask for coherent APIs
    - ceph: request Fw caps before updating the mtime in ceph_write_iter
    - ceph: lockdep annotations for try_nonblocking_invalidate
    - btrfs: fix lockdep warning while mounting sprout fs
    - nilfs2: fix memory leak in nilfs_sysfs_create_device_group
    - nilfs2: fix NULL pointer in nilfs_##name##_attr_release
    - nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
    - nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
    - nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
    - nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
    - pwm: img: Don't modify HW state in .remove() callback
    - pwm: rockchip: Don't modify HW state in .remove() callback
    - pwm: stm32-lp: Don't modify HW state in .remove() callback
    - blk-throttle: fix UAF by deleteing timer in blk_throtl_exit()
    - rtc: rx8010: select REGMAP_I2C
    - drm/nouveau/nvkm: Replace -ENOSYS with -ENODEV
    - Linux 5.4.149

 -- Kleber Sacilotto de Souza <kleber.so...@canonical.com>  Fri, 05 Nov
2021 17:02:56 +0100

** Changed in: linux (Ubuntu Focal)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1943464

Title:
  Reassign I/O Path of ConnectX-5 Port 1 before Port 2 causes NULL
  dereference

Status in Ubuntu on IBM z Systems:
  Fix Committed
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Focal:
  Fix Released
Status in linux source package in Hirsute:
  Fix Committed
Status in linux source package in Impish:
  Fix Committed
Status in linux source package in Jammy:
  Fix Released

Bug description:
  SRU Justification:

  [Impact]

  * After reassigning a PCHID of a ConnectX-5 based RoCE Adapter
    from one physical LPAR to another,
    running Ubuntu 20.04 with kernel 5.4 (latest),
    a lifetime issue occurs.

  * Subsequent testing on newer kernels now shows that a
    NULL pointer dereference in the zPCI code happens (causing a hard crash)
    that was previously hidden by leaking the struct pci_dev.
    
  * For a more detailed root cause analysis, see the below original bug 
description.

  [Fix]

  The following three commits fix this issue in focal:

  * upstream (since v5.12-rc4):
    0b13525c20febcfecccf6fc1db5969727401317d 0b13525c20fe "s390/pci: fix leak 
of PCI device structure"
    backport: 
https://launchpadlibrarian.net/566161494/0001-s390-pci-fix-leak-of-PCI-device-structure.patch

  * upstream (since v5.14-rc7):
    2a671f77ee49f3e78997b77fdee139467ff6a598 2a671f77ee49 "s390/pci: fix use 
after free of zpci_dev"
    backport: 
https://launchpadlibrarian.net/566161496/0002-s390-pci-fix-use-after-free-of-zpci_dev.patch

  * upstream (since v5.15-rc5):
    a46044a92add6a400f4dada7b943b30221f7cc80 a46044a92add "s390/pci: fix 
zpci_zdev_put() on reserve"
    backport: 
https://launchpadlibrarian.net/566161497/0003-s390-pci-fix-zpci_zdev_put-on-reserve.patch

  * Commit 0b13525c20fe fixes a lifetime issue of the struct pci_dev  that was 
not released on removal,
    commit 2a671f77ee49 fixes the 'NULL pointer dereference' (causing the hard 
crash) itself.
    and commit a46044a92add fixes the handling of multiple events for a single 
reserve state transition of the device.
    Without this, the NULL dereference can still be triggered as Reassign I/O 
Path causes a redudant second removal event.

  * Since none of the three upstream commits does apply cleanly to focal 
master-next by just cherry-picking them
    (mainly due to changes in the context), the above backports are needed.

  [Test Case]

  * Two z15 or LinuxONE III LPARs, one with a Connect-X5 based RoCE
  adapter attached.

  * LPARs need to run Ubuntu 20.04 with kernel 5.4 to hit the lifetime issue
    (that hides the also potential existing 'NULL pointer dereference') - 
    with Hirsute and kernel 5.11 the 'NULL pointer dereference' crash occurs.

  * Now change the PCHID (physical channel identifier)
    to a different one from the 2nd LPAR (at the HMC?).

  * Verify if the reassignment worked properly (by checking the PCHID) and
    monitor the kernel ring buffer dmesg (diagnostic messages) for 
    "Krnl PSW" crash (caused by NULL pointer)
    (for more error details, please see below original bug description).

  * Due hardware availability reasons (the ConnectX-5 cards are only used in 
special cases),
    the testing needs to be done by IBM.

  [Regression Potential / What can go wrong]

  * What can go wrong with: 2a671f77ee49 "s390/pci: fix use after free
  of zpci_dev"

  * The reference count to the struct zpci_dev got increased
    while it is used by the PCI core.
    This could cause a leak if not properly released.

  * Hot-plug of there Connect-X5 devices could be broken on s390x entirely,
    in case the new pointer handing is erroneous.

  * This may even have an impact on "cold plug", too.

  * Fortunately the modifications are quite minimal and thereby
  traceable,

  * and affect /arch/s390/pci/pci.c and arch/s390/pci/pci_bus.h only,
    hence are specific to the s390x platform only
    and there again to "plugging" of zPCI devices.

  * What can go wrong with: 0b13525c20fe "s390/pci: fix leak of PCI
  device structure"

  * The function zpci_remove_device got expanded with an additional set_error 
argument,
    and the internal flow got significantly changed. 
    In case handled in a wrong way, this may harm the entire remove/release 
logic.

  * The calls of zpci_remove_device need to be adjusted (as part of the new 
arg),
    failures here will most likely be identified at compile time.

  * The initialization of the pci_dev struct got improved,

  * and the flow in __zpci_event_availability carefully changed
    to reflect the device slot/bus remove characteristics.
    However, issues here may lead again to general zpci hotplug removal issues.

  * Fortunately all modifications are limited to s390x only (/arch/s390/*
    and /drivers/pci/hotplug/s390*) obviously affect zpci devices only
    (and no ccw devices).

  [Other]

  * jammy, the current release in development, has all three commits
  included.

  * impish and hirsute already incl. "s390/pci: fix leak of PCI device 
structure"
    and "s390/pci: fix use after free of zpci_dev";
    "s390/pci: fix zpci_zdev_put() on reserve" is tagged for upstream stable 
v5.14.x / v5.10.x
    (see 
https://lore.kernel.org/stable/31dcc776244843aa76deebd49f4ba3fbe4819990.ca...@linux.ibm.com/)
    and since we pick up v5.14.x / v5.10.x for the Ubuntu hirsute and impish 
kernels,
    it will arrive there via upstream stable.
  __________

  After reassign RoCE ConnectX-5 Card Pchid to another LPAR dmesg show
  under Ubuntu the following Error message

  Ubuntu 20.04.01 with updates

  oot@t35lp02:~# uname -a
  Linux t35lp02.lnxne.boe 5.4.0-80-generic #90-Ubuntu SMP Fri Jul 9 17:41:33 
UTC 2021 s390x s390x s390x GNU/Linux
  root@t35lp02:~#

  DMESG Output

    761.778422] mlx5_core 0018:00:00.1: poll_health:715:(pid 0): Fatal error 1 
detected
  [  761.778432] mlx5_core 0018:00:00.1: print_health_info:381:(pid 0): 
assert_var[0] 0xffffffff
  [  761.778435] mlx5_core 0018:00:00.1: print_health_info:381:(pid 0): 
assert_var[1] 0xffffffff
  [  761.778437] mlx5_core 0018:00:00.1: print_health_info:381:(pid 0): 
assert_var[2] 0xffffffff
  [  761.778439] mlx5_core 0018:00:00.1: print_health_info:381:(pid 0): 
assert_var[3] 0xffffffff
  [  761.778442] mlx5_core 0018:00:00.1: print_health_info:381:(pid 0): 
assert_var[4] 0xffffffff
  [  761.778444] mlx5_core 0018:00:00.1: print_health_info:384:(pid 0): 
assert_exit_ptr 0xffffffff
  [  761.778447] mlx5_core 0018:00:00.1: print_health_info:386:(pid 0): 
assert_callra 0xffffffff
  [  761.778451] mlx5_core 0018:00:00.1: print_health_info:389:(pid 0): fw_ver 
65535.65535.65535
  [  761.778454] mlx5_core 0018:00:00.1: print_health_info:390:(pid 0): hw_id 
0xffffffff
  [  761.778456] mlx5_core 0018:00:00.1: print_health_info:391:(pid 0): 
irisc_index 255
  [  761.778460] mlx5_core 0018:00:00.1: print_health_info:392:(pid 0): synd 
0xff: unrecognized error
  [  761.778462] mlx5_core 0018:00:00.1: print_health_info:394:(pid 0): 
ext_synd 0xffff
  [  761.778465] mlx5_core 0018:00:00.1: print_health_info:396:(pid 0): raw 
fw_ver 0xffffffff
  [  761.778467] mlx5_core 0018:00:00.1: mlx5_trigger_health_work:696:(pid 0): 
new health works are not permitted at this stage
  [  763.179016] mlx5_core 0018:00:00.1: E-Switch: cleanup
  [  768.348431] mlx5_core 0018:00:00.1: mlx5_reclaim_startup_pages:562:(pid 
123): FW did not return all pages. giving up...
  [  768.348433] ------------[ cut here ]------------
  [  768.348434] FW pages counter is 43318 after reclaiming all pages
  [  768.348562] WARNING: CPU: 0 PID: 123 at 
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:567 
mlx5_reclaim_startup_pages+0x12c/0x1c0 [mlx5_core]
  [  768.348563] Modules linked in: s390_trng chsc_sch eadm_sch vfio_ccw 
vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel drm 
drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress 
zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_service_time mlx5_ib 
ib_uverbs ib_core pkey qeth_l2 zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 
des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 mlx5_core sha256_s390 
sha1_s390 sha_common tls mlxfw ptp pps_core zfcp scsi_transport_fc 
dasd_eckd_mod dasd_mod qeth qdio ccwgroup scsi_dh_emc scsi_dh_rdac scsi_dh_alua 
dm_multipath
  [  768.348586] CPU: 0 PID: 123 Comm: kmcheck Tainted: G        W         
5.4.0-80-generic #90-Ubuntu
  [  768.348586] Hardware name: IBM 8561 T01 703 (LPAR)
  [  768.348587] Krnl PSW : 0704c00180000000 000003ff808d33ac 
(mlx5_reclaim_startup_pages+0x12c/0x1c0 [mlx5_core])
  [  768.348607]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
RI:0 EA:3
  [  768.348608] Krnl GPRS: 0000000000000004 0000000000000006 0000000000000034 
0000000000000007
  [  768.348608]            0000000000000007 00000000fcb4fa00 000000000000007b 
000003e00458fafc
  [  768.348609]            00000000b7d406f0 000000004d849c00 00000000b7d00120 
000000010000b6c0
  [  768.348610]            00000000f4cb1100 000003e00458fe70 000003ff808d33a8 
000003e00458fa50
  [  768.348615] Krnl Code: 000003ff808d339c: c02000041043        larl    
%r2,000003ff80955422
                            000003ff808d33a2: c0e5ffff70c7        brasl   
%r14,000003ff808c1530
                           #000003ff808d33a8: a7f40001            brc     
15,000003ff808d33aa
                           >000003ff808d33ac: e330a5f04012        lt      
%r3,263664(%r10)
                            000003ff808d33b2: a784ffd7            brc     
8,000003ff808d3360
                            000003ff808d33b6: b9140033            lgfr    
%r3,%r3
                            000003ff808d33ba: c0200004104e        larl    
%r2,000003ff80955456
                            000003ff808d33c0: c0e5ffff70b8        brasl   
%r14,000003ff808c1530
  [  768.348622] Call Trace:
  [  768.348641] ([<000003ff808d33a8>] mlx5_reclaim_startup_pages+0x128/0x1c0 
[mlx5_core])
  [  768.348661]  [<000003ff808c8e14>] mlx5_function_teardown+0x44/0xa0 
[mlx5_core]
  [  768.348680]  [<000003ff808c95b0>] mlx5_unload_one+0x80/0x160 [mlx5_core]
  [  768.348699]  [<000003ff808c9720>] remove_one+0x50/0xd0 [mlx5_core]
  [  768.348702]  [<000000004d2704c0>] pci_device_remove+0x40/0xa0
  [  768.348706]  [<000000004d2f724e>] device_release_driver_internal+0xee/0x1c0
  [  768.348707]  [<000000004d267054>] pci_stop_bus_device+0x94/0xc0
  [  768.348708]  [<000000004d267210>] 
pci_stop_and_remove_bus_device_locked+0x30/0x50
  [  768.348710]  [<000000004cd36cbe>] __zpci_event_availability+0x26e/0x340
  [  768.348713]  [<000000004d382794>] chsc_process_crw+0x2e4/0x300
  [  768.348714]  [<000000004d389fd6>] crw_collect_info+0x276/0x340
  [  768.348716]  [<000000004cd681e6>] kthread+0x126/0x160
  [  768.348719]  [<000000004d5a568c>] ret_from_fork+0x28/0x30
  [  768.348720]  [<000000004d5a5694>] kernel_thread_starter+0x0/0x10
  [  768.348720] Last Breaking-Event-Address:
  [  768.348739]  [<000003ff808d33a8>] mlx5_reclaim_startup_pages+0x128/0x1c0 
[mlx5_core]
  [  768.348740] ---[ end trace 1056779ff3084977 ]---
  [  768.354255] pci 0018:00:00.1: Removing from iommu group 2
  [  768.359097] pci_bus 0018:00: busn_res: [bus 00] is released
  [  768.359122] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=B, anc=0, 
erc=0, rsid=0
  root@t35lp02:~#

  == Comment: #2 - hel...@de.ibm.com> - 2021-07-27 08:43:37 ==
  Make an Update to Ubuntu 21.04 as mentioned with Niklas:
  root@t35lp02:~# lsb_release -a
  No LSB modules are available.
  Distributor ID: Ubuntu
  Description:    Ubuntu 21.04
  Release:        21.04
  Codename:       hirsute
  root@t35lp02:~# ls -la
  total 36
  drwx------  5 root root 4096 Jul 27 13:06 .
  drwxr-xr-x 20 root root 4096 Jul 27 12:49 ..
  -rw-------  1 root root  174 Jul 27 13:15 .bash_history
  -rw-r--r--  1 root root 3106 Dec  5  2019 .bashrc
  drwx------  2 root root 4096 Jul 27 12:58 .cache
  -rw-r--r--  1 root root  161 Dec  5  2019 .profile
  drwxr-xr-x  3 root root 4096 Jul 27 12:58 snap
  drwx------  2 root root 4096 Jul 27 12:58 .ssh
  -rw-------  1 root root  979 Jul 27 13:06 .viminfo
  root@t35lp02:~# uname -a
  Linux t35lp02.lnxne.boe 5.11.0-25-generic #27-Ubuntu SMP Fri Jul 9 18:40:37 
UTC 2021 s390x s390x s390x GNU/Linux
  root@t35lp02:~#

  dmesg show the following Call Trace after reasign ConnectX-5 Ports to
  another LPAR

  [  232.218778] mlx5_core 0008:00:00.1: mlx5_wait_for_pages:735:(pid 140): 
Skipping wait for vf pages stage
  [  234.108700] mlx5_core 0008:00:00.1: E-Switch: cleanup
  [  234.281483] pci 0008:00:00.1: Removing from iommu group 1
  [  234.281510] ------------[ cut here ]------------
  [  234.281511] WARNING: CPU: 6 PID: 140 at arch/s390/pci/pci.c:374 
pcibios_release_device+0xfe/0x110
  [  234.281522] Modules linked in: s390_trng chsc_sch eadm_sch vfio_ccw 
vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel drm i2c_core 
drm_panel_orientation_quirks ip_tables x_tables btrfs blake2b_generic 
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_service_time mlx5_ib 
ib_uverbs ib_core qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 mlx5_core prng 
aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 
sha1_s390 sha_common tls mlxfw ptp pps_core zfcp scsi_transport_fc 
dasd_eckd_mod dasd_mod qeth qdio ccwgroup scsi_dh_emc scsi_dh_rdac scsi_dh_alua 
dm_multipath
  [  234.281573] CPU: 6 PID: 140 Comm: kmcheck Not tainted 5.11.0-25-generic 
#27-Ubuntu
  [  234.281575] Hardware name: IBM 8561 T01 703 (LPAR)
  [  234.281576] Krnl PSW : 0704c00180000000 00000000d2af2e92 
(pcibios_release_device+0x102/0x110)
  [  234.281581]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
RI:0 EA:3
  [  234.281582] Krnl GPRS: 000000000000001f 00000000ffffffff 00000000839dcc00 
0000000000000000
  [  234.281584]            0000000000000000 0038008000000000 0000000000000000 
0000038000000000
  [  234.281585]            00000000d43efef0 0000000000000006 0000000000000c00 
00000000839f5298
  [  234.281586]            0000000083563300 0000000000000000 000003800475fad8 
000003800475fa88
  [  234.281596] Krnl Code: 00000000d2af2e84: c0e5003c400e        brasl   
%r14,00000000d327aea0
                            00000000d2af2e8a: a7f4ffc3            brc     
15,00000000d2af2e10
                           #00000000d2af2e8e: af000000            mc      0,0
                           >00000000d2af2e92: e5442006ffff        mvhhi   
6(%r2),-1
                            00000000d2af2e98: a7f4ffe2            brc     
15,00000000d2af2e5c
                            00000000d2af2e9c: 0707                bcr     0,%r7
                            00000000d2af2e9e: 0707                bcr     0,%r7
                            00000000d2af2ea0: c00400000000        brcl    
0,00000000d2af2ea0
  [  234.281605] Call Trace:
  [  234.281607]  [<00000000d2af2e92>] pcibios_release_device+0x102/0x110
  [  234.281612]  [<00000000d328fa42>] pci_release_dev+0x62/0xa0
  [  234.281620]  [<00000000d335f918>] device_release+0x48/0xb0
  [  234.281626]  [<00000000d32604d4>] kobject_put+0x174/0x2e0
  [  234.281632]  [<00000000d32cfeee>] pci_iov_release+0x4e/0x70
  [  234.281637]  [<00000000d328fa2e>] pci_release_dev+0x4e/0xa0
  [  234.281638]  [<00000000d335f918>] device_release+0x48/0xb0
  [  234.281640]  [<00000000d32604d4>] kobject_put+0x174/0x2e0
  [  234.281642]  [<00000000d2af870a>] __zpci_event_availability+0x20a/0x350
  [  234.281644]  [<00000000d36c5d60>] chsc_process_crw+0x2d0/0x2f0
  [  234.281650]  [<00000000d36cff36>] crw_collect_info+0x276/0x340
  [  234.281653]  [<00000000d2b3c4fa>] kthread+0x14a/0x170
  [  234.281657]  [<00000000d3728028>] ret_from_fork+0x24/0x2c
  [  234.281662] Last Breaking-Event-Address:
  [  234.281662]  [<00000000d2af2e2e>] pcibios_release_device+0x9e/0x110
  [  234.281664] ---[ end trace c37123f53d0bbb72 ]---
  [  236.519017] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=B, anc=0, 
erc=0, rsid=0
  root@t35lp02:~#

  == Comment: #3 - nik...@ibm.com> - 2021-08-04 05:31:00 ==
  The first dmesg on Ubuntu 20.04 looks like a Mellanox internal driver problem 
which if my memory serves me correctly has since been fixed in the Mellanox 
driver. As far as I can tell in the worst case this leaks a few pages.

  The output on Ubuntu 21.04 turns out to be a NULL pointer dereference
  in zPCI code however that was previously hidden by us leaking the
  struct pci_dev.

  I have analyzed the issue and can reproduce it on current development
  kernels, here is what I believe happens:

  The backtrace shows a warning in

  pcibios_release_device()
     zpci_unmap_resources()
        pci_iounmap_fh()

  which is

  WARN_ON(!zpci_iomap_start[idx].count);

  That however is a red herring, on this z15 machine with a ConnectX-5 we have
  MIO support so should never even enter pci_iounmap_fh().

  Adding a debug print in pcibios_release_device() it turns out that the
  struct zpci_dev * we get to to_zpci(pdev) is NULL.

  Digging a bit the problem is that during the detach PCI availability event
  we call zpci_zdev_put() as the zdev went away. We already performed the
  pci_stop_and_remove_bus_device_locked(pdev) before that point and assumed that
  after that the struct pci_dev refcount reaches 0 and will not be accessed 
anymore.

  This is usually true, however here the problem is that we first removed
  the PF for Port 1 while keeping the PF for Port 2.

  Now the "struct pci_sriov" in pdev->sriov where pdev is the PF of the Port 2 
has a field sturct pci_sriov::dev with the comment "/* Lowest numbered PF */". 
This field holds a reference to the struct pci_dev of the PF for Port 1 thus 
preventing
  the refcount of that reaching 0 until the PF for Port 2 is released.

  When the PF for Port 2 is released the refcount for the PF of Port 1 reaches 0
  and only then do we get the call pci_release_dev() -> pcibios_release_device()
  but at this point the struct zpci_dev was already released and 
zbus->functions[devfn]
  pointer NULLed when it was unregistered from the zbus.

  Here is /sys/kernel/debug/s390dbf/pci_msg/sprintf output with the added debug 
print for my reproduction:
  root@t35lp47 ~ # cat /sys/kernel/debug/s390dbf/pci_msg/sprintf
  00 01627041292:161310 3 - 0007 0000000b3ffa6560  wb bit: 1
  ...
  00 01627041292:165772 3 - 0007 0000000b3ffa2952  add fid:280, fh:2f80, c:1
  00 01627041292:166083 3 - 0007 0000000b3ffa2952  add fid:2c0, fh:3002, c:1
  ...
  00 01627041292:181187 3 - 0007 0000000b3ffa66aa  ena fid:280, fh:a3002f80, 
rc:0
  00 01627041292:181194 3 - 0007 0000000b3ffa6728  ena mio fid:280, 
fh:a3002f80, rc:0 <-- MIO enabled for PF of Port 1
  00 01627041292:182176 3 - 0007 0000000b3ffa66aa  ena fid:2c0, fh:a7003002, 
rc:0
  00 01627041292:182181 3 - 0007 0000000b3ffa6728  ena mio fid:2c0, 
fh:a7003002, rc:0 <-- MIO enabled for PF of Port 2
  ....
  00 01627041423:815382 3 - 0014 0000000b3ffa2c74  rem fid:280 <-- sturct 
zpci_dev for Port 1 released but no zpci_unmap_resources() called for it!
  00 01627041566:352720 3 - 0012 0000000b3ffa2362  zunmap: 
zdev:0000000000000000 FID:0, mio:0 <-- zdev is NULL in pcibios_release_device() 
for Port 1 and we read the FH/MIO from somewhere in lowcore BAD!
  00 01627041566:353274 3 - 0012 0000000b3ffa2362  zunmap: 
zdev:000000008803c000 FID:2c0, mio:1
  00 01627041567:548896 3 - 0012 0000000b3ffa2c74  rem fid:2c0

  Thus we have a definite bug in the coordination between the lifetimes of
  struct zpci_dev and struct pci_dev where the former can outlive the latter.

  I think the problem is that for the struct zpci_dev we only keep exactly one
  reference owned by the zPCI core that gets released once the underlying zPCI
  device goes away from the view of the zPCI core.

  At the same time the struct pci_dev has its own reference counting and via
  pdev holds its own reference to the struct zpci_dev (indirect via
  pdev->sysdata which is a strict zpci_bus which holds a struct zpci_dev* for
  all functions on the bus via zbus->functions[devfn]).

  This reference is unaccounted for and can outlive the zPCI core's refrence
  as seen in the above scenario.

  == Comment: #4 - nik...@ibm.com> - 2021-08-24 09:00:03 ==
  A fix for this has now landed upstream with the following commit:

  2a671f77ee49 ("s390/pci: fix use after free of zpci_dev")

  Note that this references an earlier commit that previously hid the issue
  and has not yet been merged to Ubuntu 20.04's kernel but is included in 21.04
  only with both fixes does the freeing of the struct pci_dev for correctly
  in the tested case.

  0b13525c20fe ("s390/pci: fix leak of PCI device structure")

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1943464/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to