This bug is awaiting verification that the linux/5.15.0-156.166 kernel
in -proposed solves the problem. Please test the kernel and update this
bug with the results. If the problem is solved, change the tag
'verification-needed-jammy-linux' to 'verification-done-jammy-linux'. If
the problem still exists, change the tag 'verification-needed-jammy-
linux' to 'verification-failed-jammy-linux'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-jammy-linux-v2 verification-needed-jammy-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia in Ubuntu.
https://bugs.launchpad.net/bugs/2107816

Title:
  warning at iommu_dma_unmap_page when running ibv_rc_pingpong

Status in linux package in Ubuntu:
  Invalid
Status in linux-nvidia package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Fix Committed
Status in linux source package in Noble:
  Fix Committed
Status in linux-nvidia source package in Noble:
  Fix Released
Status in linux source package in Oracular:
  Fix Released

Bug description:
  SRU Justification:

  [Impact]

  On systems with ConnectX devices using the mlx5_ib driver, the dereg_mr
  InfiniBand operation will produce a kernel WARNING message when
  deregistering device memory. The WARNING occurs only when an IOMMU is
  in-use and not in passthrough/identity mode, but in this case the
  mlx5_ib driver is still behaving incorrectly.

  [ 343.588824] ------------[ cut here ]------------
  [ 343.588829] WARNING: CPU: 68 PID: 4076 at drivers/iommu/dma-iommu.c:1198 
iommu_dma_unmap_page+0x12c/0x190
  ...
  [ 343.589101] Call trace:
  [ 343.589102] iommu_dma_unmap_page+0x12c/0x190
  [ 343.589104] dma_unmap_page_attrs+0x1f8/0x290
  [ 343.589107] mlx5_free_priv_descs+0x94/0xe0 [mlx5_ib]
  [ 343.589121] mlx5_ib_dereg_mr+0x330/0x4f8 [mlx5_ib]
  [ 343.589131] ib_dereg_mr_user+0x54/0x178 [ib_core]
  [ 343.589148] uverbs_free_mr+0x24/0x50 [ib_uverbs]
  [ 343.589155] destroy_hw_idr_uobject+0x38/0x98 [ib_uverbs]
  [ 343.589160] uverbs_destroy_uobject+0x4c/0x230 [ib_uverbs]
  [ 343.589165] uobj_destroy+0x60/0xe8 [ib_uverbs]
  [ 343.589170] ib_uverbs_run_method+0x194/0x310 [ib_uverbs]
  [ 343.589175] ib_uverbs_cmd_verbs+0x1ac/0x288 [ib_uverbs]
  [ 343.589180] ib_uverbs_ioctl+0xb0/0x150 [ib_uverbs]
  [ 343.589185] __arm64_sys_ioctl+0xd0/0x150
  [ 343.589189] invoke_syscall.constprop.0+0x84/0x100
  [ 343.589191] do_el0_svc+0x4c/0x100
  [ 343.589192] el0_svc+0x48/0x1c8
  [ 343.589195] el0t_64_sync_handler+0x148/0x158
  [ 343.589197] el0t_64_sync+0x1b0/0x1b8
  [ 343.589199] ---[ end trace 0000000000000000 ]---

  Oracular obtained the fix via stable updates, and 6.14 kernels and newer
  already have this fix.

  Jammy and Noble are still affected.

  [Fix]

  This is resolved by backporting abc7b3f1f056 ("RDMA/mlx5: Fix a WARN
  during dereg_mr for DM type") from upstream. The patch submitted with
  this cover letter was originally submitted to noble:linux-nvidia, but
  benefits jammy:linux and noble:linux as well.

  [Test Plan]

  For systems with ConnectX devices configured for InfiniBand, this can be
  reproduced with:

  $ ibv_rc_pingpong -g 0 -j &
  $ ibv_rc_pingpong -g 0 -j 127.0.0.1
  Finally, check dmesg for a WARNING message.

  [Where issues could arise]

  These changes affect the mlx5_ib driver. Regressions would likely appear
  as misbehavior of this driver, particularly where it handles releasing
  RDMA/IB memory regions.

  ----------- above SRU justification added by ~jacobmartin -----------

  If running ibv_rc_pingpong like this:
  ibv_rc_pingpong -g 0 -j
  with kernel 6.8.0-1025-nvidia-64k

  will see this warning at dmesg:

  [  343.588824] ------------[ cut here ]------------
  [  343.588829] WARNING: CPU: 68 PID: 4076 at drivers/iommu/dma-iommu.c:1198 
iommu_dma_unmap_page+0x12c/0x190
  [  343.588837] Modules linked in: rpcrdma rdma_ucm ib_iser libiscsi 
scsi_transport_iscsi rdma_cm ib_ipoib iw_cm ib_cm xt_conntrack xt_MASQUERADE 
bridge stp llc xt_set ip_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 xt_addrtype nft_compat nf_tables xfrm_user xfrm_algo qrtr 
overlay cfg80211 sunrpc binfmt_misc nls_iso8859_1 dax_hmem nvidia_cspmu 
cxl_acpi ast ses cxl_core i2c_algo_bit arm_smmuv3_pmu ipmi_ssif enclosure 
arm_cspmu_module coresight_trbe arm_spe_pmu acpi_power_meter cppc_cpufreq 
acpi_ipmi ipmi_devintf spi_nor nvidia_uvm(OE) coresight_tmc coresight_funnel 
coresight_stm ipmi_msghandler stm_p_basic coresight stm_core nvidia_drm(OE) 
uio_pdrv_genirq uio nvidia_modeset(OE) video ib_umad nvidia_fs(O) nvidia(OE) 
ecc dm_multipath nvme_fabrics nvme_keyring efi_pstore nfnetlink dmi_sysfs 
ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon 
raid6_pq libcrc32c raid1 raid0 mlx5_ib i
 b_uverbs macsec ib_core crct10dif_ce
  [  343.588907]  polyval_ce mlx5_core polyval_generic ghash_ce sm4_ce_gcm 
sm4_ce_ccm sm4_ce sm4_ce_cipher sm4 sm3_ce sm3 mlxfw i2c_smbus mpt3sas nvme 
sha3_ce psample sha2_ce nvme_core raid_class tls xhci_pci sha256_arm64 sha1_ce 
xhci_pci_renesas scsi_transport_sas nvme_auth pci_hyperv_intf i2c_tegra 
aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher
  [  343.588929] CPU: 68 PID: 4076 Comm: ibv_rc_pingpong Tainted: G        W  
OE      6.8.0-1025-nvidia-64k #28-Ubuntu
  [  343.588931] Hardware name: Quanta Cloud Technology Inc. QuantaGrid S74G-2U 
1S7GZ9Z0002/S7G MB (CG1), BIOS 3A21 07/10/2024
  [  343.588932] pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
  [  343.588933] pc : iommu_dma_unmap_page+0x12c/0x190
  [  343.588935] lr : iommu_dma_unmap_page+0x44/0x190
  [  343.588936] sp : ffff8000bd04f840
  [  343.588936] x29: ffff8000bd04f840 x28: ffff8000bd04fba0 x27: 
0000000000000001
  [  343.588939] x26: 0000000000000000 x25: 0000000000000010 x24: 
0000000000000001
  [  343.588941] x23: 0000000000000000 x22: 0000000000000000 x21: 
ffff000116904000
  [  343.588942] x20: 0000000000000000 x19: ffff00008e3530c8 x18: 
ffff8000bd980088
  [  343.588944] x17: 0000000000000000 x16: 0000000000000000 x15: 
0000ffffef1a0390
  [  343.588946] x14: 0000000000000000 x13: 0000000000000000 x12: 
0000000000000000
  [  343.588949] x11: 0000000000000000 x10: 0000000000000000 x9 : 
ffff800080df8d90
  [  343.588952] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 
0000000000000000
  [  343.588954] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 
0000000000000000
  [  343.588959] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 
0000000000000000
  [  343.589101] Call trace:
  [  343.589102]  iommu_dma_unmap_page+0x12c/0x190
  [  343.589104]  dma_unmap_page_attrs+0x1f8/0x290
  [  343.589107]  mlx5_free_priv_descs+0x94/0xe0 [mlx5_ib]
  [  343.589121]  mlx5_ib_dereg_mr+0x330/0x4f8 [mlx5_ib]
  [  343.589131]  ib_dereg_mr_user+0x54/0x178 [ib_core]
  [  343.589148]  uverbs_free_mr+0x24/0x50 [ib_uverbs]
  [  343.589155]  destroy_hw_idr_uobject+0x38/0x98 [ib_uverbs]
  [  343.589160]  uverbs_destroy_uobject+0x4c/0x230 [ib_uverbs]
  [  343.589165]  uobj_destroy+0x60/0xe8 [ib_uverbs]
  [  343.589170]  ib_uverbs_run_method+0x194/0x310 [ib_uverbs]
  [  343.589175]  ib_uverbs_cmd_verbs+0x1ac/0x288 [ib_uverbs]
  [  343.589180]  ib_uverbs_ioctl+0xb0/0x150 [ib_uverbs]
  [  343.589185]  __arm64_sys_ioctl+0xd0/0x150
  [  343.589189]  invoke_syscall.constprop.0+0x84/0x100
  [  343.589191]  do_el0_svc+0x4c/0x100
  [  343.589192]  el0_svc+0x48/0x1c8
  [  343.589195]  el0t_64_sync_handler+0x148/0x158
  [  343.589197]  el0t_64_sync+0x1b0/0x1b8
  [  343.589199] ---[ end trace 0000000000000000 ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2107816/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to