Fix patch for Jammy and Noble submitted to kernel team mailing list:
https://lists.ubuntu.com/archives/kernel-team/2025-July/161181.html.
** Description changed:
+ SRU Justification:
+
+ [Impact]
+
+ On systems with ConnectX devices using the mlx5_ib driver, the dereg_mr
+ InfiniBand operation will produce a kernel WARNING message when
+ deregistering device memory. The WARNING occurs only when an IOMMU is
+ in-use and not in passthrough/identity mode, but in this case the
+ mlx5_ib driver is still behaving incorrectly.
+
+ [ 343.588824] ------------[ cut here ]------------
+ [ 343.588829] WARNING: CPU: 68 PID: 4076 at drivers/iommu/dma-iommu.c:1198
iommu_dma_unmap_page+0x12c/0x190
+ ...
+ [ 343.589101] Call trace:
+ [ 343.589102] iommu_dma_unmap_page+0x12c/0x190
+ [ 343.589104] dma_unmap_page_attrs+0x1f8/0x290
+ [ 343.589107] mlx5_free_priv_descs+0x94/0xe0 [mlx5_ib]
+ [ 343.589121] mlx5_ib_dereg_mr+0x330/0x4f8 [mlx5_ib]
+ [ 343.589131] ib_dereg_mr_user+0x54/0x178 [ib_core]
+ [ 343.589148] uverbs_free_mr+0x24/0x50 [ib_uverbs]
+ [ 343.589155] destroy_hw_idr_uobject+0x38/0x98 [ib_uverbs]
+ [ 343.589160] uverbs_destroy_uobject+0x4c/0x230 [ib_uverbs]
+ [ 343.589165] uobj_destroy+0x60/0xe8 [ib_uverbs]
+ [ 343.589170] ib_uverbs_run_method+0x194/0x310 [ib_uverbs]
+ [ 343.589175] ib_uverbs_cmd_verbs+0x1ac/0x288 [ib_uverbs]
+ [ 343.589180] ib_uverbs_ioctl+0xb0/0x150 [ib_uverbs]
+ [ 343.589185] __arm64_sys_ioctl+0xd0/0x150
+ [ 343.589189] invoke_syscall.constprop.0+0x84/0x100
+ [ 343.589191] do_el0_svc+0x4c/0x100
+ [ 343.589192] el0_svc+0x48/0x1c8
+ [ 343.589195] el0t_64_sync_handler+0x148/0x158
+ [ 343.589197] el0t_64_sync+0x1b0/0x1b8
+ [ 343.589199] ---[ end trace 0000000000000000 ]---
+
+ Oracular obtained the fix via stable updates, and 6.14 kernels and newer
+ already have this fix.
+
+ Jammy and Noble are still affected.
+
+ [Fix]
+
+ This is resolved by backporting abc7b3f1f056 ("RDMA/mlx5: Fix a WARN
+ during dereg_mr for DM type") from upstream. The patch submitted with
+ this cover letter was originally submitted to noble:linux-nvidia, but
+ benefits jammy:linux and noble:linux as well.
+
+ [Test Plan]
+
+ For systems with ConnectX devices configured for InfiniBand, this can be
+ reproduced with:
+
+ $ ibv_rc_pingpong -g 0 -j &
+ $ ibv_rc_pingpong -g 0 -j 127.0.0.1
+ Finally, check dmesg for a WARNING message.
+
+ [Where issues could arise]
+
+ These changes affect the mlx5_ib driver. Regressions would likely appear
+ as misbehavior of this driver, particularly where it handles releasing
+ RDMA/IB memory regions.
+
+ ----------- above SRU justification added by ~jacobmartin -----------
+
If running ibv_rc_pingpong like this:
ibv_rc_pingpong -g 0 -j
with kernel 6.8.0-1025-nvidia-64k
will see this warning at dmesg:
[ 343.588824] ------------[ cut here ]------------
[ 343.588829] WARNING: CPU: 68 PID: 4076 at drivers/iommu/dma-iommu.c:1198
iommu_dma_unmap_page+0x12c/0x190
[ 343.588837] Modules linked in: rpcrdma rdma_ucm ib_iser libiscsi
scsi_transport_iscsi rdma_cm ib_ipoib iw_cm ib_cm xt_conntrack xt_MASQUERADE
bridge stp llc xt_set ip_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 xt_addrtype nft_compat nf_tables xfrm_user xfrm_algo qrtr
overlay cfg80211 sunrpc binfmt_misc nls_iso8859_1 dax_hmem nvidia_cspmu
cxl_acpi ast ses cxl_core i2c_algo_bit arm_smmuv3_pmu ipmi_ssif enclosure
arm_cspmu_module coresight_trbe arm_spe_pmu acpi_power_meter cppc_cpufreq
acpi_ipmi ipmi_devintf spi_nor nvidia_uvm(OE) coresight_tmc coresight_funnel
coresight_stm ipmi_msghandler stm_p_basic coresight stm_core nvidia_drm(OE)
uio_pdrv_genirq uio nvidia_modeset(OE) video ib_umad nvidia_fs(O) nvidia(OE)
ecc dm_multipath nvme_fabrics nvme_keyring efi_pstore nfnetlink dmi_sysfs
ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon
raid6_pq libcrc32c raid1 raid0 mlx5_ib ib_uverbs macsec ib_core crct10dif_ce
[ 343.588907] polyval_ce mlx5_core polyval_generic ghash_ce sm4_ce_gcm
sm4_ce_ccm sm4_ce sm4_ce_cipher sm4 sm3_ce sm3 mlxfw i2c_smbus mpt3sas nvme
sha3_ce psample sha2_ce nvme_core raid_class tls xhci_pci sha256_arm64 sha1_ce
xhci_pci_renesas scsi_transport_sas nvme_auth pci_hyperv_intf i2c_tegra
aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher
[ 343.588929] CPU: 68 PID: 4076 Comm: ibv_rc_pingpong Tainted: G W
OE 6.8.0-1025-nvidia-64k #28-Ubuntu
[ 343.588931] Hardware name: Quanta Cloud Technology Inc. QuantaGrid S74G-2U
1S7GZ9Z0002/S7G MB (CG1), BIOS 3A21 07/10/2024
[ 343.588932] pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[ 343.588933] pc : iommu_dma_unmap_page+0x12c/0x190
[ 343.588935] lr : iommu_dma_unmap_page+0x44/0x190
[ 343.588936] sp : ffff8000bd04f840
[ 343.588936] x29: ffff8000bd04f840 x28: ffff8000bd04fba0 x27:
0000000000000001
[ 343.588939] x26: 0000000000000000 x25: 0000000000000010 x24:
0000000000000001
[ 343.588941] x23: 0000000000000000 x22: 0000000000000000 x21:
ffff000116904000
[ 343.588942] x20: 0000000000000000 x19: ffff00008e3530c8 x18:
ffff8000bd980088
[ 343.588944] x17: 0000000000000000 x16: 0000000000000000 x15:
0000ffffef1a0390
[ 343.588946] x14: 0000000000000000 x13: 0000000000000000 x12:
0000000000000000
[ 343.588949] x11: 0000000000000000 x10: 0000000000000000 x9 :
ffff800080df8d90
[ 343.588952] x8 : 0000000000000000 x7 : 0000000000000000 x6 :
0000000000000000
[ 343.588954] x5 : 0000000000000000 x4 : 0000000000000000 x3 :
0000000000000000
[ 343.588959] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
0000000000000000
[ 343.589101] Call trace:
[ 343.589102] iommu_dma_unmap_page+0x12c/0x190
[ 343.589104] dma_unmap_page_attrs+0x1f8/0x290
[ 343.589107] mlx5_free_priv_descs+0x94/0xe0 [mlx5_ib]
[ 343.589121] mlx5_ib_dereg_mr+0x330/0x4f8 [mlx5_ib]
[ 343.589131] ib_dereg_mr_user+0x54/0x178 [ib_core]
[ 343.589148] uverbs_free_mr+0x24/0x50 [ib_uverbs]
[ 343.589155] destroy_hw_idr_uobject+0x38/0x98 [ib_uverbs]
[ 343.589160] uverbs_destroy_uobject+0x4c/0x230 [ib_uverbs]
[ 343.589165] uobj_destroy+0x60/0xe8 [ib_uverbs]
[ 343.589170] ib_uverbs_run_method+0x194/0x310 [ib_uverbs]
[ 343.589175] ib_uverbs_cmd_verbs+0x1ac/0x288 [ib_uverbs]
[ 343.589180] ib_uverbs_ioctl+0xb0/0x150 [ib_uverbs]
[ 343.589185] __arm64_sys_ioctl+0xd0/0x150
[ 343.589189] invoke_syscall.constprop.0+0x84/0x100
[ 343.589191] do_el0_svc+0x4c/0x100
[ 343.589192] el0_svc+0x48/0x1c8
[ 343.589195] el0t_64_sync_handler+0x148/0x158
[ 343.589197] el0t_64_sync+0x1b0/0x1b8
[ 343.589199] ---[ end trace 0000000000000000 ]---
** Also affects: linux (Ubuntu Jammy)
Importance: Undecided
Status: New
** Also affects: linux-nvidia (Ubuntu Jammy)
Importance: Undecided
Status: New
** No longer affects: linux-nvidia (Ubuntu Jammy)
** Changed in: linux (Ubuntu Jammy)
Status: New => In Progress
** Changed in: linux (Ubuntu Jammy)
Importance: Undecided => Low
** Changed in: linux (Ubuntu Jammy)
Assignee: (unassigned) => Jacob Martin (jacobmartin)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2107816
Title:
warning at iommu_dma_unmap_page when running ibv_rc_pingpong
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2107816/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs