> How can we/I compile the nvidia-peermem driver with Mellanox ib_peer_mem symbols?
Probably not a problem of the nvidia-peermem module but of the kernel (or a third-party module) that needs to provide these symbols. First, I upgraded to Debian 12.6. So I am running nvidia-driver 535.183.01. I then installed doca-ofed. https://developer.nvidia.com/doca-downloads?deployment_platform=Host-Server&deployment_package=DOCA-Host&target_os=Linux&Architecture=x86_64&Profile=doca-ofed&Distribution=Debian&version=12.1&installer_type=deb_online (I'll spare you the details. It did not install out of the box. If you wish, I can tell you exactly what I did to install it.) doca-ofed (aka MLNX_OFED) provides the necessary symbols. Then I did this: dkms --force build -m nvidia-current -v 535.183.01 dkms --force install -m nvidia-current -v 535.183.01 I did this to attempt to get nvidia-peermem to work. Indeed, now I get a different error message: root@vuku:~# modprobe nvidia-peermem modprobe: ERROR: could not insert 'nvidia_current_peermem': Unknown symbol in module, or unknown parameter (see dmesg) modprobe: ERROR: ../libkmod/libkmod-module.c:1047 command_do() Error running install command 'modprobe nvidia ; modprobe -i nvidia-current-peermem ' for module nvidia_peermem: retcode 1 modprobe: ERROR: could not insert 'nvidia_peermem': Invalid argument root@vuku:~# dmesg|tail [161810.093263] Compat-mlnx-ofed backport release: 91fb8cd [161810.093272] Backport based on https://:@git-nbu.nvidia.com/r/a/mlnx_ofed/mlnx-ofa_kernel-4.0.git 91fb8cd [161810.093274] compat.git: https://:@git-nbu.nvidia.com/r/a/mlnx_ofed/mlnx-ofa_kernel-4.0.git [161810.342539] nvidia_peermem: Unknown symbol ib_register_peer_memory_client (err -2) [161810.342554] nvidia_peermem: Unknown symbol ib_unregister_peer_memory_client (err -2) root@vuku:~# I think I am getting close to getting nvidia-peermem to work. /usr/src/nvidia-current-535.183.01/nvidia-peermem/nvidia-peermem.Kbuild has OFA_DIR := /usr/src/ofa_kernel OFA_CANDIDATES = $(OFA_DIR)/$(OFA_ARCH)/$(KERNELRELEASE) $(OFA_DIR)/$(KERNELRELEASE) $(OFA_DIR)/default /var/lib/dkms/mlnx-ofed-kernel And three of the four candidate directories exist: qobi@vuku>ls /usr/src/ofa_kernel/x86_64/6.1.0-22-amd64/ compat/ compat_base_tree_version configure@ Module.symvers compat_base compat.config configure.mk.kernel ofed_scripts/ compat_base_tree compat_version include/ qobi@vuku>ls /usr/src/ofa_kernel/6.1.0-22-amd64/ ls: cannot access '/usr/src/ofa_kernel/6.1.0-22-amd64/': No such file or directory qobi@vuku>ls /usr/src/ofa_kernel/default /usr/src/ofa_kernel/default@ qobi@vuku>ls /usr/src/ofa_kernel/default/ compat/ compat_base_tree_version configure@ Module.symvers compat_base compat.config configure.mk.kernel ofed_scripts/ compat_base_tree compat_version include/ qobi@vuku>ls /var/lib/dkms/mlnx-ofed-kernel 24.04.OFED.24.04.0.7.0.1/ kernel-6.1.0-22-amd64-x86_64@ qobi@vuku> There appear to be two different Module.symvers, but they apear to be identical: qobi@vuku>ls -l /usr/src/ofa_kernel/x86_64/6.1.0-22-amd64/Module.symvers -rw-r--r-- 1 root root 92655 Jul 3 15:49 /usr/src/ofa_kernel/x86_64/6.1.0-22-amd64/Module.symvers qobi@vuku>ls -l /usr/src/ofa_kernel/default/Module.symvers -rw-r--r-- 1 root root 92655 Jul 3 15:49 /usr/src/ofa_kernel/default/Module.symvers qobi@vuku>diff /usr/src/ofa_kernel/x86_64/6.1.0-22-amd64/Module.symvers /usr/src/ofa_kernel/default/Module.symvers qobi@vuku> And they appear to have the requiste symbols: qobi@vuku>fgrep ib_register_peer_memory_client /usr/src/ofa_kernel/x86_64/6.1.0-22-amd64/Module.symvers 0xaba78e45 ib_register_peer_memory_client /var/lib/dkms/mlnx-ofed-kernel/24.04.OFED.24.04.0.7.0.1/build/drivers/infiniband/core/ib_core EXPORT_SYMBOL qobi@vuku>fgrep ib_unregister_peer_memory_client /usr/src/ofa_kernel/x86_64/6.1.0-22-amd64/Module.symvers 0xbde5c050 ib_unregister_peer_memory_client /var/lib/dkms/mlnx-ofed-kernel/24.04.OFED.24.04.0.7.0.1/build/drivers/infiniband/core/ib_core EXPORT_SYMBOL qobi@vuku> And the module appears to contain those symbols: qobi@vuku>fgrep ib_register_peer_memory_client /usr/lib/modules/6.1.0-22-amd64/updates/dkms/nvidia-current-peermem.ko grep: /usr/lib/modules/6.1.0-22-amd64/updates/dkms/nvidia-current-peermem.ko: binary file matches qobi@vuku>strings /usr/lib/modules/6.1.0-22-amd64/updates/dkms/nvidia-current-peermem.ko|fgrep ib_register_peer_memory_client ib_register_peer_memory_client ib_register_peer_memory_client qobi@vuku> So I don't know why the module doesn't load. Any ideas? Thanks, Jeff (http: //engineering.purdue.edu/~qobi)