Package: wnpp Severity: wishlist * Package name : nvidia-mofed Version : 24.04-0.6.6. Upstream Contact: I don't know * URL : https://content.mellanox.com/ofed/MLNX_OFED-24.04-0.6.6.0/MLNX_OFED_SRC-debian-24.04-0.6.6.0.tgz https://content.mellanox.com/ofed/MLNX_OFED-24.04-0.6.6.0/MLNX_OFED_LINUX-24.04-0.6.6.0-debian12.1-x86_64.tgz * License : I don't know Programming Lang: I don't know Description : MVidia MLNX OFED software for Infiniband
Debian 12.5 contains packages for Infiniband based on OFED. However, thse packages don't fully support the most recent ConnectX-7 hardware. opensm does not support NDR. Infiniband needs a subnet manager. If you use an unmanaged switch, you need to run opensm. And the nvidia-peermem kernel module does not load. You need this kernel module to do RDMA direct to GPUs. Without this, NVidia NCCL bandwidth is about one tenth the speed possible with this. The nvidia-peermem kernel module is provided by nvidia-kernel-dkms in Debian 12.5. But for it to load, it must be compiled with Mellanox ib_peer_mem symbols. Apparently, this is all handled by MOFED. NVidia provides MOFED aka MLNX OFED packages for Debian 12.1 https://content.mellanox.com/ofed/MLNX_OFED-24.04-0.6.6.0/MLNX_OFED_SRC-debian-24.04-0.6.6.0.tgz https://content.mellanox.com/ofed/MLNX_OFED-24.04-0.6.6.0/MLNX_OFED_LINUX-24.04-0.6.6.0-debian12.1-x86_64.tgz But if you install them, the uninstall many standard Debian 12.5 packages, among them libboost-all-dev libopenmpi-dev libopencv-dev python3-opencv python3-torch ros-desktop-full-dev and many of their dependencies. People (like myself) who use GPUs and Infiniband are likely to also need Open MPI, OpenCV, PyTorch, and ROS. It would be great if Debian could officially package MOFED in non-free in a way that was compatible with the Debian ecosystem. So one could just use apt to do package management In the longer term (fall 2024), NVidia plans to replace MOFED with DOCA. It would be great if Debian could package DOCA as well. This should not be difficult since MOFED works seemlessly with Ubuntu 24.04 LTS. To get a subnet manager, I needed to set up an whole other machine running Ubuntu 24.04. It would be much more convenient if I could just continue to be a pure Debian user. I am not a Debian dev. I am not familiar with Debian dev/maintenance practices.