Public bug reported: Support for ConnectX hardware through doca-ofed-26.01 DKMS in Ubuntu Resolute
[ SRU Justification ] This is an effort to repackage the DKMS portion of the doca-ofed upstream package distributed by NVIDIA through a public HTTP source (e.g. https://linux.mellanox.com/public/repo/doca/). NVIDIA distributes this as a set of deb packages in a tarball at the following address: https://developer.nvidia.com/doca-downloads; this package is for the 3.3.0 version of the upstream NVIDIA package. Current Ubuntu packaging lacks support for NVIDIA Bluefield/IGX/DGX platform: this is needed as customers using NVIDIA hardware can take advantage of custom, high performance drivers directly through Canonical apt system, without the need of downloading an external tarball to be manually installed on the target machines. This helps as well any automated installation flow as, by just specifying the need for this package as an apt dependency, avoids the automation issue of finding the HTTP address of the latest version of the doca-ofed package distributed by NVIDIA and the relative scripting to manually install them. This package adds a set of kernel modules related to the NVIDIA mlnx5 hardware family (e.g. IB-capable ConnectX devices). This package is related to the https://launchpad.net/ubuntu/+source/mofed-modules-24.10/ package already accepted last year. This is the newer LTS version distributed by NVIDIA. There are no modification in repackaging, apart from a more coherent name with NVIDIA naming scheme, has been applied to the repackaging. Another difference from the 24.10 version, is a smaller list of .ko files that will be built on the target machine, following what invoking the doca-ofed target on the original package installs. [ Current Test Plan ] * Hardware tested on: * Noble GA kernel kernel 7.0.0.7 - Equivalent to Kernel 7.0rc4 - Tested on DGX B200 (amd64) and IGX GH200 (arm64) * Before uploading to Ubuntu an update version of the repackaged kernel modules, a local test to the affected kernels must be executed. At the moment, the main target kernel is the noble:linux-nvidia-tegra one, so the tests will be thoroughly executed on the specific hardware running said kernel. If the modules load correctly, the package is considered sane and can be uploaded to the wider public. The compilation test is executed as well on the latest version of Resolute. * Another test suite run on each new spin of the module, is a smoke test running on DGX B200 (amd64) and GH200 (arm64) machines; those machines contains the hardware this set of kernel modules target. The suite is able to test for the produced modules to be able to run IB interactions, fully load all the built modules and interactions between kernel-space modules/user-space applications. A full test run can be found on the ARGOS-2019 Jira ticket. Test-plan can be found at https://github.com/canonical/mofed-userspace-integration/tree/main/testcases Test machines are freely available in Testflinger. * The produced modules has been tested by an Nvidia team that has given us the green light for the release. [ Where problems could occur ] * As the set of drivers are replacing some in-tree modules with Nvidia modified ones, Ubuntu users installing this package can see their non- NVIDIA Bluefield Infiniband devices stop working or regressing in performance. [ Other Info ] * The repackaging code is contained in the repository at the following address: https://kernel.ubuntu.com/forgejo/alessiofaina/nvidia-doca- ofed-dkms * Examples of the repackaged DKMS can be found at the following personal PPA: https://launchpad.net/~alessiofaina/+archive/ubuntu/mofed- autoupload-test/+packages * All the modules as GPL-2 or Dual BSD/GPL licensed, no closed source binaries are redistributed. ** Affects: linux (Ubuntu) Importance: Critical Assignee: Alessio Faina (alessiofaina) Status: In Progress ** Affects: linux (Ubuntu Resolute) Importance: Critical Assignee: Alessio Faina (alessiofaina) Status: In Progress ** Also affects: linux (Ubuntu Resolute) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Resolute) Importance: Undecided => Medium ** Changed in: linux (Ubuntu Resolute) Importance: Medium => Critical ** Changed in: linux (Ubuntu Resolute) Status: New => In Progress ** Changed in: linux (Ubuntu Resolute) Assignee: (unassigned) => Alessio Faina (alessiofaina) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2144874 Title: [NEEDS PACKAGING] [mlnx5 hardware not supported] NVIDIA doca- ofed-26.01-dkms DKMS suite Edit To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2144874/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
