Public bug reported:

SRU Justification

[Impact]

On some x86 systems, it is possible for PCIe device BAR addresses to exceed the
range reserved by KASLR for direct mappings. This causes attempts to map the
impacted BAR region using devm_memremap_pages() to fail. These memmap-backed
mappings are required for multiple use-cases, including P2PDMA, and CUDA with
Heterogeneous Memory Management (HMM) enabled.

[Fix]

This is resolved upstream by commit 7ffb791423c7 ("x86/kaslr: Reduce KASLR
entropy on most x86 systems"). It changes the behavior of KASLR to not shrink
direct mapping space when CONFIG_PCI_P2PDMA is enabled. The consequence of this
is that there is less room for KASLR to maneuver, and thus the amount of
entropy in the randomized layout is reduced. In discussion on the upstream
patch submission [1], it is noted that on the submitter's system this reduces
entropy from 16 bits down to 15 bits.

Cherry-picking the mentioned commit allows CUDA with HMM enabled and P2PDMA to
function on the systems described above, as with it the direct mapping space is
not shrunk, so all BAR regions fall within its bounds, and thus the
devm_memremap_pages() operation succeeds.

Jammy 5.15 has CONFIG_PCI_P2PDMA set to n, so a cherry-pick alone will not
resolve the issue. In addition to the cherry-pick, set CONFIG_PCI_P2PDMA=y.

Jammy: Cherry-pick and CONFIG_PCI_P2PDMA=y needed
Noble: Cherry-pick needed
Plucky: Cherry-pick needed
Questing: Not affected, fix commit already in tree.

[Test Case]

The issue only occurs on systems with PCIe BAR addresses located outside of the
current minimum address range of [0, ceil(max_pfn / 1TiB) +
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING (10 TiB)].

With the NVIDIA Container Toolkit installed and enabled for Docker, the
following reproduces the issue on affected systems where one or more NVIDIA
GPUs have BAR addresses outside of the current minimum range:

$ sudo docker run --runtime nvidia --rm -it nvcr.io/nvidia/pytorch:25.03-py3
ERROR: The NVIDIA Driver is present, but CUDA failed to initialize.  GPU
functionality will not be available.
   [[ Initialization error (error 3) ]]

[Where things could go wrong]

This reduces the entropy of the memory layouts KASLR generates on most x86
systems. A bug would likely show up as misbehavior of KASLR.

On Jammy, this changeset also enables CONFIG_PCI_P2PDMA, which could have
additional side-effects. There is an LP bug [2] noting the change of
CONFIG_PCI_P2PDMA in newer kernels.

[Other Notes]

[1] https://lore.kernel.org/lkml/202502061145.8AFAF053E4@keescook/
[2] https://bugs.launchpad.net/bugs/1987394

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Invalid

** Affects: linux (Ubuntu Jammy)
     Importance: Medium
     Assignee: Jacob Martin (jacobmartin)
         Status: In Progress

** Affects: linux (Ubuntu Noble)
     Importance: Medium
     Assignee: Jacob Martin (jacobmartin)
         Status: In Progress

** Affects: linux (Ubuntu Plucky)
     Importance: Medium
     Assignee: Jacob Martin (jacobmartin)
         Status: In Progress

** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Plucky)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu)
       Status: New => Invalid

** Changed in: linux (Ubuntu Jammy)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Noble)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Plucky)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Jammy)
     Assignee: (unassigned) => Jacob Martin (jacobmartin)

** Changed in: linux (Ubuntu Noble)
     Assignee: (unassigned) => Jacob Martin (jacobmartin)

** Changed in: linux (Ubuntu Plucky)
     Assignee: (unassigned) => Jacob Martin (jacobmartin)

** Changed in: linux (Ubuntu Jammy)
       Status: New => In Progress

** Changed in: linux (Ubuntu Noble)
       Status: New => In Progress

** Changed in: linux (Ubuntu Plucky)
       Status: New => In Progress

** Changed in: linux (Ubuntu Jammy)
   Importance: High => Medium

** Changed in: linux (Ubuntu Noble)
   Importance: High => Medium

** Changed in: linux (Ubuntu Plucky)
   Importance: High => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2120209

Title:
  x86 systems with PCIe BAR addresses located outside a certain range
  see P2PDMA allocation failures and CUDA initialization errors

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  In Progress
Status in linux source package in Noble:
  In Progress
Status in linux source package in Plucky:
  In Progress

Bug description:
  SRU Justification

  [Impact]

  On some x86 systems, it is possible for PCIe device BAR addresses to exceed 
the
  range reserved by KASLR for direct mappings. This causes attempts to map the
  impacted BAR region using devm_memremap_pages() to fail. These memmap-backed
  mappings are required for multiple use-cases, including P2PDMA, and CUDA with
  Heterogeneous Memory Management (HMM) enabled.

  [Fix]

  This is resolved upstream by commit 7ffb791423c7 ("x86/kaslr: Reduce KASLR
  entropy on most x86 systems"). It changes the behavior of KASLR to not shrink
  direct mapping space when CONFIG_PCI_P2PDMA is enabled. The consequence of 
this
  is that there is less room for KASLR to maneuver, and thus the amount of
  entropy in the randomized layout is reduced. In discussion on the upstream
  patch submission [1], it is noted that on the submitter's system this reduces
  entropy from 16 bits down to 15 bits.

  Cherry-picking the mentioned commit allows CUDA with HMM enabled and P2PDMA to
  function on the systems described above, as with it the direct mapping space 
is
  not shrunk, so all BAR regions fall within its bounds, and thus the
  devm_memremap_pages() operation succeeds.

  Jammy 5.15 has CONFIG_PCI_P2PDMA set to n, so a cherry-pick alone will not
  resolve the issue. In addition to the cherry-pick, set CONFIG_PCI_P2PDMA=y.

  Jammy: Cherry-pick and CONFIG_PCI_P2PDMA=y needed
  Noble: Cherry-pick needed
  Plucky: Cherry-pick needed
  Questing: Not affected, fix commit already in tree.

  [Test Case]

  The issue only occurs on systems with PCIe BAR addresses located outside of 
the
  current minimum address range of [0, ceil(max_pfn / 1TiB) +
  CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING (10 TiB)].

  With the NVIDIA Container Toolkit installed and enabled for Docker, the
  following reproduces the issue on affected systems where one or more NVIDIA
  GPUs have BAR addresses outside of the current minimum range:

  $ sudo docker run --runtime nvidia --rm -it nvcr.io/nvidia/pytorch:25.03-py3
  ERROR: The NVIDIA Driver is present, but CUDA failed to initialize.  GPU
  functionality will not be available.
     [[ Initialization error (error 3) ]]

  [Where things could go wrong]

  This reduces the entropy of the memory layouts KASLR generates on most x86
  systems. A bug would likely show up as misbehavior of KASLR.

  On Jammy, this changeset also enables CONFIG_PCI_P2PDMA, which could have
  additional side-effects. There is an LP bug [2] noting the change of
  CONFIG_PCI_P2PDMA in newer kernels.

  [Other Notes]

  [1] https://lore.kernel.org/lkml/202502061145.8AFAF053E4@keescook/
  [2] https://bugs.launchpad.net/bugs/1987394

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2120209/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to