** Changed in: linux-oem-6.11 (Ubuntu Noble)
       Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-oem-6.11 in Ubuntu.
https://bugs.launchpad.net/bugs/2086668

Title:
  NVIDIA WANR_ON call trace right after power on or resumed on 6.11
  kernel

Status in HWE Next:
  New
Status in linux-oem-6.11 package in Ubuntu:
  Invalid
Status in linux-oem-6.11 source package in Noble:
  Fix Committed

Bug description:
  [Impact]
  Shows follow_pte() warning with nvidia dirver + 6.11 kernel.

  Aug 09 09:20:42 ubuntu-202407-34200 kernel: WARNING: CPU: 0 PID: 2918
  at include/linux/rwsem.h:80 follow_pte+0x220/0x230

  [Fix]
  This occurs during suspend when a function from the NVIDIA 
'nv_revoke_gpu_mappings_locked()' calls the kernel function 
'unmap_mapping_range()', which eventually ends up calling 'follow_pte()'. The 
function 'follow_pte()' calls an assertion 'mmap_assert_locked' to check if the 
'mmap_lock' has been taken. 

  This assertion fails, and we see a warning call trace (no functional issue, 
just some output in dmesg). All of this happens in kernel versions v6.10 
through v6.11. 
  This is a kernel bug, not an NVIDIA driver bug, and has also been discussed 
here in the kernel mailing list : 
https://lore.kernel.org/linux-mm/20240712080414.ga47...@google.com/T/#u

  There is a series of patches to address this issue and replace the 
follow_pte()
  https://lore.kernel.org/linux-mm/20240809160909.1023470-1-pet...@redhat.com/
  We try to cherry pick the new functions and at the same time preserve the 
follow_pte() for compatiblity with the old drivers.

  b1b46751671b mm: fix follow_pfnmap API lockdep assert
  75182022a043 mm/x86: support large pfn mappings
  cbea8536d933 mm/x86/pat: use the new follow_pfnmap API
  6da8e9634bb7 mm: new follow_pfnmap API
  6857be5fecae mm: introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to 
pmd/pud

  [Test]
  1. Boot up the machine with 6.11 kernel + nvidia driver
  2. Do suspend/resume and check dmesg
  3. There should be no nvidia call trace

  [Where problems could occur]
  Only this patch change the code that uses follow_pfnmap() to replace 
follow_pfn()
  cbea8536d933 ("mm/x86/pat: use the new follow_pfnmap API")
  The changes are 1x1 mappingable and should do the identical things.

To manage notifications about this bug go to:
https://bugs.launchpad.net/hwe-next/+bug/2086668/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to