This bug is awaiting verification that the linux-nvidia-
tegra/6.8.0-1009.9 kernel in -proposed solves the problem. Please test
the kernel and update this bug with the results. If the problem is
solved, change the tag 'verification-needed-noble-linux-nvidia-tegra' to
'verification-done-noble-linux-nvidia-tegra'. If the problem still
exists, change the tag 'verification-needed-noble-linux-nvidia-tegra' to
'verification-failed-noble-linux-nvidia-tegra'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-noble-linux-nvidia-tegra-v2 
verification-needed-noble-linux-nvidia-tegra

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2115209

Title:
  NVMe namespace ID mismatch on repeated map/unmap

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Noble:
  Fix Committed
Status in linux source package in Oracular:
  Won't Fix
Status in linux source package in Plucky:
  Fix Released
Status in linux source package in Questing:
  Fix Released

Bug description:
  [Impact]
  During repeated NS map/unmap operations in ONTAP (which triggers NS attr 
changed AENs) where new NSs get mapped reusing the old NSID, one occasionally 
sees the Ubuntu 24.04 NVMe/TCP host ending up with device inconsistencies where 
the respective NVMe block device (i.e. /dev/nvmeXnY) is available, but not the 
corresponding NVMe generic char device (i.e. /dev/ngXnY). This issue is not 
seen if the same NS is remapped on the same NSID, but only hit when a new NS is 
mapped reusing the same NSID which was previously used by some other NS.

  The following error entries are seen in the messages file during this
  device inconsistency scenario:

  ...
  kernel: [267011.744167][ T2016] nvme nvme6: rescanning namespaces.
  kernel: [267011.744347][T46805] nvme nvme2: rescanning namespaces.
  kernel: [267011.750418][ T7876] nvme nvme1: rescanning namespaces.
  kernel: [267011.784466][ T2016] nvme nvme6: IDs don't match for shared 
namespace 1
  kernel: [267011.784791][T46805] nvme nvme2: IDs don't match for shared 
namespace 1
  kernel: [267011.790843][ T7876] nvme nvme1: IDs don't match for shared 
namespace 1
  kernel: [267011.804852][ T2016] nvme nvme6: IDs don't match for shared 
namespace 2
  kernel: [267011.804867][T46805] nvme nvme2: IDs don't match for shared 
namespace 2
  kernel: [267011.810788][ T7876] nvme nvme1: IDs don't match for shared 
namespace 2
  kernel: [267011.824600][ T2016] nvme nvme6: IDs don't match for shared 
namespace 3
  kernel: [267011.825114][T46805] nvme nvme2: IDs don't match for shared 
namespace 3
  kernel: [267011.830982][ T7876] nvme nvme1: IDs don't match for shared 
namespace 3
  kernel: [267011.844712][ T2016] nvme nvme6: duplicate IDs in subsystem for 
nsid 4
  kernel: [267011.845161][T46805] nvme nvme2: duplicate IDs in subsystem for 
nsid 4
  kernel: [267011.851060][ T7876] nvme nvme1: duplicate IDs in subsystem for 
nsid 4

  [Fix]
  The following upstream commits are required:

    9546ad1a9bda nvme: requeue namespace scan on missed AENs
    62baf70c3274 nvme: re-read ANA log page after ns scan completes
    26d7fb4fd4ca nvme: fixup scan failure for non-ANA multipath controllers

  $ git describe --contains 9546ad1a9bda 62baf70c3274 26d7fb4fd4ca
  v6.15-rc2~11^2~1^2~11
  v6.15-rc2~11^2~1^2~10
  v6.15-rc3~27^2^2~5

  These are already included in the Plucky tree and the Questing kernel
  seems to be based on v6.15 already, so only Noble needs the cherry-
  picks.

  [Test Case]
  The ns-stress.sh script should be able to reproduce this. It repeatedly 
creates and deletes NVMe namespaces mapped to the same ID. An example run from 
an affected system will look like the one below:

  # ./ns-stress.sh /dev/nvme2
  Starting test with parameters:
  Controller: /dev/nvme2
  NSID: 1
  Iterations: 100
  Size1: 0x200000
  Size2: 0x400000
  Iteration 1/100
  create-ns: Success, created nsid:1
  attach-ns: Success, nsid:1
  ❌ Char device missing after first attach

  [Where Problems Could Occur]
  The fix requeues controller scans if there are any pending/missed AEN events. 
This can introduce delays when managing NVMe namespaces, so we should look out 
for any delays or hangs with such operations.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2115209/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to