** Changed in: linux (Ubuntu Noble)
Status: In Progress => Fix Committed
** Changed in: linux (Ubuntu Plucky)
Status: In Progress => Fix Committed
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2111521
Title:
nvme no longer detected on boot after upgrade to 6.8.0-60
Status in linux package in Ubuntu:
In Progress
Status in linux source package in Noble:
Fix Committed
Status in linux source package in Plucky:
Fix Committed
Status in linux source package in Questing:
In Progress
Bug description:
[Impact]
An Intel nvme stops working after upgrade to noble 6.8.0-60, that is because
a upstream stable commit d591f6804e7e ("PCI: Wait for device readiness with
Configuration RRS") introduced this regression, and other than noble, the
ubuntu plucky, questing and mainline kernel are all impacted. So far a formal
fix is not ready yet, PCI maintainer has been working on it for almost 1.5
month, the ubuntu users want the bug get fixed asap, otherwise their servers
couldn't upgrade the ubuntu kernel.
As a temporary fix, I wrote a SAUCE patch to apply a quirk to this Intel nvme
device. After the mainline kernel has a formal fix, we could revert this SAUCE
patch and introduce the formal one.
Upstream mailing list discussion:
https://lore.kernel.org/linux-pci/[email protected]/T/
[Fix]
apply a SAUCE patch, this will set a specific quirk to this Intel nvme
device
[Test]
I patched noble and unstable kernels and built them, then shared the deb to
bug reporters, they tested with the patched kernel, the nvme worked as before
[Where problems could occur]
This quirk is specific to the Intel nvme device 8086:0a54, so it is safe to
other devices, if it has chance to introduce regression, it will make this
Intel nvme device stop working if the nvme connects to a differnt VMD or pci
root port, but this regression chance is very low since this quirk just
disable
the RRS polling, and let pci_dev_wait() work as before.
Short version: booting 6.8.0-59-generic or any earlier version from the grub
menu works; 6.8.0-60-generic dumps me at the initramfs prompt with no disks.
We have some servers running Ubuntu 24.04.2 LTS. They have NVME
solid-state disks which (in a working kernel) are detected as follows:
[ 3.537968] nvme nvme0: pci function 10000:01:00.0
[ 3.539285] nvme 10000:01:00.0: PCI INT A: no GSI
[ 5.897819] nvme nvme0: 32/0/0 default/read/poll queues
[ 5.905451] nvme nvme0: Ignoring bogus Namespace Identifiers
[ 5.909057] nvme0n1: p1 p2 p3
On the PCI bus they look like this:
10000:01:00.0 Non-Volatile memory controller [0108]: Intel Corporation NVMe
Datacenter SSD [3DNAND, Beta Rock Controller] [8086:0a54]
$ ls -l /sys/class/nvme/nvme0
lrwxrwxrwx 1 root root 0 May 22 16:56 /sys/class/nvme/nvme0 ->
../../devices/pci0000:d7/0000:d7:05.5/pci10000:00/10000:00:02.0/10000:01:00.0/nvme/nvme0
Four identical servers updated their kernel this morning to:
ii linux-image-6.8.0-60-generic 6.8.0-60.63 amd64 Signed kernel
image generic
...and rebooted. All four failed to come up and ended up at the
(initramfs) prompt. Rebooting and selecting 6.8.0-59-generic from the
grub menu allowed them to boot as normal.
There is no sign that the initramfs generation went wrong (on all four
servers) and the initramfs does contain all the same nvme modules for
-60 that the one for -59 does. I am at a loss to explain this, and
the initramfs environment is a bit limited for debugging.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp