Public bug reported: My problem is similarly described in this old thread: https://unix.stackexchange.com/questions/742360/
journalctl message: one of the many related logs Apr 09 15:37:40.096850 ****** kernel: Linux version 6.5.0-26-lowlatency (buildd@lcy02-amd64-109) (x86_64-linux-gnu-gcc-12 (Ubunntu 12.3.0-1ubuntu1~22.04) 12.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #26.1~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Mar 13 10:41:42 UTC (Ubuntu 6.5.0-26.26.1~22.04.1-lowlatency 6.5.13) .................... Apr 09 15:43:46.238697 ****** kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10 Apr 09 15:43:46.239162 ****** kernel: nvme nvme0: Does your device have a faulty power saving mode enabled? Apr 09 15:43:46.239266 ****** kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug Apr 09 15:43:46.690200 ****** kernel: nvme 0000:06:00.0: enabling device (0000 -> 0002) Apr 09 15:43:46.690409 ****** kernel: nvme nvme0: Disabling device after reset failure: -19 Apr 09 15:43:46.698188 ****** kernel: I/O error, dev nvme0n1, sector 1216896 op 0x1:(WRITE) flags 0xc800 phys_seg 1 prio clas> I was using 22.04.4 with hwe kernel, as shown above (kernel 6.5) upgrade to 24.04 dev hoping the problem would be resolved, but no it still exists (kernel 6.8) The problem happens after some kernel upgrades that I'd done after 2024-03-01, but I cannot pinpoint when; the nvme_core kernel param as shown in the message above does not help. The problem does NOT exist with 22.04 regular kernel: Currently I'd created a VM to perform my heavy write workload using pci passthrough of the NVMe drive, and it works okay. Cannot downgrade host to older kernel because of ZFS pool being upgraded VM info (where my NVMe drive works okay) uname -r 5.15.0-78-lowlatency lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.4 LTS Release: 22.04 Codename: jammy (maybe) related hardware spec CPU: AMD Ryzen 5750G (x8x4x4) Chipset: AMD B450 NVMe: Samsung MZ1LB960HBJR-000FB (PM983a, f/w EDW73F2Q) ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Description changed: My problem is similarly described in this old thread: https://unix.stackexchange.com/questions/742360/ journalctl message: one of the many related logs - Apr 09 15:37:40.096850 awepet kernel: Linux version 6.5.0-26-lowlatency (buildd@lcy02-amd64-109) (x86_64-linux-gnu-gcc-12 (Ubunntu 12.3.0-1ubuntu1~22.04) 12.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #26.1~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Mar 13 10:41:42 UTC (Ubuntu 6.5.0-26.26.1~22.04.1-lowlatency 6.5.13) + Apr 09 15:37:40.096850 ****** kernel: Linux version 6.5.0-26-lowlatency (buildd@lcy02-amd64-109) (x86_64-linux-gnu-gcc-12 (Ubunntu 12.3.0-1ubuntu1~22.04) 12.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #26.1~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Mar 13 10:41:42 UTC (Ubuntu 6.5.0-26.26.1~22.04.1-lowlatency 6.5.13) .................... Apr 09 15:43:46.238697 ****** kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10 Apr 09 15:43:46.239162 ****** kernel: nvme nvme0: Does your device have a faulty power saving mode enabled? Apr 09 15:43:46.239266 ****** kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug Apr 09 15:43:46.690200 ****** kernel: nvme 0000:06:00.0: enabling device (0000 -> 0002) Apr 09 15:43:46.690409 ****** kernel: nvme nvme0: Disabling device after reset failure: -19 Apr 09 15:43:46.698188 ****** kernel: I/O error, dev nvme0n1, sector 1216896 op 0x1:(WRITE) flags 0xc800 phys_seg 1 prio clas> I was using 22.04.4 with hwe kernel, as shown above (kernel 6.5) upgrade to 24.04 dev hoping the problem would be resolved, but no it still exists (kernel 6.8) The problem happens after some kernel upgrades that I'd done after 2024-03-01, but I cannot pinpoint when; the nvme_core kernel param as shown in the message above does not help. The problem does NOT exist with 22.04 regular kernel: Currently I'd created a VM to perform my heavy write workload using pci passthrough of the NVMe drive, and it works okay. Cannot downgrade host to older kernel because of ZFS pool being upgraded VM info (where my NVMe drive works okay) uname -r 5.15.0-78-lowlatency lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.4 LTS Release: 22.04 Codename: jammy (maybe) related hardware spec CPU: AMD Ryzen 5750G (x8x4x4) Chipset: AMD B450 NVMe: Samsung MZ1LB960HBJR-000FB (PM983a, f/w EDW73F2Q) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2060770 Title: NVMe drive fails at high write workload after kernel upgrades To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2060770/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs