On Tue, Jul 2, 2024 at 5:17 AM Dmitrii Odintcov
<cyprussocial...@gmail.com> wrote:
>
> Hi,
>
>
> Just had it happen to me again - on boot this time - with the
> recommended `nvme_core.default_ps_max_latency_us=0 pcie_aspm=off`.
>
> Worth noting that it's only happening with one of two SSDs I have
> installed - the other being Samsung SSD 980 PRO 2TB - and that they've
> been working fine for several years until this started happening
> sometime last month.

One thing I noticed is when I boot without any special kernel
parameters I was seeing the following, I've read that this error can
be due to a buggy BIOS:

$ lspci | grep -i 1b.4
00:1b.4 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express
Root Port (rev 11)

Jul  2 14:09:00 x kernel: [ 1213.547609] pcieport 0000:00:1b.4: AER:
Multiple Corrected error message received from 0000:00:1b.4
Jul  2 14:09:00 x kernel: [ 1213.547843] pcieport 0000:00:1b.4: PCIe
Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Jul  2 14:09:00 x kernel: [ 1213.548011] pcieport 0000:00:1b.4:
device [8086:7ac4] error status/mask=00000040/00002000
Jul  2 14:09:00 x kernel: [ 1213.548182] pcieport 0000:00:1b.4:    [ 6] BadTLP
Jul  2 14:09:43 x kernel: [ 1256.906830] pcieport 0000:00:1b.4: AER:
Multiple Corrected error message received from 0000:00:1b.4
Jul  2 14:09:43 x kernel: [ 1256.907169] pcieport 0000:00:1b.4: PCIe
Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Jul  2 14:09:43 x kernel: [ 1256.907395] pcieport 0000:00:1b.4:
device [8086:7ac4] error status/mask=00000040/00002000
Jul  2 14:09:43 x kernel: [ 1256.907648] pcieport 0000:00:1b.4:    [ 6] BadTLP

I am now testing with pci=nommconf, so far it has been 20 hours and no
crash yet nor Corrected errors as I showed above, but I need to give
it more time.  I am curious if there is any improvement on your side
if only testing with pci=nommconf ?

Justin

Reply via email to