Hi,

this bug is similar to bug #1053750 and bug #1034718 that have been archived.

In the 6.1.x kernel branch, the problem has become worse:

 - Previously the kernel would output an error in /var/lib/systemd/pstore/ but 
would shutdown anyway.

 - Now, with kernel 6.1.135-1, the shutdown is blocked as with 6.12.x kernels 
(see below).

--
Laurent.

<30>[  961.098671] systemd-shutdown[1]: Rebooting.
<6>[  961.098743] kvm: exiting hardware virtualization
<6>[  961.361878] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x40000009
<6>[  961.414526] ACPI: PM: Preparing to enter system sleep state S5
<0>[  963.828210] {1}[Hardware Error]: Hardware error from APEI Generic 
Hardware Error Source: 5
<0>[  963.828213] {1}[Hardware Error]: event severity: fatal
<0>[  963.828214] {1}[Hardware Error]:  Error 0, type: fatal
<0>[  963.828216] {1}[Hardware Error]:   section_type: PCIe error
<0>[  963.828216] {1}[Hardware Error]:   port_type: 0, PCIe end point
<0>[  963.828217] {1}[Hardware Error]:   version: 3.0
<0>[  963.828218] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
<0>[  963.828220] {1}[Hardware Error]:   device_id: 0000:01:00.1
<0>[  963.828221] {1}[Hardware Error]:   slot: 6
<0>[  963.828222] {1}[Hardware Error]:   secondary_bus: 0x00
<0>[  963.828223] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x1563
<0>[  963.828224] {1}[Hardware Error]:   class_code: 020000
<0>[  963.828225] {1}[Hardware Error]:   aer_uncor_status: 0x00100000, 
aer_uncor_mask: 0x00018000
<0>[  963.828226] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef010
<0>[  963.828227] {1}[Hardware Error]:   TLP Header: 40000001 0000000f 90028090 
00000000
<0>[  963.828229] GHES: Fatal hardware error but panic disabled
<0>[  963.828230] Kernel panic - not syncing: GHES: Fatal hardware error
<4>[  963.828231] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-34-amd64 #1  
Debian 6.1.135-1
<4>[  963.828234] Hardware name: Dell Inc. PowerEdge R540/0PRWNC, BIOS 2.23.0 
01/09/2025
<4>[  963.828235] Call Trace:
<4>[  963.828238]  <NMI>
<4>[  963.828240]  dump_stack_lvl+0x44/0x5c
<4>[  963.828247]  panic+0x118/0x2f4
<4>[  963.828253]  __ghes_panic.cold+0x28/0x28
<4>[  963.828258]  ghes_notify_nmi+0x1db/0x370
<4>[  963.828263]  nmi_handle+0x5a/0x120
<4>[  963.828269]  default_do_nmi+0x40/0x130
<4>[  963.828273]  exc_nmi+0x11e/0x150
<4>[  963.828276]  end_repeat_nmi+0x16/0x67
<4>[  963.828281] RIP: 0010:mwait_idle_with_hints.constprop.0+0x48/0x90
<4>[  963.828286] Code: 48 89 d1 65 48 8b 04 25 80 fb 01 00 0f 01 c8 48 8b 00 a8 08 
75 14 66 90 0f 00 2d 8f ac b0 00 b9 01 00 00 00 48 89 f8 0f 01 c9 <65> 48 8b 04 25 80 
fb 01 00 f0 80 60 02 df f0 83 44 24 fc 00 48 8b
<4>[  963.828287] RSP: 0018:ffffffffb2e03e18 EFLAGS: 00000046
<4>[  963.828290] RAX: 0000000000000020 RBX: 0000000000000003 RCX: 
0000000000000001
<4>[  963.828292] RDX: 0000000000000000 RSI: ffffffffb2fa0160 RDI: 
0000000000000020
<4>[  963.828293] RBP: 0000000000000003 R08: 0000000000000002 R09: 
000000003a518aaa
<4>[  963.828295] R10: 0000000000000018 R11: 000000000000afc8 R12: 
ffffffffb2fa0160
<4>[  963.828296] R13: ffffffffb2fa02b0 R14: 0000000000000003 R15: 
0000000000000000
<4>[  963.828300]  ? mwait_idle_with_hints.constprop.0+0x48/0x90
<4>[  963.828303]  ? mwait_idle_with_hints.constprop.0+0x48/0x90
<4>[  963.828305]  </NMI>
<4>[  963.828306]  <TASK>
<4>[  963.828307]  intel_idle_ibrs+0x75/0x90
<4>[  963.828309]  cpuidle_enter_state+0x89/0x420
<4>[  963.828315]  cpuidle_enter+0x29/0x40
<4>[  963.828317]  do_idle+0x202/0x2a0
<4>[  963.828323]  cpu_startup_entry+0x26/0x30
<4>[  963.828326]  rest_init+0xca/0xd0
<4>[  963.828328]  arch_call_rest_init+0xa/0x14
<4>[  963.828333]  start_kernel+0x70a/0x733
<4>[  963.828336]  secondary_startup_64_no_verify+0xe5/0xeb
<4>[  963.828343]  </TASK>
<0>[  963.828357] Kernel Offset: 0x30400000 from 0xffffffff81000000 (relocation 
range: 0xffffffff80000000-0xffffffffbfffffff)

Reply via email to