[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0
I wouldn't mind testing, but I'm not sure how :) I'm on 18.04 LTS with 4.15.0-23-generic If you can give me some commands to try (as well as a command to revert), I have no problem trying. Thanks! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1771467 Title: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0 Status in Linux: Fix Released Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Bug description: Verified on multiple DL360 Gen9 servers with up to date firmware. Just before reboot or shutdown, there is the following panic: [ 289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 [ 289.093085] {1}[Hardware Error]: event severity: fatal [ 289.093087] {1}[Hardware Error]: Error 0, type: fatal [ 289.093088] {1}[Hardware Error]: section_type: PCIe error [ 289.093090] {1}[Hardware Error]: port_type: 4, root port [ 289.093091] {1}[Hardware Error]: version: 1.16 [ 289.093093] {1}[Hardware Error]: command: 0x6010, status: 0x0143 [ 289.093094] {1}[Hardware Error]: device_id: :00:01.0 [ 289.093095] {1}[Hardware Error]: slot: 0 [ 289.093096] {1}[Hardware Error]: secondary_bus: 0x03 [ 289.093097] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f02 [ 289.093098] {1}[Hardware Error]: class_code: 040600 [ 289.093378] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003 [ 289.093380] {1}[Hardware Error]: Error 1, type: fatal [ 289.093381] {1}[Hardware Error]: section_type: PCIe error [ 289.093382] {1}[Hardware Error]: port_type: 4, root port [ 289.093383] {1}[Hardware Error]: version: 1.16 [ 289.093384] {1}[Hardware Error]: command: 0x6010, status: 0x0143 [ 289.093386] {1}[Hardware Error]: device_id: :00:01.0 [ 289.093386] {1}[Hardware Error]: slot: 0 [ 289.093387] {1}[Hardware Error]: secondary_bus: 0x03 [ 289.093388] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f02 [ 289.093674] {1}[Hardware Error]: class_code: 040600 [ 289.093676] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003 [ 289.093678] Kernel panic - not syncing: Fatal hardware error! [ 289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation range: 0x8000-0xbfff) [ 289.105835] ERST: [Firmware Warn]: Firmware does not respond in time. It does eventually restart after this. Then during the subsequent POST, the following warning appears: Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical Drive(s) - Operation Failed - 1719-Slot 0 Drive Array - A controller failure event occurred prior to this power-up. (Previous lock up code = 0x13) Action: Install the latest controller firmware. If the problem persists, replace the controller. The latter's symptoms are described in https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565 but the running storage controller firmware is much newer than the doc's resolution. Neither of these problems occur during shutdown/reboot on the xenial kernel. FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56 (01/22/2018)), the shutdown failure mode was a loop like so: [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0. [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.222884] Do you have a strange power saving mode enabled? [529153.222884] Dazed and confused, but trying to continue [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554448] Do you have a strange power saving mode enabled? [529153.554449] Dazed and confused, but trying to continue [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554451] Do you have a strange power saving mode enabled? [529153.554452] Dazed and confused, but trying to continue [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554453] Do you have a strange power saving mode enabled? [529153.554454] Dazed and confused, but trying to continue [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0. [529153.554455] Do you have a strange power saving mode enabled? [529153.554456] Dazed and confused, but trying to continue [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554458] Do you have a strange power saving mode enabled? [529153.554458] Dazed and confused, but trying to continue [529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554460] Do you have a strange power saving mode enabled? [529153.554460] Dazed and confused, but trying to continue [529154.953916] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529154.953917
[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0
hi, I just updated to 18.04 today and have started to see this message on reboot. I am also on an HP DL380 Gen9. It looks like everything has already been found :). Pardon me for asking, but how long does a fix like this (ballpark estimate) usually take to get into an OS update? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1771467 Title: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0 Status in Linux: Unknown Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Triaged Bug description: Verified on multiple DL360 Gen9 servers with up to date firmware. Just before reboot or shutdown, there is the following panic: [ 289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 [ 289.093085] {1}[Hardware Error]: event severity: fatal [ 289.093087] {1}[Hardware Error]: Error 0, type: fatal [ 289.093088] {1}[Hardware Error]: section_type: PCIe error [ 289.093090] {1}[Hardware Error]: port_type: 4, root port [ 289.093091] {1}[Hardware Error]: version: 1.16 [ 289.093093] {1}[Hardware Error]: command: 0x6010, status: 0x0143 [ 289.093094] {1}[Hardware Error]: device_id: :00:01.0 [ 289.093095] {1}[Hardware Error]: slot: 0 [ 289.093096] {1}[Hardware Error]: secondary_bus: 0x03 [ 289.093097] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f02 [ 289.093098] {1}[Hardware Error]: class_code: 040600 [ 289.093378] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003 [ 289.093380] {1}[Hardware Error]: Error 1, type: fatal [ 289.093381] {1}[Hardware Error]: section_type: PCIe error [ 289.093382] {1}[Hardware Error]: port_type: 4, root port [ 289.093383] {1}[Hardware Error]: version: 1.16 [ 289.093384] {1}[Hardware Error]: command: 0x6010, status: 0x0143 [ 289.093386] {1}[Hardware Error]: device_id: :00:01.0 [ 289.093386] {1}[Hardware Error]: slot: 0 [ 289.093387] {1}[Hardware Error]: secondary_bus: 0x03 [ 289.093388] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f02 [ 289.093674] {1}[Hardware Error]: class_code: 040600 [ 289.093676] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003 [ 289.093678] Kernel panic - not syncing: Fatal hardware error! [ 289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation range: 0x8000-0xbfff) [ 289.105835] ERST: [Firmware Warn]: Firmware does not respond in time. It does eventually restart after this. Then during the subsequent POST, the following warning appears: Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical Drive(s) - Operation Failed - 1719-Slot 0 Drive Array - A controller failure event occurred prior to this power-up. (Previous lock up code = 0x13) Action: Install the latest controller firmware. If the problem persists, replace the controller. The latter's symptoms are described in https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565 but the running storage controller firmware is much newer than the doc's resolution. Neither of these problems occur during shutdown/reboot on the xenial kernel. FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56 (01/22/2018)), the shutdown failure mode was a loop like so: [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0. [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.222884] Do you have a strange power saving mode enabled? [529153.222884] Dazed and confused, but trying to continue [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554448] Do you have a strange power saving mode enabled? [529153.554449] Dazed and confused, but trying to continue [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554451] Do you have a strange power saving mode enabled? [529153.554452] Dazed and confused, but trying to continue [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554453] Do you have a strange power saving mode enabled? [529153.554454] Dazed and confused, but trying to continue [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0. [529153.554455] Do you have a strange power saving mode enabled? [529153.554456] Dazed and confused, but trying to continue [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554458] Do you have a strange power saving mode enabled? [529153.554458] Dazed and confused, but trying to continue [529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554460] Do you have a strange power saving mode enabled? [529153.554460] Dazed and confused, but trying to continue [529154.95391
[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0
Was this one forgotten, haha? If I can help in any way, please let me know. Thanks! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1771467 Title: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0 Status in Linux: Fix Released Status in linux package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Bug description: == SRU Justification == Mainline commit introduced a regression in v4.15-rc1. The regression causes a kernel panic during system shutdown. This commit fixes that regression. This commit was also cc'd to upstream stable, but it has not landed in Bionic as of yet. == Fix == 0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown") == Regression Potential == Low. This patch fixes a current regression. It has been cc'd to upstream stable, so it has had additon upstream review. == Test Case == A test kernel was built with this patch and tested by the original bug reporter. The bug reporter states the test kernel resolved the bug. Verified on multiple DL360 Gen9 servers with up to date firmware. Just before reboot or shutdown, there is the following panic: [ 289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 [ 289.093085] {1}[Hardware Error]: event severity: fatal [ 289.093087] {1}[Hardware Error]: Error 0, type: fatal [ 289.093088] {1}[Hardware Error]: section_type: PCIe error [ 289.093090] {1}[Hardware Error]: port_type: 4, root port [ 289.093091] {1}[Hardware Error]: version: 1.16 [ 289.093093] {1}[Hardware Error]: command: 0x6010, status: 0x0143 [ 289.093094] {1}[Hardware Error]: device_id: :00:01.0 [ 289.093095] {1}[Hardware Error]: slot: 0 [ 289.093096] {1}[Hardware Error]: secondary_bus: 0x03 [ 289.093097] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f02 [ 289.093098] {1}[Hardware Error]: class_code: 040600 [ 289.093378] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003 [ 289.093380] {1}[Hardware Error]: Error 1, type: fatal [ 289.093381] {1}[Hardware Error]: section_type: PCIe error [ 289.093382] {1}[Hardware Error]: port_type: 4, root port [ 289.093383] {1}[Hardware Error]: version: 1.16 [ 289.093384] {1}[Hardware Error]: command: 0x6010, status: 0x0143 [ 289.093386] {1}[Hardware Error]: device_id: :00:01.0 [ 289.093386] {1}[Hardware Error]: slot: 0 [ 289.093387] {1}[Hardware Error]: secondary_bus: 0x03 [ 289.093388] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f02 [ 289.093674] {1}[Hardware Error]: class_code: 040600 [ 289.093676] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003 [ 289.093678] Kernel panic - not syncing: Fatal hardware error! [ 289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation range: 0x8000-0xbfff) [ 289.105835] ERST: [Firmware Warn]: Firmware does not respond in time. It does eventually restart after this. Then during the subsequent POST, the following warning appears: Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical Drive(s) - Operation Failed - 1719-Slot 0 Drive Array - A controller failure event occurred prior to this power-up. (Previous lock up code = 0x13) Action: Install the latest controller firmware. If the problem persists, replace the controller. The latter's symptoms are described in https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565 but the running storage controller firmware is much newer than the doc's resolution. Neither of these problems occur during shutdown/reboot on the xenial kernel. FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56 (01/22/2018)), the shutdown failure mode was a loop like so: [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0. [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.222884] Do you have a strange power saving mode enabled? [529153.222884] Dazed and confused, but trying to continue [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554448] Do you have a strange power saving mode enabled? [529153.554449] Dazed and confused, but trying to continue [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554451] Do you have a strange power saving mode enabled? [529153.554452] Dazed and confused, but trying to continue [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554453] Do you have a strange power saving mode enabled? [529153.554454] Dazed and confused, but trying to continue [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0. [529153.554455] Do you have a strange power saving mo