------- Comment From pavsu...@in.ibm.com 2019-11-06 01:01 EDT------- I have installed both the Bionic *and* Disco kernels available in PPA: https://launchpad.net/~ubuntu-power-triage/+archive/ubuntu/lp1848127
Then executed the MCE UE tests again on the machine with both the kernels. root@ltc-wspoon4:~# apt-get install linux-image-unsigned-5.0.0-33-generic/disco Reading package lists... Done Building dependency tree Reading state information... Done Selected version '5.0.0-33.35~lp1848127+build.1' (lp1848127:19.04/disco [ppc64el]) for 'linux-image-unsigned-5.0.0-33-generic' The following additional packages will be installed: linux-modules-5.0.0-33-generic Suggested packages: fdutils linux-doc-5.0.0 | linux-source-5.0.0 linux-headers-5.0.0-33-generic The following NEW packages will be installed: linux-image-unsigned-5.0.0-33-generic linux-modules-5.0.0-33-generic 0 upgraded, 2 newly installed, 0 to remove and 3 not upgraded. Need to get 20.7 MB of archives. After this operation, 106 MB of additional disk space will be used. Do you want to continue? [Y/n] Y Get:1 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu disco/main ppc64el linux-modules-5.0.0-33-generic ppc64el 5.0.0-33.35~lp1848127+build.1 [14.0 MB] Get:2 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu disco/main ppc64el linux-image-unsigned-5.0.0-33-generic ppc64el 5.0.0-33.35~lp1848127+build.1 [6,748 kB] Fetched 20.7 MB in 13s (1,546 kB/s) Selecting previously unselected package linux-modules-5.0.0-33-generic. (Reading database ... 71699 files and directories currently installed.) Preparing to unpack .../linux-modules-5.0.0-33-generic_5.0.0-33.35~lp1848127+build.1_ppc64el.deb ... Unpacking linux-modules-5.0.0-33-generic (5.0.0-33.35~lp1848127+build.1) ... Selecting previously unselected package linux-image-unsigned-5.0.0-33-generic. Preparing to unpack .../linux-image-unsigned-5.0.0-33-generic_5.0.0-33.35~lp1848127+build.1_ppc64el.deb ... Unpacking linux-image-unsigned-5.0.0-33-generic (5.0.0-33.35~lp1848127+build.1) ... Setting up linux-modules-5.0.0-33-generic (5.0.0-33.35~lp1848127+build.1) ... Setting up linux-image-unsigned-5.0.0-33-generic (5.0.0-33.35~lp1848127+build.1) ... I: /boot/vmlinux is now a symlink to vmlinux-5.0.0-33-generic I: /boot/initrd.img is now a symlink to initrd.img-5.0.0-33-generic Processing triggers for linux-image-unsigned-5.0.0-33-generic (5.0.0-33.35~lp1848127+build.1) ... /etc/kernel/postinst.d/initramfs-tools: update-initramfs: Generating /boot/initrd.img-5.0.0-33-generic cryptsetup: WARNING: The initramfs image may not contain cryptsetup binaries nor crypto modules. If that's on purpose, you may want to uninstall the 'cryptsetup-initramfs' package in order to disable the cryptsetup initramfs integration and avoid this warning. W: Possible missing firmware /lib/firmware/ast_dp501_fw.bin for module ast /etc/kernel/postinst.d/zz-update-grub: Sourcing file `/etc/default/grub' Sourcing file `/etc/default/grub.d/init-select.cfg' Generating grub configuration file ... Found linux image: /boot/vmlinux-5.0.0-33-generic Found initrd image: /boot/initrd.img-5.0.0-33-generic Found linux image: /boot/vmlinux-5.0.0-32-generic Found initrd image: /boot/initrd.img-5.0.0-32-generic done root@ltc-wspoon4:~# uname -a Linux ltc-wspoon4 5.0.0-33-generic #35~lp1848127+build.1-Ubuntu SMP Mon Oct 28 20:12:03 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux root@ltc-wspoon4:~# cat /etc/os-release NAME="Ubuntu" VERSION="19.04 (Disco Dingo)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 19.04" VERSION_ID="19.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=disco UBUNTU_CODENAME=disco root@ltc-wspoon4:~# ./statedisable.sh ./statedisable.sh: line 10: /sys/devices/system/cpu/cpu*/cpuidle/state7/disable: No such file or directory ./statedisable.sh: line 11: /sys/devices/system/cpu/cpu*/cpuidle/state8/disable: No such file or directory root@ltc-wspoon4:~# ./run_workload.sh root@ltc-wspoon4:~# ./scom_addr_p9.sh 0x1001080c 6 EQ[ 1]: 0x1101080c EX[ 3]: 0x11010c0c C[ 6]: 0x3601080c root@ltc-wspoon4:~# getscom -c 0x8 0x11010c0c 0000000000000000 root@ltc-wspoon4:~# putscom -c 0x8 0x11010c0c 0c00000000000000 0c00000000000000 ltc-wspoon4 login: [ 442.228985] NIP [c00000000019ae5c]: osq_lock+0x15c/0x230 [ 442.228985] Initiator: CPU [ 442.228986] Error type: UE [Load/Store] [ 442.228987] Effective address: c000201cc76a9600 [ 442.228988] Physical address: 0000201cc76a0000 [ 442.228988] opal: Hardware platform error: Unrecoverable Machine Check exception [ 442.228989] CPU: 109 PID: 9095 Comm: find Tainted: G M 5.0.0-33-generic #35~lp1848127+build.1-Ubuntu [ 442.228990] NIP: c00000000019ae5c LR: c000000000e000a0 CTR: c000000000446e30 [ 442.228991] REGS: c000201fff24bd70 TRAP: 0200 Tainted: G M (5.0.0-33-generic) [ 442.228992] MSR: 9000000000209033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 48002222 XER: 00000000 [ 442.228996] CFAR: c00000000019ae34 DAR: c000201cc76a9600 DSISR: 00008000 IRQMASK: 0 [ 442.228998] GPR00: c000000000e000a0 c000201c87babc30 c00000000184cb00 c000000001731abc [ 442.229001] GPR04: 0000000000000000 0000000000000000 c000000001885c78 0000000000000000 [ 442.229003] GPR08: c000201cc76a9600 c000201cc7b69600 0000000000000004 ffffffffffffffea [ 442.229005] GPR12: 0000000088002228 c000201fff686d80 00000ed7ab1e2b80 00000ed7ab1e2b80 [ 442.229008] GPR16: 00000ed7ab1f0e30 00000ed7ab1eec30 0000000000000101 00007fffc662d8b8 [ 442.229010] GPR20: 0000000000000000 0000000000030000 000000000001a9b7 0000000000000018 [ 442.229012] GPR24: c000001fc28a9dc8 c000201c7710c500 0000000000000000 c000000001731ab0 [ 442.229014] GPR28: 0000000000000002 c000000001731abc c000201c87babdb0 c000000001731ab0 [ 442.229017] NIP [c00000000019ae5c] osq_lock+0x15c/0x230 [ 442.229018] LR [c000000000e000a0] __mutex_lock.isra.1+0x90/0x710 [ 442.229018] Call Trace: [ 442.229019] [c000201c87babc30] [c000000000e00054] __mutex_lock.isra.1+0x44/0x710 (unreliable) [ 442.229020] [c000201c87babcd0] [c000000[ 577.498732581,0] OPAL: Reboot requested due to Platform error. [ 577.498806187,3] OPAL: Reboot requested due to Platform error.0004facd0] kernfs_fop_readdir+0x200/0x3b0 [ 442.229022] [c000201c87babd40] [c000000000446300] iterate_dir+0x200/0x280 [ 442.229023] [c000201c87babd90] [c0000000004472a0] ksys_getdents64+0xa0/0x1a0 [ 442.229024] [c000201c87babe00] [c0000000004473c8] sys_getdents64+0x28/0x110 [ 442.229025] [c000201c87babe20] [c00000000000b288] system_call+0x5c/0x70 [ 442.229026] Instruction dump: [ 442.229027] 60000000 38e00000 48000028 60000000 60000000 81490010 7c2004ac 2faa0000 [ 442.229030] 409effd4 7c210b78 7c421378 e9090008 <e9480000> 7faa4800 409effdc 7c0004ac [ 443.416541] Disabling lock debugging due to kernel taint [ 443.416543] Severe Machine check interrupt [Not recovered] [ 443.416544] NIP [c00000000019ad88]: osq_lock+0x88/0x230 [ 443.416544] Initiator: CPU [ 443.416545] Error type: UE [Load/Store] [ 443.416545] Effective address: c000201cc76a9610 [ 443.416546] Physical address: 0000201cc76a0000 [ 443.416547] opal: Hardware platform error: Unrecoverable Machine Check exception [ 443.416548] CPU: 90 PID: 9020 Comm: find Tainted: G M 5.0.0-33-generic #35~lp1848127+build.1-Ubuntu [ 443.416549] NIP: c00000000019ad88 LR: c000000000e000a0 CTR: c0000000004f8d60 [ 443.416550] REGS: c000201fff32fd70 TRAP: 0200 Tainted: G M (5.0.0-33-generic) [ 443.416551] MSR: 9000000000209033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24002224 XER: 00000000 [ 443.416555] CFAR: c000000000e0009c DAR: 0000201cc76a9610 DSISR: 00008000 IRQMASK: 0 [ 443.416557] GPR00: c000000000e000a0 c000201c8f81fbc0 c00000000184cb00 c000000001731abc [ 443.416559] GPR04: 0000000000000000 0000000000000000 0000201cc6370000 c000000001339600 [ 443.416561] GPR08: c000001ffec29600 c000201cc76a9600 0000001ffd8f0000 c000001f96936300 [ 443.416564] GPR12: 0000000084002228 c000201fff69ba00 00000334527e2b80 0000000000000000 [ 443.416566] GPR16: 0000000000000000 000003345280d440 0000000000000101 00007fffcaffe858 [ 443.416568] GPR20: 0000000000000000 00007fffcaffe7c8 0000000000000000 0000000000000006 [ 443.416570] GPR24: 000077194c155308 00000000000007ff c000201c8f81fd80 000003345280d548 [ 443.416572] GPR28: 0000000000000002 c000000001731abc c000000001731ab0 c000000001731ab0 [ 443.416575] NIP [c00000000019ad88] osq_lock+0x88/0x230 [ 443.416576] LR [c000000000e000a0] __mutex_lock.isra.1+0x90/0x710 [ 443.416576] Call Trace: [ 443.416577] [c000201c8f81fbc0] [c000000000e00054] __mutex_lock.isra.1+0x44/0x710 (unreliable) [ 443.416578] [c000201c8f81fc60] [c0000000004f8dac] kernfs_iop_getattr+0x4c/0xa0 [ 443.416579] [c000201c8f81fca0] [c00000000042eac0] vfs_getattr_nosec+0x90/0xf0 [ 443.416581] [c000201c8f81fce0] [c00000000042ed68] vfs_statx+0xc8/0x190 [ 443.416582] [c000201c8f81fd60] [c00000000042f128] sys_newfstatat+0x48/0x90 [ 443.416583] [c000201c8f81fe20] [c00000000000b288] system_call+0x5c/0x70 [ 443.416584] Instruction dump: [ 443.416584] 2faa0000 419e00c4 394affff 3d020003 39085170 7d4a07b4 794a1f24 7d48502a [ 443.416587] 7d075214 f9090008 7c2004ac 7d27512a <81490010> 2faa0000 409e0090 782a0464 [ 577.500377001,3] ___________________________________________________________ [ 577.500429242,3] < Dangerous NVRAM option: opal-sw-xstop=enable [ 577.500480635,3] ----------------------------------------------------------- [ 577.500520165,3] \ [ 577.500562271,3] \ WW [ 577.500614905,3] <^ \___/| [ 577.500657283,3] \ / [ 577.500704560,3] \_ _/ [ 577.500743890,3] }{ The Linux HOST did not hang and it booted back after the above injection. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1848127 Title: [LTCTest][OPAL][OP930] Machine hangs after injecting the Machine Check Error Status in The Ubuntu-power-systems project: Incomplete Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Incomplete Status in linux source package in Disco: Incomplete Status in linux source package in Eoan: In Progress Bug description: [IMPACT] MCE test renders the system unresponsive on P9 open power hardware (Withersoon) [TEST] A test kernel is available in ppa:ubuntu-power-triage/lp1848127. Please see the [OTHER] section for test details and comment #7 for results with the PPA kernel. [FIX] IBM has identified the following patch that fixes this issue: commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee Author: Balbir Singh <bsinghar...@gmail.com> Date: Tue Aug 20 13:43:47 2019 +0530 powerpc/mce: Fix MCE handling for huge pages [REGRESSION POTENTIAL] The patch is applicable the powerpc architecture and limited in scope to MCE handling for huge pages. Patch does not touch any generic code. Regression if any is limited to powerpc MCE handling. [OTHER] == Comment: #0 - PAVAMAN SUBRAMANIYAM <pavsu...@in.ibm.com> - 2019-05-07 23:31:20 == Install a P9 Open Power Hardware with the latest OP930 Firmware images built from the upstream op-build git tree. root@witherspoon:~# cat /etc/os-release ID="openbmc-phosphor" NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro)" VERSION="ibm-v2.3" VERSION_ID="ibm-v2.3-476-g2d622cb-r32-0-g9973ab0" PRETTY_NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) ibm-v2.3" BUILD_ID="ibm-v2.3-476-g2d622cb-r32" root@witherspoon:~# cat /var/lib/phosphor-software-manager/pnor/ro/VERSION open-power-witherspoon-v2.3-rc2-58-g59fd0743 buildroot-2019.02.2-17-g93b841d204 skiboot-v6.3-rc2 hostboot-19a436e occ-58e422d linux-5.0.9-openpower1-p3a4d5a4 petitboot-v1.10.3 machine-xml-a6f4df3 hostboot-binaries-hw043019a.940 capp-ucode-p9-dd2-v4 sbe-249671d hcode-hw040319a.940 Then enable sw xstop manually by using below command: root@ltc-wspoon11:~# nvram -p ibm,skiboot --update-config opal-sw-xstop=enable root@ltc-wspoon11:~# nvram -p ibm,skiboot --print-config "ibm,skiboot" Partition -------------------------- experimental-fast-reset=1 snarf-mode=noooooo opal-sw-xstop=enable Then from the Linux HOST injected the MCE UE Error on the machine as follows: root@ltc-wspoon11:~# ./probe_cpus.sh -L CHIP ID: 0 CORE ID: 0 THREADS: 4 CPUs: 0 1 2 3 CHIP ID: 0 CORE ID: 1 THREADS: 4 CPUs: 4 5 6 7 CHIP ID: 0 CORE ID: 2 THREADS: 4 CPUs: 8 9 10 11 CHIP ID: 0 CORE ID: 3 THREADS: 4 CPUs: 12 13 14 15 CHIP ID: 0 CORE ID: 6 THREADS: 4 CPUs: 16 17 18 19 CHIP ID: 0 CORE ID: 7 THREADS: 4 CPUs: 20 21 22 23 CHIP ID: 0 CORE ID: 8 THREADS: 4 CPUs: 24 25 26 27 CHIP ID: 0 CORE ID: 9 THREADS: 4 CPUs: 28 29 30 31 CHIP ID: 0 CORE ID: 10 THREADS: 4 CPUs: 32 33 34 35 CHIP ID: 0 CORE ID: 11 THREADS: 4 CPUs: 36 37 38 39 CHIP ID: 0 CORE ID: 12 THREADS: 4 CPUs: 40 41 42 43 CHIP ID: 0 CORE ID: 13 THREADS: 4 CPUs: 44 45 46 47 CHIP ID: 0 CORE ID: 16 THREADS: 4 CPUs: 48 49 50 51 CHIP ID: 0 CORE ID: 17 THREADS: 4 CPUs: 52 53 54 55 CHIP ID: 0 CORE ID: 18 THREADS: 4 CPUs: 56 57 58 59 CHIP ID: 0 CORE ID: 19 THREADS: 4 CPUs: 60 61 62 63 CHIP ID: 0 CORE ID: 20 THREADS: 4 CPUs: 64 65 66 67 CHIP ID: 0 CORE ID: 21 THREADS: 4 CPUs: 68 69 70 71 CHIP ID: 8 CORE ID: 6 THREADS: 4 CPUs: 72 73 74 75 CHIP ID: 8 CORE ID: 7 THREADS: 4 CPUs: 76 77 78 79 CHIP ID: 8 CORE ID: 8 THREADS: 4 CPUs: 80 81 82 83 CHIP ID: 8 CORE ID: 9 THREADS: 4 CPUs: 84 85 86 87 CHIP ID: 8 CORE ID: 10 THREADS: 4 CPUs: 88 89 90 91 CHIP ID: 8 CORE ID: 11 THREADS: 4 CPUs: 92 93 94 95 CHIP ID: 8 CORE ID: 12 THREADS: 4 CPUs: 96 97 98 99 CHIP ID: 8 CORE ID: 13 THREADS: 4 CPUs: 100 101 102 103 CHIP ID: 8 CORE ID: 14 THREADS: 4 CPUs: 104 105 106 107 CHIP ID: 8 CORE ID: 15 THREADS: 4 CPUs: 108 109 110 111 CHIP ID: 8 CORE ID: 16 THREADS: 4 CPUs: 112 113 114 115 CHIP ID: 8 CORE ID: 17 THREADS: 4 CPUs: 116 117 118 119 CHIP ID: 8 CORE ID: 18 THREADS: 4 CPUs: 120 121 122 123 CHIP ID: 8 CORE ID: 19 THREADS: 4 CPUs: 124 125 126 127 CHIP ID: 8 CORE ID: 20 THREADS: 4 CPUs: 128 129 130 131 CHIP ID: 8 CORE ID: 21 THREADS: 4 CPUs: 132 133 134 135 CHIP ID: 8 CORE ID: 22 THREADS: 4 CPUs: 136 137 138 139 CHIP ID: 8 CORE ID: 23 THREADS: 4 CPUs: 140 141 142 143 ----------------------------- p[0] eq[0,1,2,3,4,5] ex[0,1,3,4,5,6,8,9,10] c[0,1,2,3,6,7,8,9,10,11,12,13,16,17,18,19,20,21] p[8] eq[1,2,3,4,5] ex[3,4,5,6,7,8,9,10,11] c[6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23] ----------------------------- ----------Processor Layout------------------- p[0] +---EQ00----+ +---EQ02----+ +---EQ04----+ |EX-0 C0 | |EX-4 C8 | |EX-8 C16| + - - - - - + + - - - - - + + - - - - - + |EX-0 C1 | |EX-4 C9 | |EX-8 C17| + - - - - - + + - - - - - + + - - - - - + |EX-1 C2 | |EX-5 C10| |EX-9 C18| + - - - - - + + - - - - - + + - - - - - + |EX-1 C3 | |EX-5 C11| |EX-9 C19| +-----------+ +-----------+ +-----------+ +---EQ01----+ +---EQ03----+ +---EQ05----+ | | |EX-6 C12| |EX-10 C20| + - - - - - + + - - - - - + + - - - - - + | | |EX-6 C13| |EX-10 C21| + - - - - - + + - - - - - + + - - - - - + |EX-3 C6 | | | | | + - - - - - + + - - - - - + + - - - - - + |EX-3 C7 | | | | | +-----------+ +-----------+ +-----------+ p[8] +---EQ00----+ +---EQ02----+ +---EQ04----+ | | |EX-4 C8 | |EX-8 C16| + - - - - - + + - - - - - + + - - - - - + | | |EX-4 C9 | |EX-8 C17| + - - - - - + + - - - - - + + - - - - - + | | |EX-5 C10| |EX-9 C18| + - - - - - + + - - - - - + + - - - - - + | | |EX-5 C11| |EX-9 C19| +-----------+ +-----------+ +-----------+ +---EQ01----+ +---EQ03----+ +---EQ05----+ | | |EX-6 C12| |EX-10 C20| + - - - - - + + - - - - - + + - - - - - + | | |EX-6 C13| |EX-10 C21| + - - - - - + + - - - - - + + - - - - - + |EX-3 C6 | |EX-7 C14| |EX-11 C22| + - - - - - + + - - - - - + + - - - - - + |EX-3 C7 | |EX-7 C15| |EX-11 C23| +-----------+ +-----------+ +-----------+ root@ltc-wspoon11:~# ./statedisable.sh ./statedisable.sh: line 10: /sys/devices/system/cpu/cpu*/cpuidle/state7/disable: No such file or directory ./statedisable.sh: line 11: /sys/devices/system/cpu/cpu*/cpuidle/state8/disable: No such file or directory root@ltc-wspoon11:~# cpupower idle-info CPUidle driver: powernv_idle CPUidle governor: menu analyzing CPU 0: Number of idle states: 7 Available idle states: snooze stop0_lite stop0 stop1 stop2 stop4 stop5 snooze (DISABLED) : Flags/Description: snooze Latency: 0 Usage: 81861 Duration: 29748269 stop0_lite (DISABLED) : Flags/Description: stop0_lite Latency: 1 Usage: 70 Duration: 1982345 stop0 (DISABLED) : Flags/Description: stop0 Latency: 2 Usage: 274 Duration: 125896 stop1 (DISABLED) : Flags/Description: stop1 Latency: 5 Usage: 36 Duration: 4922 stop2 (DISABLED) : Flags/Description: stop2 Latency: 10 Usage: 3745 Duration: 88300041 stop4 (DISABLED) : Flags/Description: stop4 Latency: 100 Usage: 65 Duration: 1048951 stop5 (DISABLED) : Flags/Description: stop5 Latency: 200 Usage: 30377 Duration: 61977191643 root@ltc-wspoon11:~#./run_workload.sh root@ltc-wspoon11:~# ./scom_addr_p9.sh 0x1001080c 15 EQ[ 3]: 0x1301080c EX[ 7]: 0x13010c0c C[15]: 0x3f01080c root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/getscom -c 0x8 0x13010c0c 0000000000000000 root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/putscom -c 0x8 0x13010c0c 0c00000000000000 0c00000000000000 root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/putscom -c 0x8 0x13010c0c 0c00000000000000 0c00000000000000 After injecting the Machine check error, the HOST Linux stops pinging and the console access to the machine also gets lost. But still the Open BMC shell and GUI still shows that the HOST is in Running state. == Comment: #1 - PAVAMAN SUBRAMANIYAM <pavsu...@in.ibm.com> - 2019-05-07 23:33:31 == The machine is installed with the Ubuntu 18.04 Linux OS. root@ltc-wspoon11:~# uname -a Linux ltc-wspoon11 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:26:19 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux root@ltc-wspoon11:~# cat /etc/os-release NAME="Ubuntu" VERSION="18.04.2 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.2 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic root@ltc-wspoon11:~# cat /proc/cpuinfo | tail cpu : POWER9, altivec supported clock : 2300.000000MHz revision : 2.3 (pvr 004e 1203) timebase : 512000000 platform : PowerNV model : 8335-GTH machine : PowerNV 8335-GTH firmware : OPAL MMU : Radix root@ltc-wspoon11:~# lsmcode Version of System Firmware : Product Name : OpenPOWER Firmware Product Version : witherspoon-v2.3-rc2-58-g59fd0743 Product Extra : skiboot-v6.3-rc2 Product Extra : bmc-firmware-version-2.03 Product Extra : occ-58e422d Product Extra : hostboot-19a436e Product Extra : buildroot-2019.02.2-17-g93b841d204 Product Extra : capp-ucode-p9-dd2-v4 Product Extra : machine-xml-a6f4df3 Product Extra : hostboot-binaries-hw043019a.940 Product Extra : sbe-249671d Product Extra : hcode-hw040319a.940 Product Extra : petitboot-v1.10.3 Product Extra : linux-5.0.9-openpower1-p3a4d5a4 == Comment: #3 - PAVAMAN SUBRAMANIYAM <pavsu...@in.ibm.com> - 2019-05-07 23:42:35 == I quickly tested MCE on op930 build ( IBM-witherspoon-ibm- OP9-v2.2-3.5) with 4.15.0-47-generic and found no hang. But on further investigation I see that the hang issue is seen from kernel version 4.15.0-48-generic and above. Looks like changes that gone in 4.15.0-48-generic version causing the hang issue. Still investigating.... == Comment: #9 - Application Cdeadmin <cdead...@us.ibm.com> - 2019-05-22 06:45:07 == ==== State: Working by: jayeshp on 22 May 2019 06:37:27 ==== Any update? == Comment: #11 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 2019-09-19 04:44:01 == The hang issues should go away with below patch. commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee Author: Balbir Singh <bsinghar...@gmail.com> Date: Tue Aug 20 13:43:47 2019 +0530 powerpc/mce: Fix MCE handling for huge pages The current code would fail on huge pages addresses, since the shift would be incorrect. Use the correct page shift value returned by __find_linux_pte() to get the correct physical address. The code is more generic and can handle both regular and compound pages. Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Signed-off-by: Balbir Singh <bsinghar...@gmail.com> [ar...@linux.ibm.com: Fixup pseries_do_memory_failure()] Signed-off-by: Reza Arbab <ar...@linux.ibm.com> Tested-by: Mahesh Salgaonkar <mah...@linux.vnet.ibm.com> Signed-off-by: Santosh Sivaraj <sant...@fossix.org> Cc: sta...@vger.kernel.org # v4.15+ Signed-off-by: Michael Ellerman <m...@ellerman.id.au> Link: https://lore.kernel.org/r/20190820081352.8641-3-sant...@fossix.org To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1848127/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp