------- Comment From pavra...@in.ibm.com 2018-03-30 01:14 EDT------- Tested again with given kernel, dump capture is successful with smt=2 and smt=off.
Sorry fr the wrong update in previous comment, not sure what i had missed yesterday. root@ltc-wspoon4:~# uname -a Linux ltc-wspoon4 4.15.0-12-generic #13~lp1758206 SMP Tue Mar 27 15:20:59 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux root@ltc-wspoon4:~# ppc64_cpu --smt=off root@ltc-wspoon4:~# root@ltc-wspoon4:~# echo 1 > /proc/sys/kernel/sysrq root@ltc-wspoon4:~# echo "c" > /proc/sysrq-trigger [ 1424.806117] sysrq: SysRq : Trigger a crash [ 1424.806163] Unable to handle kernel paging request for data at address 0x00000000 [ 1424.806267] Faulting instruction address: 0xc0000000007ec768 [ 1424.806352] Oops: Kernel access of bad area, sig: 11 [#1] [ 1424.806424] LE SMP NR_CPUS=2048 NUMA PowerNV [ 1424.806483] Modules linked in: idt_89hpesx(E) at24 ofpart uio_pdrv_genirq cmdlinepart powernv_flash uio mtd opal_prd ipmi_powernv ipmi_devintf ibmpowernv vmx_crypto ipmi_msghandler crct10dif_vpmsum sch_fq_codel ip_tables x_tables autofs4 ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci crc32c_vpmsum drm tg3 libahci [ 1424.806828] CPU: 0 PID: 3110 Comm: bash Tainted: G E 4.15.0-12-generic #13~lp1758206 [ 1424.806963] NIP: c0000000007ec768 LR: c0000000007ed6a8 CTR: c0000000007ec740 [ 1424.807075] REGS: c000001fce3d39f0 TRAP: 0300 Tainted: G E (4.15.0-12-generic) [ 1424.807211] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28222222 XER: 20040000 [ 1424.807325] CFAR: c0000000007ed6a4 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1 [ 1424.807325] GPR00: c0000000007ed6a8 c000001fce3d3c70 c0000000016eaf00 0000000000000063 [ 1424.807325] GPR04: c000001ff6fbce18 c000001ff6fd4368 9000000000009033 000000000000000a [ 1424.807325] GPR08: 0000000000000007 0000000000000001 0000000000000000 9000000000001003 [ 1424.807325] GPR12: c0000000007ec740 c000000007a20000 000006127f00ae48 0000000000000000 [ 1424.807325] GPR16: 000006124f78e9f0 000006124f821998 000006124f8219d0 000006124f858204 [ 1424.807325] GPR20: 0000000000000000 0000000000000001 0000000000000000 00007fffd6e57524 [ 1424.807325] GPR24: 00007fffd6e57520 000006124f85afc4 c0000000015e9968 0000000000000002 [ 1424.807325] GPR28: 0000000000000063 0000000000000004 c000000001572a9c c0000000015e9d08 [ 1424.808272] NIP [c0000000007ec768] sysrq_handle_crash+0x28/0x30 [ 1424.808364] LR [c0000000007ed6a8] __handle_sysrq+0xf8/0x2c0 [ 1424.808417] Call Trace: [ 1424.808468] [c000001fce3d3c70] [c0000000007ed688] __handle_sysrq+0xd8/0x2c0 (unreliable) [ 1424.808582] [c000001fce3d3d10] [c0000000007edeb4] write_sysrq_trigger+0x64/0x90 [ 1424.808690] [c000001fce3d3d40] [c00000000047dfe8] proc_reg_write+0x88/0xd0 [ 1424.808782] [c000001fce3d3d70] [c0000000003d131c] __vfs_write+0x3c/0x70 [ 1424.808875] [c000001fce3d3d90] [c0000000003d1578] vfs_write+0xd8/0x220 [ 1424.808957] [c000001fce3d3de0] [c0000000003d1898] SyS_write+0x68/0x110 [ 1424.809038] [c000001fce3d3e30] [c00000000000b184] system_call+0x58/0x6c [ 1424.809139] Instruction dump: [ 1424.809191] 4bfff9f1 4bfffe50 3c4c00f0 3842e7c0 7c0802a6 60000000 39200001 3d42001c [ 1424.809294] 394a6db0 912a0000 7c0004ac 39400000 <992a0000> 4e800020 3c4c00f0 3842e790 [ 1424.809399] ---[ end trace a6b92894072107e0 ]--- [ 1425.814557] [ 1425.814659] Sending IPI to other CPUs [ 1427[ 1827.188061287,5] OPAL: Switch to big-endian OS .111853] IPI complete [ 1428[ 1830.496187306,5] OPAL: Switch to little-endian OS [ 1832.313865861,3] PHB#0000[0:0]: CRESET: Unexpected slot state 00000102, resetting... [ 1840.498727171,3] PHB#0003[0:3]: CRESET: Unexpected slot state 00000102, resetting... [ 1849.245109062,3] PHB#0030[8:0]: CRESET: Unexpected slot state 00000102, resetting... [ 1851.209060452,3] PHB#0033[8:3]: CRESET: Unexpected slot state 00000102, resetting... [ 1853.170614858,3] PHB#0034[8:4]: CRESET: Unexpected slot state 00000102, resetting... .808156] kexec: Starting switchover sequence. [ 1.199857] integrity: Unable to open file: /etc/keys/x509_ima.der (-2) [ 1.199861] integrity: Unable to open file: /etc/keys/x509_evm.der (-2) [ 1.286500] vio vio: uevent: failed to send synthetic uevent /dev/sdb2: recovering journal /dev/sdb2: clean, 163655/61054976 files, 17123931/244188416 blocks [ 6.018312] vio vio: uevent: failed to send synthetic uevent [ OK ] Started Show Plymouth Boot Screen. plymouth-start.service [ OK ] Started Forward Password Requests to Plymouth Directory Watch. [ OK ] Reached target Local Encrypted Volumes. [ OK ] Started Network Service. systemd-networkd.service Starting Wait for Network to be Configured... [ OK ] Reached target Network. [ 7.934300] PKCS#7 signature not signed with a trusted key [ 7.934373] PKCS#7 signature not signed with a trusted key [ 7.935026] PKCS#7 signature not signed with a trusted key [ 7.935470] PKCS#7 signature not signed with a trusted key [ 7.936307] PKCS#7 signature not signed with a trusted key [ 7.937347] PKCS#7 signature not signed with a trusted key [ 7.938346] PKCS#7 signature not signed with a trusted key [ 7.939391] PKCS#7 signature not signed with a trusted key [ 7.939831] PKCS#7 signature not signed with a trusted key [ OK ] Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch. [ OK ] Created slice system-lvm2\x2dpvscan.slice. Starting LVM2 PV scan on device 8:3... [ OK ] Started AppArmor initialization. apparmor.service [ OK ] Started LVM2 PV scan on device 8:3. lvm2-pvscan@8:3.service [ OK ] Started Flush Journal to Persistent Storage. systemd-journal-flush.service Starting Create Volatile Files and Directories... [ OK ] Started Create Volatile Files and Directories. systemd-tmpfiles-setup.service Starting Update UTMP about System Boot/Shutdown... Starting Network Time Synchronization... [ OK ] Started Update UTMP about System Boot/Shutdown. systemd-update-utmp.service [ OK ] Started Network Time Synchronization. systemd-timesyncd.service [ OK ] Reached target System Time Synchronized. [ OK ] Reached target System Initialization. [ OK ] Started Wait for Network to be Configured. systemd-networkd-wait-online.service [ OK ] Reached target Network is Online. Starting Kernel crash dump capture service... [ 12.548307] kdump-tools[961]: Starting kdump-tools: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/201803300002/dump-incomplete Copying data : [100.0 %] | eta: 0s [ 25.779768] kdump-tools[961]: The kernel version is not supported. [ 25.779891] kdump-tools[961]: The makedumpfile operation may be incomplete. [ 25.779997] kdump-tools[961]: The dumpfile is saved to /var/crash/201803300002/dump-incomplete. [ 25.780101] kdump-tools[961]: makedumpfile Completed. [ 25.786228] kdump-tools[961]: * kdump-tools: saved vmcore in /var/crash/201803300002 [ 29.025567] kdump-tools[961]: * running makedumpfile --dump-dmesg /proc/vmcore /var/crash/201803300002/dmesg.201803300002 [ 29.067040] kdump-tools[961]: The kernel version is not supported. [ 29.067179] kdump-tools[961]: The makedumpfile operation may be incomplete. [ 29.067281] kdump-tools[961]: The dmesg log is saved to /var/crash/201803300002/dmesg.201803300002. [ 29.067401] kdump-tools[961]: makedumpfile Completed. [ 29.067524] kdump-tools[961]: * kdump-tools: saved dmesg content in /var/crash/201803300002 [ 29.179499] kdump-tools[961]: Fri, 30 Mar 2018 00:02:37 -0500 [ 29.268561] kdump-tools[961]: Rebooting. [ 29.360898] reboot: Restarting system [ 1891.310447037,5] OPAL: Reboot request... root@ltc-wspoon4:~# ppc64_cpu --smt=2 root@ltc-wspoon4:~# ppc64_cpu --smt SMT=2 root@ltc-wspoon4:~# echo 1 > /proc/sys/kernel/sysrq root@ltc-wspoon4:~# echo "c" > /proc/sysrq-trigger [ 304.763364] sysrq: SysRq : Trigger a crash [ 304.763397] Unable to handle kernel paging request for data at address 0x00000000 [ 304.763441] Faulting instruction address: 0xc0000000007ec768 [ 304.763480] Oops: Kernel access of bad area, sig: 11 [#1] [ 304.763510] LE SMP NR_CPUS=2048 NUMA PowerNV [ 304.763543] Modules linked in: idt_89hpesx(E) ipmi_powernv ipmi_devintf vmx_crypto ipmi_msghandler crct10dif_vpmsum opal_prd at24 ofpart uio_pdrv_genirq uio cmdlinepart powernv_flash ibmpowernv mtd sch_fq_codel ip_tables x_tables autofs4 ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci crc32c_vpmsum drm tg3 libahci [ 304.763732] CPU: 40 PID: 2980 Comm: bash Tainted: G E 4.15.0-12-generic #13~lp1758206 [ 304.763784] NIP: c0000000007ec768 LR: c0000000007ed6a8 CTR: c0000000007ec740 [ 304.763828] REGS: c000000005d539f0 TRAP: 0300 Tainted: G E (4.15.0-12-generic) [ 304.763879] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28222222 XER: 20040000 [ 304.763929] CFAR: c0000000007ed6a4 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1 [ 304.763929] GPR00: c0000000007ed6a8 c000000005d53c70 c0000000016eaf00 0000000000000063 [ 304.763929] GPR04: c000001ff79bce18 c000001ff79d4368 9000000000009033 000000000000000a [ 304.763929] GPR08: 0000000000000007 0000000000000001 0000000000000000 9000000000001003 [ 304.763929] GPR12: c0000000007ec740 c000000007a3b800 00000d7bc5ccb048 0000000000000000 [ 304.763929] GPR16: 00000d7b8865e9f0 00000d7b886f1998 00000d7b886f19d0 00000d7b88728204 [ 304.763929] GPR20: 0000000000000000 0000000000000001 0000000000000000 00007ffff2a55254 [ 304.763929] GPR24: 00007ffff2a55250 00000d7b8872afc4 c0000000015e9968 0000000000000002 [ 304.763929] GPR28: 0000000000000063 0000000000000004 c000000001572a9c c0000000015e9d08 [ 304.764328] NIP [c0000000007ec768] sysrq_handle_crash+0x28/0x30 [ 304.764367] LR [c0000000007ed6a8] __handle_sysrq+0xf8/0x2c0 [ 304.764396] Call Trace: [ 304.764414] [c000000005d53c70] [c0000000007ed688] __handle_sysrq+0xd8/0x2c0 (unreliable) [ 304.764460] [c000000005d53d10] [c0000000007edeb4] write_sysrq_trigger+0x64/0x90 [ 304.764507] [c000000005d53d40] [c00000000047dfe8] proc_reg_write+0x88/0xd0 [ 304.764547] [c000000005d53d70] [c0000000003d131c] __vfs_write+0x3c/0x70 [ 304.764586] [c000000005d53d90] [c0000000003d1578] vfs_write+0xd8/0x220 [ 304.764626] [c000000005d53de0] [c0000000003d1898] SyS_write+0x68/0x110 [ 304.764675] [c000000005d53e30] [c00000000000b184] system_call+0x58/0x6c [ 304.764749] Instruction dump: [ 304.764801] 4bfff9f1 4bfffe50 3c4c00f0 3842e7c0 7c0802a6 60000000 39200001 3d42001c [ 304.764914] 394a6db0 912a0000 7c0004ac 39400000 <992a0000> 4e800020 3c4c00f0 3842e790 [ 304.764973] ---[ end trace f398ca9c3def6fee ]--- [ 305.769801] [ 305.769885] Sending IPI to other CPUs [ 307.066750][ 426.346633670,5] OPAL: Switch to big-endian OS IPI complete [ 308.752258][ 430.106080901,5] OPAL: Switch to little-endian OS [ 431.432751400,3] PHB#0000[0:0]: CRESET: Unexpected slot state 00000102, resetting... [ 440.104674740,3] PHB#0003[0:3]: CRESET: Unexpected slot state 00000102, resetting... [ 448.361087598,3] PHB#0030[8:0]: CRESET: Unexpected slot state 00000102, resetting... [ 450.323714101,3] PHB#0033[8:3]: CRESET: Unexpected slot state 00000102, resetting... [ 452.285139375,3] PHB#0034[8:4]: CRESET: Unexpected slot state 00000102, resetting... kexec: Starting switchover sequence. [ 1.209519] integrity: Unable to open file: /etc/keys/x509_ima.der (-2) [ 1.209524] integrity: Unable to open file: /etc/keys/x509_evm.der (-2) [ 1.294768] vio vio: uevent: failed to send synthetic uevent /dev/sdb2: recovering journal /dev/sdb2: clean, 163660/61054976 files, 17174134/244188416 blocks [ 5.874570] vio vio: uevent: failed to send synthetic uevent [ OK ] Started Show Plymouth Boot Screen. plymouth-start.service [ OK ] Reached target Local Encrypted Volumes. [ OK ] Started Forward Password Requests to Plymouth Directory Watch. [ 7.502179] PKCS#7 signature not signed with a trusted key [ 7.502369] PKCS#7 signature not signed with a trusted key [ 7.502389] PKCS#7 signature not signed with a trusted key [ 7.503448] PKCS#7 signature not signed with a trusted key [ 7.503482] PKCS#7 signature not signed with a trusted key [ 7.504590] PKCS#7 signature not signed with a trusted key [ 7.504731] PKCS#7 signature not signed with a trusted key [ 7.505761] PKCS#7 signature not signed with a trusted key [ 7.506811] PKCS#7 signature not signed with a trusted key [ OK ] Started Network Service. systemd-networkd.service [ OK ] Reached target Network. Starting Wait for Network to be Configured... [ OK ] Created slice system-lvm2\x2dpvscan.slice. Starting LVM2 PV scan on device 8:3... [ OK ] Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch. [ OK ] Started LVM2 PV scan on device 8:3. lvm2-pvscan@8:3.service [ OK ] Started AppArmor initialization. apparmor.service [ OK ] Started Flush Journal to Persistent Storage. systemd-journal-flush.service Starting Create Volatile Files and Directories... [ OK ] Started Create Volatile Files and Directories. systemd-tmpfiles-setup.service Starting Network Time Synchronization... Starting Update UTMP about System Boot/Shutdown... [ OK ] Started Update UTMP about System Boot/Shutdown. systemd-update-utmp.service [ OK ] Started Network Time Synchronization. systemd-timesyncd.service [ OK ] Reached target System Time Synchronized. [ OK ] Reached target System Initialization. [ OK ] Started Wait for Network to be Configured. systemd-networkd-wait-online.service [ OK ] Reached target Network is Online. Starting Kernel crash dump capture service... [ 10.834005] kdump-tools[957]: Starting kdump-tools: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/201803300011/dump-incomplete Copying data : [100.0 %] | eta: 0s [ 25.067109] kdump-tools[957]: The kernel version is not supported. [ 25.067229] kdump-tools[957]: The makedumpfile operation may be incomplete. [ 25.067327] kdump-tools[957]: The dumpfile is saved to /var/crash/201803300011/dump-incomplete. [ 25.067417] kdump-tools[957]: makedumpfile Completed. [ 25.076476] kdump-tools[957]: * kdump-tools: saved vmcore in /var/crash/201803300011 [ 27.513300] kdump-tools[957]: * running makedumpfile --dump-dmesg /proc/vmcore /var/crash/201803300011/dmesg.201803300011 [ 27.557637] kdump-tools[957]: The kernel version is not supported. [ 27.557774] kdump-tools[957]: The makedumpfile operation may be incomplete. [ 27.557866] kdump-tools[957]: The dmesg log is saved to /var/crash/201803300011/dmesg.201803300011. [ 27.557975] kdump-tools[957]: makedumpfile Completed. [ 27.558090] kdump-tools[957]: * kdump-tools: saved dmesg content in /var/crash/201803300011 [ 27.655814] kdump-tools[957]: Fri, 30 Mar 2018 00:11:18 -0500 [ 27.753246] kdump-tools[957]: Rebooting. [ 27.847115] reboot: Restarting system [ 489.155244249,5] OPAL: Reboot request... -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1758206 Title: Ubuntu 18.04 [ WSP DD2.2 with stop4 and stop5 enabled ]: kdump fails to capture dump when smt=2 or off. Status in The Ubuntu-power-systems project: In Progress Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Bug description: ---Problem Description--- Ubuntu 18.04 [ WSP DD2.2 with stop4 and stop5 enabled ]: kdump fails to capture dump when smt=2 or off. ---Environment-- Kernel Build: 4.15.0-13-generic System Name : ltc-wspoon4 Model/Type : P9 Platform : BML ---Steps to reproduce-- 1. Configure kdump. 2. Set smt=off # ppc64_cpu --smt=off 3. trigger crash. echo 1 > /proc/sys/kernel/sysrq echo "c" > /proc/sysrq-trigger ---Logs---- root@ltc-wspoon4:~# dpkg -l|grep kexec ii kexec-tools 1:2.0.16-1ubuntu1 ppc64el tools to support fast kexec reboots root@ltc-wspoon4:~# makedumpfile -v makedumpfile: version 1.6.3 (released on 29 Jun 2018) lzo enabled snappy disabled [ 285.519832] [c000001fe2d83de0] [c0000000003d1898] SyS_write+0x68/0x110 [ 285.519926] [c000001fe2d83e30] [c00000000000b184] system_call+0x58/0x6c [ 285.520007] Instruction dump: [ 285.520053] 4bfff9f1 4bfffe50 3c4c00f0 3842e800 7c0802a6 60000000 39200001 3d42001c [ 285.520158] 394a6db0 912a0000 7c0004ac 39400000 <992a0000> 4e800020 3c4c00f0 3842e7d0 [ 285.520261] ---[ end trace 90a666dc7ca6f0ec ]--- [ 286.525787] [ 286.525883] Sending IPI to other CPUs [ 28[ 401.296284048,5] OPAL: Switch to big-endian OS [ 402.297026662,3] OPAL: CPU 0x1 not in OPAL ! 6.851284] IPI complete [ 403.455520784,3] OPAL: CPU 0x1 not in OPAL !nce. [ 403.455569636,5] OPAL: Switch to little-endian OS [ 404.455711332,3] OPAL: CPU 0x1 not in OPAL ! [ 404.470276386,3] PHB#0000[0:0]: CRESET: Unexpected slot state 00000102, resetting... [ 413.140065625,3] PHB#0003[0:3]: CRESET: Unexpected slot state 00000102, resetting... [ 421.393193605,3] PHB#0030[8:0]: CRESET: Unexpected slot state 00000102, resetting... [ 423.353977316,3] PHB#0033[8:3]: CRESET: Unexpected slot state 00000102, resetting... [ 425.314547966,3] PHB#0034[8:4]: CRESET: Unexpected slot state 00000102, resetting... [ 5.004718] Processor 1 is stuck. [ 10.007584] Processor 2 is stuck. [ 15.010425] Processor 3 is stuck. [ 16.135550] integrity: Unable to open file: /etc/keys/x509_ima.der (-2) [ 16.135554] integrity: Unable to open file: /etc/keys/x509_evm.der (-2) [ 16.250952] vio vio: uevent: failed to send synthetic uevent --== Welcome to Hostboot hostboot-5fc3b52/hbicore.bin ==-- 4.52180|secure|SecureROM valid - enabling functionality 4.53193|secure|Booting in non-secure mode. 6.00924|Booting from SBE side 0 on master proc=00050000 There could be a firmware issue there but still there is need for the below kernel patches to be included to ensure kdump kernel captures dump successfully when SMT is set to 2/off https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=04b9c96eae72d862726f2f4bfcec2078240c33c5 ("powerpc/crash: Remove the test for cpu_online in the IPI callback") https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4145f358644b970fcff293c09fdcc7939e8527d2 ("powernv/kdump: Fix cases where the kdump kernel can get HMI's") https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=910961754572a2f4c83ad7e610d180 ("powerpc/kdump: Fix powernv build break when KEXEC_CORE=n") Thanks Hari To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1758206/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp