** Package changed: ubuntu => linux (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1370421
Title: BUG: soft lockup - CPU#15 stuck for 59737s! [genload:22734] Status in “linux” package in Ubuntu: Incomplete Bug description: == Comment: #0 - ABDUL HALEEM <abdha...@in.ibm.com> - 2014-09-01 05:24:37 == ---Problem Description--- CPU stalls and soft lockup on cpu while running ltpstresstest.sh test of LTP suite, detailed syslog and the test logs are attached Contact Information = abdha...@in.ibm.com ---uname output--- Linux ubuntu 3.16.0-10-generic #15-Ubuntu SMP Thu Aug 21 16:32:31 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux Machine Type = POWER8 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- - Ubuntu 14.10 LE guest running on Power 8 machine with Power KVM build 2_1_1.8 - Download and build LTP suite on the guest. run /opt/ltp/testscripts/ltpstress.sh -d /tmp/sardata -l /tmp/ltplog.12028 -m 128 -t 24 -S - After 2hrs of test run, dmesg start throwing below trace messages. syslog: --------- Aug 31 09:31:59 ubuntu kernel: [83796.274731] Adding 576k swap on swapfile29. Priority:-29 extents:1 across:576k FS Aug 31 09:32:00 ubuntu in.rshd[8457]: connect from 127.0.0.1 (127.0.0.1) Aug 31 09:32:01 ubuntu in.rshd[8459]: connect from 127.0.0.1 (127.0.0.1) Aug 31 09:32:02 ubuntu in.rshd[8461]: connect from 127.0.0.1 (127.0.0.1) Sep 1 04:42:36 ubuntu kernel: [147953.248523] INFO: rcu_sched detected stalls on CPUs/tasks: { 15} (detected by 2, t=92214 jiffies, g=440674, c=440673, q=304) Sep 1 04:42:36 ubuntu kernel: [147953.248720] Task dump for CPU 15: Sep 1 04:42:36 ubuntu kernel: [147953.248725] genload R running task 0 22734 22733 0x00040000 Sep 1 04:42:36 ubuntu kernel: [147953.248730] Call Trace: Sep 1 04:42:36 ubuntu kernel: [147953.248740] [c0000000033239b0] [c000000000056fe4] ht64_call_hpte_insert1+0x4/0x3c (unreliable) Sep 1 04:42:36 ubuntu kernel: [147953.248745] [c000000003323ab0] [c0000000000532c8] hash_preload+0x2f8/0x300 Sep 1 04:42:36 ubuntu kernel: [147953.248748] [c000000003323b30] [c00000000004eaf0] update_mmu_cache+0xf0/0x110 Sep 1 04:42:36 ubuntu kernel: [147953.248753] [c000000003323b70] [c00000000023559c] handle_mm_fault+0xa0c/0x11b0 Sep 1 04:42:36 ubuntu kernel: [147953.248758] [c000000003323c10] [c0000000009e58dc] do_page_fault+0x71c/0x990 Sep 1 04:42:36 ubuntu kernel: [147953.248762] [c000000003323e30] [c000000000009568] handle_page_fault+0x10/0x30 Sep 1 04:42:36 ubuntu kernel: [147953.250365] INFO: rcu_sched detected stalls on CPUs/tasks: { 15} (detected by 2, t=16035133 jiffies, g=440674, c=440673, q=304) Sep 1 04:42:36 ubuntu kernel: [147953.250519] Task dump for CPU 15: Sep 1 04:42:36 ubuntu kernel: [147953.250522] genload R running task 0 22734 22733 0x00040000 Sep 1 04:42:36 ubuntu kernel: [147953.250525] Call Trace: Sep 1 04:42:36 ubuntu kernel: [147953.250528] [c0000000033239b0] [c000000000056fe4] ht64_call_hpte_insert1+0x4/0x3c (unreliable) Sep 1 04:42:36 ubuntu kernel: [147953.250532] [c000000003323ab0] [c0000000000532c8] hash_preload+0x2f8/0x300 Sep 1 04:42:36 ubuntu kernel: [147953.250535] [c000000003323b30] [c00000000004eaf0] update_mmu_cache+0xf0/0x110 Sep 1 04:42:36 ubuntu kernel: [147953.250538] [c000000003323b70] [c00000000023559c] handle_mm_fault+0xa0c/0x11b0 Sep 1 04:42:36 ubuntu kernel: [147953.250541] [c000000003323c10] [c0000000009e58dc] do_page_fault+0x71c/0x990 Sep 1 04:42:36 ubuntu kernel: [147953.250544] [c000000003323e30] [c000000000009568] handle_page_fault+0x10/0x30 Sep 1 04:42:36 ubuntu kernel: [147953.257562] BUG: soft lockup - CPU#15 stuck for 59737s! [genload:22734] Sep 1 04:42:36 ubuntu kernel: [147953.257647] Modules linked in: nfsv2 nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache pseries_rng rtc_generic e1000 ohci_pci Other details : ------------------ @ubuntu:/tmp$ lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 16 NUMA node(s): 1 Model: IBM pSeries (emulated by qemu) L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0-15 @ubuntu:/tmp$ free total used free shared buffers cached Mem: 2072704 892480 1180224 448 274240 132480 -/+ buffers/cache: 485760 1586944 Swap: 3460160 35392 3424768 @ubuntu:/tmp$ uptime 05:22:02 up 1 day, 19:06, 2 users, load average: 10.67, 9.10, 9.32 Thanks == Comment: #1 - ABDUL HALEEM <abdha...@in.ibm.com> - 2014-09-01 05:31:58 == == Comment: #2 - ABDUL HALEEM <abdha...@in.ibm.com> - 2014-09-01 05:36:48 == == Comment: #5 - MAMATHA INAMDAR <mainam...@in.ibm.com> - 2014-09-05 05:03:56 == Hi Abdul, Are you able to recreate this issue? Please update the bug with your latest test results. == Comment: #6 - ABDUL HALEEM <abdha...@in.ibm.com> - 2014-09-10 05:55:47 == (In reply to comment #5) > Hi Abdul, > Are you able to recreate this issue? > Please update the bug with your latest test results. Hi Mamatha, I have started the test again with xmon enabled. will keep updating you on status. Thanks == Comment: #7 - ABDUL HALEEM <abdha...@in.ibm.com> - 2014-09-10 05:59:17 == I have started the test on 3.16.0-14-generic and I still see these messages in syslog [ 8075.169576] Unable to find swap-space signature [ 7452.105450] Unable to find swap-space signature should we worry about this. the original problem has not reproduced yet..will update the soon == Comment: #8 - Dan Streetman <ddstr...@us.ibm.com> - 2014-09-10 08:44:21 == (In reply to comment #7) > I have started the test on 3.16.0-14-generic and I still see these messages > in syslog > > [ 8075.169576] Unable to find swap-space signature > [ 7452.105450] Unable to find swap-space signature > > should we worry about this. It looks like you have some kind of tests creating/adding swap files, and I have no idea what those tests look like, so I don't know if this is an expected result of the tests or not. Generally that error means you are trying to swapon a swap file that isn't correctly initialized with mkswap, or it's header is corrupted. Assuming your test isn't expecting a failure, you should just mkswap again on whatever swap file is failing. It looks like "./swapfile01", but since you're using relative paths, I can't tell you where it's located. == Comment: #9 - ABDUL HALEEM <abdha...@in.ibm.com> - 2014-09-11 04:09:13 == Hi, I recreated the bug on latest kernel 3.16.0-14-generic If I properly recall the scenario due to which kernel triggered soft lockup - CPU#15 traces is During my first test run, the next day I saw the guest was in 'paused' state, as my host disk partition on which /var/lib/libvirt/images is mounted was out of space, I freed up the disk space and resumed the guest. Still i see my test were running, but dmesg showed the traces messages. So in my last run I recreated similar scenario with xmon=on and found that the traces are triggered when I suspend and resume my guest when test were running and not because of my actual test. --- Actual steps to reproduce -- - enable xmon in /etc/default/grub and run 'update-grub' and 'reboot' - Run ltpstress test - suspend the guest 'virsh suspend <guest>' - after few seconds resume. my test running fine - dmesg showed the original traces messages as below perhaps when the traces were triggered, the console did not fall to xmon, I guess this might be a different problem. I have kept the system in the same state. Trace messages: [84735.190787] Adding 576k swap on swapfile27. Priority:-27 extents:1 across:576k FS [84735.740298] Adding 576k swap on swapfile28. Priority:-28 extents:1 across:576k FS [84736.062528] Adding 576k swap on swapfile29. Priority:-29 extents:1 across:576k FS [84924.032436] BUG: soft lockup - CPU#0 stuck for 104s! [float_bessel:10251] [84924.032507] Modules linked in: nfsv2 nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache pseries_rng rtc_generic shpchp ohci_pci e1000 [84924.032525] CPU: 0 PID: 10251 Comm: float_bessel Not tainted 3.16.0-14-generic #20-Ubuntu [84924.032527] task: c000000003100000 ti: c00000003250c000 task.ti: c00000003250c000 [84924.032529] NIP: c0000000000110b4 LR: c0000000000110b4 CTR: 00003fffb4644120 [84924.032531] REGS: c00000003250fb90 TRAP: 0901 Not tainted (3.16.0-14-generic) [84924.032532] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22002444 XER: 00000000 [84924.032538] CFAR: 00003fffb4645888 SOFTE: 1 GPR00: c00000000000a704 c00000003250fe10 c0000000013d49e0 0000000000000900 GPR04: 0000000000040004 0000000000000000 00000000009c0000 00000000ff001009 GPR08: 000182dee8f4d56f 000000007fefffff 0000000040cc8595 0000000000000000 GPR12: 0000000000002200 00003fffab8658f0 [84924.032552] NIP [c0000000000110b4] arch_local_irq_restore+0x74/0x90 [84924.032554] LR [c0000000000110b4] arch_local_irq_restore+0x74/0x90 [84924.032556] Call Trace: [84924.032557] [c00000003250fe10] [0000000000002856] 0x2856 (unreliable) [84924.032561] [c00000003250fe30] [c00000000000a704] ret_from_except_lite+0x30/0x60 [84924.032562] Instruction dump: [84924.032563] 994d02ba 2fa30000 409e0024 e92d0020 61298000 7d210164 38210020 e8010010 [84924.032566] 7c0803a6 4e800020 60420000 4bff1315 <60000000> 4bffffe4 60420000 e92d0020 [84926.062119] Adding 576k swap on ./swapfile01. Priority:-2 extents:1 across:576k FS [84936.733247] Adding 65472k swap on ./swapfile01. Priority:-2 extents:2 across:114624k Thanks To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1370421/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp