------- Comment From pavsu...@in.ibm.com 2015-12-11 08:36 EDT------- I have downloaded the test kernel as suggested and executed the scenario again.
root@powerkvmpok002:~/lp1483343# dpkg -i linux-image-3.19.0-41-generic_3.19.0-41.46~14.04.2_ppc64el.deb Selecting previously unselected package linux-image-3.19.0-41-generic. (Reading database ... 142221 files and directories currently installed.) Preparing to unpack linux-image-3.19.0-41-generic_3.19.0-41.46~14.04.2_ppc64el.deb ... Done. Unpacking linux-image-3.19.0-41-generic (3.19.0-41.46~14.04.2) ... Setting up linux-image-3.19.0-41-generic (3.19.0-41.46~14.04.2) ... Running depmod. update-initramfs: deferring update (hook will be called later) Examining /etc/kernel/postinst.d. run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic run-parts: executing /etc/kernel/postinst.d/initramfs-tools 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic update-initramfs: Generating /boot/initrd.img-3.19.0-41-generic run-parts: executing /etc/kernel/postinst.d/update-notifier 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic run-parts: executing /etc/kernel/postinst.d/zz-update-grub 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic Generating grub configuration file ... Found linux image: /boot/vmlinux-4.4.0-040400rc4-generic Found initrd image: /boot/initrd.img-4.4.0-040400rc4-generic Found linux image: /boot/vmlinux-3.19.0-41-generic Found initrd image: /boot/initrd.img-3.19.0-41-generic Found linux image: /boot/vmlinux-3.19.0-39-generic Found initrd image: /boot/initrd.img-3.19.0-39-generic Found Debian GNU/Linux (8.2) on /dev/sdb2 Found unknown Linux distribution on /dev/mapper/ibmpkvm_vg_root01-ibmpkvm_lv_system done root@powerkvmpok002:~/lp1483343# dpkg -i linux-image-extra-3.19.0-41-generic_3.19.0-41.46~14.04.2_ppc64el.deb Selecting previously unselected package linux-image-extra-3.19.0-41-generic. (Reading database ... 143108 files and directories currently installed.) Preparing to unpack linux-image-extra-3.19.0-41-generic_3.19.0-41.46~14.04.2_ppc64el.deb ... Unpacking linux-image-extra-3.19.0-41-generic (3.19.0-41.46~14.04.2) ... Setting up linux-image-extra-3.19.0-41-generic (3.19.0-41.46~14.04.2) ... run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic run-parts: executing /etc/kernel/postinst.d/initramfs-tools 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic update-initramfs: Generating /boot/initrd.img-3.19.0-41-generic run-parts: executing /etc/kernel/postinst.d/update-notifier 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic run-parts: executing /etc/kernel/postinst.d/zz-update-grub 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic Generating grub configuration file ... Found linux image: /boot/vmlinux-4.4.0-040400rc4-generic Found initrd image: /boot/initrd.img-4.4.0-040400rc4-generic Found linux image: /boot/vmlinux-3.19.0-41-generic Found initrd image: /boot/initrd.img-3.19.0-41-generic Found linux image: /boot/vmlinux-3.19.0-39-generic Found initrd image: /boot/initrd.img-3.19.0-39-generic Found Debian GNU/Linux (8.2) on /dev/sdb2 Found unknown Linux distribution on /dev/mapper/ibmpkvm_vg_root01-ibmpkvm_lv_system done root@powerkvmpok002:~# uname -a Linux powerkvmpok002 3.19.0-41-generic #46~14.04.2 SMP Thu Dec 10 15:55:32 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux root@powerkvmpok002:~# cd /opt/ltp/testcases/bin/ root@powerkvmpok002:/opt/ltp/testcases/bin# ./lock_torture.sh ./lock_torture.sh: 55: ./lock_torture.sh: tst_kvercmp: not found lock_torture 1 TINFO : estimate time 6.00 min lock_torture 1 TINFO : spin_lock: running 60 sec... lock_torture 1 TPASS : spin_lock: completed lock_torture 2 TINFO : spin_lock_irq: running 60 sec... root@powerkvmpok002:~# [ 6816.082218] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [lock_torture_wr:3486] [ 6828.106219] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 21s! [lock_torture_wr:3403] [ 6828.214216] NMI watchdog: BUG: soft lockup - CPU#141 stuck for 21s! [lock_torture_wr:3606] [ 6844.222215] NMI watchdog: BUG: soft lockup - CPU#144 stuck for 23s! [lock_torture_wr:3507] [ 6844.234220] NMI watchdog: BUG: soft lockup - CPU#149 stuck for 21s! [lock_torture_wr:3475] [ 6856.082221] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 23s! [lock_torture_wr:3345] [ 6856.094227] NMI watchdog: BUG: soft lockup - CPU#35 stuck for 21s! [lock_torture_wr:3329] [ 6856.142223] NMI watchdog: BUG: soft lockup - CPU#95 stuck for 21s! [lock_torture_wr:3283] [ 6856.162221] NMI watchdog: BUG: soft lockup - CPU#109 stuck for 23s! [lock_torture_wr:3265] [ 6856.210221] NMI watchdog: BUG: soft lockup - CPU#137 stuck for 21s! [lock_torture_wr:3458] [ 6860.118223] NMI watchdog: BUG: soft lockup - CPU#68 stuck for 22s! [lock_torture_wr:3409] [ 6860.182227] NMI watchdog: BUG: soft lockup - CPU#121 stuck for 22s! [lock_torture_wr:3341] [ 6862.110245] INFO: rcu_sched self-detected stall on CPU [ 6862.110253] INFO: rcu_sched self-detected stall on CPU [ 6862.110257] INFO: rcu_sched self-detected stall on CPU The issue is occurring even V2 test kernel provided in the link: http://kernel.ubuntu.com/~jsalisbury/lp1483343/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1483343 Title: NMI watchdog: BUG: soft lockup errors when we execute lock_torture_wr tests Status in linux package in Ubuntu: Triaged Status in linux source package in Vivid: In Progress Bug description: ---Problem Description--- NMI watchdog: BUG: soft lockup errors when we execute lock_torture_wr tests ---uname output--- Linux alp15 3.19.0-18-generic #18~14.04.1-Ubuntu SMP Wed May 20 09:40:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux Machine Type = P8 ---Steps to Reproduce--- Install a P8 Power VM LPAR with Ubuntu 14.04.2 ISO. Then install the Ubuntu 14.04.3 kernel on the same and reboot. Then compile and build the LTP latest test suites on the same. root@alp15:~# tar -xvf ltp-full-20150420.tar.bz2 root@alp15:~# cd ltp-full-20150420/ root@alp15:~/ltp-full-20150420# ls aclocal.m4 configure execltp.in install-sh Makefile README runltplite.sh testcases utils autom4te.cache configure.ac IDcheck.sh lib Makefile.release README.kernel_config runtest testscripts ver_linux config.guess COPYING include ltpmenu missing runalltests.sh scenario_groups TODO VERSION config.sub doc INSTALL m4 pan runltp scripts tools root@alp15:~/ltp-full-20150420# ./configure root@alp15:~/ltp-full-20150420# make root@alp15:~/ltp-full-20150420# make install root@alp15:/opt/ltp/testcases/bin# ./lock_torture.sh lock_torture 1 TINFO : estimate time 6.00 min lock_torture 1 TINFO : spin_lock: running 60 sec... Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ... alp15 vmunix: [ 308.034386] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 21s! [lock_torture_wr:2337] Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ... alp15 vmunix: [ 308.034389] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [lock_torture_wr:2331] Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ... alp15 vmunix: [ 308.034394] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [lock_torture_wr:2339] Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ... alp15 vmunix: [ 308.034396] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [lock_torture_wr:2346] Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ... alp15 vmunix: [ 308.034398] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 21s! [lock_torture_wr:2334] Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ... alp15 vmunix: [ 308.034410] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [lock_torture_wr:2321] Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ... alp15 vmunix: [ 308.034412] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [lock_torture_wr:2333] Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ... alp15 vmunix: [ 308.038386] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [lock_torture_wr:2327] Stack trace output: root@alp15:~# dmesg | more [ 1717.146881] lock_torture_wr R running task [ 1717.146881] [ 1717.146885] 0 2555 2 0x00000804 [ 1717.146887] Call Trace: [ 1717.146894] [c000000c7551b820] [c000000c7551b860] 0xc000000c7551b860 (unreliable) [ 1717.146899] [c000000c7551b860] [c0000000000b4fb0] __do_softirq+0x220/0x3b0 [ 1717.146904] [c000000c7551b960] [c0000000000b5478] irq_exit+0x98/0x100 [ 1717.146909] [c000000c7551b980] [c00000000001fa54] timer_interrupt+0xa4/0xe0 [ 1717.146913] [c000000c7551b9b0] [c000000000002758] decrementer_common+0x158/0x180 [ 1717.146922] --- interrupt: 901 at _raw_write_lock+0x68/0xc0 [ 1717.146922] LR = torture_rwlock_write_lock+0x28/0x40 [locktorture] [ 1717.146927] [c000000c7551bca0] [c000000c7551bcd0] 0xc000000c7551bcd0 (unreliable) [ 1717.146934] [c000000c7551bcd0] [d00000000d4810b8] torture_rwlock_write_lock+0x28/0x40 [locktorture] [ 1717.146939] [c000000c7551bcf0] [d00000000d480578] lock_torture_writer+0x98/0x210 [locktorture] [ 1717.146944] [c000000c7551bd80] [c0000000000da4d4] kthread+0x114/0x140 [ 1717.146948] [c000000c7551be30] [c00000000000956c] ret_from_kernel_thread+0x5c/0x70 [ 1717.146951] Task dump for CPU 10: [ 1717.146953] lock_torture_wr R running task 0 2537 2 0x00000804 [ 1717.146957] Call Trace: [ 1717.146961] [c000000c7557b820] [c000000c7557b860] 0xc000000c7557b860 (unreliable) [ 1717.146966] [c000000c7557b860] [c0000000000b4fb0] __do_softirq+0x220/0x3b0 [ 1717.146970] [c000000c7557b960] [c0000000000b5478] irq_exit+0x98/0x100 [ 1717.146975] [c000000c7557b980] [c00000000001fa54] timer_interrupt+0xa4/0xe0 [ 1717.146979] [c000000c7557b9b0] [c000000000002758] decrementer_common+0x158/0x180 [ 1717.146988] --- interrupt: 901 at _raw_write_lock+0x68/0xc0 [ 1717.146988] LR = torture_rwlock_write_lock+0x28/0x40 [locktorture] [ 1717.146993] [c000000c7557bca0] [c000000c7557bcd0] 0xc000000c7557bcd0 (unreliable) [ 1717.147000] [c000000c7557bcd0] [d00000000d4810b8] torture_rwlock_write_lock+0x28/0x40 [locktorture] [ 1717.147006] [c000000c7557bcf0] [d00000000d480578] lock_torture_writer+0x98/0x210 [locktorture] [ 1717.147013] [c000000c7557bd80] [c0000000000da4d4] kthread+0x114/0x140 [ 1717.147017] [c000000c7557be30] [c00000000000956c] ret_from_kernel_thread+0x5c/0x70 [ 1717.147020] Task dump for CPU 17: [ 1717.147021] Task dump for CPU 2: [ 1717.147028] lock_torture_wr R [ 1717.147028] lock_torture_wr R running task [ 1717.147033] running task 0 2547 2 0x00000804 [ 1717.147042] 0 2533 2 0x00000804 [ 1717.147044] Call Trace: [ 1717.147045] Call Trace: [ 1717.147053] [c000000c732a3820] [c000000c7f688448] 0xc000000c7f688448 [ 1717.147056] [c000000c7555f820] [c000000c7fa48448] 0xc000000c7fa48448 [ 1717.147059] (unreliable) [ 1717.147063] (unreliable) [ 1717.147063] [ 1717.147067] [ 1717.147072] Task dump for CPU 18: [ 1717.147073] Task dump for CPU 7: [ 1717.147077] lock_torture_wr R running task [ 1717.147082] lock_torture_wr R 0 2555 2 0x00000804 [ 1717.147088] running task [ 1717.147088] Call Trace: [ 1717.147096] [c000000c7551b820] [c000000c7551b860] 0xc000000c7551b860 [ 1717.147096] 0 2559 2 0x00000804 [ 1717.147102] Call Trace: [ 1717.147105] (unreliable) It is possible that we are missing this commit that fixes a deadlock during these tests: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=f548d99ef4f5ec8f7080e88ad07c44d16d058ddc will check the Ubuntu source shortly as see if this is the case and we can suggest building a kernel to see if it helps. The apt-get source linux-image- on the test system didn't pull down the sources but the kernel being used is close to the one used for vivid (3.19.0-25.26) so I pulled down the git source tree for it with git clone git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git and the resulting source shows that the patch for the commit mentioned is not applied. As I basically understand it, the problem that was fixed is that while torture_rwlock_read_lock_irq() acquires a read lock on the lock called: torture_rwlock anything that calls the counterpart torture_rwlock_read_unlock_irq() to relinquish the read lock instead ends doing a write_unlock_irqrestore() on the torture_rwlock() in essence leaving the read lock. So when the locktorture module calls something like torture_rwlock_write_lock() as we see in the bug description, it will block indefinitely as there is at least one lock reader. I'll go ahead and mirror this since I pretty confident this is the issue (also should affect Vivid). We'll have to figure out how to get the sources for the LTS kernel to build a test kernel as well. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1483343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp