You have been subscribed to a public bug:

== Comment: #0 - PAVITHRA R. PRAKASH <pavra...@in.ibm.com> - 2017-05-17 
05:55:38 ==
--- Problem description ----

Ubuntu 16.04.03: "NMI watchdog: BUG: soft lockup" occurs while running
stress-ng on NV machine.

--- Steps to recreate------

1. Install ubuntu16.04.03.
2. Run "stress-ng -a 0".

Logs:
====

[ 2660.437087] INFO: rcu_sched self-detected stall on CPU
[ 2660.437111]  22-...: (5247 ticks this GP) idle=e19/140000000000001/0 
softirq=905/905 fqs=2380 
[ 2660.437114]   (t=5251 jiffies g=95606 c=95605 q=2545946)
[ 2660.437750]  24-...: (5250 ticks this GP) idle=0b7/140000000000001/0 
softirq=5805/5805 fqs=2380 
[ 2660.437859]  
[ 2664.172796] NMI watchdog: BUG: soft lockup - CPU#22 stuck for 23s! 
[stress-ng-mmap:3509]
[ 2664.172808] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 23s! 
[stress-ng-mrema:3536]
[ 2674.848037] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 33s! 
[stress-ng-fork:3381]
[ 2676.172894] NMI watchdog: BUG: soft lockup - CPU#30 stuck for 22s! 
[kswapd0:992]
[ 2680.336844] NMI watchdog: BUG: soft lockup - CPU#98 stuck for 23s! 
[stress-ng-clock:5099]
[ 2686.140931] NMI watchdog: BUG: soft lockup - CPU#16 stuck for 39s! 
[stress-ng-clone:3366]
[ 2686.987192] xhci_hcd 0003:09:00.0: HC died; cleaning up
[ 2686.987212] usb 1-3-port3: cannot reset (err = -108)

After few hours machine will become completely unresponsive

[pavithra@localhost ~]$ ping 9.47.69.255
PING 9.47.69.255 (9.47.69.255) 56(84) bytes of data.
^C
--- 9.47.69.255 ping statistics ---
12 packets transmitted, 0 received, 100% packet loss, time 11000ms


Thanks,
Pavithra

== Comment: #6 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2017-05-25 03:32:26 
==
ubuntu@ltc-firep2:~$ hostname -i
9.47.69.255
ubuntu@ltc-firep2:~$ uname -a
Linux ltc-firep2 4.10.0-21-generic #23~16.04.1-Ubuntu SMP Tue May 2 12:54:57 
UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
ubuntu@ltc-firep2:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.2 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/";
SUPPORT_URL="http://help.ubuntu.com/";
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/";
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
ubuntu@ltc-firep2:~$ tail /proc/cpuinfo
processor       : 159
cpu             : POWER8 (raw), altivec supported
clock           : 2061.000000MHz
revision        : 2.0 (pvr 004d 0200)

timebase        : 512000000
platform        : PowerNV
model           : 8335-GTA        
machine         : PowerNV 8335-GTA        
firmware        : OPAL
ubuntu@ltc-firep2:~$


== Comment: #11 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2017-05-25 06:27:12 
==
System Memory stats
==============
ubuntu@ltc-firep2:~$ numactl -H
available: 2 nodes (0,8)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 
77 78 79
node 0 size: 61321 MB
node 0 free: 60297 MB
node 8 cpus: 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
node 8 size: 65303 MB
node 8 free: 64923 MB
node distances:
node   0   8 
  0:  10  40 
  8:  40  10 
ubuntu@ltc-firep2:~$ free -h 
              total        used        free      shared  buff/cache   available
Mem:           123G        534M        122G         20M        868M        121G
Swap:           37G          0B         37G
ubuntu@ltc-firep2:~$ sudo sysctl vm | grep free
vm.min_free_kbytes = 360448
ubuntu@ltc-firep2:~$ 

Host is having 123 GB of memory spread across two nodes. 
Swap is configured to be 37GB and VM min free bytes is set to 360MB.

== Comment: #13 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2017-05-25
07:25:35 ==

[  280.494345] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! 
[stress-ng-mmap:4172]
[  280.495250] CPU: 5 PID: 4172 Comm: stress-ng-mmap Not tainted 
4.10.0-21-generic #23~16.04.1-Ubuntu
[  280.495262] task: c000000fe318c600 task.stack: c000000fc0d7c000
[  280.495271] NIP: c0000000001a3248 LR: c0000000001a3204 CTR: c0000000000871f0
[  280.495285] REGS: c000000fc0d7f7d0 TRAP: 0901   Not tainted  
(4.10.0-21-generic)
[  280.495299] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[  280.495408]   CR: 44424444  XER: 20000000
[  280.495416] CFAR: c0000000001a3250 SOFTE: 1 
[  280.495624] NIP [c0000000001a3248] smp_call_function_many+0x358/0x3f0
[  280.495636] LR [c0000000001a3204] smp_call_function_many+0x314/0x3f0
[  280.495645] Call Trace:
[  280.495660] [c000000fc0d7fa50] [c0000000001a31e4] 
smp_call_function_many+0x2f4/0x3f0 (unreliable)
[  280.495697] [c000000fc0d7fac0] [c0000000001a3430] 
kick_all_cpus_sync+0x40/0x50
[  280.495726] [c000000fc0d7fae0] [c000000000069728] 
hash__pmdp_huge_get_and_clear+0xa8/0xf0
[  280.495742] [c000000fc0d7fb10] [c00000000032b600] change_huge_pmd+0x210/0x2d0
[  280.495762] [c000000fc0d7fb80] [c0000000002df638] 
change_protection_range+0xb38/0xe60
[  280.495789] [c000000fc0d7fcc0] [c00000000030994c] change_prot_numa+0x3c/0xc0
[  280.495815] [c000000fc0d7fcf0] [c00000000012e854] task_numa_work+0x2d4/0x3f0
[  280.495844] [c000000fc0d7fdb0] [c00000000010f330] task_work_run+0x140/0x1a0
[  280.495868] [c000000fc0d7fe00] [c00000000001db04] do_notify_resume+0xe4/0xf0
[  280.495885] [c000000fc0d7fe30] [c00000000000b744] 
ret_from_except_lite+0x70/0x74
[  280.495909] Instruction dump:
[  280.495925] 3d020003 78691f24 39480fe0 7d2a482a e95d0000 7d4a4a14 812a0018 
792707e1 
[  280.496022] 4182001c 60420000 7c210b78 7c421378 <812a0018> 792807e1 4082fff0 
7c2004ac 


[  636.509312] NMI watchdog: BUG: soft lockup - CPU#29 stuck for 22s! 
[stress-ng-mrema:4205]
[  636.510076] CPU: 29 PID: 4205 Comm: stress-ng-mrema Tainted: G             L 
 4.10.0-21-generic #23~16.04.1-Ubuntu
[  636.510090] task: c000000fdef86e00 task.stack: c000000fdd074000
[  636.510104] NIP: c0000000001a3244 LR: c0000000001a3204 CTR: c0000000000871f0
[  636.510136] REGS: c000000fdd077760 TRAP: 0901   Tainted: G             L   
(4.10.0-21-generic)
[  636.510146] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[  636.510302]   CR: 44484824  XER: 20000000
[  636.510319] CFAR: c0000000001a3250 SOFTE: 1 
[  636.510620] NIP [c0000000001a3244] smp_call_function_many+0x354/0x3f0
[  636.510647] LR [c0000000001a3204] smp_call_function_many+0x314/0x3f0
[  636.510658] Call Trace:
[  636.510676] [c000000fdd0779e0] [c0000000001a31e4] 
smp_call_function_many+0x2f4/0x3f0 (unreliable)
[  636.510759] [c000000fdd077a50] [c0000000001a3430] 
kick_all_cpus_sync+0x40/0x50
[  636.510791] [c000000fdd077a70] [c00000000006f350] pmdp_invalidate+0x80/0xc0
[  636.510820] [c000000fdd077aa0] [c000000000327d7c] 
__split_huge_pmd_locked+0x5bc/0xaa0
[  636.510842] [c000000fdd077b60] [c00000000032b834] 
__split_huge_pmd+0x174/0x280
[  636.510876] [c000000fdd077bc0] [c00000000032bc04] 
vma_adjust_trans_huge+0x134/0x1a0
[  636.510909] [c000000fdd077c10] [c0000000002da1e4] __vma_adjust+0x114/0x8e0
[  636.510932] [c000000fdd077cf0] [c0000000002dac2c] 
__split_vma.isra.5+0x27c/0x2a0
[  636.510969] [c000000fdd077d40] [c0000000002dbb34] do_munmap+0x134/0x480
[  636.510991] [c000000fdd077db0] [c0000000002e1550] SyS_mremap+0x1f0/0x550
[  636.511029] [c000000fdd077e30] [c00000000000b184] system_call+0x38/0xe0
[  636.511048] Instruction dump:
[  636.511065] 409dfda4 3d020003 78691f24 39480fe0 7d2a482a e95d0000 7d4a4a14 
812a0018 
[  636.511184] 792707e1 4182001c 60420000 7c210b78 <7c421378> 812a0018 792807e1 
4082fff0 

Even after increasing  vm.min_free_kbytes to 2GB also, soft lockups and hang is 
still being
seen after running stress-ng tool. This seems to be kernel issue.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: architecture-ppc64le bot-comment bugnameltc-154759 severity-high 
targetmilestone-inin16043
-- 
Ubuntu 16.04.03: "NMI watchdog: BUG: soft lockup" occurs while running 
stress-ng on PowerNV machine.
https://bugs.launchpad.net/bugs/1693566
You received this bug notification because you are a member of Kernel Packages, 
which is subscribed to linux in Ubuntu.

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to