------- Comment From jos...@br.ibm.com 2017-11-30 08:18 EDT-------
Hello!

I'm also trying to reproduce the problem on QEMU/KVM side but I haven't
hit it so far.

My setup:

[1]
host: 8247-42L
kernel: 4.13.0-16-generic
guest: vanilla ubuntu-16.04.3-server-ppc64el.iso

[2]
host: 8335-GCA
kernel: 4.13.0-17-generic
guest: vanilla ubuntu-16.04.3-server-ppc64el.iso

I tried several command line combinations, both KVM PR/HV modules and
everything works flawlessly.

After reading the logs (#comment 10) the following line called my
attention:

interrupt: 901 at plpar_hcall_norets+0x1c/0x28

Searching the code, I found pieces like:

for_each_online_cpu(cpu)
plpar_hcall_norets(...);

So I'm thinking that *maybe* if one of your hw threads died and KVM
alloc'ed that core it could trigger the issue. If that's the case, set
the processor affinity may let the error consistent.

In my case, all cores and threads looks good:

$ sudo ppc64_cpu --smt=8
$ sudo ppc64_cpu --info
Core   0:    0*    1*    2*    3*    4*    5*    6*    7*
Core   1:    8*    9*   10*   11*   12*   13*   14*   15*
Core   2:   16*   17*   18*   19*   20*   21*   22*   23*
Core   3:   24*   25*   26*   27*   28*   29*   30*   31*
Core   4:   32*   33*   34*   35*   36*   37*   38*   39*
Core   5:   40*   41*   42*   43*   44*   45*   46*   47*
Core   6:   48*   49*   50*   51*   52*   53*   54*   55*
Core   7:   56*   57*   58*   59*   60*   61*   62*   63*
Core   8:   64*   65*   66*   67*   68*   69*   70*   71*
Core   9:   72*   73*   74*   75*   76*   77*   78*   79*
Core  10:   80*   81*   82*   83*   84*   85*   86*   87*
Core  11:   88*   89*   90*   91*   92*   93*   94*   95*
Core  12:   96*   97*   98*   99*  100*  101*  102*  103*
Core  13:  104*  105*  106*  107*  108*  109*  110*  111*
Core  14:  112*  113*  114*  115*  116*  117*  118*  119*
Core  15:  120*  121*  122*  123*  124*  125*  126*  127*
Core  16:  128*  129*  130*  131*  132*  133*  134*  135*
Core  17:  136*  137*  138*  139*  140*  141*  142*  143*
Core  18:  144*  145*  146*  147*  148*  149*  150*  151*
Core  19:  152*  153*  154*  155*  156*  157*  158*  159*
Core  20:  160*  161*  162*  163*  164*  165*  166*  167*
Core  21:  168*  169*  170*  171*  172*  173*  174*  175*
Core  22:  176*  177*  178*  179*  180*  181*  182*  183*
Core  23:  184*  185*  186*  187*  188*  189*  190*  191*

Could you guys turn your core threads as well and give me the output?

Thank you

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1733864

Title:
  kernel 4.10.0-40 is hanging with a CPU soft lock

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  In Progress

Bug description:
  Kernel 4.10.0-40-generic is causing CPU hung on POWER machines. I got
  this problem on a POWER8 KVM virtual machine

  [ 1912.003255] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 24s! 
[dpkg-deb:31284]
  [ 1912.004496] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs 
xfs ipt_REJECT nf_reject_ipv4 xfrm_user xfrm_algo xt_addrtype xt_conntrack 
br_netfilter ebtable_filter ebtables ip6table_filter ip6_tables ib_srpt 
dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio tcm_qla2xxx qla2xxx 
vhost_scsi vhost usb_f_tcm tcm_usb_gadget libcomposite udc_core tcm_fc libfc 
scsi_transport_fc tcm_loop iscsi_target_mod target_core_file target_core_iblock 
target_core_pscsi target_core_mod ipmi_devintf ipmi_msghandler xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat xt_tcpudp 
iptable_filter ip_tables x_tables openvswitch nf_conntrack_ipv6 nf_nat_ipv6 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack 
binfmt_misc zfs(PO) zunicode(PO) zavl(PO) zcommon(PO)
  [ 1912.004575]  znvpair(PO) spl(O) bridge 8021q garp mrp stp llc vmx_crypto 
kvm ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
multipath linear ibmvscsi ibmveth crc32c_vpmsum virtio_blk
  [ 1912.004624] CPU: 12 PID: 31284 Comm: dpkg-deb Tainted: P           O    
4.10.0-40-generic #44~16.04.1-Ubuntu
  [ 1912.004626] task: c000000775551e00 task.stack: c0000007755ac000
  [ 1912.004627] NIP: 00003fff86b71960 LR: 00003fff86b7319c CTR: 
000000000000002d
  [ 1912.004628] REGS: c0000007755afea0 TRAP: 0901   Tainted: P           O     
(4.10.0-40-generic)
  [ 1912.004629] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>
  [ 1912.004635]   CR: 42004442  XER: 20000000
  [ 1912.004636] CFAR: 00003fff86b719b4 SOFTE: 1 
                 GPR00: 00000000000000a4 00003fffd53f7d70 00003fff86ba5008 
0000000000000040 
                 GPR04: 00000000038a20fc 00003fff86467d4b 00000000036c0ed8 
000000000000002a 
                 GPR08: 00003fff81c41010 00000000000a20f5 0000000000800001 
ffffffffffec0ed1 
                 GPR12: 00000000000000a6 00003fff86c8db30 
  [ 1912.004646] NIP [00003fff86b71960] 0x3fff86b71960
  [ 1912.004647] LR [00003fff86b7319c] 0x3fff86b7319c
  [ 1912.004647] Call Trace:

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to