------- Comment From jos...@br.ibm.com 2017-11-30 08:18 EDT------- Hello! I'm also trying to reproduce the problem on QEMU/KVM side but I haven't hit it so far.
My setup: [1] host: 8247-42L kernel: 4.13.0-16-generic guest: vanilla ubuntu-16.04.3-server-ppc64el.iso [2] host: 8335-GCA kernel: 4.13.0-17-generic guest: vanilla ubuntu-16.04.3-server-ppc64el.iso I tried several command line combinations, both KVM PR/HV modules and everything works flawlessly. After reading the logs (#comment 10) the following line called my attention: interrupt: 901 at plpar_hcall_norets+0x1c/0x28 Searching the code, I found pieces like: for_each_online_cpu(cpu) plpar_hcall_norets(...); So I'm thinking that *maybe* if one of your hw threads died and KVM alloc'ed that core it could trigger the issue. If that's the case, set the processor affinity may let the error consistent. In my case, all cores and threads looks good: $ sudo ppc64_cpu --smt=8 $ sudo ppc64_cpu --info Core 0: 0* 1* 2* 3* 4* 5* 6* 7* Core 1: 8* 9* 10* 11* 12* 13* 14* 15* Core 2: 16* 17* 18* 19* 20* 21* 22* 23* Core 3: 24* 25* 26* 27* 28* 29* 30* 31* Core 4: 32* 33* 34* 35* 36* 37* 38* 39* Core 5: 40* 41* 42* 43* 44* 45* 46* 47* Core 6: 48* 49* 50* 51* 52* 53* 54* 55* Core 7: 56* 57* 58* 59* 60* 61* 62* 63* Core 8: 64* 65* 66* 67* 68* 69* 70* 71* Core 9: 72* 73* 74* 75* 76* 77* 78* 79* Core 10: 80* 81* 82* 83* 84* 85* 86* 87* Core 11: 88* 89* 90* 91* 92* 93* 94* 95* Core 12: 96* 97* 98* 99* 100* 101* 102* 103* Core 13: 104* 105* 106* 107* 108* 109* 110* 111* Core 14: 112* 113* 114* 115* 116* 117* 118* 119* Core 15: 120* 121* 122* 123* 124* 125* 126* 127* Core 16: 128* 129* 130* 131* 132* 133* 134* 135* Core 17: 136* 137* 138* 139* 140* 141* 142* 143* Core 18: 144* 145* 146* 147* 148* 149* 150* 151* Core 19: 152* 153* 154* 155* 156* 157* 158* 159* Core 20: 160* 161* 162* 163* 164* 165* 166* 167* Core 21: 168* 169* 170* 171* 172* 173* 174* 175* Core 22: 176* 177* 178* 179* 180* 181* 182* 183* Core 23: 184* 185* 186* 187* 188* 189* 190* 191* Could you guys turn your core threads as well and give me the output? Thank you -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock Status in The Ubuntu-power-systems project: In Progress Status in linux package in Ubuntu: In Progress Bug description: Kernel 4.10.0-40-generic is causing CPU hung on POWER machines. I got this problem on a POWER8 KVM virtual machine [ 1912.003255] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 24s! [dpkg-deb:31284] [ 1912.004496] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_reject_ipv4 xfrm_user xfrm_algo xt_addrtype xt_conntrack br_netfilter ebtable_filter ebtables ip6table_filter ip6_tables ib_srpt dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio tcm_qla2xxx qla2xxx vhost_scsi vhost usb_f_tcm tcm_usb_gadget libcomposite udc_core tcm_fc libfc scsi_transport_fc tcm_loop iscsi_target_mod target_core_file target_core_iblock target_core_pscsi target_core_mod ipmi_devintf ipmi_msghandler xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat xt_tcpudp iptable_filter ip_tables x_tables openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack binfmt_misc zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) [ 1912.004575] znvpair(PO) spl(O) bridge 8021q garp mrp stp llc vmx_crypto kvm ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ibmvscsi ibmveth crc32c_vpmsum virtio_blk [ 1912.004624] CPU: 12 PID: 31284 Comm: dpkg-deb Tainted: P O 4.10.0-40-generic #44~16.04.1-Ubuntu [ 1912.004626] task: c000000775551e00 task.stack: c0000007755ac000 [ 1912.004627] NIP: 00003fff86b71960 LR: 00003fff86b7319c CTR: 000000000000002d [ 1912.004628] REGS: c0000007755afea0 TRAP: 0901 Tainted: P O (4.10.0-40-generic) [ 1912.004629] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> [ 1912.004635] CR: 42004442 XER: 20000000 [ 1912.004636] CFAR: 00003fff86b719b4 SOFTE: 1 GPR00: 00000000000000a4 00003fffd53f7d70 00003fff86ba5008 0000000000000040 GPR04: 00000000038a20fc 00003fff86467d4b 00000000036c0ed8 000000000000002a GPR08: 00003fff81c41010 00000000000a20f5 0000000000800001 ffffffffffec0ed1 GPR12: 00000000000000a6 00003fff86c8db30 [ 1912.004646] NIP [00003fff86b71960] 0x3fff86b71960 [ 1912.004647] LR [00003fff86b7319c] 0x3fff86b7319c [ 1912.004647] Call Trace: To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp