Public bug reported: == Comment: #0 - NAGESWARA R. SASTRY <nasas...@in.ibm.com> - 2017-07-04 07:20:40 ==
---Problem Description--- With little load on the system seeing kernel stack traces and soft lockup issues. Contact Information = nasas...@in.ibm.com ---uname output--- Linux ltc-boston25 4.12.0-041200rc6-generic #201706191233 SMP Mon Jun 19 17:38:35 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux Machine Type = Boston Power9 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- With little load on the system seeing the stack trace. Start 2-3 guests or run some workload. Stack trace output: [56954.376790] NMI watchdog: BUG: soft lockup - CPU#44 stuck for 22s! [CPU 3/KVM:48428] [56954.376949] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc kvm_hv kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter input_leds joydev mac_hid idt_89hpesx at24 nvmem_core ofpart binfmt_misc cmdlinepart powernv_flash mtd uio_pdrv_genirq uio ibmpowernv opal_prd ipmi_powernv ipmi_devintf vmx_crypto ipmi_msghandler ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas [56954.377200] hid_generic usbhid hid ast i2c_algo_bit ttm drm_kms_helper syscopyarea crct10dif_vpmsum sysfillrect crc32c_vpmsum sysimgblt fb_sys_fops drm i40e aacraid [56954.377263] CPU: 44 PID: 48428 Comm: CPU 3/KVM Not tainted 4.12.0-041200rc6-generic #201706191233 [56954.377269] task: c000200a74621000 task.stack: c000200a811cc000 [56954.377277] NIP: c0000000001a4464 LR: c0000000001a4424 CTR: c00000000008a970 [56954.377283] REGS: c000200a811cf7c0 TRAP: 0901 Not tainted (4.12.0-041200rc6-generic) [56954.377287] MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> [56954.377325] CR: 48244224 XER: 20000000 [56954.377328] CFAR: c0000000001a4470 SOFTE: 1 GPR00: c0000000001a4404 c000200a811cfa40 c000000001503800 0000000000000022 GPR04: 0000000000000022 0000000000000022 0000000000000000 0000000000000001 GPR08: 0000000000000001 0000000000000003 c000200a95599518 c000000001541ec0 GPR12: c00000000008a970 c00000000fadce00 [56954.377403] NIP [c0000000001a4464] smp_call_function_many+0x374/0x400 [56954.377414] LR [c0000000001a4424] smp_call_function_many+0x334/0x400 [56954.377417] Call Trace: [56954.377429] [c000200a811cfa40] [c0000000001a4404] smp_call_function_many+0x314/0x400 (unreliable) [56954.377444] [c000200a811cfac0] [c0000000001a462c] kick_all_cpus_sync+0x3c/0x50 [56954.377459] [c000200a811cfae0] [c00000000006e01c] pmdp_invalidate+0x7c/0xc0 [56954.377473] [c000200a811cfb10] [c0000000003315f4] change_huge_pmd+0x1d4/0x290 [56954.377486] [c000200a811cfb80] [c0000000002e59c8] change_protection_range+0xa38/0xcf0 [56954.377500] [c000200a811cfcc0] [c00000000030ff88] change_prot_numa+0x38/0xb0 [56954.377514] [c000200a811cfcf0] [c000000000130584] task_numa_work+0x2c4/0x3e0 [56954.377527] [c000200a811cfdb0] [c000000000115060] task_work_run+0x140/0x1a0 [56954.377541] [c000200a811cfe00] [c00000000001cd14] do_notify_resume+0xf4/0x100 [56954.377556] [c000200a811cfe30] [c00000000000b544] ret_from_except_lite+0x70/0x74 [56954.377562] Instruction dump: [56954.377571] 409dfd80 3d420004 394aa580 78691f24 7d2a482a e95e0000 7d4a4a14 812a0018 [56954.377597] 71280001 4182001c 60420000 7c210b78 <7c421378> 812a0018 71270001 4082fff0 [56998.366340] NMI watchdog: BUG: soft lockup - CPU#37 stuck for 22s! [qemu-system-ppc:47850] [56998.366461] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc kvm_hv kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter input_leds joydev mac_hid idt_89hpesx at24 nvmem_core ofpart binfmt_misc cmdlinepart powernv_flash mtd uio_pdrv_genirq uio ibmpowernv opal_prd ipmi_powernv ipmi_devintf vmx_crypto ipmi_msghandler ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas [56998.366630] hid_generic usbhid hid ast i2c_algo_bit ttm drm_kms_helper syscopyarea crct10dif_vpmsum sysfillrect crc32c_vpmsum sysimgblt fb_sys_fops drm i40e aacraid [56998.366670] CPU: 37 PID: 47850 Comm: qemu-system-ppc Tainted: G L 4.12.0-041200rc6-generic #201706191233 [56998.366674] task: c000200a19be7e00 task.stack: c000200a1f4ec000 [56998.366678] NIP: c0000000001a4464 LR: c0000000001a4424 CTR: c00000000008a970 [56998.366682] REGS: c000200a1f4ef730 TRAP: 0901 Tainted: G L (4.12.0-041200rc6-generic) [56998.366684] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> [56998.366704] CR: 48484224 XER: 20000000 [56998.366706] CFAR: c0000000001a4470 SOFTE: 1 GPR00: c0000000001a4404 c000200a1f4ef9b0 c000000001503800 0000000000000022 GPR04: 0000000000000022 0000000000000022 0000000000000000 0000000000000001 GPR08: 0000000000000001 0000000000000003 c000200a95599438 c000000001541ec0 GPR12: c00000000008a970 c00000000fad8480 [56998.366753] NIP [c0000000001a4464] smp_call_function_many+0x374/0x400 [56998.366758] LR [c0000000001a4424] smp_call_function_many+0x334/0x400 [56998.366761] Call Trace: [56998.366767] [c000200a1f4ef9b0] [c0000000001a4404] smp_call_function_many+0x314/0x400 (unreliable) [56998.366777] [c000200a1f4efa30] [c0000000001a462c] kick_all_cpus_sync+0x3c/0x50 [56998.366785] [c000200a1f4efa50] [c00000000006e01c] pmdp_invalidate+0x7c/0xc0 [56998.366793] [c000200a1f4efa80] [c00000000032e1ec] __split_huge_pmd_locked+0x5dc/0xae0 [56998.366799] [c000200a1f4efb40] [c0000000003318a4] __split_huge_pmd+0x164/0x270 [56998.366806] [c000200a1f4efba0] [c000000000331c44] vma_adjust_trans_huge+0x134/0x1a0 [56998.366815] [c000200a1f4efbf0] [c0000000002e0754] __vma_adjust+0x104/0x880 [56998.366821] [c000200a1f4efcd0] [c0000000002e1e34] __split_vma+0x174/0x290 [56998.366828] [c000200a1f4efd20] [c0000000002e2100] do_munmap+0x170/0x4f0 [56998.366835] [c000200a1f4efd90] [c0000000002e24f8] vm_munmap+0x78/0xd0 [56998.366842] [c000200a1f4efdf0] [c0000000002e258c] SyS_munmap+0x3c/0x50 [56998.366850] [c000200a1f4efe30] [c00000000000af84] system_call+0x38/0xe0 [56998.366854] Instruction dump: [56998.366860] 409dfd80 3d420004 394aa580 78691f24 7d2a482a e95e0000 7d4a4a14 812a0018 [56998.366877] 71280001 4182001c 60420000 7c210b78 <7c421378> 812a0018 71270001 4082fff0 Oops output: no System Dump Info: The system is not configured to capture a system dump. *Additional Instructions for nasas...@in.ibm.com: -Attach sysctl -a output output to the bug. == Comment: #1 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2017-07-14 03:52:11 == $ git log 3d3efb68c19e -1 commit 3d3efb68c19e539f0535c93a5258c1299270215f Author: Paul Mackerras <pau...@ozlabs.org> Date: Tue Jun 6 14:35:30 2017 +1000 KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1 POWER9 DD1 has an erratum where writing to the TBU40 register, which is used to apply an offset to the timebase, can cause the timebase to lose counts. This results in the timebase on some CPUs getting out of sync with other CPUs, which then results in misbehaviour of the timekeeping code. To work around the problem, we make KVM ignore the timebase offset for all guests on POWER9 DD1 machines. This means that live migration cannot be supported on POWER9 DD1 machines. Cc: sta...@vger.kernel.org # v4.10+ Signed-off-by: Paul Mackerras <pau...@ozlabs.org> $ git tag --contains 3d3efb68c19e v4.12 v4.12-rc7 $ Commit 3d3efb68c19e fixes this issue and its available with 4.12-rc7 onwards. ** Affects: ubuntu-power-systems Importance: Undecided Assignee: Canonical Kernel Team (canonical-kernel-team) Status: New ** Affects: linux (Ubuntu) Importance: Undecided Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) Status: New ** Tags: architecture-ppc64le bugnameltc-156333 severity-high targetmilestone-inin1710 ** Tags added: architecture-ppc64le bugnameltc-156333 severity-high targetmilestone-inin1710 ** Changed in: ubuntu Assignee: (unassigned) => Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) ** Package changed: ubuntu => linux (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1704343 Title: Ubuntu17.10: Changing guest timebase offset breaks host Status in The Ubuntu-power-systems project: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - NAGESWARA R. SASTRY <nasas...@in.ibm.com> - 2017-07-04 07:20:40 == ---Problem Description--- With little load on the system seeing kernel stack traces and soft lockup issues. Contact Information = nasas...@in.ibm.com ---uname output--- Linux ltc-boston25 4.12.0-041200rc6-generic #201706191233 SMP Mon Jun 19 17:38:35 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux Machine Type = Boston Power9 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- With little load on the system seeing the stack trace. Start 2-3 guests or run some workload. Stack trace output: [56954.376790] NMI watchdog: BUG: soft lockup - CPU#44 stuck for 22s! [CPU 3/KVM:48428] [56954.376949] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc kvm_hv kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter input_leds joydev mac_hid idt_89hpesx at24 nvmem_core ofpart binfmt_misc cmdlinepart powernv_flash mtd uio_pdrv_genirq uio ibmpowernv opal_prd ipmi_powernv ipmi_devintf vmx_crypto ipmi_msghandler ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas [56954.377200] hid_generic usbhid hid ast i2c_algo_bit ttm drm_kms_helper syscopyarea crct10dif_vpmsum sysfillrect crc32c_vpmsum sysimgblt fb_sys_fops drm i40e aacraid [56954.377263] CPU: 44 PID: 48428 Comm: CPU 3/KVM Not tainted 4.12.0-041200rc6-generic #201706191233 [56954.377269] task: c000200a74621000 task.stack: c000200a811cc000 [56954.377277] NIP: c0000000001a4464 LR: c0000000001a4424 CTR: c00000000008a970 [56954.377283] REGS: c000200a811cf7c0 TRAP: 0901 Not tainted (4.12.0-041200rc6-generic) [56954.377287] MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> [56954.377325] CR: 48244224 XER: 20000000 [56954.377328] CFAR: c0000000001a4470 SOFTE: 1 GPR00: c0000000001a4404 c000200a811cfa40 c000000001503800 0000000000000022 GPR04: 0000000000000022 0000000000000022 0000000000000000 0000000000000001 GPR08: 0000000000000001 0000000000000003 c000200a95599518 c000000001541ec0 GPR12: c00000000008a970 c00000000fadce00 [56954.377403] NIP [c0000000001a4464] smp_call_function_many+0x374/0x400 [56954.377414] LR [c0000000001a4424] smp_call_function_many+0x334/0x400 [56954.377417] Call Trace: [56954.377429] [c000200a811cfa40] [c0000000001a4404] smp_call_function_many+0x314/0x400 (unreliable) [56954.377444] [c000200a811cfac0] [c0000000001a462c] kick_all_cpus_sync+0x3c/0x50 [56954.377459] [c000200a811cfae0] [c00000000006e01c] pmdp_invalidate+0x7c/0xc0 [56954.377473] [c000200a811cfb10] [c0000000003315f4] change_huge_pmd+0x1d4/0x290 [56954.377486] [c000200a811cfb80] [c0000000002e59c8] change_protection_range+0xa38/0xcf0 [56954.377500] [c000200a811cfcc0] [c00000000030ff88] change_prot_numa+0x38/0xb0 [56954.377514] [c000200a811cfcf0] [c000000000130584] task_numa_work+0x2c4/0x3e0 [56954.377527] [c000200a811cfdb0] [c000000000115060] task_work_run+0x140/0x1a0 [56954.377541] [c000200a811cfe00] [c00000000001cd14] do_notify_resume+0xf4/0x100 [56954.377556] [c000200a811cfe30] [c00000000000b544] ret_from_except_lite+0x70/0x74 [56954.377562] Instruction dump: [56954.377571] 409dfd80 3d420004 394aa580 78691f24 7d2a482a e95e0000 7d4a4a14 812a0018 [56954.377597] 71280001 4182001c 60420000 7c210b78 <7c421378> 812a0018 71270001 4082fff0 [56998.366340] NMI watchdog: BUG: soft lockup - CPU#37 stuck for 22s! [qemu-system-ppc:47850] [56998.366461] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc kvm_hv kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter input_leds joydev mac_hid idt_89hpesx at24 nvmem_core ofpart binfmt_misc cmdlinepart powernv_flash mtd uio_pdrv_genirq uio ibmpowernv opal_prd ipmi_powernv ipmi_devintf vmx_crypto ipmi_msghandler ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas [56998.366630] hid_generic usbhid hid ast i2c_algo_bit ttm drm_kms_helper syscopyarea crct10dif_vpmsum sysfillrect crc32c_vpmsum sysimgblt fb_sys_fops drm i40e aacraid [56998.366670] CPU: 37 PID: 47850 Comm: qemu-system-ppc Tainted: G L 4.12.0-041200rc6-generic #201706191233 [56998.366674] task: c000200a19be7e00 task.stack: c000200a1f4ec000 [56998.366678] NIP: c0000000001a4464 LR: c0000000001a4424 CTR: c00000000008a970 [56998.366682] REGS: c000200a1f4ef730 TRAP: 0901 Tainted: G L (4.12.0-041200rc6-generic) [56998.366684] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> [56998.366704] CR: 48484224 XER: 20000000 [56998.366706] CFAR: c0000000001a4470 SOFTE: 1 GPR00: c0000000001a4404 c000200a1f4ef9b0 c000000001503800 0000000000000022 GPR04: 0000000000000022 0000000000000022 0000000000000000 0000000000000001 GPR08: 0000000000000001 0000000000000003 c000200a95599438 c000000001541ec0 GPR12: c00000000008a970 c00000000fad8480 [56998.366753] NIP [c0000000001a4464] smp_call_function_many+0x374/0x400 [56998.366758] LR [c0000000001a4424] smp_call_function_many+0x334/0x400 [56998.366761] Call Trace: [56998.366767] [c000200a1f4ef9b0] [c0000000001a4404] smp_call_function_many+0x314/0x400 (unreliable) [56998.366777] [c000200a1f4efa30] [c0000000001a462c] kick_all_cpus_sync+0x3c/0x50 [56998.366785] [c000200a1f4efa50] [c00000000006e01c] pmdp_invalidate+0x7c/0xc0 [56998.366793] [c000200a1f4efa80] [c00000000032e1ec] __split_huge_pmd_locked+0x5dc/0xae0 [56998.366799] [c000200a1f4efb40] [c0000000003318a4] __split_huge_pmd+0x164/0x270 [56998.366806] [c000200a1f4efba0] [c000000000331c44] vma_adjust_trans_huge+0x134/0x1a0 [56998.366815] [c000200a1f4efbf0] [c0000000002e0754] __vma_adjust+0x104/0x880 [56998.366821] [c000200a1f4efcd0] [c0000000002e1e34] __split_vma+0x174/0x290 [56998.366828] [c000200a1f4efd20] [c0000000002e2100] do_munmap+0x170/0x4f0 [56998.366835] [c000200a1f4efd90] [c0000000002e24f8] vm_munmap+0x78/0xd0 [56998.366842] [c000200a1f4efdf0] [c0000000002e258c] SyS_munmap+0x3c/0x50 [56998.366850] [c000200a1f4efe30] [c00000000000af84] system_call+0x38/0xe0 [56998.366854] Instruction dump: [56998.366860] 409dfd80 3d420004 394aa580 78691f24 7d2a482a e95e0000 7d4a4a14 812a0018 [56998.366877] 71280001 4182001c 60420000 7c210b78 <7c421378> 812a0018 71270001 4082fff0 Oops output: no System Dump Info: The system is not configured to capture a system dump. *Additional Instructions for nasas...@in.ibm.com: -Attach sysctl -a output output to the bug. == Comment: #1 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2017-07-14 03:52:11 == $ git log 3d3efb68c19e -1 commit 3d3efb68c19e539f0535c93a5258c1299270215f Author: Paul Mackerras <pau...@ozlabs.org> Date: Tue Jun 6 14:35:30 2017 +1000 KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1 POWER9 DD1 has an erratum where writing to the TBU40 register, which is used to apply an offset to the timebase, can cause the timebase to lose counts. This results in the timebase on some CPUs getting out of sync with other CPUs, which then results in misbehaviour of the timekeeping code. To work around the problem, we make KVM ignore the timebase offset for all guests on POWER9 DD1 machines. This means that live migration cannot be supported on POWER9 DD1 machines. Cc: sta...@vger.kernel.org # v4.10+ Signed-off-by: Paul Mackerras <pau...@ozlabs.org> $ git tag --contains 3d3efb68c19e v4.12 v4.12-rc7 $ Commit 3d3efb68c19e fixes this issue and its available with 4.12-rc7 onwards. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp