This bug is missing log files that will aid in diagnosing the problem. >From a terminal window please run:
apport-collect 1568729 and then change the status of the bug to 'Confirmed'. If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'. This change has been made by an automated script, maintained by the Ubuntu Kernel Team. ** Changed in: linux (Ubuntu) Status: New => Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1568729 Title: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault Status in linux package in Ubuntu: Confirmed Bug description: While running qemu 2.5 on a trusty host running 4.4.0-15.31~14.04.1 the host system has crashed (load > 200) 3 times in the last 3 days. Always with this stack trace: Apr 9 19:01:09 cnode9.0 kernel: [197071.195577] divide error: 0000 [#1] SMP Apr 9 19:01:09 cnode9.0 kernel: [197071.195633] Modules linked in: vhost_net vhost macvtap macvlan arc4 md4 nls_utf8 ci fs nfnetlink_queue nfnetlink xt_CHECKSUM xt_nat iptable_nat nf_nat_ipv4 xt_NFQUEUE xt_CLASSIFY ip6table_mangle sch_sfq sch_htb veth dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag ebtable_filter ebtables nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_fil ter ip6_tables iptable_mangle xt_CT iptable_raw xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter ip_tables x_tables dum my bridge stp llc ipmi_ssif ipmi_devintf intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm dcdbas irqbypass crct10dif_p clmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev input_leds nf_nat_ftp sb_edac nf_conntrack_ftp e dac_core cdc_ether nf_nat_pptp usbnet nf_conntrack_pptp mii nf_nat_proto_gre lpc_ich nf_nat_sip ioatdma nf_nat nf_conntrack_sip nfsd ipmi_si 8250_fintek nf_conntrack_proto_gre ipmi_msghandler acpi_pad wmi shpchp nf_conntrack acpi_power_meter mac_hid auth_rpcgss nfs_acl bonding nfs lp lockd parport grace sunrpc fscache tcp_htcp xfs btrfs hid_generic usbhid hid raid10 raid456 async_raid6_recov async_memcpy async_pq async_ xor async_tx xor ixgbe raid6_pq libcrc32c igb vxlan raid1 i2c_algo_bit ip6_udp_tunnel dca udp_tunnel ahci raid0 ptp libahci megaraid_sas mult ipath pps_core mdio linear fjes Apr 9 19:01:09 cnode9.0 kernel: [197071.197014] CPU: 13 PID: 3147726 Comm: ceph-osd Not tainted 4.4.0-15-generic #31~14 .04.1-Ubuntu Apr 9 19:01:09 cnode9.0 kernel: [197071.197085] Hardware name: Dell Inc. PowerEdge R720/0XH7F2, BIOS 2.5.2 01/28/2015 Apr 9 19:01:09 cnode9.0 kernel: [197071.197154] task: ffff88252be1ee00 ti: ffff8824fc0d4000 task.ti: ffff8824fc0d4000 Apr 9 19:01:09 cnode9.0 kernel: [197071.197221] RIP: 0010:[<ffffffff810afec8>] [<ffffffff810afec8>] task_numa_find_cpu+0x238/0x700 Apr 9 19:01:09 cnode9.0 kernel: [197071.197300] RSP: 0000:ffff8824fc0d7ba8 EFLAGS: 00010257 Apr 9 19:01:09 cnode9.0 kernel: [197071.197340] RAX: 0000000000000000 RBX: ffff8824fc0d7c48 RCX: 0000000000000000 Apr 9 19:01:09 cnode9.0 kernel: [197071.197406] RDX: 0000000000000000 RSI: ffff88479f180000 RDI: ffff884782a47600 Apr 9 19:01:09 cnode9.0 kernel: [197071.197473] RBP: ffff8824fc0d7c10 R08: 0000000102eea157 R09: 00000000000001a8 Apr 9 19:01:09 cnode9.0 kernel: [197071.197540] R10: 000000000002404b R11: 000000000000023f R12: ffff882380930000 Apr 9 19:01:09 cnode9.0 kernel: [197071.197606] R13: 0000000000000008 R14: 000000000000008c R15: 0000000000000124 Apr 9 19:01:09 cnode9.0 kernel: [197071.197673] FS: 00007f19aab5b700(0000) GS:ffff88479f180000(0000) knlGS:0000000000000000 Apr 9 19:01:09 cnode9.0 kernel: [197071.197741] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 9 19:01:09 cnode9.0 kernel: [197071.197782] CR2: 0000000025469600 CR3: 00000023846bc000 CR4: 00000000000426e0 Apr 9 19:01:09 cnode9.0 kernel: [197071.197848] Stack: Apr 9 19:01:09 cnode9.0 kernel: [197071.197880] ffffffff817425fb ffff8829af3e9e00 00000000000000f6 ffff88252be1ee00 Apr 9 19:01:09 cnode9.0 kernel: [197071.197965] 000000000000008d 0000000000000225 0000000000016d40 000000000000008d Apr 9 19:01:09 cnode9.0 kernel: [197071.198047] ffff88252be1ee00 00000000000001ad ffff8824fc0d7c48 00000000000000e1 Apr 9 19:01:09 cnode9.0 kernel: [197071.198132] Call Trace: Apr 9 19:01:09 cnode9.0 kernel: [197071.198172] [<ffffffff817425fb>] ? tcp_schedule_loss_probe+0x12b/0x1b0 Apr 9 19:01:09 cnode9.0 kernel: [197071.198219] [<ffffffff810b0830>] task_numa_migrate+0x4a0/0x930 Apr 9 19:01:09 cnode9.0 kernel: [197071.198264] [<ffffffff816d2957>] ? release_sock+0x117/0x160 Apr 9 19:01:09 cnode9.0 kernel: [197071.198306] [<ffffffff810b0d39>] numa_migrate_preferred+0x79/0x80 Apr 9 19:01:09 cnode9.0 kernel: [197071.198350] [<ffffffff810b557d>] task_numa_fault+0x91d/0xcc0 Apr 9 19:01:09 cnode9.0 kernel: [197071.198395] [<ffffffff811d35ae>] ? mpol_misplaced+0x14e/0x190 Apr 9 19:01:09 cnode9.0 kernel: [197071.198439] [<ffffffff811b06b8>] handle_pte_fault+0x5a8/0x14c0 Apr 9 19:01:09 cnode9.0 kernel: [197071.198485] [<ffffffff810f8531>] ? futex_wake+0x81/0x150 Apr 9 19:01:09 cnode9.0 kernel: [197071.198526] [<ffffffff810b0de4>] ? set_next_entity+0xa4/0x700 Apr 9 19:01:09 cnode9.0 kernel: [197071.198569] [<ffffffff810fab44>] ? do_futex+0xf4/0x4d0 Apr 9 19:01:09 cnode9.0 kernel: [197071.198610] [<ffffffff811b2440>] handle_mm_fault+0x250/0x540 Apr 9 19:01:09 cnode9.0 kernel: [197071.198654] [<ffffffff81067d19>] __do_page_fault+0x199/0x430 Apr 9 19:01:09 cnode9.0 kernel: [197071.198696] [<ffffffff81067fd2>] do_page_fault+0x22/0x30 Apr 9 19:01:09 cnode9.0 kernel: [197071.198740] [<ffffffff817ef878>] page_fault+0x28/0x30 Apr 9 19:01:09 cnode9.0 kernel: [197071.198775] Code: 4d b0 4c 89 f7 e8 29 d5 ff ff 48 8b 4d b0 49 8b 86 b0 00 00 00 31 d2 48 0f af 81 d8 01 00 00 49 8b 4e 78 4c 8b 73 78 48 83 c1 01 <48> f7 f1 48 8b 4b 20 49 89 c1 48 29 c1 4c 03 4b 48 4c 39 7d d0 Apr 9 19:01:09 cnode9.0 kernel: [197071.199217] RIP [<ffffffff810afec8>] task_numa_find_cpu+0x238/0x700 Apr 9 19:01:09 cnode9.0 kernel: [197071.199264] RSP <ffff8824fc0d7ba8> Apr 9 19:01:09 cnode9.0 kernel: [197071.199900] ---[ end trace e938a840610a79f7 ]--- This is appears to be the same bug as reported upstream in http://lkml.iu.edu/hypermail/linux/kernel/1603.2/01659.html According to this thread the issue is: 27: 48 83 c1 01 add $0x1,%rcx 2b:* 48 f7 f1 div %rcx <-- trapping instruction This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load: update_cfs_rq_h_load(cfs_rq); return div64_ul(p->se.avg.load_avg * cfs_rq->h_load, cfs_rq_load_avg(cfs_rq) + 1); So the load avg is -1, thus after adding 1 we get division by 0 The fix of the LKML reporter was to include the patches to kernel/sched/fair.c up to 4.5 A specific patch was not identified. Please backport these patches for Xenial and lts-xenial kernel in trusty. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1568729/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp