** Description changed: + SRU Justification: + + Impact: The arch_trigger_all_cpu_backtrace tries to notify all other + cpus via ipi. For that it looks up an ipi hook from the apic structure + without verifying whether that pointer is NULL or not. + + Fix: Upstream fixed this by implementing the apic IPI hooks interface. + Although some pieces seem to be unclear, this is not changed in upstream + kernels since then. So either it does not matter or those pieces are not + used. So for now backport the patch introducing the apic interface from + upstream (only dropping one unnecessary declaration). This only affects + PVM as HVM emulates flat apic completely. + + Testcase: Cause a call to arch_trigger_all_cpu_backtrace (Munehisa, can + you provide a simple trigger?). + + --- + The arch_trigger_all_cpu_backtrace() tries to send NMI to all CPUs via IPI for getting stacktraces from them. But NMI vector is not implemented on virtualized environment(Xen PV) and the function results in Oops. [4746854.099062] INFO: rcu_sched detected stall on CPU 3 (t=15001 jiffies) [4746854.099091] BUG: unable to handle kernel paging request at ffffffffff5fb310 [4746854.099100] IP: [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0 [4746854.099116] PGD 1c07067 PUD 1c08067 PMD 1dd4067 PTE 0 - [4746854.099126] Oops: 0002 [#1] SMP - [4746854.099134] CPU 3 + [4746854.099126] Oops: 0002 [#1] SMP + [4746854.099134] CPU 3 [4746854.099137] Modules linked in: stallmod(O+) isofs acpiphp - [4746854.099150] - [4746854.099157] Pid: 4752, comm: insmod Tainted: G O 3.2.0-40-virtual #64-Ubuntu + [4746854.099150] + [4746854.099157] Pid: 4752, comm: insmod Tainted: G O 3.2.0-40-virtual #64-Ubuntu [4746854.099174] RIP: e030:[<ffffffff81037cf8>] [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0 [4746854.099189] RSP: e02b:ffff8803bfd83c68 EFLAGS: 00010046 [4746854.099198] RAX: 0000000000000000 RBX: ffffffff81cd0060 RCX: 000000000003ffff [4746854.099208] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 [4746854.099219] RBP: ffff8803bfd83c88 R08: 000000000003ffff R09: 0000000000000000 [4746854.099229] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000800 [4746854.099240] R13: 000000000f000000 R14: ffff8803bfd8e700 R15: 0000000000000000 [4746854.099256] FS: 00007f456d441700(0000) GS:ffff8803bfd80000(0000) knlGS:0000000000000000 [4746854.099270] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [4746854.099279] CR2: ffffffffff5fb310 CR3: 00000003a4180000 CR4: 0000000000002660 [4746854.099290] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [4746854.099301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [4746854.099312] Process insmod (pid: 4752, threadinfo ffff8803a48d4000, task ffff8803a6b5c4a0) [4746854.099323] Stack: [4746854.099328] 0000000000000000 0000000000002710 ffffffff81c31000 ffffffff81c31100 [4746854.099346] ffff8803bfd83ca8 ffffffff8103333a ffff8803a6e17b00 ffffffff81c31000 [4746854.099363] ffff8803bfd83cc8 ffffffff810df347 ffff8803bfd8e250 ffff8803bfd8eb80 [4746854.099382] Call Trace: - [4746854.099387] <IRQ> + [4746854.099387] <IRQ> [4746854.099401] [<ffffffff8103333a>] arch_trigger_all_cpu_backtrace+0x5a/0x90 [4746854.099416] [<ffffffff810df347>] check_cpu_stall.isra.35+0x97/0xf0 [4746854.099429] [<ffffffff810df3d8>] __rcu_pending+0x38/0x1d0 [4746854.099439] [<ffffffff810df869>] rcu_check_callbacks+0x79/0x1e0 [4746854.099453] [<ffffffff81078098>] update_process_times+0x48/0x90 [4746854.099466] [<ffffffff8109b864>] tick_sched_timer+0x64/0xc0 [4746854.099480] [<ffffffff8108dfe8>] __run_hrtimer+0x78/0x1f0 [4746854.099491] [<ffffffff8109b800>] ? tick_nohz_handler+0x100/0x100 [4746854.099506] [<ffffffff8105e748>] ? load_balance+0x78/0x370 [4746854.099520] [<ffffffff8108e917>] hrtimer_interrupt+0xf7/0x230 [4746854.099535] [<ffffffff8100a817>] xen_timer_interrupt+0x27/0x40 [4746854.099547] [<ffffffff810d7bb5>] handle_irq_event_percpu+0x55/0x210 [4746854.099561] [<ffffffff813a6f7e>] ? info_for_irq+0xe/0x30 [4746854.099572] [<ffffffff810dae67>] handle_percpu_irq+0x47/0x60 [4746854.099583] [<ffffffff813a6de9>] __xen_evtchn_do_upcall+0x199/0x250 [4746854.099596] [<ffffffff813a8ecf>] xen_evtchn_do_upcall+0x2f/0x50 [4746854.099610] [<ffffffff81661b7e>] xen_do_hypervisor_callback+0x1e/0x30 - [4746854.099619] <EOI> + [4746854.099619] <EOI> [4746854.099632] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [4746854.099645] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [4746854.099659] [<ffffffff813a757e>] ? xen_poll_irq_timeout+0x3e/0x50 [4746854.099671] [<ffffffff813a9060>] ? xen_poll_irq+0x10/0x20 [4746854.099683] [<ffffffff8163c200>] ? xen_spin_lock_slow+0x97/0xf2 [4746854.099695] [<ffffffffa000c000>] ? 0xffffffffa000bfff [4746854.099709] [<ffffffff810121da>] ? xen_spin_lock+0x4a/0x50 [4746854.099722] [<ffffffff816572ce>] ? _raw_spin_lock+0xe/0x20 [4746854.099734] [<ffffffffa000702b>] ? stall+0x2b/0x44 [stallmod] [4746854.099746] [<ffffffffa000c009>] ? init_module+0x9/0x1000 [stallmod] [4746854.099758] [<ffffffff81002040>] ? do_one_initcall+0x40/0x180 [4746854.099771] [<ffffffff810a7abe>] ? sys_init_module+0xbe/0x230 [4746854.099783] [<ffffffff8165f8c2>] ? system_call_fastpath+0x16/0x1b - In this case, the function is invoked by RCU based stall detector when it detects stalled CPU(i.e. lockup) in an interrupt context. + In this case, the function is invoked by RCU based stall detector when it detects stalled CPU(i.e. lockup) in an interrupt context. Oops in an interrupt context always causes a kernel panic, so this bug sometimes makes debugging a kernel lockup issue difficult. The function is also invoked from sysrq_handle_showallcpus() that is for getting traces from all active CPUs anytime we want. - # echo l > /pros/sysrq-trigger + # echo l > /pros/sysrq-trigger This is the easiest way to reproduce this. - [How to fix] As far as I see, one possible solution is to backport the following patch. This patch is already included in Quantal's kernel. - http://lists.xen.org/archives/html/xen-devel/2012-04/msg01023.html + http://lists.xen.org/archives/html/xen-devel/2012-04/msg01023.html Another solution is to disable arch_trigger_all_cpu_backtrace() at compile time but I'm still investigating what config is for that. - If you need any other information, please feel free to ask me.
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1168350 Title: arch_trigger_all_cpu_backtrace() results in Oops on virtualized guest To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1168350/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs