I've updated the description with the simplest testcase. ** Description changed:
SRU Justification: Impact: The arch_trigger_all_cpu_backtrace tries to notify all other cpus via ipi. For that it looks up an ipi hook from the apic structure without verifying whether that pointer is NULL or not. Fix: Upstream fixed this by implementing the apic IPI hooks interface. Although some pieces seem to be unclear, this is not changed in upstream kernels since then. So either it does not matter or those pieces are not used. So for now backport the patch introducing the apic interface from upstream (only dropping one unnecessary declaration). This only affects PVM as HVM emulates flat apic completely. - Testcase: Cause a call to arch_trigger_all_cpu_backtrace (Munehisa, can - you provide a simple trigger?). + Testcase: To cause a call to arch_trigger_all_cpu_backtrace by: + + # echo l > /proc/sysrq-trigger --- The arch_trigger_all_cpu_backtrace() tries to send NMI to all CPUs via IPI for getting stacktraces from them. But NMI vector is not implemented on virtualized environment(Xen PV) and the function results in Oops. [4746854.099062] INFO: rcu_sched detected stall on CPU 3 (t=15001 jiffies) [4746854.099091] BUG: unable to handle kernel paging request at ffffffffff5fb310 [4746854.099100] IP: [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0 [4746854.099116] PGD 1c07067 PUD 1c08067 PMD 1dd4067 PTE 0 [4746854.099126] Oops: 0002 [#1] SMP [4746854.099134] CPU 3 [4746854.099137] Modules linked in: stallmod(O+) isofs acpiphp [4746854.099150] [4746854.099157] Pid: 4752, comm: insmod Tainted: G O 3.2.0-40-virtual #64-Ubuntu [4746854.099174] RIP: e030:[<ffffffff81037cf8>] [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0 [4746854.099189] RSP: e02b:ffff8803bfd83c68 EFLAGS: 00010046 [4746854.099198] RAX: 0000000000000000 RBX: ffffffff81cd0060 RCX: 000000000003ffff [4746854.099208] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 [4746854.099219] RBP: ffff8803bfd83c88 R08: 000000000003ffff R09: 0000000000000000 [4746854.099229] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000800 [4746854.099240] R13: 000000000f000000 R14: ffff8803bfd8e700 R15: 0000000000000000 [4746854.099256] FS: 00007f456d441700(0000) GS:ffff8803bfd80000(0000) knlGS:0000000000000000 [4746854.099270] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [4746854.099279] CR2: ffffffffff5fb310 CR3: 00000003a4180000 CR4: 0000000000002660 [4746854.099290] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [4746854.099301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [4746854.099312] Process insmod (pid: 4752, threadinfo ffff8803a48d4000, task ffff8803a6b5c4a0) [4746854.099323] Stack: [4746854.099328] 0000000000000000 0000000000002710 ffffffff81c31000 ffffffff81c31100 [4746854.099346] ffff8803bfd83ca8 ffffffff8103333a ffff8803a6e17b00 ffffffff81c31000 [4746854.099363] ffff8803bfd83cc8 ffffffff810df347 ffff8803bfd8e250 ffff8803bfd8eb80 [4746854.099382] Call Trace: [4746854.099387] <IRQ> [4746854.099401] [<ffffffff8103333a>] arch_trigger_all_cpu_backtrace+0x5a/0x90 [4746854.099416] [<ffffffff810df347>] check_cpu_stall.isra.35+0x97/0xf0 [4746854.099429] [<ffffffff810df3d8>] __rcu_pending+0x38/0x1d0 [4746854.099439] [<ffffffff810df869>] rcu_check_callbacks+0x79/0x1e0 [4746854.099453] [<ffffffff81078098>] update_process_times+0x48/0x90 [4746854.099466] [<ffffffff8109b864>] tick_sched_timer+0x64/0xc0 [4746854.099480] [<ffffffff8108dfe8>] __run_hrtimer+0x78/0x1f0 [4746854.099491] [<ffffffff8109b800>] ? tick_nohz_handler+0x100/0x100 [4746854.099506] [<ffffffff8105e748>] ? load_balance+0x78/0x370 [4746854.099520] [<ffffffff8108e917>] hrtimer_interrupt+0xf7/0x230 [4746854.099535] [<ffffffff8100a817>] xen_timer_interrupt+0x27/0x40 [4746854.099547] [<ffffffff810d7bb5>] handle_irq_event_percpu+0x55/0x210 [4746854.099561] [<ffffffff813a6f7e>] ? info_for_irq+0xe/0x30 [4746854.099572] [<ffffffff810dae67>] handle_percpu_irq+0x47/0x60 [4746854.099583] [<ffffffff813a6de9>] __xen_evtchn_do_upcall+0x199/0x250 [4746854.099596] [<ffffffff813a8ecf>] xen_evtchn_do_upcall+0x2f/0x50 [4746854.099610] [<ffffffff81661b7e>] xen_do_hypervisor_callback+0x1e/0x30 [4746854.099619] <EOI> [4746854.099632] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [4746854.099645] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [4746854.099659] [<ffffffff813a757e>] ? xen_poll_irq_timeout+0x3e/0x50 [4746854.099671] [<ffffffff813a9060>] ? xen_poll_irq+0x10/0x20 [4746854.099683] [<ffffffff8163c200>] ? xen_spin_lock_slow+0x97/0xf2 [4746854.099695] [<ffffffffa000c000>] ? 0xffffffffa000bfff [4746854.099709] [<ffffffff810121da>] ? xen_spin_lock+0x4a/0x50 [4746854.099722] [<ffffffff816572ce>] ? _raw_spin_lock+0xe/0x20 [4746854.099734] [<ffffffffa000702b>] ? stall+0x2b/0x44 [stallmod] [4746854.099746] [<ffffffffa000c009>] ? init_module+0x9/0x1000 [stallmod] [4746854.099758] [<ffffffff81002040>] ? do_one_initcall+0x40/0x180 [4746854.099771] [<ffffffff810a7abe>] ? sys_init_module+0xbe/0x230 [4746854.099783] [<ffffffff8165f8c2>] ? system_call_fastpath+0x16/0x1b In this case, the function is invoked by RCU based stall detector when it detects stalled CPU(i.e. lockup) in an interrupt context. Oops in an interrupt context always causes a kernel panic, so this bug sometimes makes debugging a kernel lockup issue difficult. The function is also invoked from sysrq_handle_showallcpus() that is for getting traces from all active CPUs anytime we want. # echo l > /pros/sysrq-trigger This is the easiest way to reproduce this. [How to fix] As far as I see, one possible solution is to backport the following patch. This patch is already included in Quantal's kernel. http://lists.xen.org/archives/html/xen-devel/2012-04/msg01023.html Another solution is to disable arch_trigger_all_cpu_backtrace() at compile time but I'm still investigating what config is for that. If you need any other information, please feel free to ask me. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1168350 Title: arch_trigger_all_cpu_backtrace() results in Oops on virtualized guest To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1168350/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs