[Bug 1168350] Re: arch_trigger_all_cpu_backtrace() results in Oops on virtualized guest

Stefan Bader Wed, 08 May 2013 07:18:12 -0700

** Description changed:

+ SRU Justification:
+ 
+ Impact: The arch_trigger_all_cpu_backtrace tries to notify all other
+ cpus via ipi. For that it looks up an ipi hook from the apic structure
+ without verifying whether that pointer is NULL or not.
+ 
+ Fix: Upstream fixed this by implementing the apic IPI hooks interface.
+ Although some pieces seem to be unclear, this is not changed in upstream
+ kernels since then. So either it does not matter or those pieces are not
+ used. So for now backport the patch introducing the apic interface from
+ upstream (only dropping one unnecessary declaration). This only affects
+ PVM as HVM emulates flat apic completely.
+ 
+ Testcase: Cause a call to arch_trigger_all_cpu_backtrace (Munehisa, can
+ you provide a simple trigger?).
+ 
+ ---
+ 
  The arch_trigger_all_cpu_backtrace() tries to send NMI to all CPUs via
  IPI for getting stacktraces from them. But NMI vector is not implemented
  on virtualized environment(Xen PV) and the function results in Oops.
  
  [4746854.099062] INFO: rcu_sched detected stall on CPU 3 (t=15001 jiffies)
  [4746854.099091] BUG: unable to handle kernel paging request at 
ffffffffff5fb310
  [4746854.099100] IP: [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0
  [4746854.099116] PGD 1c07067 PUD 1c08067 PMD 1dd4067 PTE 0
- [4746854.099126] Oops: 0002 [#1] SMP 
- [4746854.099134] CPU 3 
+ [4746854.099126] Oops: 0002 [#1] SMP
+ [4746854.099134] CPU 3
  [4746854.099137] Modules linked in: stallmod(O+) isofs acpiphp
- [4746854.099150] 
- [4746854.099157] Pid: 4752, comm: insmod Tainted: G           O 
3.2.0-40-virtual #64-Ubuntu  
+ [4746854.099150]
+ [4746854.099157] Pid: 4752, comm: insmod Tainted: G           O 
3.2.0-40-virtual #64-Ubuntu
  [4746854.099174] RIP: e030:[<ffffffff81037cf8>]  [<ffffffff81037cf8>] 
flat_send_IPI_all+0x98/0xd0
  [4746854.099189] RSP: e02b:ffff8803bfd83c68  EFLAGS: 00010046
  [4746854.099198] RAX: 0000000000000000 RBX: ffffffff81cd0060 RCX: 
000000000003ffff
  [4746854.099208] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000002
  [4746854.099219] RBP: ffff8803bfd83c88 R08: 000000000003ffff R09: 
0000000000000000
  [4746854.099229] R10: 0000000000000000 R11: 0000000000000000 R12: 
0000000000000800
  [4746854.099240] R13: 000000000f000000 R14: ffff8803bfd8e700 R15: 
0000000000000000
  [4746854.099256] FS:  00007f456d441700(0000) GS:ffff8803bfd80000(0000) 
knlGS:0000000000000000
  [4746854.099270] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
  [4746854.099279] CR2: ffffffffff5fb310 CR3: 00000003a4180000 CR4: 
0000000000002660
  [4746854.099290] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
  [4746854.099301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
  [4746854.099312] Process insmod (pid: 4752, threadinfo ffff8803a48d4000, task 
ffff8803a6b5c4a0)
  [4746854.099323] Stack:
  [4746854.099328]  0000000000000000 0000000000002710 ffffffff81c31000 
ffffffff81c31100
  [4746854.099346]  ffff8803bfd83ca8 ffffffff8103333a ffff8803a6e17b00 
ffffffff81c31000
  [4746854.099363]  ffff8803bfd83cc8 ffffffff810df347 ffff8803bfd8e250 
ffff8803bfd8eb80
  [4746854.099382] Call Trace:
- [4746854.099387]  <IRQ> 
+ [4746854.099387]  <IRQ>
  [4746854.099401]  [<ffffffff8103333a>] 
arch_trigger_all_cpu_backtrace+0x5a/0x90
  [4746854.099416]  [<ffffffff810df347>] check_cpu_stall.isra.35+0x97/0xf0
  [4746854.099429]  [<ffffffff810df3d8>] __rcu_pending+0x38/0x1d0
  [4746854.099439]  [<ffffffff810df869>] rcu_check_callbacks+0x79/0x1e0
  [4746854.099453]  [<ffffffff81078098>] update_process_times+0x48/0x90
  [4746854.099466]  [<ffffffff8109b864>] tick_sched_timer+0x64/0xc0
  [4746854.099480]  [<ffffffff8108dfe8>] __run_hrtimer+0x78/0x1f0
  [4746854.099491]  [<ffffffff8109b800>] ? tick_nohz_handler+0x100/0x100
  [4746854.099506]  [<ffffffff8105e748>] ? load_balance+0x78/0x370
  [4746854.099520]  [<ffffffff8108e917>] hrtimer_interrupt+0xf7/0x230
  [4746854.099535]  [<ffffffff8100a817>] xen_timer_interrupt+0x27/0x40
  [4746854.099547]  [<ffffffff810d7bb5>] handle_irq_event_percpu+0x55/0x210
  [4746854.099561]  [<ffffffff813a6f7e>] ? info_for_irq+0xe/0x30
  [4746854.099572]  [<ffffffff810dae67>] handle_percpu_irq+0x47/0x60
  [4746854.099583]  [<ffffffff813a6de9>] __xen_evtchn_do_upcall+0x199/0x250
  [4746854.099596]  [<ffffffff813a8ecf>] xen_evtchn_do_upcall+0x2f/0x50
  [4746854.099610]  [<ffffffff81661b7e>] xen_do_hypervisor_callback+0x1e/0x30
- [4746854.099619]  <EOI> 
+ [4746854.099619]  <EOI>
  [4746854.099632]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
  [4746854.099645]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
  [4746854.099659]  [<ffffffff813a757e>] ? xen_poll_irq_timeout+0x3e/0x50
  [4746854.099671]  [<ffffffff813a9060>] ? xen_poll_irq+0x10/0x20
  [4746854.099683]  [<ffffffff8163c200>] ? xen_spin_lock_slow+0x97/0xf2
  [4746854.099695]  [<ffffffffa000c000>] ? 0xffffffffa000bfff
  [4746854.099709]  [<ffffffff810121da>] ? xen_spin_lock+0x4a/0x50
  [4746854.099722]  [<ffffffff816572ce>] ? _raw_spin_lock+0xe/0x20
  [4746854.099734]  [<ffffffffa000702b>] ? stall+0x2b/0x44 [stallmod]
  [4746854.099746]  [<ffffffffa000c009>] ? init_module+0x9/0x1000 [stallmod]
  [4746854.099758]  [<ffffffff81002040>] ? do_one_initcall+0x40/0x180
  [4746854.099771]  [<ffffffff810a7abe>] ? sys_init_module+0xbe/0x230
  [4746854.099783]  [<ffffffff8165f8c2>] ? system_call_fastpath+0x16/0x1b
  
- In this case, the function is invoked by RCU based stall detector when it 
detects stalled CPU(i.e. lockup) in an interrupt context.  
+ In this case, the function is invoked by RCU based stall detector when it 
detects stalled CPU(i.e. lockup) in an interrupt context.
  Oops in an interrupt context always causes a kernel panic, so this bug 
sometimes makes debugging a kernel lockup issue difficult.
  
  The function is also invoked from sysrq_handle_showallcpus() that is for
  getting traces from all active CPUs anytime we want.
  
-  # echo l > /pros/sysrq-trigger
+  # echo l > /pros/sysrq-trigger
  
  This is the easiest way to reproduce this.
- 
  
  [How to fix]
  As far as I see, one possible solution is to backport the following patch. 
This patch is already included in Quantal's kernel.
  
-  http://lists.xen.org/archives/html/xen-devel/2012-04/msg01023.html
+  http://lists.xen.org/archives/html/xen-devel/2012-04/msg01023.html
  
  Another solution is to disable arch_trigger_all_cpu_backtrace() at
  compile time but I'm still investigating what config is for that.
  
- 
  If you need any other information, please feel free to ask me.


-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1168350

Title:
   arch_trigger_all_cpu_backtrace() results in Oops on virtualized guest

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1168350/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1168350] Re: arch_trigger_all_cpu_backtrace() results in Oops on virtualized guest

Reply via email to