[Bug 1168350] Re: arch_trigger_all_cpu_backtrace() results in Oops on virtualized guest

Munehisa Kamata Wed, 08 May 2013 09:48:25 -0700

I've updated the description with the simplest testcase.

** Description changed:


  SRU Justification:
  
  Impact: The arch_trigger_all_cpu_backtrace tries to notify all other
  cpus via ipi. For that it looks up an ipi hook from the apic structure
  without verifying whether that pointer is NULL or not.
  
  Fix: Upstream fixed this by implementing the apic IPI hooks interface.
  Although some pieces seem to be unclear, this is not changed in upstream
  kernels since then. So either it does not matter or those pieces are not
  used. So for now backport the patch introducing the apic interface from
  upstream (only dropping one unnecessary declaration). This only affects
  PVM as HVM emulates flat apic completely.
  
- Testcase: Cause a call to arch_trigger_all_cpu_backtrace (Munehisa, can
- you provide a simple trigger?).
+ Testcase: To cause a call to arch_trigger_all_cpu_backtrace by:
+ 
+   # echo l > /proc/sysrq-trigger
  
  ---
  
  The arch_trigger_all_cpu_backtrace() tries to send NMI to all CPUs via
  IPI for getting stacktraces from them. But NMI vector is not implemented
  on virtualized environment(Xen PV) and the function results in Oops.
  
  [4746854.099062] INFO: rcu_sched detected stall on CPU 3 (t=15001 jiffies)
  [4746854.099091] BUG: unable to handle kernel paging request at 
ffffffffff5fb310
  [4746854.099100] IP: [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0
  [4746854.099116] PGD 1c07067 PUD 1c08067 PMD 1dd4067 PTE 0
  [4746854.099126] Oops: 0002 [#1] SMP
  [4746854.099134] CPU 3
  [4746854.099137] Modules linked in: stallmod(O+) isofs acpiphp
  [4746854.099150]
  [4746854.099157] Pid: 4752, comm: insmod Tainted: G           O 
3.2.0-40-virtual #64-Ubuntu
  [4746854.099174] RIP: e030:[<ffffffff81037cf8>]  [<ffffffff81037cf8>] 
flat_send_IPI_all+0x98/0xd0
  [4746854.099189] RSP: e02b:ffff8803bfd83c68  EFLAGS: 00010046
  [4746854.099198] RAX: 0000000000000000 RBX: ffffffff81cd0060 RCX: 
000000000003ffff
  [4746854.099208] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000002
  [4746854.099219] RBP: ffff8803bfd83c88 R08: 000000000003ffff R09: 
0000000000000000
  [4746854.099229] R10: 0000000000000000 R11: 0000000000000000 R12: 
0000000000000800
  [4746854.099240] R13: 000000000f000000 R14: ffff8803bfd8e700 R15: 
0000000000000000
  [4746854.099256] FS:  00007f456d441700(0000) GS:ffff8803bfd80000(0000) 
knlGS:0000000000000000
  [4746854.099270] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
  [4746854.099279] CR2: ffffffffff5fb310 CR3: 00000003a4180000 CR4: 
0000000000002660
  [4746854.099290] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
  [4746854.099301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
  [4746854.099312] Process insmod (pid: 4752, threadinfo ffff8803a48d4000, task 
ffff8803a6b5c4a0)
  [4746854.099323] Stack:
  [4746854.099328]  0000000000000000 0000000000002710 ffffffff81c31000 
ffffffff81c31100
  [4746854.099346]  ffff8803bfd83ca8 ffffffff8103333a ffff8803a6e17b00 
ffffffff81c31000
  [4746854.099363]  ffff8803bfd83cc8 ffffffff810df347 ffff8803bfd8e250 
ffff8803bfd8eb80
  [4746854.099382] Call Trace:
  [4746854.099387]  <IRQ>
  [4746854.099401]  [<ffffffff8103333a>] 
arch_trigger_all_cpu_backtrace+0x5a/0x90
  [4746854.099416]  [<ffffffff810df347>] check_cpu_stall.isra.35+0x97/0xf0
  [4746854.099429]  [<ffffffff810df3d8>] __rcu_pending+0x38/0x1d0
  [4746854.099439]  [<ffffffff810df869>] rcu_check_callbacks+0x79/0x1e0
  [4746854.099453]  [<ffffffff81078098>] update_process_times+0x48/0x90
  [4746854.099466]  [<ffffffff8109b864>] tick_sched_timer+0x64/0xc0
  [4746854.099480]  [<ffffffff8108dfe8>] __run_hrtimer+0x78/0x1f0
  [4746854.099491]  [<ffffffff8109b800>] ? tick_nohz_handler+0x100/0x100
  [4746854.099506]  [<ffffffff8105e748>] ? load_balance+0x78/0x370
  [4746854.099520]  [<ffffffff8108e917>] hrtimer_interrupt+0xf7/0x230
  [4746854.099535]  [<ffffffff8100a817>] xen_timer_interrupt+0x27/0x40
  [4746854.099547]  [<ffffffff810d7bb5>] handle_irq_event_percpu+0x55/0x210
  [4746854.099561]  [<ffffffff813a6f7e>] ? info_for_irq+0xe/0x30
  [4746854.099572]  [<ffffffff810dae67>] handle_percpu_irq+0x47/0x60
  [4746854.099583]  [<ffffffff813a6de9>] __xen_evtchn_do_upcall+0x199/0x250
  [4746854.099596]  [<ffffffff813a8ecf>] xen_evtchn_do_upcall+0x2f/0x50
  [4746854.099610]  [<ffffffff81661b7e>] xen_do_hypervisor_callback+0x1e/0x30
  [4746854.099619]  <EOI>
  [4746854.099632]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
  [4746854.099645]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
  [4746854.099659]  [<ffffffff813a757e>] ? xen_poll_irq_timeout+0x3e/0x50
  [4746854.099671]  [<ffffffff813a9060>] ? xen_poll_irq+0x10/0x20
  [4746854.099683]  [<ffffffff8163c200>] ? xen_spin_lock_slow+0x97/0xf2
  [4746854.099695]  [<ffffffffa000c000>] ? 0xffffffffa000bfff
  [4746854.099709]  [<ffffffff810121da>] ? xen_spin_lock+0x4a/0x50
  [4746854.099722]  [<ffffffff816572ce>] ? _raw_spin_lock+0xe/0x20
  [4746854.099734]  [<ffffffffa000702b>] ? stall+0x2b/0x44 [stallmod]
  [4746854.099746]  [<ffffffffa000c009>] ? init_module+0x9/0x1000 [stallmod]
  [4746854.099758]  [<ffffffff81002040>] ? do_one_initcall+0x40/0x180
  [4746854.099771]  [<ffffffff810a7abe>] ? sys_init_module+0xbe/0x230
  [4746854.099783]  [<ffffffff8165f8c2>] ? system_call_fastpath+0x16/0x1b
  
  In this case, the function is invoked by RCU based stall detector when it 
detects stalled CPU(i.e. lockup) in an interrupt context.
  Oops in an interrupt context always causes a kernel panic, so this bug 
sometimes makes debugging a kernel lockup issue difficult.
  
  The function is also invoked from sysrq_handle_showallcpus() that is for
  getting traces from all active CPUs anytime we want.
  
   # echo l > /pros/sysrq-trigger
  
  This is the easiest way to reproduce this.
  
  [How to fix]
  As far as I see, one possible solution is to backport the following patch. 
This patch is already included in Quantal's kernel.
  
   http://lists.xen.org/archives/html/xen-devel/2012-04/msg01023.html
  
  Another solution is to disable arch_trigger_all_cpu_backtrace() at
  compile time but I'm still investigating what config is for that.
  
  If you need any other information, please feel free to ask me.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1168350

Title:
   arch_trigger_all_cpu_backtrace() results in Oops on virtualized guest

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1168350/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1168350] Re: arch_trigger_all_cpu_backtrace() results in Oops on virtualized guest

Reply via email to