The following upstream patch: >From 25fb213d873977290caf374234df496ad158ec1e Mon Sep 17 00:00:00 2001 From: Rik van Riel <r...@redhat.com> Date: Mon, 21 Mar 2016 15:13:27 +0100 Subject: [PATCH 2/2] kvm, rt: change async pagefault code locking for PREEMPT_RT
The async pagefault wake code can run from the idle task in exception context, so everything here needs to be made non-preemptible. Conversion to a simple wait queue and raw spinlock does the trick. Signed-off-by: Rik van Riel <r...@redhat.com> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> Fixes the issue by not letting async pagefault code to be preempted due to waitqueues. Backport for Trusty needs: >From 25fb213d873977290caf374234df496ad158ec1e Mon Sep 17 00:00:00 2001 From: Rik van Riel <r...@redhat.com> Date: Mon, 21 Mar 2016 15:13:27 +0100 Subject: [PATCH 2/2] kvm, rt: change async pagefault code locking for PREEMPT_RT >From 6b9cf536987c69825f91af9478109aa7bcbebc94 Mon Sep 17 00:00:00 2001 From: "Peter Zijlstra (Intel)" <pet...@infradead.org> Date: Fri, 19 Feb 2016 09:46:37 +0100 Subject: [PATCH 1/2] wait.[ch]: Introduce the simple waitqueue (swait) implementation If adding simple waitqueue interface to Trusty is not acceptable as SRU I'll have to come up with something else. I'm sure that problem goes away when using these 2 patches. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1596941 Title: KVM deadlock on KVM guest migration with latest QEMU (mitaka) from Xenial (or Mitaka Ubuntu Cloud Archive) Status in linux package in Ubuntu: In Progress Bug description: It was brought to my knowledge that qemu-kvm live migration (with full storage copy) on Trusty + Mitaka Ubuntu Cloud Archive was broken. When investigating I stepped into the following situation: crash> sys KERNEL: /usr/lib/debug/boot/vmlinux-3.13.0-86-generic DUMPFILE: ./201606241546/dump.201606241546 [PARTIAL DUMP] CPUS: 4 DATE: Fri Jun 24 15:46:39 2016 UPTIME: 00:06:00 LOAD AVERAGE: 1.00, 0.60, 0.26 TASKS: 146 NODENAME: vmqemulivefail1 RELEASE: 3.13.0-86-generic VERSION: #131-Ubuntu SMP Thu May 12 23:33:13 UTC 2016 MACHINE: x86_64 (2494 Mhz) MEMORY: 8 GB PANIC: "Kernel panic - not syncing: hung_task: blocked tasks" Full backtrace doesn't have anything useful since i've configured kernel.softlockup_panic. From scheduled-out tasks (and from kern.log) I was able to see that in more than one occasion I had the qemu process possibly dead-locked when dealing with asynchronous page faults: ## kernel 3.13 # dump 1 PID: 1604 TASK: ffff8800374be000 CPU: 3 COMMAND: "qemu-system-x86" #0 [ffff8800ba115e28] __schedule at ffffffff8172e379 #1 [ffff8800ba115e90] schedule at ffffffff8172e859 #2 [ffff8800ba115ea0] kvm_async_pf_task_wait at ffffffff8105060f #3 [ffff8800ba115f38] do_async_page_fault at ffffffff81736090 #4 [ffff8800ba115f50] async_page_fault at ffffffff81732cd8 RIP: 00007fb4eff0a4b3 RSP: 00007fb4713facb0 RFLAGS: 00010206 RAX: 00007fb4cb9cf000 RBX: 00007fb4f166d8f0 RCX: 0000000000000010 RDX: 0000000000001fff RSI: 00007fb4cb9deff8 RDI: 4000000000000000 RBP: 0000000000000000 R8: 0000000000000000 R9: 00000002601b0000 R10: 00fffffffffffe00 R11: 0000000000001fff R12: 0000000000000008 R13: 00007fb4713fad84 R14: 00007fb4f1665290 R15: 00007fb4713fad88 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b # dump 2 PID: 1735 TASK: ffff8800b9bcb000 CPU: 2 COMMAND: "qemu-system-x86" #0 [ffff8802333c9e28] __schedule at ffffffff8172e379 #1 [ffff8802333c9e90] schedule at ffffffff8172e859 #2 [ffff8802333c9ea0] kvm_async_pf_task_wait at ffffffff8105060f #3 [ffff8802333c9f38] do_async_page_fault at ffffffff81736090 #4 [ffff8802333c9f50] async_page_fault at ffffffff81732cd8 RIP: 00007f631399d3b0 RSP: 00007f62912c7990 RFLAGS: 00010206 RAX: 0000000000000000 RBX: 00007f6315f9e370 RCX: 00007f62ca714000 RDX: 0000000032914020 RSI: 0000000000001000 RDI: 00007f62ca714000 RBP: 00007f6315c66e40 R8: 00007f62912c7a40 R9: 00007f6315f9e3e0 R10: 0000000000000000 R11: 0000000032914020 R12: 0000000032914020 R13: 0000000000032914 R14: 00000000ffffffff R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b # dump 3 PID: 1617 TASK: ffff880232834800 CPU: 3 COMMAND: "qemu-system-x86" #0 [ffff880232a6de28] __schedule at ffffffff8172e379 #1 [ffff880232a6de90] schedule at ffffffff8172e859 #2 [ffff880232a6dea0] kvm_async_pf_task_wait at ffffffff8105060f #3 [ffff880232a6df38] do_async_page_fault at ffffffff81736090 #4 [ffff880232a6df50] async_page_fault at ffffffff81732cd8 RIP: 00007f8c39e8b3b0 RSP: 00007f8bb80c9990 RFLAGS: 00010206 RAX: 0000000000000000 RBX: 00007f8c3aeba370 RCX: 00007f8bdea18000 RDX: 0000000022c18020 RSI: 0000000000001000 RDI: 00007f8bdea18000 RBP: 00007f8c3ab82e40 R8: 00007f8bb80c9a40 R9: 00007f8c3aeba498 R10: 0000000000000000 R11: 0000000022c18020 R12: 0000000022c18020 R13: 0000000000022c18 R14: 00000000ffffffff R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b ## kernel 4.4 # kern.log 544 [ 360.282132] INFO: task qemu-system-x86:1592 blocked for more than 120 seconds. 545 [ 360.282984] Not tainted 4.4.0-27-generic #46~14.04.1-Ubuntu 546 [ 360.283581] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 547 [ 360.284439] qemu-system-x86 D ffff8800bb833e90 0 1592 1 0x00000000 548 [ 360.284443] ffff8800bb833e90 ffff88023151c4c0 ffff8802345eb700 ffff8800bb834000 549 [ 360.284444] 0000000000000010 ffffffff81efe6d0 000055ac8fa05520 00007f88fc7f7d88 550 [ 360.284445] ffff8800bb833ea8 ffffffff817ed5f5 ffff8800bb833ef0 ffff8800bb833f38 551 [ 360.284447] Call Trace: 552 [ 360.284472] [<ffffffff817ed5f5>] schedule+0x35/0x80 553 [ 360.284481] [<ffffffff81060a93>] kvm_async_pf_task_wait+0x1a3/0x1f0 554 [ 360.284487] [<ffffffff810bdc60>] ? prepare_to_wait_event+0xf0/0xf0 555 [ 360.284494] [<ffffffff811fe600>] ? do_sendfile+0x360/0x380 556 [ 360.284495] [<ffffffff81060c55>] do_async_page_fault+0x75/0x80 557 [ 360.284498] [<ffffffff817f2fe8>] async_page_fault+0x28/0x30 558 [ 360.284500] Sending NMI to all CPUs: To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1596941/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp