Public bug reported: In the thread at http://thread.gmane.org/gmane.comp.emulators.kvm.devel/127042/focus=129294, three commits were identified to fix live migration for qemu 2.0 (at least), which I am using on trusty. I would like to get these pulled-in by the package maintainer.
I have cherry-picked those three commits (with some considerable fix-up for the first , which may or may not be correct; the others apply cleanly) and built packages locally. Installing that on the migration- receiver seems to fix my guest lockups after live-migrating. I can attach the patches I'm using if someone is able to review my fix-ups to the first one. My original problem description was: Somewhere between kernel 3.2 and 3.11 on my VM hosts (yes, I know that narrows it down a /whole lot/ ...), live migration started killing my Ubuntu precise (kernel 3.2.x) guests, causing all of their vcpus to go into a busy loop. Once (and only once) I've observed the guest eventually becoming responsive again, with a clock nearly 600 years in the future and a negative uptime. I haven't been able to dig up any previous threads about this problem, so my gut instinct is that I've configured something wonky. Any pointers toward /what/ I may have done wrong are appreciated. It only seems to happen if I've given the guests Nehalem-class CPU features. My longest-running VMs, from before I started passing-through the CPU capabilities into the guest, seem to migrate without issue. It also seems to happen reliably when the guest has been running for a while; it's easily reproducible with guests that have been up ~1 day, and I've reproduced it in VMs with an uptime of ~20 hours. I haven't yet figured out a lower-bound, which makes the testing cycle a little longer for me. The guests that I reliably reproduce this on are Ubuntu 12.04 guests running the current 3.2 kernel that Canonical distributes. Recent Fedora kernels (3.14+, IIRC) don't seem to busy-spin this way, though I haven't tested this case exhaustively, and I haven't written down very good notes for the tests I have done with Fedora. The hosts are dual-socket Nehalem Xeons (L5520), currently running Ubuntu 14.04 and the associated 3.13 kernel. I had previously reproduced this with 12.04 running a raring-backport 3.11 kernel as well, but I (seemingly erroneously) assumed it may have been a qemu userspace discrepancy. ** Affects: qemu (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1398718 Title: Live migration locks up Linux 3.2-based guests To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1398718/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs