On Fri, 2014-06-20 at 14:23 +0200, Igor Mammedov wrote: > Hang is observed on virtual machines during CPU hotplug, > especially in big guests with many CPUs. (It reproducible > more often if host is over-committed). > > It happens because master CPU gives up waiting on > secondary CPU and allows it to run wild. As result > AP causes locking or crashing system. For example > as described here: https://lkml.org/lkml/2014/3/6/257 > > If master CPU have sent STARTUP IPI successfully, > and AP signalled to master CPU that it's ready > to start initialization, make master CPU wait > indefinitely till AP is onlined. > To ensure that AP won't ever run wild, make it > wait at early startup till master CPU confirms its > intention to wait for AP. If AP doesn't respond in 10 > seconds, the master CPU will timeout and cancel > AP onlining. > > Signed-off-by: Igor Mammedov <[email protected]> > --- > v7: > - fix stuck boot with non SMP config > - fix stuck paravirtual Xen SMP boot with more than 1VCPU > and CPU hotplug > v6: > - no changes > v5: > - add smp_mb() after clearing cpu_initialized_mask in do_boot_cpu() > - add 10 sec timeout description into commit message. > v4: > - move commont code in cpu_init() for x32/x64 in shared > helper function wait_formaster_cpu() > - add WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)) > to wait_formaster_cpu() > v3: > - leave timeouts in do_boot_cpu(), so that master CPU > won't hang if AP doesn't respond, use cpu_initialized_mask > as a way for AP to signal to master CPU that it's ready > to start initialzation. > v2: > - ammend comment in cpu_init() > --- > arch/x86/kernel/cpu/common.c | 29 ++++++++----- > arch/x86/kernel/smpboot.c | 99 > +++++++++++++----------------------------- > arch/x86/xen/smp.c | 2 + > 3 files changed, 51 insertions(+), 79 deletions(-)
For the changes under arch/x86/kernel (I'm not familiar with Xen): Acked-by: Toshi Kani <[email protected]> Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

