At work we have some custom watchdog hardware that sends an NMI upon
expiry. We've modified the kernel to panic when it receives the watchdog
NMI. I've been trying the "stop_scheduler_on_panic" mode, and I've
discovered that when my watchdog expires, the system gets completely
wedged. After some digging, I've discovered is that I have multiple CPUs
getting the watchdog NMI and trying to panic concurrently. One of the CPUs
wins, and the rest spin forever in this code:
/*
* We don't want multiple CPU's to panic at the same time, so we
* use panic_cpu as a simple spinlock. We have to keep checking
* panic_cpu if we are spinning in case the panic on the first
* CPU is canceled.
*/
if (panic_cpu != PCPU_GET(cpuid))
while (atomic_cmpset_int(&panic_cpu, NOCPU,
PCPU_GET(cpuid)) == 0)
while (panic_cpu != NOCPU)
; /* nothing */
The system wedges when stop_cpus_hard() is called, which sends NMIs to all
of the other CPUs and waits for them to acknowledge that they are stopped
before returning. However the CPU will not deliver an NMI to a CPU that is
already handling an NMI, so the other CPUs that got a watchdog NMI and are
spinning will never go into the NMI handler and acknowledge that they are
stopped.
I've been able to work around this with the following hideous hack:
--- kern_shutdown.c 2012-08-17 10:25:02.000000000 -0400
+++ kern_shutdown.c 2012-11-15 17:04:10.000000000 -0500
@@ -658,11 +658,15 @@
* panic_cpu if we are spinning in case the panic on the first
* CPU is canceled.
*/
- if (panic_cpu != PCPU_GET(cpuid))
+ if (panic_cpu != PCPU_GET(cpuid)) {
while (atomic_cmpset_int(&panic_cpu, NOCPU,
- PCPU_GET(cpuid)) == 0)
+ PCPU_GET(cpuid)) == 0) {
+ atomic_set_int(&stopped_cpus, PCPU_GET(cpumask));
while (panic_cpu != NOCPU)
; /* nothing */
+ }
+ atomic_clear_int(&stopped_cpus, PCPU_GET(cpumask));
+ }
if (stop_scheduler_on_panic) {
if (panicstr == NULL && !kdb_active)
But I'm hoping that somebody has some ideas on a better way to fix this
kind of problem.
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[email protected]"