Re: Stop scheduler on panic

Alexander Motin Wed, 16 Nov 2011 15:08:25 -0800

On 17.11.2011 00:21, Andriy Gapon wrote:

on 16/11/2011 21:27 Fabian Keil said the following:

Kostik Belousov<kostik...@gmail.com>  wrote:

I was tricked into finishing the work by Andrey Gapon, who developed
the patch to reliably stop other processors on panic.  The patch
greatly improves the chances of getting dump on panic on SMP host.


I tested the patch trying to get a dump (from the debugger) for
kern/162036, which currently results in the double fault reported in:
http://lists.freebsd.org/pipermail/freebsd-current/2011-September/027766.html

It didn't help, but also didn't make anything worse.

Fabian


The mi_switch recursion looks very familiar to me:
mi_switch() at mi_switch+0x270
critical_exit() at critical_exit+0x9b
spinlock_exit() at spinlock_exit+0x17
mi_switch() at mi_switch+0x275
critical_exit() at critical_exit+0x9b
spinlock_exit() at spinlock_exit+0x17
[several pages of the previous three lines skipped]
mi_switch() at mi_switch+0x275
critical_exit() at critical_exit+0x9b
spinlock_exit() at spinlock_exit+0x17
intr_even_schedule_thread() at intr_event_schedule_thread+0xbb
ahci_end_transaction() at ahci_end_transaction+0x398
ahci_ch_intr() at ahci_ch_intr+0x2b5
ahcipoll() at ahcipoll+0x15
xpt_polled_action() at xpt_polled_action+0xf7

In fact I once discussed with jhb this recursion triggered from a different
place.  To quote myself:
<avg>    spinlock_exit ->  critical_exit ->  mi_switch ->  kdb_switch ->
thread_unlock ->  spinlock_exit ->  critical_exit ->  mi_switch ->  ...
<avg>    in the kdb context
<avg>    this issue seems to be triggered by td_owepreempt being true at the 
time
kdb is entered
<avg>    and there of course has to be an initial spinlock_exit call somewhere
<avg>    in my case it's because of usb keyboard
<avg>    I wonder if it would make sense to clear td_owepreempt right before
calling kdb_switch in mi_switch
<avg>    instead of in sched_switch()
<avg>    clearing td_owepreempt seems like a scheduler-independent operation to 
me
<avg>    or is it better to just skip locking in usb when kdb_active is set
<avg>    ?

The workaround described above should work in this case.
Another possibility is to pessimize mtx_unlock_spin() implementations to check
SCHEDULER_STOPPED() and to bypass any further actions in that case.  But that
would add unnecessary overhead to the sunny day code paths.

Going further up the stack one can come up with the following proposals:
- check SCHEDULER_STOPPED() swi_sched() and return early
- do not call swi_sched() from xpt_done() if we somehow know that we are in a
polling mode

There is no flag in CAM now to indicate polling mode, but if needed, itshould not be difficult to add one and not call swi_sched().


--
Alexander Motin
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Stop scheduler on panic

Reply via email to