Kqueue races causing crashes

Eric Badger Tue, 14 Jun 2016 20:30:07 -0700

Hi there,

There seems to be some racy code in kern_event.c which is causing me torun into some crashes. I’ve attached the test program used to generatethese crashes (build it and run the “go” script). They were produced ina VM with 4 cores on 11 Alpha 3 (and originally 10.3). The crashes I’veseen come in a few varieties:

1. “userret: returning with the following locks held”. This one is theeasiest to hit (assuming witness is enabled).


userret: returning with the following locks held:

exclusive sleep mutex process lock (process lock) r = 0(0xfffff80006956120) locked @ /usr/src/sys/kern/kern_event.c:2125

panic: witness_warn
cpuid = 2
KDB: stack backtrace:

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame0xfffffe000039d8e0

vpanic() at vpanic+0x182/frame 0xfffffe000039d960
kassert_panic() at kassert_panic+0x126/frame 0xfffffe000039d9d0
witness_warn() at witness_warn+0x3c6/frame 0xfffffe000039daa0
userret() at userret+0x9d/frame 0xfffffe000039dae0
amd64_syscall() at amd64_syscall+0x406/frame 0xfffffe000039dbf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe000039dbf0

--- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x800b8a0ba, rsp =0x7fffffffea98, rbp = 0x7fffffffeae0 ---

KDB: enter: panic
[ thread pid 64855 tid 100106 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> show all locks
Process 64855 (watch) thread 0xfffff800066c3000 (100106)

exclusive sleep mutex process lock (process lock) r = 0(0xfffff80006956120) locked @ /usr/src/sys/kern/kern_event.c:2125

Process 64855 (watch) thread 0xfffff8000696a500 (100244)

exclusive sleep mutex pmap (pmap) r = 0 (0xfffff800068c3138) locked @/usr/src/sys/amd64/amd64/pmap.c:4067exclusive sx vm map (user) (vm map (user)) r = 0 (0xfffff800068f6080)locked @ /usr/src/sys/vm/vm_map.c:3315exclusive sx vm map (user) (vm map (user)) r = 0 (0xfffff800068c3080)locked @ /usr/src/sys/vm/vm_map.c:3311

db> ps
  pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
64855   690   690     0  R+      (threaded)                  watch
100106                   Run     CPU 2                       main
100244                   Run     CPU 1 procmaker
100245                   Run     CPU 3 reaper

2. “Sleeping thread owns a non-sleepable lock”. This one first drew myattention by showing up in a real world application at work.


Sleeping thread (tid 100101, pid 76857) owns a non-sleepable lock
KDB: stack backtrace of thread 100101:
sched_switch() at sched_switch+0x2a5/frame 0xfffffe0000257690
mi_switch() at mi_switch+0xe1/frame 0xfffffe00002576d0

sleepq_catch_signals() at sleepq_catch_signals+0x16c/frame0xfffffe0000257730

sleepq_timedwait_sig() at sleepq_timedwait_sig+0xf/frame 0xfffffe0000257760
_sleep() at _sleep+0x234/frame 0xfffffe00002577e0
kern_kevent_fp() at kern_kevent_fp+0x38a/frame 0xfffffe00002579d0
kern_kevent() at kern_kevent+0x9f/frame 0xfffffe0000257a30
sys_kevent() at sys_kevent+0x12a/frame 0xfffffe0000257ae0
amd64_syscall() at amd64_syscall+0x2d4/frame 0xfffffe0000257bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0000257bf0

--- syscall (363, FreeBSD ELF64, sys_kevent), rip = 0x800b6afea, rsp =0x7fffffffea88, rbp = 0x7fffffffead0 ---

panic: sleeping thread
cpuid = 3
KDB: stack backtrace:

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame0xfffffe0000225590

kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe0000225640
vpanic() at vpanic+0x126/frame 0xfffffe0000225680
panic() at panic+0x43/frame 0xfffffe00002256e0
propagate_priority() at propagate_priority+0x166/frame 0xfffffe0000225710
turnstile_wait() at turnstile_wait+0x282/frame 0xfffffe0000225750
__mtx_lock_sleep() at __mtx_lock_sleep+0x26b/frame 0xfffffe00002257d0
__mtx_lock_flags() at __mtx_lock_flags+0x5e/frame 0xfffffe00002257f0
proc_to_reap() at proc_to_reap+0x46/frame 0xfffffe0000225840
kern_wait6() at kern_wait6+0x202/frame 0xfffffe00002258f0
sys_wait4() at sys_wait4+0x72/frame 0xfffffe0000225ae0
amd64_syscall() at amd64_syscall+0x2d4/frame 0xfffffe0000225bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0000225bf0

--- syscall (7, FreeBSD ELF64, sys_wait4), rip = 0x800b209ba, rsp =0x7fffdfdfcf48, rbp = 0x7fffdfdfcf80 ---

KDB: enter: panic
[ thread pid 76857 tid 100225 ]
Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
db> show allchains
chain 1:

thread 100225 (pid 76857, reaper) blocked on lock 0xfffff800413105f0(sleep mutex) "process lock"

 thread 100101 (pid 76857, main) inhibited

(3./4.) There are a few others that I hit less frequently (“page faultwhile in kernel mode”, "Kernel page fault with the followingnon-sleepable locks held”. I don’t have a backtrace handy for these.

I believe they all have more or less the same cause. The crashes occurbecause we acquire a knlist lock via the KN_LIST_LOCK macro, but when wecall KN_LIST_UNLOCK, the knote’s knlist reference (kn->kn_knlist) hasbeen cleared by another thread. Thus we are unable to unlock thepreviously acquired lock and hold it until something causes us to crash(such as the witness code noticing that we’re returning to userland withthe lock still held).


A walkthrough of what happens in the test program:

There are 3 threads: 1 forks off short-lived child processes, 2 reapsthe child processes, and 3 tracks the child processes via a kqueue(NOTE_EXIT | NOTE_EXEC | NOTE_FORK | NOTE_TRACK). I believe a crashgenerally looks like this:

1. Forker thread creates a short lived child. That child dies andtriggers a NOTE_EXIT event.2. Kqueue thread is somewhere in kqueue_scan(), probably blocked at aKN_LIST_LOCK call.3. The dying process calls into filt_proc() and notices that theKN_DETACHED flag is not set. It therefore decides to callknlist_remove_inevent() to take the knote out of the knlist.Importantly, this sets kn->kn_knlist to NULL, meaning we can no longeraccess the knlist lock from the knote.4. Kqueue thread, still in kqueue_scan(), is able to acquire the lockvia KN_LIST_LOCK. It does some work and then calls the KN_LIST_UNLOCKmacro. This macro checks and finds that the knote does not have areference to a knlist, and thus takes no action, leaving the lock in thelocked state.

I believe there’s also a small window where the KN_LIST_LOCK macrochecks kn->kn_knlist and finds it to be non-NULL, but by the time itactually dereferences it, it has become NULL. This would produce the“page fault while in kernel mode” crash.

If someone familiar with this code sees an obvious fix, I’ll be happy totest it. Otherwise, I’d appreciate any advice on fixing this. My firstthought is that a ‘struct knote’ ought to have its own mutex forcontrolling access to the flag fields and ideally the “kn_knlist” field.I.e., you would first acquire a knote’s lock and then the knlist lock,thus ensuring that no one could clear the kn_knlist variable while youhold the knlist lock. The knlist lock, however, usually comes fromwhichever event producing entity the knote tracks, so getting lockordering right between the per-knote mutex and this other lock seemspotentially hard. (Sometimes we call into functions in kern_event.c withthe knlist lock already held, having been acquired in code outside ofkern_event.c. Consider, for example, calling KNOTE_LOCKED fromkern_exit.c; the PROC_LOCK macro has already been used to acquire theprocess lock, also serving as the knlist lock).

Apropos of the knlist lock and its provenance: why is a lock from theevent producing entity used to control access to the knlist and knote?Is it generally desirable to, for example, hold the process lock whileoperating on a knlist attached to that process? It’s not obvious to methat this is required or even desirable. This might suggest that aknlist should have its own lock rather than using a lock from the eventproducing entity, which might make addressing this problem morestraightforward.

Many thanks for any help, and please let me know if I’ve failed to makeanything clear.


Cheers,
Eric



_______________________________________________
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Kqueue races causing crashes

Reply via email to