On Wed, Apr 21, 2021 at 10:21:32AM +1000, David Gwynne wrote: > if you have a program that uses kq (or libevent) to wait for bytes to > read off an idle network interface via /dev/bpf and that interface > goes away, the program doesnt get woken up. this is because the kq > read filter in bpf only checks if there ares bytes available. because a > detached interface never gets packets (how very zen), this condition > never changes and the program will never know something happened. > > this has the bpf filter check if the interface is detached too. with > this change my test program wakes up, tries to read, and gets EIO. which > is great. > > note that in the middle of this is the vdevgone machinery. when an > interface is detached, bpfdetach gets called, which ends up calling > vdevgone. vdevgone sort of swaps out bpf on the currently open vdev with > some dead operations, part of which involves calling bpfclose() to try > and clean up the existing state associated with the vdev. bpfclose tries > to wake up any waiting listeners, which includes kq handlers. that's how > the kernel goes from an interface being detached to the bpf kq filter > being run. the bpf kq filter just has to check that the interface is > still attached.
I thought tun(4) had this same problem, but I wrote a test and couldn't reproduce it. tun works because it addresses the problem in a different way. Instead of having its own kq filter check if the device is dead or not, it calls klist_invalidate, which switches things around like the vdevgone/vop_revoke stuff does with the vdev. So an alternative way to solve this problem in bpf(4) would be the following: Index: bpf.c =================================================================== RCS file: /cvs/src/sys/net/bpf.c,v retrieving revision 1.203 diff -u -p -r1.203 bpf.c --- bpf.c 21 Jan 2021 12:33:14 -0000 1.203 +++ bpf.c 21 Apr 2021 00:54:30 -0000 @@ -401,6 +401,7 @@ bpfclose(dev_t dev, int flag, int mode, bpf_wakeup(d); LIST_REMOVE(d, bd_list); mtx_leave(&d->bd_mtx); + klist_invalidate(&d->bd_sel.si_note); bpf_put(d); return (0);