On Wed, Apr 21, 2021 at 10:21:32AM +1000, David Gwynne wrote:
> if you have a program that uses kq (or libevent) to wait for bytes to
> read off an idle network interface via /dev/bpf and that interface
> goes away, the program doesnt get woken up. this is because the kq
> read filter in bpf only checks if there ares bytes available. because a
> detached interface never gets packets (how very zen), this condition
> never changes and the program will never know something happened.
> 
> this has the bpf filter check if the interface is detached too. with
> this change my test program wakes up, tries to read, and gets EIO. which
> is great.
> 
> note that in the middle of this is the vdevgone machinery. when an
> interface is detached, bpfdetach gets called, which ends up calling
> vdevgone. vdevgone sort of swaps out bpf on the currently open vdev with
> some dead operations, part of which involves calling bpfclose() to try
> and clean up the existing state associated with the vdev. bpfclose tries
> to wake up any waiting listeners, which includes kq handlers. that's how
> the kernel goes from an interface being detached to the bpf kq filter
> being run. the bpf kq filter just has to check that the interface is
> still attached.

I thought tun(4) had this same problem, but I wrote a test and couldn't
reproduce it. tun works because it addresses the problem in a different
way. Instead of having its own kq filter check if the device is dead or
not, it calls klist_invalidate, which switches things around like the
vdevgone/vop_revoke stuff does with the vdev.

So an alternative way to solve this problem in bpf(4) would be the
following:

Index: bpf.c
===================================================================
RCS file: /cvs/src/sys/net/bpf.c,v
retrieving revision 1.203
diff -u -p -r1.203 bpf.c
--- bpf.c       21 Jan 2021 12:33:14 -0000      1.203
+++ bpf.c       21 Apr 2021 00:54:30 -0000
@@ -401,6 +401,7 @@ bpfclose(dev_t dev, int flag, int mode, 
        bpf_wakeup(d);
        LIST_REMOVE(d, bd_list);
        mtx_leave(&d->bd_mtx);
+       klist_invalidate(&d->bd_sel.si_note);
        bpf_put(d);
 
        return (0);

Reply via email to