On Wed, Apr 21, 2021 at 01:15:53PM +0000, Visa Hankala wrote: > On Wed, Apr 21, 2021 at 11:04:20AM +1000, David Gwynne wrote: > > On Wed, Apr 21, 2021 at 10:21:32AM +1000, David Gwynne wrote: > > > if you have a program that uses kq (or libevent) to wait for bytes to > > > read off an idle network interface via /dev/bpf and that interface > > > goes away, the program doesnt get woken up. this is because the kq > > > read filter in bpf only checks if there ares bytes available. because a > > > detached interface never gets packets (how very zen), this condition > > > never changes and the program will never know something happened. > > > > > > this has the bpf filter check if the interface is detached too. with > > > this change my test program wakes up, tries to read, and gets EIO. which > > > is great. > > > > > > note that in the middle of this is the vdevgone machinery. when an > > > interface is detached, bpfdetach gets called, which ends up calling > > > vdevgone. vdevgone sort of swaps out bpf on the currently open vdev with > > > some dead operations, part of which involves calling bpfclose() to try > > > and clean up the existing state associated with the vdev. bpfclose tries > > > to wake up any waiting listeners, which includes kq handlers. that's how > > > the kernel goes from an interface being detached to the bpf kq filter > > > being run. the bpf kq filter just has to check that the interface is > > > still attached. > > > > I thought tun(4) had this same problem, but I wrote a test and couldn't > > reproduce it. tun works because it addresses the problem in a different > > way. Instead of having its own kq filter check if the device is dead or > > not, it calls klist_invalidate, which switches things around like the > > vdevgone/vop_revoke stuff does with the vdev. > > > > So an alternative way to solve this problem in bpf(4) would be the > > following: > > > > Index: bpf.c > > =================================================================== > > RCS file: /cvs/src/sys/net/bpf.c,v > > retrieving revision 1.203 > > diff -u -p -r1.203 bpf.c > > --- bpf.c 21 Jan 2021 12:33:14 -0000 1.203 > > +++ bpf.c 21 Apr 2021 00:54:30 -0000 > > @@ -401,6 +401,7 @@ bpfclose(dev_t dev, int flag, int mode, > > bpf_wakeup(d); > > LIST_REMOVE(d, bd_list); > > mtx_leave(&d->bd_mtx); > > + klist_invalidate(&d->bd_sel.si_note); > > bpf_put(d); > > > > return (0); > > I think bpf should call klist_invalidate() from the detach path. > bpfsdetach() might be the right place. This would make the code pattern > similar to the existing uses of klist_invalidate(). > > Calling klist_invalidate() from the close function twists the logic, > at least in my opinion. When a file descriptor is closed, the file > descriptor layer will remove the knotes automatically. This is why close > functions usually do not have to manage with knotes. However, the > automatic removal does not happen when a device is revoked, which is > mended by klist_invalidate().
yep, makes sense to me. how's this look? it works as well as my previous diff did in my tests. Index: bpf.c =================================================================== RCS file: /cvs/src/sys/net/bpf.c,v retrieving revision 1.203 diff -u -p -r1.203 bpf.c --- bpf.c 21 Jan 2021 12:33:14 -0000 1.203 +++ bpf.c 22 Apr 2021 03:09:27 -0000 @@ -1690,8 +1690,10 @@ bpfsdetach(void *p) if (cdevsw[maj].d_open == bpfopen) break; - while ((bd = SMR_SLIST_FIRST_LOCKED(&bp->bif_dlist))) + while ((bd = SMR_SLIST_FIRST_LOCKED(&bp->bif_dlist))) { vdevgone(maj, bd->bd_unit, bd->bd_unit, VCHR); + klist_invalidate(&bd->bd_sel.si_note); + } for (tbp = bpf_iflist; tbp; tbp = tbp->bif_next) { if (tbp->bif_next == bp) {