On Mon, Feb 18, 2019 at 11:30 PM Cong Wang <xiyou.wangc...@gmail.com> wrote: > > On Fri, Feb 15, 2019 at 2:47 PM Willem de Bruijn > <willemdebruijn.ker...@gmail.com> wrote: > > > > > > > 2. Why there is no socket option for sysctl.net.busy_poll? Clearly > > > > > sysctl_net_busy_poll is global and SO_BUSY_POLL only works for > > > > > sysctl.net.busy_read. > > > > > > > > I guess because of how sock_poll works. In that case it is not needed. > > > > The poll duration applies more to the pollset than any of the > > > > individual sockets, too. > > > > > > > > > Good point, it's probably like struct eventpoll vs. struct epitem. > > > > > > The reason why I am looking for a per-socket tuning is to minimize > > > the impact of setting busy_poll. I don't know if it is possible to somehow > > > make this per-socket via epoll interfaces, perhaps fundamentally > > > it is impossible? > > > > I think it may be possible. The way busy_read and busy_poll work > > in sock_poll is that the sum of all (per socket tunable) busy_read > > durations on the sockets in the pollset is ~bound by (global) busy_poll. > > > Good idea!! > > I was actually thinking about checking sk->sk_ll_usec > ep_set_busy_poll_napi_id(). Your idea sounds better than mine. > Maybe we can do both together. :) > > > > > > The epoll implementation is restricted in the sense that it polls only > > on one napi_id at a time. Alongside setting ep->napi_id in > > ep_set_busy_poll_napi_id, we could also set a new ep field takes > > the min of the global busy_poll and sk->sk_ll_usec. > > > > Though I guess you want to be able to poll on a given pollset > > without setting the global sysctl_net_busy_poll at all? That > > would be a useful feature both for epoll and poll/select. But > > definitely requires refining net_busy_loop_on() to optionally > > take some state derived from the sockets in the (e)pollset. > > Yes, or at least give some control to each application. > I was thinking about this: > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index a5d219d920e7..1b104c8bda5e 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -432,7 +432,7 @@ static inline void ep_set_busy_poll_napi_id(struct > epitem *epi) > return; > > sk = sock->sk; > - if (!sk) > + if (!sk || !sk->sk_ll_usec) > return; > > napi_id = READ_ONCE(sk->sk_napi_id); > > With this change, busy_poll would rely on either busy_read or > SO_BUSY_POLL. > > Does this make any sense?
The main issue I see would be in doing it unconditionally, changing existing behavior. Perhaps we can add a socket flag and replace the e()pollset busy poll budget with sk->sk_ll_usec if set? The pollsets would need private fields initialized from sysctl_net_busy_poll. This also allows selectively enabling busy polling while leaving global net_core_busy_poll zero otherwise. This flag could for instance be passed through SO_BUSY_POLL by accepting an optlen of 2*sizeof(int). Just a thought. This is quickly getting a lot more complex than a nice one-line patch.