On Wed, Sep 4, 2019 at 8:23 AM Eric Dumazet <eric.duma...@gmail.com> wrote: > > > > On 9/4/19 2:00 PM, Mark KEATON wrote: > > Hi Willem, > > > > I am the person who commented on the original bug report in bugzilla. > > > > In communicating with Steve just now about possible solutions that maintain > > the efficiency that you are after, what would you think of the following: > > keep two lists of UDP sockets, those connected and those not connected, and > > always searching the connected list first. > > This was my suggestion. > > Note that this requires adding yet another hash table, and yet another lookup > (another cache line miss per incoming packet) > > This lookup will slow down DNS and QUIC servers, or any application solely > using not connected sockets.
Exactly. The only way around it that I see is to keep the single list and optionally mark a struct reuseport_sock as having no connected members, in which case the search can break on the first reuseport match, as it does today. " On top of the main patch it requires something like @@ -22,6 +22,7 @@ struct sock_reuseport { /* ID stays the same even after the size of socks[] grows. */ unsigned int reuseport_id; bool bind_inany; + unsigned int connected; struct bpf_prog __rcu *prog; /* optional BPF sock selector */ struct sock *socks[0]; /* array of sock pointers */ }; @@ -73,6 +74,15 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len sk_set_txhash(sk); inet->inet_id = jiffies; + if (rcu_access_pointer(sk->sk_reuseport_cb)) { + struct sock_reuseport *reuse; + + rcu_read_lock(); + reuse = rcu_dereference(sk->sk_reuseport_cb); + reuse->connected = 1; + rcu_read_unlock(); + } + sk_dst_set(sk, &rt->dst); err = 0; " plus a way for reuseport_select_sock to communicate that. Probably a variant __reuseport_select_sock with an extra argument. As for BPF: the example I pointed out does read ip addresses and uses a BPF map for socket selection. But as that feature is new with 4.19 it is probably moot for this purpose, as we are targeting a fix that can be backported to 4.19 stable.