On Wed, Sep 4, 2019 at 8:23 AM Eric Dumazet <[email protected]> wrote:
>
>
>
> On 9/4/19 2:00 PM, Mark KEATON wrote:
> > Hi Willem,
> >
> > I am the person who commented on the original bug report in bugzilla.
> >
> > In communicating with Steve just now about possible solutions that maintain
> > the efficiency that you are after, what would you think of the following:
> > keep two lists of UDP sockets, those connected and those not connected, and
> > always searching the connected list first.
>
> This was my suggestion.
>
> Note that this requires adding yet another hash table, and yet another lookup
> (another cache line miss per incoming packet)
>
> This lookup will slow down DNS and QUIC servers, or any application solely
> using not connected sockets.
Exactly.
The only way around it that I see is to keep the single list and
optionally mark a struct reuseport_sock as having no connected
members, in which case the search can break on the first reuseport
match, as it does today.
"
On top of the main patch it requires something like
@@ -22,6 +22,7 @@ struct sock_reuseport {
/* ID stays the same even after the size of socks[] grows. */
unsigned int reuseport_id;
bool bind_inany;
+ unsigned int connected;
struct bpf_prog __rcu *prog; /* optional BPF sock selector */
struct sock *socks[0]; /* array of sock pointers */
};
@@ -73,6 +74,15 @@ int __ip4_datagram_connect(struct sock *sk, struct
sockaddr *uaddr, int addr_len
sk_set_txhash(sk);
inet->inet_id = jiffies;
+ if (rcu_access_pointer(sk->sk_reuseport_cb)) {
+ struct sock_reuseport *reuse;
+
+ rcu_read_lock();
+ reuse = rcu_dereference(sk->sk_reuseport_cb);
+ reuse->connected = 1;
+ rcu_read_unlock();
+ }
+
sk_dst_set(sk, &rt->dst);
err = 0;
"
plus a way for reuseport_select_sock to communicate that. Probably a
variant __reuseport_select_sock with an extra argument.
As for BPF: the example I pointed out does read ip addresses and uses
a BPF map for socket selection. But as that feature is new with 4.19
it is probably moot for this purpose, as we are targeting a fix that
can be backported to 4.19 stable.