On Wed, Sep 4, 2019 at 8:23 AM Eric Dumazet <eric.duma...@gmail.com> wrote:
>
>
>
> On 9/4/19 2:00 PM, Mark KEATON wrote:
> > Hi Willem,
> >
> > I am the person who commented on the original bug report in bugzilla.
> >
> > In communicating with Steve just now about possible solutions that maintain 
> > the efficiency that you are after, what would you think of the following:  
> > keep two lists of UDP sockets, those connected and those not connected, and 
> > always searching the connected list first.
>
> This was my suggestion.
>
> Note that this requires adding yet another hash table, and yet another lookup
> (another cache line miss per incoming packet)
>
> This lookup will slow down DNS and QUIC servers, or any application solely 
> using not connected sockets.

Exactly.

The only way around it that I see is to keep the single list and
optionally mark a struct reuseport_sock as having no connected
members, in which case the search can break on the first reuseport
match, as it does today.

"
On top of the main patch it requires something like

@@ -22,6 +22,7 @@ struct sock_reuseport {
        /* ID stays the same even after the size of socks[] grows. */
        unsigned int            reuseport_id;
        bool                    bind_inany;
+       unsigned int             connected;
        struct bpf_prog __rcu   *prog;          /* optional BPF sock selector */
        struct sock             *socks[0];      /* array of sock pointers */
 };

@@ -73,6 +74,15 @@ int __ip4_datagram_connect(struct sock *sk, struct
sockaddr *uaddr, int addr_len
        sk_set_txhash(sk);
        inet->inet_id = jiffies;

+       if (rcu_access_pointer(sk->sk_reuseport_cb)) {
+               struct sock_reuseport *reuse;
+
+               rcu_read_lock();
+               reuse = rcu_dereference(sk->sk_reuseport_cb);
+               reuse->connected = 1;
+               rcu_read_unlock();
+       }
+
        sk_dst_set(sk, &rt->dst);
        err = 0;
"

plus a way for reuseport_select_sock to communicate that. Probably a
variant __reuseport_select_sock with an extra argument.

As for BPF: the example I pointed out does read ip addresses and uses
a BPF map for socket selection. But as that feature is new with 4.19
it is probably moot for this purpose, as we are targeting a fix that
can be backported to 4.19 stable.

Reply via email to