On Tue, Mar 2, 2021 at 8:22 AM Lorenz Bauer <[email protected]> wrote:
>
> On Tue, 2 Mar 2021 at 02:37, Cong Wang <[email protected]> wrote:
>
> ...
> > static inline void sk_psock_restore_proto(struct sock *sk,
> > struct sk_psock *psock)
> > {
> > sk->sk_prot->unhash = psock->saved_unhash;
>
> Not related to your patch set, but why do an extra restore of
> sk_prot->unhash here? At this point sk->sk_prot is one of our tcp_bpf
> / udp_bpf protos, so overwriting that seems wrong?
Good catch. It seems you are right, but I need a double check. And
yes, it is completely unrelated to my patch, as the current code has
the same problem.
>
> > - if (inet_csk_has_ulp(sk)) {
> > - tcp_update_ulp(sk, psock->sk_proto,
> > psock->saved_write_space);
> > - } else {
> > - sk->sk_write_space = psock->saved_write_space;
> > - /* Pairs with lockless read in sk_clone_lock() */
> > - WRITE_ONCE(sk->sk_prot, psock->sk_proto);
> > - }
> > + if (psock->saved_update_proto)
> > + psock->saved_update_proto(sk, true);
> > }
> >
> > static inline void sk_psock_set_state(struct sk_psock *psock,
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 636810ddcd9b..0e8577c917e8 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -1184,6 +1184,9 @@ struct proto {
> > void (*unhash)(struct sock *sk);
> > void (*rehash)(struct sock *sk);
> > int (*get_port)(struct sock *sk, unsigned short
> > snum);
> > +#ifdef CONFIG_BPF_SYSCALL
> > + int (*update_proto)(struct sock *sk, bool
> > restore);
>
> Kind of a nit, but this name suggests that the callback is a lot more
> generic than it really is. The only thing you can use it for is to
> prep the socket to be sockmap ready since we hardwire sockmap_unhash,
> etc. It's also not at all clear that this only works if sk has an
> sk_psock associated with it. Calling it without one would crash the
> kernel since the update_proto functions don't check for !sk_psock.
>
> Might as well call it install_sockmap_hooks or something and have a
> valid sk_psock be passed in to the callback. Additionally, I'd prefer
For the name, sure, I am always open to better names. Not sure if
'install_sockmap_hooks' is a good name, I also want to express we
are overriding sk_prot. How about 'psock_update_sk_prot'?
> if the function returned a struct proto * like it does at the moment.
> That way we keep sk->sk_prot manipulation confined to the sockmap code
> and don't have to copy paste it into every proto.
Well, TCP seems too special to do this, as it could call tcp_update_ulp()
to update the proto.
>
> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> > index 3bddd9dd2da2..13d2af5bb81c 100644
> > --- a/net/core/sock_map.c
> > +++ b/net/core/sock_map.c
> > @@ -184,26 +184,10 @@ static void sock_map_unref(struct sock *sk, void
> > *link_raw)
> >
> > static int sock_map_init_proto(struct sock *sk, struct sk_psock *psock)
> > {
> > - struct proto *prot;
> > -
> > - switch (sk->sk_type) {
> > - case SOCK_STREAM:
> > - prot = tcp_bpf_get_proto(sk, psock);
> > - break;
> > -
> > - case SOCK_DGRAM:
> > - prot = udp_bpf_get_proto(sk, psock);
> > - break;
> > -
> > - default:
> > + if (!sk->sk_prot->update_proto)
> > return -EINVAL;
> > - }
> > -
> > - if (IS_ERR(prot))
> > - return PTR_ERR(prot);
> > -
> > - sk_psock_update_proto(sk, psock, prot);
> > - return 0;
> > + psock->saved_update_proto = sk->sk_prot->update_proto;
> > + return sk->sk_prot->update_proto(sk, false);
>
> I think reads / writes from sk_prot need READ_ONCE / WRITE_ONCE. We've
> not been diligent about this so far, but I think it makes sense to be
> careful in new code.
Hmm, there are many places not using READ_ONCE/WRITE_ONCE,
for a quick example:
void sock_map_unhash(struct sock *sk)
{
void (*saved_unhash)(struct sock *sk);
struct sk_psock *psock;
rcu_read_lock();
psock = sk_psock(sk);
if (unlikely(!psock)) {
rcu_read_unlock();
if (sk->sk_prot->unhash)
sk->sk_prot->unhash(sk);
return;
}
saved_unhash = psock->saved_unhash;
sock_map_remove_links(sk, psock);
rcu_read_unlock();
saved_unhash(sk);
}
Thanks.