Stanislav Fomichev <s...@fomichev.me> [Fri, 2019-01-18 08:50 -0800]: > On 01/18, Andrey Ignatov wrote: > > Stanislav Fomichev <s...@google.com> [Thu, 2019-01-17 16:41 -0800]: > > > Currently, we have BPF_CGROUP_INET_SOCK_CREATE hook that triggers on > > > socket creation and there is no way to know when the socket is being > > > closed. Add new set of hooks BPF_CGROUP_INET{4,6}_SOCK_RELEASE > > > that trigger when the socket is closed. > > > > > > Initial intended usecase is to cleanup statistics after POST{4,6}_BIND. > > > Hooks have read-only access to all fields of struct bpf_sock. > > > > Do you need it for both TCP and UDP? > Yes, we need both TCP and UDP. Although, UDP is tricky in general with > the connected/unconnected cases. > > > I was thinking about this hook earlier but since in my case only TCP was > > needed I ended up using TCP-BPF. E.g. be BPF_SOCK_OPS_TCP_LISTEN_CB or > > BPF_SOCK_OPS_TCP_CONNECT_CB can be used instead of POST{4,6}_BIND to > > enable something, and then BPF_SOCK_OPS_STATE_CB can be used instead of > > SOCK_RELEASE to disable that something when socket transisions to > > BPF_TCP_CLOSE (e.g. BPF_TCP_LISTEN -> BPF_TCP_CLOSE). > > > > That turned out to be much cleaner than POST{4,6}_BIND and also works > > fine when socket is disconnected with AF_UNSPEC and then connected again > > (what Eric mentioned). > > What if we do something like the patch below? Add pre_release hook (like we > currently have for pre_connect) and call it from connect(AF_UNSPEC) and > from inet_release? Any concerns here?
That's hard to say w/o fully understanding the use-case. Could you provide more details on it please? Is my understanding correct that some statistics is needed only for _client_ sockets (since we're talking about connect) and these client sockets are always bound to local IP:port (or maybe IP only with IP_BIND_ADDRESS_NO_PORT?) before sending data and that's why POST{4,6}_BIND is used to start gathering statistics? And then later, there should be some event to cleanup statistics and you chose ops->release for this? If so what's kind of statistics is needed? Is it per-destination statistics (since AF_UNSPEC matters?)? If so then it should be cleaned up whenever destination is changed or when socket is disconnected the last time. For UDP socket connect(2) can be called many times with different destinations even w/o AF_UNSPEC in between (in contrast to TCP). So your patch with pre_release below won't handle it. Furthermore even if connect(2) was used to set destination, sendmsg(2) can override it for individual datagram. If per-destination statistics is needed for UDP client socket then IMO both "start events" should be handled: BPF_CGROUP_INET{4,6}_CONNECT (connected UDP) and BPF_CGROUP_UDP{4,6}_SENDMSG (unconnected UDP). The former can be used to "close" previous statistics "window" (if it's not the first connect) as well. The latter can be used to created "separate" statistics "window" for destination of current datagram only. And the only case left, from what I see, is when socket is stopped being used completely. That's, again, hard to say how to handle it better w/o understanding use-case (e.g. since it's only UDP problem it may make sense to implement in UDP stack). > (I agree that TCP is probably better handled via BPF_SOCK_OPS_TCP_XYZ hooks, > but we need something for UDP as well) Do you think you may handle TCP with TCP-BPF and this patch set may focus on UDP only if there is no good attach point that covers both TCP and UDP? That may lead to different BPF programs for TCP and UDP but that may be hard to avoid anyway (BPF_CGROUP_UDP{4,6}_SENDMSG is a good example here). > > -- > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index b703ad242365..ee3dc181df8f 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -568,8 +568,11 @@ int inet_dgram_connect(struct socket *sock, struct > sockaddr *uaddr, > > if (addr_len < sizeof(uaddr->sa_family)) > return -EINVAL; > - if (uaddr->sa_family == AF_UNSPEC) > + if (uaddr->sa_family == AF_UNSPEC) { > + if (BPF_CGROUP_PRE_RELEASE_ENABLED(sk)) > + sk->sk_prot->pre_release(sk); > return sk->sk_prot->disconnect(sk, flags); > + } > > if (BPF_CGROUP_PRE_CONNECT_ENABLED(sk)) { > err = sk->sk_prot->pre_connect(sk, uaddr, addr_len); > @@ -632,6 +635,8 @@ int __inet_stream_connect(struct socket *sock, struct > sockaddr *uaddr, > return -EINVAL; > > if (uaddr->sa_family == AF_UNSPEC) { > + if (BPF_CGROUP_PRE_RELEASE_ENABLED(sk)) > + sk->sk_prot->pre_release(sk); > err = sk->sk_prot->disconnect(sk, flags); > sock->state = err ? SS_DISCONNECTING : SS_UNCONNECTED; > goto out; > > > > First patch adds hooks, the rest of the patches add uapi and tests to make > > > sure these hooks work. > > > > > > Stanislav Fomichev (5): > > > bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE hooks > > > tools: bpf: support BPF_CGROUP_INET{4,6}_SOCK_RELEASE in > > > libbpf/bpftool > > > selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to > > > test_section_names.c > > > selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to test_sock.c > > > selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to > > > test_sock_addr.c > > > > > > include/linux/bpf-cgroup.h | 6 + > > > include/net/inet_common.h | 1 + > > > include/uapi/linux/bpf.h | 2 + > > > kernel/bpf/syscall.c | 8 ++ > > > net/core/filter.c | 7 + > > > net/ipv4/af_inet.c | 13 +- > > > net/ipv6/af_inet6.c | 5 +- > > > tools/bpf/bpftool/cgroup.c | 2 + > > > tools/include/uapi/linux/bpf.h | 2 + > > > tools/lib/bpf/libbpf.c | 4 + > > > .../selftests/bpf/test_section_names.c | 10 ++ > > > tools/testing/selftests/bpf/test_sock.c | 119 ++++++++++++++++ > > > tools/testing/selftests/bpf/test_sock_addr.c | 131 +++++++++++++++++- > > > 13 files changed, 307 insertions(+), 3 deletions(-) > > > > > > -- > > > 2.20.1.321.g9e740568ce-goog > > > > > > > -- > > Andrey Ignatov -- Andrey Ignatov