On Mon, May 14, 2018 at 08:16:59PM -0700, Alexei Starovoitov wrote: > On Fri, May 11, 2018 at 05:54:33PM -0700, Joe Stringer wrote: > > On 11 May 2018 at 14:41, Martin KaFai Lau <ka...@fb.com> wrote: > > > On Fri, May 11, 2018 at 02:08:01PM -0700, Joe Stringer wrote: > > >> On 10 May 2018 at 22:00, Martin KaFai Lau <ka...@fb.com> wrote: > > >> > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote: > > >> >> This patch adds a new BPF helper function, sk_lookup() which allows > > >> >> BPF > > >> >> programs to find out if there is a socket listening on this host, and > > >> >> returns a socket pointer which the BPF program can then access to > > >> >> determine, for instance, whether to forward or drop traffic. > > >> >> sk_lookup() > > >> >> takes a reference on the socket, so when a BPF program makes use of > > >> >> this > > >> >> function, it must subsequently pass the returned pointer into the > > >> >> newly > > >> >> added sk_release() to return the reference. > > >> >> > > >> >> By way of example, the following pseudocode would filter inbound > > >> >> connections at XDP if there is no corresponding service listening for > > >> >> the traffic: > > >> >> > > >> >> struct bpf_sock_tuple tuple; > > >> >> struct bpf_sock_ops *sk; > > >> >> > > >> >> populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet > > >> >> sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0); > > >> >> if (!sk) { > > >> >> // Couldn't find a socket listening for this traffic. Drop. > > >> >> return TC_ACT_SHOT; > > >> >> } > > >> >> bpf_sk_release(sk, 0); > > >> >> return TC_ACT_OK; > > >> >> > > >> >> Signed-off-by: Joe Stringer <j...@wand.net.nz> > > >> >> --- > > >> > > >> ... > > >> > > >> >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto > > >> >> bpf_skb_get_xfrm_state_proto = { > > >> >> }; > > >> >> #endif > > >> >> > > >> >> +struct sock * > > >> >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) { > > >> > Would it be possible to have another version that > > >> > returns a sk without taking its refcnt? > > >> > It may have performance benefit. > > >> > > >> Not really. The sockets are not RCU-protected, and established sockets > > >> may be torn down without notice. If we don't take a reference, there's > > >> no guarantee that the socket will continue to exist for the duration > > >> of running the BPF program. > > >> > > >> From what I follow, the comment below has a hidden implication which > > >> is that sockets without SOCK_RCU_FREE, eg established sockets, may be > > >> directly freed regardless of RCU. > > > Right, SOCK_RCU_FREE sk is the one I am concern about. > > > For example, TCP_LISTEN socket does not require taking a refcnt > > > now. Doing a bpf_sk_lookup() may have a rather big > > > impact on handling TCP syn flood. or the usual intention > > > is to redirect instead of passing it up to the stack? > > > > I see, if you're only interested in listen sockets then probably this > > series could be extended with a new flag, eg something like > > BPF_F_SK_FIND_LISTENERS which restricts the set of possible sockets > > found to only listen sockets, then the implementation would call into > > __inet_lookup_listener() instead of inet_lookup(). The presence of > > that flag in the relevant register during CALL instruction would show > > that the verifier should not reference-track the result, then there'd > > need to be a check on the release to ensure that this unreferenced > > socket is never released. Just a thought, completely untested and I > > could still be missing some detail.. > > > > That said, I don't completely follow how you would expect to handle > > the traffic for sockets that are already established - the helper > > would no longer find those sockets, so you wouldn't know whether to > > pass the traffic up the stack for established traffic or not. > > I think Martin has a valid concern here that if somebody starts using > this helper on the rx traffic the bpf program (via these two new > helpers) will be doing refcnt++ and refcnt-- even for listener > sockets which will cause synflood to suffer. > One can argue that this is bpf author mistake, but without fixes > (and api changes) to the helper the programmer doesn't really have a way > of avoiding this situation. > Also udp sockets don't need refcnt at all. > How about we split this single helper into three: > - bpf_sk_lookup_tcp_established() that will return refcnt-ed socket > and has to be bpf_sk_release()d by the program. > - bpf_sk_lookup_tcp_listener() that doesn't refcnt, since progs > run in rcu. > - bpf_sk_lookup_udp() that also doesn't refcnt. > The logic you want to put into this helper can be easily > replicated with these three helpers and the whole thing will > be much faster. > Thoughts? Just came to my mind.
or can we explore something like: On the bpf_sk_lookup() side, use __inet[6]_lookup() and __udp[46]_lib_lookup() instead. That should only take refcnt if it has to. On the bpf_sk_release() side, it skips refcnt-- if sk is SOCK_RCU_FREE.