On Thu, 13 Sep 2018 at 12:06, Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote: > > On Wed, Sep 12, 2018 at 5:06 PM, Alexei Starovoitov > <alexei.starovoi...@gmail.com> wrote: > > On Tue, Sep 11, 2018 at 05:36:36PM -0700, Joe Stringer wrote: > >> This patch adds new BPF helper functions, bpf_sk_lookup_tcp() and > >> bpf_sk_lookup_udp() which allows BPF programs to find out if there is a > >> socket listening on this host, and returns a socket pointer which the > >> BPF program can then access to determine, for instance, whether to > >> forward or drop traffic. bpf_sk_lookup_xxx() may take a reference on the > >> socket, so when a BPF program makes use of this function, it must > >> subsequently pass the returned pointer into the newly added sk_release() > >> to return the reference. > >> > >> By way of example, the following pseudocode would filter inbound > >> connections at XDP if there is no corresponding service listening for > >> the traffic: > >> > >> struct bpf_sock_tuple tuple; > >> struct bpf_sock_ops *sk; > >> > >> populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet > >> sk = bpf_sk_lookup_tcp(ctx, &tuple, sizeof tuple, netns, 0); > > ... > >> +struct bpf_sock_tuple { > >> + union { > >> + __be32 ipv6[4]; > >> + __be32 ipv4; > >> + } saddr; > >> + union { > >> + __be32 ipv6[4]; > >> + __be32 ipv4; > >> + } daddr; > >> + __be16 sport; > >> + __be16 dport; > >> + __u8 family; > >> +}; > > > > since we can pass ptr_to_packet into map lookup and other helpers now, > > can you move 'family' out of bpf_sock_tuple and combine with netns_id arg? > > then progs wouldn't need to copy bytes from the packet into tuple > > to do a lookup.
If I follow, you're proposing that users should be able to pass a pointer to the source address field of the L3 header, and assuming that the L3 header ends with saddr+daddr (no options/extheaders), and is immediately followed by the sport/dport then a packet pointer should work for performing socket lookup. Then it is up to the BPF program writer to ensure that this is the case, or otherwise fall back to populating a copy of the sock tuple on the stack. > have been thinking more about it. > since only ipv4 and ipv6 supported may be use size of bpf_sock_tuple > to infer family inside the helper, so it doesn't need to be passed explicitly? Let me make sure I understand the proposal here. The current structure and function prototypes are: struct bpf_sock_tuple { union { __be32 ipv6[4]; __be32 ipv4; } saddr; union { __be32 ipv6[4]; __be32 ipv4; } daddr; __be16 sport; __be16 dport; __u8 family; }; static struct bpf_sock *(*bpf_sk_lookup_tcp)(void *ctx, struct bpf_sock_tuple *tuple, int size, unsigned int netns_id, unsigned long long flags); static struct bpf_sock *(*bpf_sk_lookup_udp)(void *ctx, struct bpf_sock_tuple *tuple, int size, unsigned int netns_id, unsigned long long flags); static int (*bpf_sk_release)(struct bpf_sock *sk, unsigned long long flags); You're proposing something like: struct bpf_sock_tuple4 { __be32 saddr; __be32 daddr; __be16 sport; __be16 dport; __u8 family; }; struct bpf_sock_tuple6 { __be32 saddr[4]; __be32 daddr[4]; __be16 sport; __be16 dport; __u8 family; }; static struct bpf_sock *(*bpf_sk_lookup_tcp)(void *ctx, void *tuple, int size, unsigned int netns_id, unsigned long long flags); static struct bpf_sock *(*bpf_sk_lookup_udp)(void *ctx, void *tuple, int size, unsigned int netns_id, unsigned long long flags); static int (*bpf_sk_release)(struct bpf_sock *sk, unsigned long long flags); Then the implementation will check the size against either "sizeof(struct bpf_sock_tuple4)" or "sizeof(struct bpf_sock_tuple6)" and interpret as the v4 or v6 handler from this. Sure, I can try this out.