On Fri, 19 Oct 2018 22:32:28 +0200 Daniel Borkmann <dan...@iogearbox.net> wrote:
> On 10/19/2018 06:47 PM, Joe Stringer wrote: > > On Thu, 18 Oct 2018 at 22:07, Martin Lau <ka...@fb.com> wrote: > >> On Thu, Oct 18, 2018 at 04:52:40PM -0700, Joe Stringer wrote: > >>> On Thu, 18 Oct 2018 at 14:20, Daniel Borkmann > >>> <dan...@iogearbox.net> wrote: > >>>> On 10/18/2018 11:06 PM, Joe Stringer wrote: > >>>>> On Thu, 18 Oct 2018 at 11:54, Nitin Hande > >>>>> <nitin.ha...@gmail.com> wrote: > >>>> [...] > >>>>>> Open Issue > >>>>>> * The underlying code relies on presence of an skb to find out > >>>>>> the right sk for the case of REUSEPORT socket option. Since > >>>>>> there is no skb available at XDP hookpoint, the helper > >>>>>> function will return the first available sk based off the 5 > >>>>>> tuple hash. If the desire is to return a particular sk > >>>>>> matching reuseport_cb function, please suggest way to tackle > >>>>>> it, which can be addressed in a future commit. > >>>> > >>>>>> Signed-off-by: Nitin Hande <nitin.ha...@gmail.com> > >>>>> > >>>>> Thanks Nitin, LGTM overall. > >>>>> > >>>>> The REUSEPORT thing suggests that the usage of this helper from > >>>>> XDP layer may lead to a different socket being selected vs. the > >>>>> equivalent call at TC hook, or other places where the selection > >>>>> may occur. This could be a bit counter-intuitive. > >>>>> > >>>>> One thought I had to work around this was to introduce a flag, > >>>>> something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would > >>>>> effectively communicate in the API that the bpf_sk_lookup_xxx() > >>>>> functions will only select a REUSEPORT socket based on the hash > >>>>> and not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs. > >>>>> The absence of the flag would support finding REUSEPORT sockets > >>>>> by other mechanisms (which would be allowed for now from TC > >>>>> hooks but would be disallowed from XDP, since there's no > >>>>> specific plan to support this). > >>>> > >>>> Hmm, given skb is NULL here the only way to lookup the socket in > >>>> such scenario is based on hash, that is, inet_ehashfn() / > >>>> inet6_ehashfn(), perhaps alternative is to pass this hash in > >>>> from XDP itself to the helper so it could be custom selector. Do > >>>> you have a specific use case on this for XDP (just curious)? > >>> > >>> I don't have a use case for SO_REUSEPORT introspection from XDP, > >>> so I'm primarily thinking from the perspective of making the > >>> behaviour clear in the API in a way that leaves open the > >>> possibility for a reasonable implementation in future. From that > >>> perspective, my main concern is that it may surprise some BPF > >>> writers that the same "bpf_sk_lookup_tcp()" call (with identical > >>> parameters) may have different behaviour at TC vs. XDP layers, as > >>> the BPF selection of sockets is respected at TC but not at XDP. > >>> > >>> FWIW we're already out of parameters for the actual call, so if we > >>> wanted to allow passing a hash in, we'd need to either dedicate > >>> half the 'flags' field for this configurable hash, or consider > >>> adding the new hash parameter to 'struct bpf_sock_tuple'. > >>> > >>> +Martin for any thoughts on SO_REUSEPORT and XDP here. > >> The XDP/TC prog has read access to the sk fields through > >> 'struct bpf_sock'? > >> > >> A quick thought... > >> Considering all sk in the same reuse->socks[] share > >> many things (e.g. family,type,protocol,ip,port..etc are the same), > >> I wonder returning which particular sk from reuse->socks[] will > >> matter too much since most of the fields from 'struct bpf_sock' > >> will be the same. Some of fields in 'struct bpf_sock' could be > >> different though, like priority? Hence, another possibility is to > >> limit the accessible fields for the XDP prog. Only allow > >> accessing the fields that must be the same among the sk in the > >> same reuse->socks[]. > > > > This sounds pretty reasonable to me. > > Agree, and in any case this difference in returned sk selection should > probably also be documented in the uapi helper description. Okay, will do in a v2. Thanks Nitin