On 10/19/2018 06:47 PM, Joe Stringer wrote: > On Thu, 18 Oct 2018 at 22:07, Martin Lau <ka...@fb.com> wrote: >> On Thu, Oct 18, 2018 at 04:52:40PM -0700, Joe Stringer wrote: >>> On Thu, 18 Oct 2018 at 14:20, Daniel Borkmann <dan...@iogearbox.net> wrote: >>>> On 10/18/2018 11:06 PM, Joe Stringer wrote: >>>>> On Thu, 18 Oct 2018 at 11:54, Nitin Hande <nitin.ha...@gmail.com> wrote: >>>> [...] >>>>>> Open Issue >>>>>> * The underlying code relies on presence of an skb to find out the >>>>>> right sk for the case of REUSEPORT socket option. Since there is >>>>>> no skb available at XDP hookpoint, the helper function will return >>>>>> the first available sk based off the 5 tuple hash. If the desire >>>>>> is to return a particular sk matching reuseport_cb function, please >>>>>> suggest way to tackle it, which can be addressed in a future commit. >>>> >>>>>> Signed-off-by: Nitin Hande <nitin.ha...@gmail.com> >>>>> >>>>> Thanks Nitin, LGTM overall. >>>>> >>>>> The REUSEPORT thing suggests that the usage of this helper from XDP >>>>> layer may lead to a different socket being selected vs. the equivalent >>>>> call at TC hook, or other places where the selection may occur. This >>>>> could be a bit counter-intuitive. >>>>> >>>>> One thought I had to work around this was to introduce a flag, >>>>> something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would >>>>> effectively communicate in the API that the bpf_sk_lookup_xxx() >>>>> functions will only select a REUSEPORT socket based on the hash and >>>>> not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs. The absence >>>>> of the flag would support finding REUSEPORT sockets by other >>>>> mechanisms (which would be allowed for now from TC hooks but would be >>>>> disallowed from XDP, since there's no specific plan to support this). >>>> >>>> Hmm, given skb is NULL here the only way to lookup the socket in such >>>> scenario is based on hash, that is, inet_ehashfn() / inet6_ehashfn(), >>>> perhaps alternative is to pass this hash in from XDP itself to the >>>> helper so it could be custom selector. Do you have a specific use case >>>> on this for XDP (just curious)? >>> >>> I don't have a use case for SO_REUSEPORT introspection from XDP, so >>> I'm primarily thinking from the perspective of making the behaviour >>> clear in the API in a way that leaves open the possibility for a >>> reasonable implementation in future. From that perspective, my main >>> concern is that it may surprise some BPF writers that the same >>> "bpf_sk_lookup_tcp()" call (with identical parameters) may have >>> different behaviour at TC vs. XDP layers, as the BPF selection of >>> sockets is respected at TC but not at XDP. >>> >>> FWIW we're already out of parameters for the actual call, so if we >>> wanted to allow passing a hash in, we'd need to either dedicate half >>> the 'flags' field for this configurable hash, or consider adding the >>> new hash parameter to 'struct bpf_sock_tuple'. >>> >>> +Martin for any thoughts on SO_REUSEPORT and XDP here. >> The XDP/TC prog has read access to the sk fields through >> 'struct bpf_sock'? >> >> A quick thought... >> Considering all sk in the same reuse->socks[] share >> many things (e.g. family,type,protocol,ip,port..etc are the same), >> I wonder returning which particular sk from reuse->socks[] will >> matter too much since most of the fields from 'struct bpf_sock' will >> be the same. Some of fields in 'struct bpf_sock' could be different >> though, like priority? Hence, another possibility is to limit the >> accessible fields for the XDP prog. Only allow accessing the fields >> that must be the same among the sk in the same reuse->socks[]. > > This sounds pretty reasonable to me.
Agree, and in any case this difference in returned sk selection should probably also be documented in the uapi helper description.