Re: [net-next PATCH 06/10] bpf: sockmap with sk redirect support

John Fastabend Fri, 18 Aug 2017 20:32:59 -0700

[...] (trimmed email leaving proposal - 1 summary)

>>
>>  syscall:
>>
>>   bpf_create_map(BPF_MAP_TYPE_SOCKMAP, .... )
>>   bpf_prog_attach(verdict_prog, map_fd, BPF_SMAP_STREAM_VERDICT, 0);
>>   bpf_prog_attach(parse_prog, map_fd, BPF_SMAP_STREAM_PARSER, 0);
>>   bpf_map_update_elem(map_fd, key, sock_fd, BPF_ANY)
>>   bpf_map_delete_elem(map_fd, key)
>>
>>  helpers:
>>   to insert sock from sock ops progrm
>>       bpf_sock_map_update(skops, map, key, flags);
>>   to redirect skb to a sock in a sockmap
>>       bpf_sk_redirect_map(map, key, flags)
>>
>>  future work:
>>   bpf_prog_attach(verdict_prog, map_fd, BPF_SOCK_STREAM_VERDICT, 0)
>>   bpf_prog_attach(parse_prog, map_fd, BPF_SOCK_STREAM_PARSER, 0)
>>
>> How does this look? I think it will be both extensible and very usable
>> now.
> 
> Above sounds much better than the present situation.
> Can we take it even further and split psock from sockmap?


So psock data structure itself is almost entirely split at this
point anyways. The remaining two pieces are back pointers to the
map and a key so we can remove the entry when needed. The reason
is we need to remove the map entry when the socket is closed.

We use the sock insertion into the map as the trigger to create
the psock.

> My understanding that psock->key is there only because you tied
> psock with the map and using map as a storage for the rx socket

Almost. To clarify the psock only ever uses the key/map entries
to remove itself on a TCP state change that would close the sock.
.
> imo separating rx and tx sockets will make it cleaner.
> Like we can have new syscall cmd that creates psock that holds
> strpaser, verdict and potentially other programs.
> Later sock ops program will use a helper:
> bpf_psock_update(skops, psock_obj_handle, flags);
> to assign single skops socket into this psock object.
> The programs (strparser, verdict) will be applied to this skops socket,
> so your inheritance requirement is satisfied.
> And use sockmap only for TX sockets. Either user space via syscall
> will store them in there or sockops program will store them into the map
> via bpf_sock_map_update(skops, sockmap, key, flags); helper.
> Later the verdict program will use
> bpf_sk_redirect_map(sockmap, key, flags);
> and for the program author no need to worry about 'type' of socket
> in the sockmap. All sockets in there are TX sockets to redirect to.
> And the same verdict program can use multiple sockmaps.
> Similarly user space can create multiple psock objects with
> same strparser+verdict programs or different and sockops prog
> can pick and choose which psock to use to assign RX socket into.
> 

I prefer the alternative below it seems a bit cleaner to me. I think
a very similar programming flow to the above can be achieved using
the primitives below.

> Another alternative:
> Instead of new psock object to store single socket (like current
> implementation does), we can do two types of sockmap.
> One for a set of RX sockets. All of them will have the same
> strparser+verdict progs and psock with skbuff queue will be part
> of this sockmap type.
> And another sockmap type for TX sockets that don't have skbuff queues

Clarification the skbuff queue in the psock data structure,

        struct sk_buff_head rxqueue;

is attached to the TX socket actually and run through the workqueue.
The naming is perhaps unfortunate, it made sense to me because the
sending sock is receiving skbs from multiple socks.

> at all and can only be used to redirect the RX socket into.
> So bpf_rx_sock_map_update() helper will be used only on RX_SOCKMAP map
> and bpf_tx_sock_map_update() helper will be used only on TX_SOCKMAP,
> while bpf_sk_redirect_map() can only be used on TX_SOCKMAP.
> 

So this is really close to what I proposed above. For a TX_SOCKMAP
simply do not attach any programs,

   bpf_create_map(BPF_MAP_TYPE_SOCKMAP, .... )
   [...]

For an RX_SOCKMAP,

   bpf_create_map(BPF_MAP_TYPE_SOCKMAP, .... )
   bpf_prog_attach(verdict_prog, map_fd, BPF_SMAP_STREAM_VERDICT, 0);
   bpf_prog_attach(parse_prog, map_fd, BPF_SMAP_STREAM_PARSER, 0);   

With the new attach type (compared to the fd2 thing before) we can easily
extend maps to contain other program types as needed. So in the future
we might have TX_SOCKMAP, RX_SOCKMAP, FOO_SOCKMAP, ...

I don't see the need to have the API enforce the map type via update
specifiers bpf_{rx|tx}_sock_map_update. The programmer should "know"
the type by virtue of the programs attached. This is more flexible
as well because it allows a map to be TX only, RX only or TX/RX.

With this proposal we can relax the restriction where a sock can only
be in a single map and even allow a sock to be in the same map multiple
times. The limitation we do have to enforce is allowing a sock in the
a map with different BPF_SMAP_STREAM_* programs. But I think this
should be clear to the programmer (with good tracing functions and
error codes).

Slight aside: but by creating map size of 1 we have an object that
contains programs and later we can attach a sock to it, looks like
the following,

      create_map(BPF_MAP_TYPE_SOCKMAP,...)
      bpf_prog_attach(...) 
      [...]
      bpf_update_map_elem(fd, map, key, flags)

I think this is very close to your first approach where you suggested
a program container object.

> Or you have cases when two RX sockets need to redirect into each
> other and in both cases strparser+verdict need to run?

If we don't do rx, tx restrictions and use my suggestion here we
don't have this limitation. OR because we allow socks in multiple
maps now the user can simply put the sockets in different maps.

> In such case we need to allow bpf_sk_redirect_map() to use on
> RX_SOCKMAP map as well,
> but looking at current implementation you only allow one psock per map,
> so two sockets forwarding to each other cannot work due to only one queue.
> Am I missing anything from what you want to achieve?

I don't think so. But lets get rid of the one psock per map, I took a shot
at relaxing that today and was able to get it with a refcount on the psock
which seems to work OK.

Also reorganizing the psock structure into clear sections tx_psock, rx_psock,
general_psock will probably help readers.

> Thoughts?
> 

What do you think of my counter proposal I started coding it up and it
actually (other than pushing code snippets around) seems to work out
nicely with the existing code base. I think it is really a nice improvement.

Thanks,
John

Re: [net-next PATCH 06/10] bpf: sockmap with sk redirect support

Reply via email to