Hi, On Sat, 10 Feb 2018 14:08:58 +0100 Daniel Borkmann <dan...@iogearbox.net> wrote:
> Hi Shmulik, > > On 02/10/2018 08:46 AM, Shmulik Ladkani wrote: > > Hi, > > > > Apparently one cannot use TC cls_bpf/act_bpf if running from a user ns > > other than the init_user_ns, as bpf_prog_load does not permit loading > > these type of progs, snip: > > > > if (type != BPF_PROG_TYPE_SOCKET_FILTER && > > type != BPF_PROG_TYPE_CGROUP_SKB && > > !capable(CAP_SYS_ADMIN)) > > return -EPERM; > > > > although the user performing BPF_PROG_LOAD has both CAP_SYS_ADMIN (and > > CAP_NET_ADMIN, as required by RTM_NEWTFILTER) in his current_user_ns. > > > > This prevents using tc cls_bpf/act_bpf in containerized software > > stacks (where in contrast other tc cls/act are permitted). > > Not really, it's correct that it's initns root-only, but for containers > control plane can attach BPF progs out of initns into the host-facing > veth on ingress/egress clsact side to enforce policy, mangle packets etc. > The other option you would have is that controller would load and pin > the prog as a node into BPF fs and you can then get the fd and attach > it to to the veth inside the netns if this is what you're after (the > attach itself in the second step does not require anything extra compared > to rest of tc) provided the mount is shared at setup time (but could > later be removed in the container for example). Thanks Daniel for the suggestions. Unfortunately these won't do for our application; Assume for example a multi-tenant network service, where each container holds the application stack servicing a tenant. The host in this case is rather dumb. Moreover, the software stack in each container may create various network devices dynamically (e.g. tunnels, dummies) and needs to apply some cls/act on these dynamic virtual devices (and not on the initial veth itself). > In future it might be > subject to change to also enable it for userns under the constraint that > verifier puts more restrictions in place in roughly similar fashion to > current unpriv program types, that work just hasn't been tackled yet. How far are we from acheiving this? Can you point to what's missing, perhaps we can assist on the matter? Thanks, Shmulik