Hi Quentin,

On 04/20/2016 09:25 AM, Quentin Monnet wrote:
2016-04-15 (11:44 UTC-0700) ~ Alexei Starovoitov:
On Fri, Apr 15, 2016 at 12:41:05PM +0200, Daniel Borkmann wrote:
On 04/15/2016 12:07 PM, Quentin Monnet wrote:
When a new BPF traffic control filter or action is set up with tc, the
bytecode is sent back to userspace through a netlink socket for cBPF, but
not for eBPF (the file descriptor pointing to the object file containing
the bytecode is sent instead).

This patch makes cls_bpf and act_bpf modules send the bytecode for eBPF as
well (in addition to the file descriptor).

[…]

Thanks for working on this, but it's unfortunately not that easy. Let
me ask, what would be the intended use-case to dump the insns?

+1

I'm asking because if you dump them as-is, then a reinject at a later
time of that bytecode back into the kernel will most likely be rejected
by the verifier.

This is because on load time, verifier does rewrites/expansion on some
of the insns (f.e. map pointers, helper functions, ctx access etc, see
also appendix in [1]), so the code as seen in the kernel would need to
be sanitized first.

+1
we had similar discussion about this in seccomp context and decided that
the only sensible way is to keep original instructions, but it's wasteful
to do unconditionally and snapshotting of maps is not possible,
so there was no use for such dumping facility other than debugging.
Is it what the patch after?
We need to discuss it in the proper context.

I am experimenting with BPF, and so far I was just trying to dump the
bytecode sent from tc to the kernel. I had not realized that the
verifier would bring some changes to the instructions. And I agree that
a more comprehensive debugging solution could be obtained if I can find
some way to get a snapshot of the maps.

Also, how would you make sense/transform maps into a meaningful
representation (probably possible to find a scheme when they are pinned)?

Another possibility is that such programs need to be pinned (can be done
easily by tc in the background) and then implement a CRIU facility into
the bpf(2) syscall to retrieve them. tc could make use of this w/o too
much effort, and at the same time it would help CRIU folks, too. It
also seems cleaner to have only one central api (bpf(2)) to dump them,
but needs a bit of thought.

+1
any debugging or criu needs to be done in a centralized way via syscall
and/or bpffs.

Maintaining a central API around bpf() makes sense to me. I have been
looking at the BPF filesystem to see what information I can obtain from
it, but I did not understand it well. I read the logs of Daniel's commit
b2197755b263 (“bpf: add support for persistent maps/progs”), but I am
unsure how I could use it in order to gather data about the maps and
programs (if this is possible at all). I tried to set up some BPF

Currently, there's not yet much information to extract. F.e. if you look at
the tc source code, we do bpf_map_selfcheck_pinned() from fdinfo to check if
the map fd that we got from the pinned one fits to the one from the object
file. But obviously more work is needed for extraction of bytecode as in your
case.

Haven't thought much about it yet, but one idea could be that tc also pins
programs, then sends some kind of annotation down to cls_bpf where on filter
dump tc could retrieve the path to the pinned program again, then uses bpf(2)
with BPF_OBJ_GET to get the fd, and a new command e.g. BPF_PROG_DUMP to extract
bytecode/map info from the running program and dumps it to the user in a way
where some sense can be made out of it from admin/user perspective (in other
words, not just raw opcodes I mean).

BPF_PROG_DUMP could have auxiliary information with map specs, kind of in a
similar way like we retrieve them as relo entries from the object file in
the loader, and in addition some information where to retrieve the maps in
case they were pinned. This still doesn't give you a entire snapshot of the
map, but would at least allow you for the pinned ones to iterate over them
via bpf(2) with BPF_MAP_GET_NEXT_KEY, plus in general it would allow you to
reload the program.

There's still the issue with the additional memory overhead to keep original
insns around as Alexei mentioned. Two things that come to mind, one being
that when JITing was successful, we could actually try to shrink struct bpf_prog
again since we work on a different image, but it doesn't address the case
where JIT is not used. Other one being to perhaps only keep a 'diff' around
in orig_prog where we can patch insns back to original, probably possible,
but needs a bit of work though.

filters working with maps, but I could not find any file under
/sys/fs/bpf/tc.

There are some getting started examples under examples/bpf/ in the iproute2
repo, f.e. bpf_shared.c is one.

Would you have a pointer to some documentation about this filesystem? Or
is there only the kernel code?

Yeah, b2197755b263 and 42984d7c1e56, and in my netdev1.1 paper I tried to put
more extensive information, but seems the proceedings haven't been published
yet. I can send you a private copy until they are officially released I guess.

Thanks,
Daniel

Reply via email to