On 05/22/2019 01:52 AM, Matthew Cover wrote:
> > __sk_buff has a member tc_classid which I'm interested in accessing from
> > the skb bpf context.
> >
> > A bpf program which accesses skb->tc_classid compiles, but fails
> > verification; the specific failure is "invalid bpf_context access".
> >
> > if (skb->tc_classid != 0)
> > return 1;
> > return 0;
> >
> > Some of the tests in tools/testing/selftests/bpf/verifier/ (those on
> > tc_classid) further confirm that this is, in all likelihood, intentional
> > behavior.
> >
> > The very similar bpf program which instead accesses skb->mark works as
> > desired.
> >
> > if (skb->mark != 0)
> > return 1;
> > return 0;
>
> You should be able to access skb->tc_classid, perhaps you're using the wrong
> program
> type? BPF_PROG_TYPE_SCHED_CLS is supposed to work (if not we'd have a
> regression).
>
I am in fact using BPF_PROG_TYPE_SOCKET_FILTER and using the program as
PACKET_FANOUT_DATA with PACKET_FANOUT_EBPF.
I have been working on a series of utils which leverage PACKET_FANOUT to
provide various per-socket-fd (per-cpu, per-queue,
per-rx-flow-hash-indirection-table-idx) statistics and pcap files. While
playing with PACKET_FANOUT_EBPF, I realized that I could use the bpf program to
categorize packets in ways packet-filter(7) does not provide.
As a concrete example, I plan to build a util `rxtxmark` which could be passed
something like `--mark-list 42,88`. This would be translated to a bpf program
where the return code is the ordinality of the mark in the list.
if (skb->mark == 42)
return 1;
if (skb->mark == 88)
return 2;
return 0;
Packets enqueued to fd0 are simply ignored. Packets enqueued to the other fds
are processed into pcaps and statistics.
While I may build a util for tc_classid which does per-user-requested-classid
pcaps and statistics like `rxtxmark` does for marks, I'm also interested in
using tc_classid as a simple way to capture tx packets from a long running
program on the fly.
The program under inspection would simply be added to a net_cls cgroup which
has a unique classid defined. A bpf program would be attached to map packets
with that classid to fd1. While I can do this already by using iptables to
translate the tc_classid to a mark, that complicates the implementation greatly
since the firewall has to be touched (which is probably overreaching for a
packet capture util and would most likely be left to the user to configure).
> > I built a kernel (v5.1) with 4 instances of the following line removed from
> > net/core/filter.c to test the behavior when the instructions pass
> > verification.
> >
> > switch (off) {
> > - case bpf_ctx_range(struct __sk_buff, tc_classid):
> > ...
> > return false;
> >
> > It appears skb->tc_classid is always zero within my bpf program, even when
> > I verify by other means (e.g. netfilter) that the value is set non-zero.
> >
> > I gather that sk_buff proper sometimes (i.e. at some layers) has
> > qdisc_skb_cb stored in skb->cb, but not always.
> >
> > I suspect that the tc_classid is available at l3 (and therefore to utils
> > like netfilter, ip route, tc), but not at l2 (and not to AF_PACKET).
>
> From tc/BPF context you can use it; it's been long time, but I think back then
> we mapped it into cb[] so it can be used within the BPF context to pass skb
> data
> around e.g. between tail calls, and cls_bpf_classify() when in direct-action
> mode
> which likely everyone is/should-be using then maps that skb->tc_classid u16
> cb[]
> value to res->classid on program return which then in either
> sch_handle_ingress()
> or sch_handle_egress() is transferred into the skb->tc_index.
>
It sounds like just before the start of a BPF_PROG_TYPE_SCHED_CLS bpf program
tc_classid id placed in skb->cb. The missing plumbing to support my use case is
probably the same thing, but for BPF_PROG_TYPE_SOCKET_FILTER.
I'll see about familiarizing myself with both as time permits and perhaps I can
get tc_classid working for a BPF_PROG_TYPE_SOCKET_FILTER program; it certainly
sounds like it's doable.
> > Is it impractical to make skb->tc_classid available in this bpf context or
> > is there just some plumbing which hasn't been connected yet?
> >
> > Is my suspicion that skb->cb no longer contains qdisc_skb_cb due to
> > crossing a layer boundary well founded?
> >
> > I'm willing to look into hooking things together as time permits if it's a
> > feasible task.
> >
> > It's trivial to have iptables match on tc_classid and set a mark which is
> > available to bpf at l2, but I'd like to better understand this.
> >
> > Thanks,
> > Matt C.
> >