Avinash Duduskar <[email protected]> writes: > bpf_fib_lookup() returns the FIB-resolved egress ifindex straight > from the fib result. When the egress is a VLAN device, the returned > ifindex is the VLAN netdev's, which has no XDP xmit handler; XDP > programs that want to forward the frame (e.g. xdp-forward) must > instead target the underlying physical device and push the VLAN tag > themselves. Today the program has no way to learn either the > underlying ifindex or the VLAN tag without maintaining its own > VLAN-to-ifindex map in userspace and refreshing it on netlink > events. > > Add BPF_FIB_LOOKUP_VLAN. When the caller sets this flag and the fib > result is a VLAN device whose immediate parent is a real (non-VLAN) > device in the same network namespace, populate the existing output > fields params->h_vlan_proto and params->h_vlan_TCI from the VLAN > device and replace params->ifindex with the parent's ifindex. > params->h_vlan_TCI carries the VID only, with PCP and DEI bits zero; a > consumer wanting to set egress priority writes PCP itself. > params->smac is the VLAN device's own address, which can differ from > the parent's. > > Only the immediate parent is resolved, via vlan_dev_priv(dev)->real_dev > and not vlan_dev_real_dev(), which walks to the bottom of a stack. When > the immediate parent is not a real device in the same namespace, the > lookup returns BPF_FIB_LKUP_RET_VLAN_FAILURE and leaves params->ifindex > at the input. This covers a stacked VLAN (QinQ), where the immediate > parent is itself a VLAN device and one h_vlan_proto/h_vlan_TCI pair > cannot describe two tags, and a parent in another network namespace (a > VLAN device can be moved while its parent stays), whose ifindex would > be meaningless in the caller's namespace. A program that wants the VLAN > device's own ifindex re-issues the lookup without BPF_FIB_LOOKUP_VLAN, > so the unreducible case stays distinct from a physical egress. That > distinction matters for XDP: a program cannot xmit on a VLAN device, so > a success carrying the VLAN ifindex would make it redirect to a device > with no ndo_xdp_xmit and drop the frame at xdp_do_flush(). The swap and > the vlan fields are written only on the reduce path; other output > fields keep their existing behaviour, so a frag-needed result still > reports the route mtu in params->mtu_result. > > BPF_FIB_LOOKUP_VLAN is only useful to XDP, which cannot redirect to a > VLAN device. A tc program can redirect to the VLAN device directly, so > bpf_skb_fib_lookup() rejects the flag with -EINVAL; bpf_xdp_fib_lookup() > accepts it. When the flag is not set, behaviour is unchanged: > h_vlan_proto and h_vlan_TCI are zeroed and ifindex is left at the FIB > result. > > The new block is compiled only under CONFIG_VLAN_8021Q since > vlan_dev_priv() is not defined otherwise; without that config > is_vlan_dev() is constant false and the flag is accepted but never > acts. That is safe because no VLAN device can exist there, so every > egress is already physical. > > This lets an XDP redirect target the physical device and learn the > tag to push in a single lookup, which xdp-forward's optional VLAN > mode (xdp-project/xdp-tools#504) wants from the kernel side. > > The helper's input semantics are unchanged; the reverse direction > (supplying a tag as lookup input) is added in the following patch. > > Suggested-by: Toke Høiland-Jørgensen <[email protected]> > Signed-off-by: Avinash Duduskar <[email protected]>
Yes, this is way nicer - thanks! One nit below, otherwise LGTM: Reviewed-by: Toke Høiland-Jørgensen <[email protected]> [..] > + if (flags & BPF_FIB_LOOKUP_VLAN) > + return -EINVAL; > + This is fine, but we should probably reject the input flag as well in the next patch (for symmetry). -Toke

