On 15-12-02 04:15 PM, Tom Herbert wrote: > On Wed, Dec 2, 2015 at 3:35 PM, John Fastabend <john.fastab...@gmail.com> > wrote: >> [...] >> >>>> >>>> I wonder why we need protocol generic offloads? I know there are >>>> currently a lot of overlay encapsulation protocols. Are there many more >>>> coming? >>>> >>> Yes, and assume that there are more coming with an unbounded limit >>> (for instance I just noticed today that there is a netdev1.1 talk on >>> supporting GTP in the kernel). Besides, this problem space not just >>> limited to offload of encapsulation protocols, but how to generalize >>> offload of any transport, IPv[46], application protocols, protocol >>> implemented in user space, security protocols, etc. >>> >>>> Besides, this offload is about TSO and RSS and they do need to parse the >>>> packet to get the information where the inner header starts. It is not >>>> only about checksum offloading. >>>> >>> RSS does not require the device to parse the inner header. All the UDP >>> encapsulations protocols being defined set the source port to entropy >>> flow value and most devices already support RSS+UDP (just needs to be >>> enabled) so this works just fine with dumb NICs. In fact, this is one >>> of the main motivations of encapsulating UDP in the first place, to >>> leverage existing RSS and ECMP mechanisms. The more general solution >>> is to use IPv6 flow label (RFC6438). We need HW support to include the >>> flow label into the hash for ECMP and RSS, but once we have that much >>> of the motivation for using UDP goes away and we can get back to just >>> doing GRE/IP, IPIP, MPLS/IP, etc. (hence eliminate overhead and >>> complexity of UDP encap). >>> >>>> Please provide a sketch up for a protocol generic api that can tell >>>> hardware where a inner protocol header starts that supports vxlan, >>>> vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is >>>> starting at that point. >>>> >>> BPF. Implementing protocol generic offloads are not just a HW concern >>> either, adding kernel GRO code for every possible protocol that comes >>> along doesn't scale well. This becomes especially obvious when we >>> consider how to provide offloads for applications protocols. If the >>> kernel provides a programmable framework for the offloads then >>> application protocols, such as QUIC, could use use that without >>> needing to hack the kernel to support the specific protocol (which no >>> one wants!). Application protocol parsing in KCM and some other use >>> cases of BPF have already foreshadowed this, and we are working on a >>> prototype for a BPF programmable engine in the kernel. Presumably, >>> this same model could eventually be applied as the HW API to >>> programmable offload. >> >> Just keying off the last statement there... >> >> I think BPF programs are going to be hard to translate into hardware >> for most devices. The problem is the BPF programs in general lack >> structure. A parse graph would be much more friendly for hardware or >> at minimum the BPF program would need to be a some sort of >> well-structured program so a driver could turn that into a parse graph. >> > This might be relevant: > http://richard.systems/research/pdf/IEEE_HPSR_BPF_OPENFLOW.pdf >
Thanks Tom interesting read but they seem to argue for a BPF engine in hardware which I'm still not convinced is necessary and the numbers provided are for a 1Gbps link where 10Gpbs/100Gbps+ would be more valuable. I am still leaning towards a fully programmable parse graph and a set of basic actions push/pop/set/fwd/etc. This would be useful for other features not just checksum offloads. I guess it doesn't necessarily exclude also having 1s complement logic though. .John -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html