On Fri, Mar 11, 2016 at 2:31 PM, Alexander Duyck <alexander.du...@gmail.com> wrote: > On Fri, Mar 11, 2016 at 1:29 PM, Edward Cree <ec...@solarflare.com> wrote: >> On 11/03/16 21:09, Alexander Duyck wrote: >>> The only real issue with the "generic" TSO is that it isn't going to >>> be so generic. We have different devices that will support different >>> levels of stuff. For example the ixgbe drivers will need to treat the >>> outer tunnel header as one giant L2 header. As a result we will need >>> to populate all the fields in the outer header including the outer IP >>> ID, checksum, udp->len, and UDP or GRE checksum if requested. For >>> i40e I think this gets a bit simpler as they already handle the outer >>> IPv4 ID and checksum. I think there we may need to only populate the >>> checksum for it to work out correctly. As such I may look at coming >>> up with a number of functions so that we can mix and match based on >>> what is needed in order to assemble a partially segmented frame. >> AIUI, the point of the design is that we _can_ populate everything, >> because we're keeping lengths and outer IP ID fixed, so outer checksums >> stay the same and the outer tunnel header _is_ just one giant L2 header >> which is bit-for-bit identical for each generated segment. So every >> devicegets to just be dumb and treat it as opaque. > > This works so long as the device isn't trying to do anything like > insert VLAN tags. Then I think we might have an issue since we don't > want to confuse the device and have it trying to insert the tag on the > inner frame's Ethernet header. > In Edward's giant L2 header mode, couldn't VLAN tags just be part of that?
> I suspect we may have differing levels of "dumb" that we have to deal > with. That is all I am saying. By default we could just populate all > of the length and checksum fields in the outer header, we would just > have to be consistent about what is provided then. In addition there > will be the matter of sorting out the IP ID bits. For example some of > the i40e parts support tunnel offloads, but not tunnel offloads with > checksums enabled. I suspect those parts will end up wanting to > handle the outer IP header and UDP length values. As a result there > trying to do a "dumb" send may result in us really messing up the IP > ID values if we don't take steps to make it a bit smarter. > >>> The other issue I am working on at the moment to enable all this is to >>> fix the differents between csum_tcpudp_magic and csum_ipv6_magic in >>> terms of handling packet lengths greater than 65535. Currently we are >>> messing up the checksum in relation to IPv6 since we are using the >>> truncated uh->len value. I'll be submitting some patches later today >>> that will hopefully get that fixed and that in turn should make the >>> rest of the segmentation work easier. >> Again, in the superpacket we want to calculate the checksum based on the >> subsegment length, rather than the length of the superpacket. The idea >> is to create the header for an MSS-sized segment, then follow it with an >> inner IP & TCP header, and n*MSS bytes of payload. (This of course >> produces a superpacket that's not what you'd send over a link with a 64k >> MTU, unlike how we do it today.) > > The question is at what point do we do the chopping. Should we be > doing this in the drivers or somewhere higher in the stack like we do > for standard GSO segmentation. I would think we would need to add > another bit that says we can do GSO with custom outer headers since I > can see VLANs being a possible issue otherwise. > >> Then hw just chops up the payload, fixes up the inner headers, and glues >> the "L2" header on each packet. > > Yea, it sounds really straight forward and easy. It isn't till you > start digging into the actual code that it gets a bit hairy. > > What this effectively is is another form of TSO where each driver will > want to do things a little differently. Alot of it has to do with the > fact that this is kind of a nasty hack that we are trying to add since > many devices won't like the fact that we are lying about the size of > our actual L2 header so things like VLAN tag insertion are going to > end up blowing back on us. > Right, the point is that we're trying to get out of the model where every driver/device implements TSO differently, supports ad hoc protocols, etc. Do you see any other common invasive technique that we need to deal with other than VLAN insertion and IP ID? > Really my preference in the case of ixgbe would have been to let the > hardware update the outer IP header and the inner TCP header and then > do the UDP and inner IP header as the static headers. That way we > could still theoretically support fragmentation on the outer headers > which last I knew is a very real possibility since the DF bit is not > set on the outer headers for VXLAN I believe. > > - Alex