On Fri, Aug 31, 2018 at 5:09 AM Paolo Abeni <pab...@redhat.com> wrote: > > Hi, > > On Tue, 2018-04-17 at 17:07 -0400, Willem de Bruijn wrote: > > That said, for negotiated flows an inverse GRO feature could > > conceivably be implemented to reduce rx stack traversal, too. > > Though due to interleaving of packets on the wire, it aggregation > > would be best effort, similar to TCP TSO and GRO using the > > PSH bit as packetization signal. > > Reviving this old thread, before I forgot again. I have some local > patches implementing UDP GRO in a dual way to current GSO_UDP_L4 > implementation: several datagram with the same length are aggregated > into a single one, and the user space receive a single larger packet > instead of multiple ones. I hope quic can leverage such scenario, but I > really know nothing about the protocol. > > I measure roughly a 50% performance improvement with udpgso_bench in > respect to UDP GSO, and ~100% using a pktgen sender, and a reduced CPU > usage on the receiver[1]. Some additional hacking to the general GRO > bits is required to avoid useless socket lookups for ingress UDP > packets when UDP_GSO is not enabled. > > If there is interest on this topic, I can share some RFC patches > (hopefully somewhat next week).
As Eric pointed out, QUIC reception on mobile clients over the WAN may not see much gain. But apparently there is a non-trivial amount of traffic the other way, to servers. Again, WAN might limit whatever gain we get, but I do want to look into that. And there are other UDP high throughput workloads (with or without QUIC) between servers. If you have patches, please do share them. I actually also have a rough patch that I did not consider ready to share yet. Based on Tom's existing socket lookup in udp_gro_receive to detect whether a local destination exists and whether it has set an option to support receiving coalesced payloads (along with a cmsg to share the segment size). Converting udp_recvmsg to split apart gso packets to make this transparent seems to me to be too complex and not worth the effort. If a local socket is not found in udp_gro_receive, this could also be tentative interpreted as a non-local path (with false positives), enabling transparent use of GRO + GSO batching on the forwarding path.