On Sat, Mar 23, 2019 at 02:12:39AM -0700, Eric Dumazet wrote: > > > On 03/23/2019 01:05 AM, brakmo wrote: > > This patchset adds support for propagating congestion notifications (cn) > > to TCP from cgroup inet skb egress BPF programs. > > > > Current cgroup skb BPF programs cannot trigger TCP congestion window > > reductions, even when they drop a packet. This patch-set adds support > > for cgroup skb BPF programs to send congestion notifications in the > > return value when the packets are TCP packets. Rather than the > > current 1 for keeping the packet and 0 for dropping it, they can > > now return: > > NET_XMIT_SUCCESS (0) - continue with packet output > > NET_XMIT_DROP (1) - drop packet and do cn > > NET_XMIT_CN (2) - continue with packet output and do cn > > -EPERM - drop packet > > > > I believe I already mentioned this model is broken, if you have any virtual > device before the cgroup BPF program. > > Please think about offloading the pacing/throttling in the NIC, > there is no way we will report back to tcp stack instant notifications.
I don't think 'offload to google proprietary nic' is a suggestion that folks can practically follow. Very few NICs can offload pacing to hw and there are plenty of limitations. This patch set represents a pure sw solution that works and scales to millions of flows. > This patch series is going way too far for my taste. I would really appreciate if you can do a technical review of the patches. Our previous approach didn't quite work due to complexity around locked/non-locked socket. This is a cleaner approach. Either we go with this one or will add a bpf hook into __tcp_transmit_skb. This approach is better since it works for other protocols and can be used by qdiscs w/o any bpf. > This idea is not new, you were at Google when it was experimented by Nandita > and > others, and we know it is not worth the pain. google networking needs are different from the rest of the world. Thank you.