I'm forwarding this sort of stuff 'cause I keep hoping to find more optimizations for cake, and it really seems like cheap multicores have grown very common.
On Thu, Dec 7, 2017 at 10:53 AM, Dave Taht <[email protected]> wrote: > ---------- Forwarded message ---------- > From: John Fastabend <[email protected]> > Date: Thu, Dec 7, 2017 at 9:53 AM > Subject: [net-next PATCH 00/14] lockless qdisc series > To: [email protected], [email protected], > [email protected], [email protected] > Cc: [email protected], [email protected], [email protected] > > > This series adds support for building lockless qdiscs. This is > the result of noticing the qdisc lock is a common hot-spot in > perf analysis of the Linux network stack, especially when testing > with high packet per second rates. However, nothing is free and > most qdiscs rely on the qdisc lock for their data structures so > each qdisc must be converted on a case by case basis. In this > series, to kick things off, we make pfifo_fast, mq, and mqprio > lockless. Follow up series can address additional qdiscs as needed. > For example sch_tbf might be useful. To allow this the lockless > design is an opt-in flag. In some future utopia we convert all > qdiscs and we get to drop this case analysis, but in order to > make progress we live in the real-world. > > There are also a handful of optimizations I have behind this > series and a few code cleanups that I couldn't figure out how > to fit neatly into this series with out increasing the patch > count. Once this is in additional patches can address this. The > most notable is in skb_dequeue we can push the consumer lock > out a bit and consume multiple skbs off the skb_array in pfifo > fast per iteration. Ideally we could push arrays of packets at > drivers as well but we would need the infrastructure for this. > The other notable improvement is to do less locking in the > overrun cases where bad tx queue list and gso_skb are being > hit. Although, nice in theory in practice this is the error > case and I haven't found a benchmark where this matters yet. > > For testing... > > My first test case uses multiple containers (via cilium) where > multiple client containers use 'wrk' to benchmark connections with > a server container running lighttpd. Where lighttpd is configured > to use multiple threads, one per core. Additionally this test has > a proxy agent running so all traffic takes an extra hop through a > proxy container. In these cases each TCP packet traverses the egress > qdisc layer at least four times and the ingress qdisc layer an > additional four times. This makes for a good stress test IMO, perf > details below. > > The other micro-benchmark I run is injecting packets directly into > qdisc layer using pktgen. This uses the benchmark script, > > ./pktgen_bench_xmit_mode_queue_xmit.sh > > Benchmarks taken in two cases, "base" running latest net-next no > changes to qdisc layer and "qdisc" tests run with qdisc lockless > updates. Numbers reported in req/sec. All virtual 'veth' devices > run with pfifo_fast in the qdisc test case. > > `wrk -t16 -c $conns -d30 "http://[$SERVER_IP4]:80"` > > conns 16 32 64 1024 > ----------------------------------------------- > base: 18831 20201 21393 29151 > qdisc: 19309 21063 23899 29265 > > notice in all cases we see performance improvement when running > with qdisc case. > > Microbenchmarks using pktgen are as follows, > > `pktgen_bench_xmit_mode_queue_xmit.sh -t 1 -i eth2 -c 20000000 > > base(mq): 2.1Mpps > base(pfifo_fast): 2.1Mpps > qdisc(mq): 2.6Mpps > qdisc(pfifo_fast): 2.6Mpps > > notice numbers are the same for mq and pfifo_fast because only > testing a single thread here. In both tests we see a nice bump > in performance gain. The key with 'mq' is it is already per > txq ring so contention is minimal in the above cases. Qdiscs > such as tbf or htb which have more contention will likely show > larger gains when/if lockless versions are implemented. > > Thanks to everyone who helped with this work especially Daniel > Borkmann, Eric Dumazet and Willem de Bruijn for discussing the > design and reviewing versions of the code. > > Changes from the RFC: dropped a couple patches off the end, > fixed a bug with skb_queue_walk_safe not unlinking skb in all > cases, fixed a lockdep splat with pfifo_fast_destroy not calling > *_bh lock variant, addressed _most_ of Willem's comments, there > was a bug in the bulk locking (final patch) of the RFC series. > > @Willem, I left out lockdep annotation for a follow on series > to add lockdep more completely, rather than just in code I > touched. > > Comments and feedback welcome. > > Thanks, > John > > --- > > John Fastabend (14): > net: sched: cleanup qdisc_run and __qdisc_run semantics > net: sched: allow qdiscs to handle locking > net: sched: remove remaining uses for qdisc_qlen in xmit path > net: sched: provide per cpu qstat helpers > net: sched: a dflt qdisc may be used with per cpu stats > net: sched: explicit locking in gso_cpu fallback > net: sched: drop qdisc_reset from dev_graft_qdisc > net: sched: use skb list for skb_bad_tx > net: sched: check for frozen queue before skb_bad_txq check > net: sched: helpers to sum qlen and qlen for per cpu logic > net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq > net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio > net: skb_array: expose peek API > net: sched: pfifo_fast use skb_array > > > include/linux/skb_array.h | 5 + > include/net/gen_stats.h | 3 > include/net/pkt_sched.h | 10 + > include/net/sch_generic.h | 79 +++++++- > net/core/dev.c | 31 +++ > net/core/gen_stats.c | 9 + > net/sched/sch_api.c | 8 + > net/sched/sch_generic.c | 440 > ++++++++++++++++++++++++++++++++------------- > net/sched/sch_mq.c | 34 +++ > net/sched/sch_mqprio.c | 69 +++++-- > 10 files changed, 512 insertions(+), 176 deletions(-) > > -- > Signature > > > -- > > Dave Täht > CEO, TekLibre, LLC > http://www.teklibre.com > Tel: 1-669-226-2619 -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619 _______________________________________________ Cake mailing list [email protected] https://lists.bufferbloat.net/listinfo/cake
