On Thu, Oct 5, 2017 at 9:26 PM, David Miller <da...@davemloft.net> wrote: > > From: Yuchung Cheng <ych...@google.com> > Date: Wed, 4 Oct 2017 12:59:57 -0700 > > > This patch set improves the CPU consumption of the RACK TCP loss > > recovery algorithm, in particular for high-speed networks. Currently, > > for every ACK in recovery RACK can potentially iterate over all sent > > packets in the write queue. On large BDP networks with non-trivial > > losses the RACK write queue walk CPU usage becomes unreasonably high. > > > > This patch introduces a new queue in TCP that keeps only skbs sent and > > not yet (s)acked or marked lost, in time order instead of sequence > > order. With that, RACK can examine this time-sorted list and only > > check packets that were sent recently, within the reordering window, > > per ACK. This is the fastest way without any write queue walks. The > > number of skbs examined per ACK is reduced by orders of magnitude. > > That's a pretty risky way to implement the second SKB list.... but Agreed. I really appreciate you accepting this change. We tried a few alternatives but this is the most clean and fastest approach. BBR can reach 8-10Gbps on 300ms RTT link so both RACK and SACK need to scale in the era of hundreds MB BDP :-)
> you avoided making sk_buff larger so what can I say :-) > > Series applied, thank.