Re: [RFC] TCP_NOTSENT_LOWAT behavior

Eric Dumazet Thu, 16 Feb 2017 23:08:03 -0800

On Fri, 2017-02-17 at 01:20 -0500, Josh Hunt wrote:
> Eric
> 
> A team here was using the TCP_NOTSENT_LOWAT socket option and noticed that
> more unsent data than they were expecting was sitting in the write queue. I
> took a look and noticed that while we don't allow allocation of new skbs once
> we exceed this value, we still allow adding data to the skb at the tail of the
> write queue. In this context that means we could add up to size_goal to the
> skb, which could be up to 64kb.
> 
> The patch below attempts to put a cap on the amount we allow to write over
> the TCP_NOTSENT_LOWAT value at 50%. In cases where the setting is smaller this
> will allow the # of unsent bytes to more closely reflect the value. In cases
> where the setting is 128kb or higher this will have no impact compared to the
> current behavior. This should have two benefits: 1) finer-grain control of the
> amount of unsent data, 2) reduction of TCP memory for values of 
> TCP_NOTSENT_LOWAT
> < 128k.
> 
> I reran the netperf results from your original commit with and without my 
> patch:
> 
> 4.10.0-rc8:
> # echo $(( 128 * 1024 )) > /proc/sys/net/ipv4/tcp_notsent_lowat
> # (./super_netperf 200 -H remote -t TCP_STREAM -l 90 &); sleep 60; grep TCP 
> /proc/net/protocols
> TCPv6     2064      2   21735   no     208   yes  ipv6        y  y  y  y  y  
> y  y  y  y  y  y  y  y  n  y  y  y  y  y
> TCP       1912    465   21735   no     208   yes  kernel      y  y  y  y  y  
> y  y  y  y  y  y  y  y  n  y  y  y  y  y
> 
> # echo $(( 64 * 1024 )) > /proc/sys/net/ipv4/tcp_notsent_lowat
> # (./super_netperf 200 -H remote -t TCP_STREAM -l 90 &); sleep 60; grep TCP 
> /proc/net/protocols
> TCPv6     2064      2   19859   no     208   yes  ipv6        y  y  y  y  y  
> y  y  y  y  y  y  y  y  n  y  y  y  y  y
> TCP       1912    465   19859   no     208   yes  kernel      y  y  y  y  y  
> y  y  y  y  y  y  y  y  n  y  y  y  y  y
> 
> 4.10.0-rc8 + patch:
> # echo $(( 128 * 1024 )) > /proc/sys/net/ipv4/tcp_notsent_lowat
> # (./super_netperf 200 -H remote -t TCP_STREAM -l 90 &); sleep 60; grep TCP 
> /proc/net/protocols
> TCPv6     2064      2   21570   no     208   yes  ipv6        y  y  y  y  y  
> y  y  y  y  y  y  y  y  n  y  y  y  y  y
> TCP       1912    465   21570   no     208   yes  kernel      y  y  y  y  y  
> y  y  y  y  y  y  y  y  n  y  y  y  y  y
> 
> # echo $(( 64 * 1024 )) > /proc/sys/net/ipv4/tcp_notsent_lowat
> # (./super_netperf 200 -H remote -t TCP_STREAM -l 90 &); sleep 60; grep TCP 
> /proc/net/protocols
> TCPv6     2064      2   18257   no     208   yes  ipv6        y  y  y  y  y  
> y  y  y  y  y  y  y  y  n  y  y  y  y  y
> TCP       1912    465   18257   no     208   yes  kernel      y  y  y  y  y  
> y  y  y  y  y  y  y  y  n  y  y  y  y  y
> 
> I still need to do more testing, but wanted to get feedback on the idea.
> 
> Josh
>


This adds a cost to fast path. tcp_sendmsg() is insane.

We have one skb granularity (64KB) already for SO_SNDBUF, regardless of
TCP_NOTSENT_LOWAT being used or not.

It makes no sense really to try so hard to add all these checks.

I would prefer we fix the under run problem of TCP_NOTSENT_LOWAT

Namely : SACKs can come, but we do not send EPOLLOUT, and we can starve
the output or TLP

Thanks

Re: [RFC] TCP_NOTSENT_LOWAT behavior

Reply via email to