On Tue, Jan 05, 2021 at 10:16:04AM +0100, Jan Klemkow wrote:
> On Wed, Dec 23, 2020 at 11:59:13AM +0000, Stuart Henderson wrote:
> > On 2020/12/17 20:50, Jan Klemkow wrote:
> > > ping
> > > 
> > > On Fri, Nov 06, 2020 at 01:10:52AM +0100, Jan Klemkow wrote:
> > > > bluhm and I make some network performance measurements and kernel
> > > > profiling.
> > 
> > I've been running this on my workstation since you sent it out - lots
> > of long-running ssh connections, hourly reposync, daily rsync of base
> > snapshots.
> > 
> > I don't know enough about TCP stack behaviour to really give a meaningful
> > OK, but certainly not seeing any problems with it.
> 
> Thanks, Stuart.  Has someone else tested this diff?  Or, are there some
> opinions or objections about it?  Even bike-shedding is welcome :-)

>From my memory TCP uses the ACKs on startup to increase the send window
and so your diff could slow down the initial startup. Not sure if that
matters actually. It can have some impact if userland reads in big blocks
at infrequent intervals since then the ACK clock slows down.

I guess to get converage it would be best to commit this and then monitor
the lists for possible slowdowns.
 
> Thanks,
> Jan
> 
> > > > Setup:  Linux (iperf) -10gbit-> OpenBSD (relayd) -10gbit-> Linux (iperf)
> > > > 
> > > > We figured out, that the kernel uses a huge amount of processing time
> > > > for sending ACKs to the sender on the receiving interface.  After
> > > > receiving a data segment, we send our two ACK.  The first one in
> > > > tcp_input() direct after receiving.  The second ACK is send out, after
> > > > the userland or the sosplice task read some data out of the socket
> > > > buffer.
> > > > 
> > > > The fist ACK in tcp_input() is called after receiving every other data
> > > > segment like it is discribed in RFC1122:
> > > > 
> > > >         4.2.3.2  When to Send an ACK Segment
> > > >                 A TCP SHOULD implement a delayed ACK, but an ACK should
> > > >                 not be excessively delayed; in particular, the delay
> > > >                 MUST be less than 0.5 seconds, and in a stream of
> > > >                 full-sized segments there SHOULD be an ACK for at least
> > > >                 every second segment.
> > > > 
> > > > This advice is based on the paper "Congestion Avoidance and Control":
> > > > 
> > > >         4 THE GATEWAY SIDE OF CONGESTION CONTROL
> > > >                 The 8 KBps senders were talking to 4.3+BSD receivers
> > > >                 which would delay an ack for atmost one packet (because
> > > >                 of an ack’s clock’ role, the authors believe that the
> > > >                 minimum ack frequency should be every other packet).
> > > > 
> > > > Sending the first ACK (on every other packet) coasts us too much
> > > > processing time.  Thus, we run into a full socket buffer earlier.  The
> > > > first ACK just acknowledges the received data, but does not update the
> > > > window.  The second ACK, caused by the socket buffer reader, also
> > > > acknowledges the data and also updates the window.  So, the second ACK,
> > > > is much more worth for a fast packet processing than the fist one.
> > > > 
> > > > The performance improvement is between 33% with splicing and 20% without
> > > > splice:
> > > > 
> > > >                         splicing        relaying
> > > > 
> > > >         current         3.1 GBit/s      2.6 GBit/s
> > > >         w/o first ack   4.1 GBit/s      3.1 GBit/s
> > > > 
> > > > As far as I understand the implementation of other operating systems:
> > > > Linux has implement a custom TCP_QUICKACK socket option, to turn this
> > > > kind of feature on and off.  FreeBSD and NetBSD sill depend on it, when
> > > > using the New Reno implementation.
> > > > 
> > > > The following diff turns off the direct ACK on every other segment.  We
> > > > are running this diff in production on our own machines at genua and on
> > > > our products for several month, now.  We don't noticed any problems,
> > > > even with interactive network sessions (ssh) nor with bulk traffic.
> > > > 
> > > > Another solution could be a sysctl(3) or an additional socket option,
> > > > similar to Linux, to control this behavior per socket or system wide.
> > > > Also, a counter to ACK every 3rd, 4th... data segment could beat the
> > > > problem.
> > > > 
> > > > bye,
> > > > Jan
> > > > 
> > > > Index: netinet/tcp_input.c
> > > > ===================================================================
> > > > RCS file: /cvs/src/sys/netinet/tcp_input.c,v
> > > > retrieving revision 1.365
> > > > diff -u -p -r1.365 tcp_input.c
> > > > --- netinet/tcp_input.c 19 Jun 2020 22:47:22 -0000      1.365
> > > > +++ netinet/tcp_input.c 5 Nov 2020 23:00:34 -0000
> > > > @@ -165,8 +165,8 @@ do { \
> > > >  #endif
> > > >  
> > > >  /*
> > > > - * Macro to compute ACK transmission behavior.  Delay the ACK unless
> > > > - * we have already delayed an ACK (must send an ACK every two 
> > > > segments).
> > > > + * Macro to compute ACK transmission behavior.  Delay the ACK until
> > > > + * a read from the socket buffer or the delayed ACK timer causes one.
> > > >   * We also ACK immediately if we received a PUSH and the ACK-on-PUSH
> > > >   * option is enabled or when the packet is coming from a loopback
> > > >   * interface.
> > > > @@ -176,8 +176,7 @@ do { \
> > > >         struct ifnet *ifp = NULL; \
> > > >         if (m && (m->m_flags & M_PKTHDR)) \
> > > >                 ifp = if_get(m->m_pkthdr.ph_ifidx); \
> > > > -       if (TCP_TIMER_ISARMED(tp, TCPT_DELACK) || \
> > > > -           (tcp_ack_on_push && (tiflags) & TH_PUSH) || \
> > > > +       if ((tcp_ack_on_push && (tiflags) & TH_PUSH) || \
> > > >             (ifp && (ifp->if_flags & IFF_LOOPBACK))) \
> > > >                 tp->t_flags |= TF_ACKNOW; \
> > > >         else \
> > > > 
> > > 
> > 
> 

-- 
:wq Claudio

Reply via email to