On Tue, Jul 3, 2018 at 11:10 AM Lawrence Brakmo <[email protected]> wrote: > > On 7/2/18, 5:52 PM, "[email protected] on behalf of Neal Cardwell" > <[email protected] on behalf of [email protected]> wrote: > > On Mon, Jul 2, 2018 at 5:39 PM Lawrence Brakmo <[email protected]> wrote: > > > > When have observed high tail latencies when using DCTCP for RPCs as > > compared to using Cubic. For example, in one setup there are 2 hosts > > sending to a 3rd one, with each sender having 3 flows (1 stream, > > 1 1MB back-to-back RPCs and 1 10KB back-to-back RPCs). The following > > table shows the 99% and 99.9% latencies for both Cubic and dctcp: > > > > Cubic 99% Cubic 99.9% dctcp 99% dctcp 99.9% > > 1MB RPCs 2.6ms 5.5ms 43ms 208ms > > 10KB RPCs 1.1ms 1.3ms 53ms 212ms > > > > Looking at tcpdump traces showed that there are two causes for the > > latency. > > > > 1) RTOs caused by the receiver sending a dup ACK and not ACKing > > the last (and only) packet sent. > > 2) Delaying ACKs when the sender has a cwnd of 1, so everything > > pauses for the duration of the delayed ACK. > > > > The first patch fixes the cause of the dup ACKs, not updating DCTCP > > state when an ACK that was initially delayed has been sent with a > > data packet. > > > > The second patch insures that an ACK is sent immediately when a > > CWR marked packet arrives. > > > > With the patches the latencies for DCTCP now look like: > > > > dctcp 99% dctcp 99.9% > > 1MB RPCs 5.8ms 6.9ms > > 10KB RPCs 146us 203us > > > > Note that while the 1MB RPCs tail latencies are higher than Cubic's, > > the 10KB latencies are much smaller than Cubic's. These patches fix > > issues on the receiver, but tcpdump traces indicate there is an > > opportunity to also fix an issue at the sender that adds about 3ms > > to the tail latencies. > > > > The following trace shows the issue that tiggers an RTO (fixed by these > patches): > > > > Host A sends the last packets of the request > > Host B receives them, and the last packet is marked with congestion > (CE) > > Host B sends ACKs for packets not marked with congestion > > Host B sends data packet with reply and ACK for packet marked with > > congestion (TCP flag ECE) > > Host A receives ACKs with no ECE flag > > Host A receives data packet with ACK for the last packet of request > > and which has TCP ECE bit set > > Host A sends 1st data packet of the next request with TCP flag CWR > > Host B receives the packet (as seen in tcpdump at B), no CE flag > > Host B sends a dup ACK that also has the TCP ECE flag > > Host A RTO timer fires! > > Host A to send the next packet > > Host A receives an ACK for everything it has sent (i.e. Host B > > did receive 1st packet of request) > > Host A send more packets… > > > > [PATCH net-next v2 1/2] tcp: notify when a delayed ack is sent > > [PATCH net-next v2 2/2] tcp: ack immediately when a cwr packet > > > > net/ipv4/tcp_input.c | 16 +++++++++++----- > > net/ipv4/tcp_output.c | 4 ++-- > > 2 files changed, 13 insertions(+), 7 deletions(-) > > Thanks, Larry. Just for context, can you please let us know whether > your tests included zero, one, or both of Eric's recent commits > (listed below) that tuned the number of ACKs after ECN events? (Or > maybe the tests were literally using a net-next kernel?) Just wanted > to get a better handle on any possible interactions there. > > Thanks! > > neal > > Yes, my test kernel includes both patches listed below.
OK, great. > BTW, I will send a new > patch where I move the call to tcp_incr_quickack from tcp_ecn_check_ce to > tcp_ecn_accept_cwr. OK, sounds good to me. thanks, neal
