On 6/28/18, 1:48 PM, "netdev-ow...@vger.kernel.org on behalf of Neal Cardwell" 
<netdev-ow...@vger.kernel.org on behalf of ncardw...@google.com> wrote:

    On Thu, Jun 28, 2018 at 4:20 PM Lawrence Brakmo <bra...@fb.com> wrote:
    >
    > I just looked at 4.18 traces and the behavior is as follows:
    >
    >    Host A sends the last packets of the request
    >
    >    Host B receives them, and the last packet is marked with congestion 
(CE)
    >
    >    Host B sends ACKs for packets not marked with congestion
    >
    >    Host B sends data packet with reply and ACK for packet marked with 
congestion (TCP flag ECE)
    >
    >    Host A receives ACKs with no ECE flag
    >
    >    Host A receives data packet with ACK for the last packet of request 
and has TCP ECE bit set
    >
    >    Host A sends 1st data packet of the next request with TCP flag CWR
    >
    >    Host B receives the packet (as seen in tcpdump at B), no CE flag
    >
    >    Host B sends a dup ACK that also has the TCP ECE flag
    >
    >    Host A RTO timer fires!
    >
    >    Host A to send the next packet
    >
    >    Host A receives an ACK for everything it has sent (i.e. Host B did 
receive 1st packet of request)
    >
    >    Host A send more packets…
    
    Thanks, Larry! This is very interesting. I don't know the cause, but
    this reminds me of an issue  Steve Ibanez raised on the netdev list
    last December, where he was seeing cases with DCTCP where a CWR packet
    would be received and buffered by Host B but not ACKed by Host B. This
    was the thread "Re: Linux ECN Handling", starting around December 5. I
    have cc-ed Steve.
    
    I wonder if this may somehow be related to the DCTCP logic to rewind
    tp->rcv_nxt and call tcp_send_ack(), and then restore tp->rcv_nxt, if
    DCTCP notices that the incoming CE bits have been changed while the
    receiver thinks it is holding on to a delayed ACK (in
    dctcp_ce_state_0_to_1() and dctcp_ce_state_1_to_0()). I wonder if the
    "synthetic" call to tcp_send_ack() somehow has side effects in the
    delayed ACK state machine that can cause the connection to forget that
    it still needs to fire a delayed ACK, even though it just sent an ACK
    just now.
    
    neal
    
Just to reiterate that there are 2 distinct issues:

1) When the last data packet of a request is received with the CE flag, it 
causes a dup ack with ECE flag to be sent after the first data packet of the 
next request is received.

2) When one of the last (but not the last) data packets of a request is 
received with the CE flag, it causes a delayed ack to be sent after the first 
data packet of the next request is received. This delayed ack does acknowledge 
the data in 1st packet of the reply.

Reply via email to