Hi Neal,

I managed to track down the code path that the unACKed CWR packet is
taking. The tcp_rcv_established() function calls tcp_ack_snd_check()
at the end of step5 and then the return statement indicated below is
invoked, which prevents the __tcp_ack_snd_check() function from
running.

static inline void tcp_ack_snd_check(struct sock *sk)
{
        if (!inet_csk_ack_scheduled(sk)) {
                /* We sent a data segment already. */
                return;   /* <=== here */
        }
        __tcp_ack_snd_check(sk, 1);
}

So somehow tcp_ack_snd_check() thinks that a data segment was already
sent when in fact it wasn't. Do you see a way around this issue?

Thanks,
-Steve

On Tue, Dec 19, 2017 at 7:28 AM, Neal Cardwell <ncardw...@google.com> wrote:
> On Tue, Dec 19, 2017 at 12:16 AM, Steve Ibanez <siba...@stanford.edu> wrote:
>>
>> Hi Neal,
>>
>> I started looking into this receiver ACKing issue today. Strangely,
>> when I tried adding printk statements at the top of the
>> tcp_v4_do_rcv(), tcp_rcv_established(), __tcp_ack_snd_check() and
>> tcp_send_delayed_ack() functions they were never executed on the
>> machine running the iperf3 server (i.e. the destination of the flows).
>> Maybe the iperf3 server is using its own TCP stack?
>>
>> In any case, the ACKing problem is reproducible using just normal
>> iperf for which I do see my printk statements being executed. I can
>> now confirm that when the CWR marked packet (for which no ACK is sent)
>> arrives at the receiver, the __tcp_ack_snd_check() function is never
>> invoked; and hence neither is the tcp_send_delayed_ack() function.
>> Hopefully this helps narrow down where the issue might be? I started
>> adding some printk statements into the tcp_rcv_established() function,
>> but I'm not sure where the best places to look would be so I wanted to
>> ask your advice on this.
>
> Thanks for the detailed report!
>
> As a next step to narrow down why the CWR-marked packet is not acked,
> I would suggest adding printk statements at the bottom of
> tcp_rcv_established() in all the spots where we have a goto or return
> that would cause us to short-circuit and not reach the
> tcp_ack_snd_check() at the bottom of the function. This could be much
> like your existing nice debugging printks, that log any time we get to
> that spot and if (sk->sk_family == AF_INET && th->cwr). And these
> could be in the following spots (marked "here"):
>
> slow_path:
>         if (len < (th->doff << 2) || tcp_checksum_complete(skb))
>                 goto csum_error;                /* <=== here */
>
>         if (!th->ack && !th->rst && !th->syn)
>                 goto discard;                /* <=== here */
>
>         /*
>          *      Standard slow path.
>          */
>
>         if (!tcp_validate_incoming(sk, skb, th, 1))
>                 return;                /* <=== here */
>
> step5:
>         if (tcp_ack(sk, skb, FLAG_SLOWPATH | FLAG_UPDATE_TS_RECENT) < 0)
>                 goto discard;                /* <=== here */
>
>
> thanks,
> neal

Reply via email to