FIN-WAIT1 just means the local application has called close() or
shutdown() to shut down the sending direction of the socket, and the
local TCP stack has sent a FIN, and is waiting to receive a FIN and an
ACK from the other side (in either order, or simultaneously). The
ASCII art state transition diagram on page 22 of RFC 793 (e.g.
https://tools.ietf.org/html/rfc793#section-3.2 ) is one source for
this, though the W. Richard Stevens books have a much more readable
diagram.
There may still be unacked and SACKed data in the retransmit queue at
this point.
Thanks for the clarification.
Thanks, that is a useful data point. Do you know what particular value
tp->sacked_out has? Would you be able to capture/log the value of
tp->packets_out, tp->lost_out, and tp->retrans_out as well?
tp->sacket_out varies per crash instance - 55, 180 etc.
However the other values are always the same - tp->packets_out is 0,
tp->lost_out is 1 and tp->retrans_out is 1.
Yes, one guess would be that somehow the skbs in the retransmit queue
have been freed, but tp->sacked_out is still non-zero and
tp->highest_sack is still a dangling pointer into one of those freed
skbs. The tcp_write_queue_purge() function is one function that fees
the skbs in the retransmit queue and leaves tp->sacked_out as non-zero
and tp->highest_sack as a dangling pointer to a freed skb, AFAICT, so
that's why I'm wondering about that function. I can't think of a
specific sequence of events that would involve tcp_write_queue_purge()
and then a socket that's still in FIN-WAIT1. Maybe I'm not being
creative enough, or maybe that guess is on the wrong track. Would you
be able to set a new bit in the tcp_sock in tcp_write_queue_purge()
and log it in your instrumentation point, to see if
tcp_write_queue_purge() was called for these connections that cause
this crash?
Sure, I can try this out.
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project