Eric Dumazet wrote on Fri, Apr 13, 2018:
> That might be caused by some TS val/ecr breakage :
>
> Many acks were received by the server tcpdump,
> but none of them was accepted by TCP stack, for some reason.
>
> Try to disable TCP timestamps, it will give some hint if bug does not
> reproduce.
This was spot on, after disabling tcp timestamps I cannot reproduce the
hang anymore.
I've had another look at the original sequence (as seen by the server)
and I don't see much wrong; tell me what I missed:
- the replayed packet has seq 32004:33378, so the first ignored ack
would be the one with ack 33378, is that right? (meaning the server did
accept the one for 32004 and none after that)
Assuming it is, excerpt from around then (first emission of that packet then
client replies):
16:49:26.700531 IP <server>.13317 > <client>.31872: Flags [.], seq 32004:33378,
ack 4190, win 307, options [nop,nop,TS val 1313937607 ecr 1617129440], length
1374
...
16:49:26.728084 IP <client>.31872 > <server>.13317: Flags [.], ack 32004, win
759, options [nop,nop,TS val 1617129473 ecr 1313937602], length 0
...
16:49:26.729531 IP <client>.31872 > <server>.13317: Flags [.], ack 33378, win
782, options [nop,nop,TS val 1617129475 ecr 1313937607], length 0
16:49:26.730002 IP <client>.31872 > <server>.13317: Flags [.], ack 34752, win
805, options [nop,nop,TS val 1617129475 ecr 1313937607], length 0
...
16:49:26.731634 IP <client>.31872 > <server>.13317: Flags [.], ack 36126, win
827, options [nop,nop,TS val 1617129476 ecr 1313937607], length 0
- the ecr value matches the val of the packet it acks
- the val is >= that of previous packet (won't be considered
reorder/should pass paws check?)
- even if the packets are processed in parallel and some kind of race
occurs, a "bigger" ack should ack all the previous packets, right?
- Just to make sure, I checked /proc/net/netstat for PAWSEstab but that
is 0:
TcpExt: SyncookiesSent SyncookiesRecv SyncookiesFailed EmbryonicRsts
PruneCalled RcvPruned OfoPruned OutOfWindowIcmps LockDroppedIcmps ArpFilter TW
TWRecycled TWKilled PAWSActive PAWSEstab DelayedACKs DelayedACKLocked
DelayedACKLost ListenOverflows ListenDrops TCPHPHits TCPPureAcks TCPHPAcks
TCPRenoRecovery TCPSackRecovery TCPSACKReneging TCPFACKReorder TCPSACKReorder
TCPRenoReorder TCPTSReorder TCPFullUndo TCPPartialUndo TCPDSACKUndo TCPLossUndo
TCPLostRetransmit TCPRenoFailures TCPSackFailures TCPLossFailures
TCPFastRetrans TCPSlowStartRetrans TCPTimeouts TCPLossProbes
TCPLossProbeRecovery TCPRenoRecoveryFail TCPSackRecoveryFail TCPRcvCollapsed
TCPDSACKOldSent TCPDSACKOfoSent TCPDSACKRecv TCPDSACKOfoRecv TCPAbortOnData
TCPAbortOnClose TCPAbortOnMemory TCPAbortOnTimeout TCPAbortOnLinger
TCPAbortFailed TCPMemoryPressures TCPMemoryPressuresChrono TCPSACKDiscard
TCPDSACKIgnoredOld TCPDSACKIgnoredNoUndo TCPSpuriousRTOs TCPMD5NotFound
TCPMD5Unexpected TCPMD5Failure TCPSackShifted TCPSackMerged
TCPSackShiftFallback TCPBacklogDrop PFMemallocDrop TCPMinTTLDrop
TCPDeferAcceptDrop IPReversePathFilter TCPTimeWaitOverflow TCPReqQFullDoCookies
TCPReqQFullDrop TCPRetransFail TCPRcvCoalesce TCPOFOQueue TCPOFODrop
TCPOFOMerge TCPChallengeACK TCPSYNChallenge TCPFastOpenActive
TCPFastOpenActiveFail TCPFastOpenPassive TCPFastOpenPassiveFail
TCPFastOpenListenOverflow TCPFastOpenCookieReqd TCPFastOpenBlackhole
TCPSpuriousRtxHostQueues BusyPollRxPackets TCPAutoCorking TCPFromZeroWindowAdv
TCPToZeroWindowAdv TCPWantZeroWindowAdv TCPSynRetrans TCPOrigDataSent
TCPHystartTrainDetect TCPHystartTrainCwnd TCPHystartDelayDetect
TCPHystartDelayCwnd TCPACKSkippedSynRecv TCPACKSkippedPAWS TCPACKSkippedSeq
TCPACKSkippedFinWait2 TCPACKSkippedTimeWait TCPACKSkippedChallenge TCPWinProbe
TCPKeepAlive TCPMTUPFail TCPMTUPSuccess
TcpExt: 0 0 0 0 58 0 0 26 0 0 50 0 0 0 0 75402 17 201 0 0 6876848 59804 2258387
0 33 0 0 3 0 0 0 0 1 102 15 0 0 0 1306 60 386 292 10 0 0 108750 201 1 228 1 8 4
0 63 0 3 0 0 0 0 107 1 0 0 0 2834 1962 622 0 0 0 0 0 0 0 0 0 1065022 54160 0 1
3 3 0 0 0 0 0 0 0 475 0 9578 6 8 71 257 5116325 0 0 0 0 0 0 6 0 0 0 61 85 0 0
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts
OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets
OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts
IpExt: 0 0 0 0 206 0 16405602866 8921427728 0 0 57928 0 0 16121410 0 5388 0
Also, here are the per-socket stats I could find (ss -i after having
reproduced hang):
reno wscale:7,7 rto:7456 backoff:5 rtt:32.924/1.41 ato:40 mss:1374
pmtu:1500 rcvmss:1248 advmss:1448 cwnd:1 ssthresh:16
bytes_acked:32004 bytes_received:4189 segs_out:85 segs_in:54
data_segs_out:78 data_segs_in:18 send 333.9Kbps lastsnd:3912
lastrcv:11464 lastack:11387 pacing_rate 21.4Mbps delivery_rate
3.5Mbps busy:12188ms unacked:33 retrans:1/5 lost:33 rcv_rtt:37
rcv_space:29200 rcv_ssthresh:39184 notsent:28796 minrtt:24.986
Here are the same stats with tcp timestamp disabled (after running my
reproducer, e.g. outputing a big chunk of text quickly):
reno wscale:7,7 rto:228 rtt:27.267/1.423 ato:40 mss:1386 pmtu:1500
rcvmss:1248 advmss:1460 cwnd:10 ssthresh:12 bytes_acked:17311070
bytes_received:40445 segs_out:13331 segs_in:7523
data_segs_out:13279 data_segs_in:947 send 4.1Mbps lastsnd:5
lastrcv:5 lastack:5 pacing_rate 4.9Mbps delivery_rate 431.4Kbps
app_limited busy:36064ms unacked:1 retrans:0/6 rcv_rtt:9112.95
rcv_space:29233 rcv_ssthresh:39184 minrtt:25.566
So I guess lost:33 matching unacked:33 might be another hint?
I'll need a bit more time reading the code to understand what this all
implies ; feel free to beat me to it.
Thanks,
--
Dominique Martinet | Asmadeus