Eric Dumazet wrote on Fri, Apr 13, 2018: > That might be caused by some TS val/ecr breakage : > > Many acks were received by the server tcpdump, > but none of them was accepted by TCP stack, for some reason. > > Try to disable TCP timestamps, it will give some hint if bug does not > reproduce.
This was spot on, after disabling tcp timestamps I cannot reproduce the hang anymore. I've had another look at the original sequence (as seen by the server) and I don't see much wrong; tell me what I missed: - the replayed packet has seq 32004:33378, so the first ignored ack would be the one with ack 33378, is that right? (meaning the server did accept the one for 32004 and none after that) Assuming it is, excerpt from around then (first emission of that packet then client replies): 16:49:26.700531 IP <server>.13317 > <client>.31872: Flags [.], seq 32004:33378, ack 4190, win 307, options [nop,nop,TS val 1313937607 ecr 1617129440], length 1374 ... 16:49:26.728084 IP <client>.31872 > <server>.13317: Flags [.], ack 32004, win 759, options [nop,nop,TS val 1617129473 ecr 1313937602], length 0 ... 16:49:26.729531 IP <client>.31872 > <server>.13317: Flags [.], ack 33378, win 782, options [nop,nop,TS val 1617129475 ecr 1313937607], length 0 16:49:26.730002 IP <client>.31872 > <server>.13317: Flags [.], ack 34752, win 805, options [nop,nop,TS val 1617129475 ecr 1313937607], length 0 ... 16:49:26.731634 IP <client>.31872 > <server>.13317: Flags [.], ack 36126, win 827, options [nop,nop,TS val 1617129476 ecr 1313937607], length 0 - the ecr value matches the val of the packet it acks - the val is >= that of previous packet (won't be considered reorder/should pass paws check?) - even if the packets are processed in parallel and some kind of race occurs, a "bigger" ack should ack all the previous packets, right? - Just to make sure, I checked /proc/net/netstat for PAWSEstab but that is 0: TcpExt: SyncookiesSent SyncookiesRecv SyncookiesFailed EmbryonicRsts PruneCalled RcvPruned OfoPruned OutOfWindowIcmps LockDroppedIcmps ArpFilter TW TWRecycled TWKilled PAWSActive PAWSEstab DelayedACKs DelayedACKLocked DelayedACKLost ListenOverflows ListenDrops TCPHPHits TCPPureAcks TCPHPAcks TCPRenoRecovery TCPSackRecovery TCPSACKReneging TCPFACKReorder TCPSACKReorder TCPRenoReorder TCPTSReorder TCPFullUndo TCPPartialUndo TCPDSACKUndo TCPLossUndo TCPLostRetransmit TCPRenoFailures TCPSackFailures TCPLossFailures TCPFastRetrans TCPSlowStartRetrans TCPTimeouts TCPLossProbes TCPLossProbeRecovery TCPRenoRecoveryFail TCPSackRecoveryFail TCPRcvCollapsed TCPDSACKOldSent TCPDSACKOfoSent TCPDSACKRecv TCPDSACKOfoRecv TCPAbortOnData TCPAbortOnClose TCPAbortOnMemory TCPAbortOnTimeout TCPAbortOnLinger TCPAbortFailed TCPMemoryPressures TCPMemoryPressuresChrono TCPSACKDiscard TCPDSACKIgnoredOld TCPDSACKIgnoredNoUndo TCPSpuriousRTOs TCPMD5NotFound TCPMD5Unexpected TCPMD5Failure TCPSackShifted TCPSackMerged TCPSackShiftFallback TCPBacklogDrop PFMemallocDrop TCPMinTTLDrop TCPDeferAcceptDrop IPReversePathFilter TCPTimeWaitOverflow TCPReqQFullDoCookies TCPReqQFullDrop TCPRetransFail TCPRcvCoalesce TCPOFOQueue TCPOFODrop TCPOFOMerge TCPChallengeACK TCPSYNChallenge TCPFastOpenActive TCPFastOpenActiveFail TCPFastOpenPassive TCPFastOpenPassiveFail TCPFastOpenListenOverflow TCPFastOpenCookieReqd TCPFastOpenBlackhole TCPSpuriousRtxHostQueues BusyPollRxPackets TCPAutoCorking TCPFromZeroWindowAdv TCPToZeroWindowAdv TCPWantZeroWindowAdv TCPSynRetrans TCPOrigDataSent TCPHystartTrainDetect TCPHystartTrainCwnd TCPHystartDelayDetect TCPHystartDelayCwnd TCPACKSkippedSynRecv TCPACKSkippedPAWS TCPACKSkippedSeq TCPACKSkippedFinWait2 TCPACKSkippedTimeWait TCPACKSkippedChallenge TCPWinProbe TCPKeepAlive TCPMTUPFail TCPMTUPSuccess TcpExt: 0 0 0 0 58 0 0 26 0 0 50 0 0 0 0 75402 17 201 0 0 6876848 59804 2258387 0 33 0 0 3 0 0 0 0 1 102 15 0 0 0 1306 60 386 292 10 0 0 108750 201 1 228 1 8 4 0 63 0 3 0 0 0 0 107 1 0 0 0 2834 1962 622 0 0 0 0 0 0 0 0 0 1065022 54160 0 1 3 3 0 0 0 0 0 0 0 475 0 9578 6 8 71 257 5116325 0 0 0 0 0 0 6 0 0 0 61 85 0 0 IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts IpExt: 0 0 0 0 206 0 16405602866 8921427728 0 0 57928 0 0 16121410 0 5388 0 Also, here are the per-socket stats I could find (ss -i after having reproduced hang): reno wscale:7,7 rto:7456 backoff:5 rtt:32.924/1.41 ato:40 mss:1374 pmtu:1500 rcvmss:1248 advmss:1448 cwnd:1 ssthresh:16 bytes_acked:32004 bytes_received:4189 segs_out:85 segs_in:54 data_segs_out:78 data_segs_in:18 send 333.9Kbps lastsnd:3912 lastrcv:11464 lastack:11387 pacing_rate 21.4Mbps delivery_rate 3.5Mbps busy:12188ms unacked:33 retrans:1/5 lost:33 rcv_rtt:37 rcv_space:29200 rcv_ssthresh:39184 notsent:28796 minrtt:24.986 Here are the same stats with tcp timestamp disabled (after running my reproducer, e.g. outputing a big chunk of text quickly): reno wscale:7,7 rto:228 rtt:27.267/1.423 ato:40 mss:1386 pmtu:1500 rcvmss:1248 advmss:1460 cwnd:10 ssthresh:12 bytes_acked:17311070 bytes_received:40445 segs_out:13331 segs_in:7523 data_segs_out:13279 data_segs_in:947 send 4.1Mbps lastsnd:5 lastrcv:5 lastack:5 pacing_rate 4.9Mbps delivery_rate 431.4Kbps app_limited busy:36064ms unacked:1 retrans:0/6 rcv_rtt:9112.95 rcv_space:29233 rcv_ssthresh:39184 minrtt:25.566 So I guess lost:33 matching unacked:33 might be another hint? I'll need a bit more time reading the code to understand what this all implies ; feel free to beat me to it. Thanks, -- Dominique Martinet | Asmadeus