On Mon, 2015-05-25 at 13:41 -0700, Eric Dumazet wrote: > On Mon, 2015-05-25 at 15:21 -0400, John A. Sullivan III wrote: > > > > > Thanks, Eric. I really appreciate the help. This is a problem holding up > > a very high profile, major project and, for the life of me, I can't > > figure out why my TCP window size is reduced inside the GRE tunnel. > > > > Here is the netem setup although we are using this merely to reproduce > > what we are seeing in production. We see the same results bare metal to > > bare metal across the Internet. > > > > qdisc prio 10: root refcnt 17 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 > > 1 1 > > Sent 32578077286 bytes 56349187 pkt (dropped 15361, overlimits 0 requeues > > 61323) > > backlog 0b 1p requeues 61323 > > qdisc netem 101: parent 10:1 limit 1000 delay 40.0ms > > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > > backlog 0b 0p requeues 0 > > qdisc netem 102: parent 10:2 limit 1000 delay 40.0ms > > Sent 32434562015 bytes 54180984 pkt (dropped 15361, overlimits 0 requeues > > 0) > > backlog 0b 1p requeues 0 > > qdisc netem 103: parent 10:3 limit 1000 delay 40.0ms > > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > > backlog 0b 0p requeues 0 > > > > > > root@router-001:~# tc -s qdisc show dev eth2 > > qdisc prio 2: root refcnt 17 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 > > 1 > > Sent 296515482689 bytes 217794609 pkt (dropped 11719, overlimits 0 > > requeues 5307) > > backlog 0b 2p requeues 5307 > > qdisc netem 21: parent 2:1 limit 1000 delay 40.0ms > > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > > backlog 0b 0p requeues 0 > > qdisc netem 22: parent 2:2 limit 1000 delay 40.0ms > > Sent 289364020190 bytes 212892539 pkt (dropped 11719, overlimits 0 > > requeues 0) > > backlog 0b 2p requeues 0 > > qdisc netem 23: parent 2:3 limit 1000 delay 40.0ms > > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > > backlog 0b 0p requeues 0 > > > > I'm not sure how helpful these stats are as we did set this router up > > for packet loss at one point. We did suspect netem at some point and > > did things like change the limit but that had no effect. > > > 80 ms at 1Gbps -> you need to hold about 6666 packets in your netem > qdisc, not 1000. > > tc qdisc ... netem ... limit 8000 ... > > (I see you added 40ms both ways, so you need 3333 packets in forward, > and 1666 packets for the ACK packets) > > I tried a netem 80ms here and got following with default settings (no > change in send/receive windows) > > > lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.7.8.152 -Cc -t OMNI -l 20 > OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.7.8.152 () port 0 > AF_INET > tcpi_rto 281000 tcpi_ato 0 tcpi_pmtu 1476 tcpi_rcv_ssthresh 28720 > tcpi_rtt 80431 tcpi_rttvar 304 tcpi_snd_ssthresh 2147483647 tpci_snd_cwnd 2215 > tcpi_reordering 3 tcpi_total_retrans 0 > Local Remote Local Elapsed Throughput Throughput Local Local > Remote Remote Local Remote Service > Send Socket Recv Socket Send Time Units CPU CPU > CPU CPU Service Service Demand > Size Size Size (sec) Util Util > Util Util Demand Demand Units > Final Final % Method % > Method > 4194304 6291456 16384 20.17 149.54 10^6bits/s 0.40 S > 0.78 S 10.467 20.554 usec/KB > > > Now with 16MB I got : > > Hmm . . . I did: tc qdisc replace dev eth0 parent 10:1 handle 101: netem delay 40ms limit 8000 tc qdisc replace dev eth0 parent 10:2 handle 102: netem delay 40ms limit 8000 tc qdisc replace dev eth0 parent 10:3 handle 103: netem delay 40ms limit 8000 tc qdisc replace dev eth2 parent 2:1 handle 21: netem delay 40ms limit 8000 tc qdisc replace dev eth2 parent 2:2 handle 22: netem delay 40ms limit 8000 tc qdisc replace dev eth2 parent 2:3 handle 23: netem delay 40ms limit 8000
The gateway to gateway performance was still abysmal: root@gwhq-1:~# nuttcp -T 60 -i 10 192.168.126.1 19.8750 MB / 10.00 sec = 16.6722 Mbps 0 retrans 23.2500 MB / 10.00 sec = 19.5035 Mbps 0 retrans 23.3125 MB / 10.00 sec = 19.5559 Mbps 0 retrans 23.3750 MB / 10.00 sec = 19.6084 Mbps 0 retrans 23.2500 MB / 10.00 sec = 19.5035 Mbps 0 retrans 23.3125 MB / 10.00 sec = 19.5560 Mbps 0 retrans 136.4375 MB / 60.13 sec = 19.0353 Mbps 0 %TX 0 %RX 0 retrans 80.25 msRTT But the end to end was near wire speed!: rita@vserver-002:~$ nuttcp -T 60 -i 10 192.168.8.20 518.9375 MB / 10.00 sec = 435.3154 Mbps 0 retrans 979.6875 MB / 10.00 sec = 821.8186 Mbps 0 retrans 979.2500 MB / 10.00 sec = 821.4541 Mbps 0 retrans 979.7500 MB / 10.00 sec = 821.8782 Mbps 0 retrans 979.7500 MB / 10.00 sec = 821.8735 Mbps 0 retrans 979.8750 MB / 10.00 sec = 821.9784 Mbps 0 retrans 5419.8750 MB / 60.11 sec = 756.3881 Mbps 7 %TX 10 %RX 0 retrans 80.58 msRTT I'm still downloading the trace to see what the window size is but this begs the interesting question of what would reproduce this in a non-netem environment? I'm guessing the netem limit being too small would simply drop packets so we would be seeing the symptoms of upper layer retransmissions. Hmm . . . but an even more interesting question - why did this only affect GRE traffic? If the netem buffer was being overrun, shouldn't this have affected both results, tunneled and untunneled? Thanks - John -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html