Hello.
I have been to track the poor forwarding latency to the TCP Window scale
options. The Netgem device uses rather large windows scale options
(x256) and I have been able to reproduce the routers poor forwarding
latency also with linux box running in the internal network and changing
the net.ipv4.tcp_rmem to a large value and thus changing the TCP window
scaling options to larger ones. I still do not have clue why this causes
the forwarfing in the linux kernel to block? Maybe something in the
connection tracking....?
With the ICMP timestamp messages I have been able to also pinpoint that
the latency is caused in the eth1 sending side (the following hping3
example is run in the router toward the internal network...
xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
using eth1, addr: 192.168.0.1, MTU: 1500
HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0
data bytes
len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
icmp_seq=0 rtt=7.9 ms
ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
ICMP timestamp RTT tsrtt=7
len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
icmp_seq=1 rtt=235.9 ms
ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
ICMP timestamp RTT tsrtt=235
len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
icmp_seq=2 rtt=3.8 ms
ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
ICMP timestamp RTT tsrtt=3
len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
icmp_seq=3 rtt=47.8 ms
ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
ICMP timestamp RTT tsrtt=47
len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
icmp_seq=4 rtt=7.7 ms
ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
ICMP timestamp RTT tsrtt=7
len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
icmp_seq=5 rtt=3.7 ms
ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
ICMP timestamp RTT tsrtt=3
^C
--- 192.168.0.112 hping statistic ---
6 packets tramitted, 6 packets received, 0% packet loss
round-trip min/avg/max = 3.7/51.1/235.9 ms
BR.
Risto
On 2.12.2018 23:32, Risto Pajula wrote:
Hello.
You can most likely ignore the "DF Bit, mtu bug when forwarding" case.
There isn't actually big IP packets on the wire, instead there is
burst of packets on the wire, which are combined by the GRO... And
thus dropping them should not happen. Sorry about the invalid bug report.
However the poor latency from intenal network to the internet still
remain, both GRO enabled and disabled. I will try to study further...
BR.
Risto
On 2.12.2018 14:01, Risto Pajula wrote:
Hello.
I have encountered a weird performance problem in Linux IP
fragmentation when using video streaming services behind the NAT.
Also I have studied a possible bug in the DF bit (don't fragment)
handling when forwarding the IP packets.
First the system setup description:
[host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre
router]-internet
where:
host1: is a Netgem N7800 "cable box" for online video streaming
services provided by local telco (Can access Netflix, HBO nordic,
"live TV", etc.)
linux router: Linux computer with Dualcore Intel Celeron G1840,
running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network,
mtu set to 1500, RTL8169sb/8110sb
eth0: Linux Routers internet facing interface, public ip address, mtu
set to 1500, RTL8168evl/8111evl
fibre router: Alcatel Lucent fibre router (I-241G-Q), directly
connected to the eth0 of the Linux router.
And now when using the Netgem N7800 with online video services
(Netflix, HBO nordic, etc) the Linux router will receive very BIG IP
packets in the eth0 upto ~20kB, this seems to lead to the following
problems in the Linux IP stack.
IP fragmentation performance:
When the Linux router receives these large IP packets in the eth0
everything works, but it seems that them cause very large performance
degradation from internal network to the internet regarding the
latency when the IP fragmentation is performed. The ping latency from
internal network to the internel network increases from stable
15ms-20ms up to 700-800ms AND also the ping from the internal network
to the linux router eth1 (192.168.0.). However up link works
perfectly, the ping is still stable when streaming the online
services (From linux router to the internet). It seems that the IP
fragmentation is somehow blocking the eth1 reception or transmission
for very long time (which it shouldn't). I'm able to test and debug
the issue further, but advice regarding where to look would be
appreciated.
DF Bit, mtu bug when forwarding:
I have started to study the above mentioned problem and have found a
possible bug in the DF bit and mtu handling in IP forwarding. The BIG
packets received from streaming services all have the "DF bit" set
and the question is that should we be forwarding them at all as that
would result them being fragmented? Apparently we currently are... I
have traced this down to the ip_forward.c function ip_exceeds_mtu(),
and the following patch seems to fix that.
--- net/ipv4/ip_forward.c.orig 2018-12-02 11:09:32.764320780 +0200
+++ net/ipv4/ip_forward.c 2018-12-02 12:53:25.031232347 +0200
@@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
return false;
/* original fragment exceeds mtu and DF is set */
- if (unlikely(IPCB(skb)->frag_max_size > mtu))
+ if (unlikely(skb->len > mtu))
return true;
if (skb->ignore_df)
This seems to work (in some ways) - after the change IP packets that
are too large to the internal network get dropped and we are sending
"ICMP Destination unreachable, The datagram is too big" messages to
the originator (as we should?). However it seems that not all
services really like this... Netflix behaves as expected and ping is
stable from internal network to the internet, but for example HBO
nordic will not work anymore (too little buffering? Retransimissions
not working?). So it seems the original issue should be also fixed
(And the fragmention should be allowed?).
Any advice would be appreciated. Thanks!
PS. Watching TV was not this intensive 20 years ago :)