On Thu, 13 Dec 2018 01:20:48 +0200 Risto Pajula <or.paj...@gmail.com> wrote:
> Hello. > > I'm not able reproduce the actual problem anymore which was the high > ping latency from the internal network. > > This starts to sound like some sort of voodoo, but... > > I tested replacing the switch to another brand where internal network > trtl8169 is connected. This did not have any effect on the behavior, the > latency remained high. > > Then I disconnected every device and all the network cables from the > internal network. The I connected everything again. After that > reproducing the problem seems very difficult. Likelihood for this to > happen dramatically decreased. > > Looking at pings 30min I once got this. > > Reply from 192.168.0.1: bytes=32 time<1ms TTL=64 > Reply from 192.168.0.1: bytes=32 time<1ms TTL=64 > Reply from 192.168.0.1: bytes=32 time<1ms TTL=64 > Reply from 192.168.0.1: bytes=32 time=18ms TTL=64 > Reply from 192.168.0.1: bytes=32 time=24ms TTL=64 > Reply from 192.168.0.1: bytes=32 time<1ms TTL=64 > Reply from 192.168.0.1: bytes=32 time<1ms TTL=64 > > I did also look at the kernel traces, it seems the rtl8169 send FIFO > still stalls, but it does not happen as often and in practice is hard to > notice for the user. > > For example I did find this kind sequence: > > t0: rtl8169_start_xmit first 1506 byte frame is written > ...9 more frames is transmitted > t0+350us: 2 frames is still not sent. (transmission should have taken > 120us + interframe time) > > BR. > Risto > > On 12.12.2018 8:23, Heiner Kallweit wrote: > > According to your description of the issue it doesn't need a very exotic > > scenario to trigger it. > > And due to the fact that Realtek network chips are used on a lot of > > consumer mainboards, I would > > assume quite some people are using such a mainboard for a use case like > > yours. > > This makes it somewhat suspicious that at least I'm not aware of more such > > reports. > > Therefore I think it's a good idea to check the other components of the > > internal network. > > > > On 12.12.2018 02:28, Risto Pajula wrote: > >> Hello. > >> > >> I added some debug prints to diagnose the bug properly. I can send the > >> patches if you are willing to debug/try... for example this output is > >> produced: > >> > >> 96096: <idle>-0 [000] ..s. 232.466703: rtl8169_start_xmit: > >> RTLDBG221 eth1 rtl8169_start_xmit len: 1506 opts1: B0000000 txpol: 0 > >> entry: 24 cur_tx: 5656 frags: 0 dirty_tx: 5648 tx_left: 8 > >> 96097: <idle>-0 [000] ..s. 232.466704: rtl8169_start_xmit: > >> RTLDBG111 eth1 rtl_tx_slots_avail > >> 96099: <idle>-0 [000] d.h. 232.468827: rtl8169_interrupt: > >> RTLDBG111 eth1 rtl8169_interrupt > >> 96103: <idle>-0 [000] ..s. 232.468833: rtl8169_poll: > >> RTLDBG111 eth1 rtl8169_poll > >> 96104: <idle>-0 [000] ..s. 232.468834: rtl8169_poll: > >> RTLDBG111 eth1 rtl_rx > >> 96105: <idle>-0 [000] ..s. 232.468835: rtl8169_poll: > >> RTLDBG111 eth1 rtl8169_try_rx_copy > >> 96184: <idle>-0 [000] ..s. 232.468914: rtl_tx: RTLDBG222 > >> eth1 rtl_tx len: 1506 opts1: B00005E2 txpol: 0 entry: 16 cur_tx: 5657 > >> frags: 0 dirty_tx: 5648 tx_left: 9 > >> 96185: <idle>-0 [000] ..s. 232.468924: rtl_tx: RTLDBG223 > >> eth1 rtl_tx entry opts1: 16, B00005E2 17, B00005E2 18, B00005E2 19, > >> B00005E2 20, B00005E2 21, B00005E2 22, B00005E2 23, B00005E2 24, B00005E2 > >> 96189: <idle>-0 [000] d.h. 232.469006: rtl8169_interrupt: > >> RTLDBG111 eth1 rtl8169_interrupt > >> 96193: <idle>-0 [000] ..s. 232.469012: rtl8169_poll: > >> RTLDBG111 eth1 rtl8169_poll > >> 96194: <idle>-0 [000] ..s. 232.469014: rtl8169_poll: > >> RTLDBG111 eth1 rtl_rx > >> 96196: <idle>-0 [000] ..s. 232.469020: rtl_tx: RTLDBG222 > >> eth1 rtl_tx len: 1506 opts1: 30000000 txpol: 0 entry: 16 cur_tx: 5657 > >> frags: 0 dirty_tx: 5648 tx_left: 9 > >> 96204: <idle>-0 [000] ..s. 232.469032: rtl_tx: RTLDBG222 > >> eth1 rtl_tx len: 1506 opts1: 30000000 txpol: 0 entry: 17 cur_tx: 5657 > >> frags: 0 dirty_tx: 5649 tx_left: 8 > >> 96212: <idle>-0 [000] ..s. 232.469041: rtl_tx: RTLDBG222 > >> eth1 rtl_tx len: 1506 opts1: B00005E2 txpol: 0 entry: 18 cur_tx: 5657 > >> frags: 0 dirty_tx: 5650 tx_left: 7 > >> 96213: <idle>-0 [000] ..s. 232.469050: rtl_tx: RTLDBG223 > >> eth1 rtl_tx entry opts1: 18, B00005E2 19, B00005E2 20, B00005E2 21, > >> B00005E2 22, B00005E2 23, B00005E2 24, B00005E2 > >> 96235: <idle>-0 [000] d.h. 232.471253: rtl8169_interrupt: > >> RTLDBG111 eth1 rtl8169_interrupt > >> > >> Here we can clearly see that TX Fifo descriptors are properly setup and > >> send is initiated, but then the rtl8169 stops sending after few frames. I > >> think the bug is in the HW side (or in the rtl8169 firmware). Some PCI > >> settings might affect this, but then again shouldn't the rtl8169 then > >> report PCI error interrupt... > >> > >> Also this TX Fifo stalling happens with any traffic and no > >> netflix/netgem/large tcp windows are required to trigger it. What the > >> netgem/netflix does is that it causes this bug to be visible by probably > >> due to timing of the traffic pattern, and thus causes also upper layers of > >> the network stack to take actions.. > >> > >> RTL8168evl indeed is a onboard Ethernet on a reasonable new mothorboard > >> and the RTL8169sb is and old PCI card. But, the same issue happens with > >> both ethernet adapters (I switched the interfaces). > >> > >> But then again - this got me thinking... If the issue (send fifo stalling) > >> does not happen in the internet facing interface even with heavy upload > >> traffic, then the problem must be in the internal network itself. Tomorrow > >> I will change the Ethernet switch. There could be some sort of > >> compatibility problem? > >> > >> BR. > >> Risto > >> > >> > >> On 11.12.2018 21:51, Heiner Kallweit wrote: > >>> OK, then another idea .. At the very beginning of the mail thread it > >>> was stated that the router has to network ports: > >>> linux router: Linux computer with Dualcore Intel Celeron G1840, running > >>> currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0 > >>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, mtu > >>> set to 1500, RTL8169sb/8110sb > >>> eth0: Linux Routers internet facing interface, public ip address, mtu set > >>> to 1500, RTL8168evl/8111evl > >>> > >>> r8169 supports about 50 members of the RTL8169/RTL8168/RTL8101 family > >>> and more or less every single member needs its own quirks. > >>> RTL8168evl (PCIe) is somewhat recent, RTL8169sb (PCI) is ancient. > >>> To rule out you triggered some hardware issue: Could you switch both > >>> interfaces and check whether you see a change in system behavior? > >>> > >>> On 11.12.2018 18:01, Risto Pajula wrote: > >>>> Hello. > >>>> > >>>> A freshly built 4.20.0-rc6-next-20181210-lp150.12.25-default waited me > >>>> when I got back from work, but unfortunately it did not help at all, it > >>>> behaved exactly in same manner. > >>>> > >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64 > >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64 > >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64 > >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64 > >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64 > >>>> Reply from 192.168.0.1: bytes=32 time=73ms TTL=64 > >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64 > >>>> Reply from 192.168.0.1: bytes=32 time=83ms TTL=64 > >>>> Reply from 192.168.0.1: bytes=32 time=307ms TTL=64 > >>>> Reply from 192.168.0.1: bytes=32 time=115ms TTL=64 > >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64 > >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64 > >>>> > >>>> BR. > >>>> Risto > >>>> > >>>> > >>>> On 11.12.2018 0:20, Risto Pajula wrote: > >>>>> Hello. > >>>>> > >>>>> I have not yet tested with linux-next but I will, thanks for pointing > >>>>> that out. > >>>>> > >>>>> ...But I have studied the problem a bit more, indeed it seems that the > >>>>> rtl8169 transmission queue gets stuck. > >>>>> > >>>>> Below is some trace log. Starting from 802026 a burst of frames is > >>>>> forwarded from eth0 to eth1 and when we get to the rtl_tx only two > >>>>> frames have been transmitted.. (Should have had 1,25 ms to transmit > >>>>> them).. Also then the sequence seems to repeat for very long time that > >>>>> only two massages get transmitted. > >>>>> > >>>>> 800541: <idle>-0 [001] ..s. 7237.698143: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 34 > >>>>> cur_tx: 4406946 frags: 0 dirty_tx: 4406946 tx_left: 0 > >>>>> 800671: <idle>-0 [001] ..s. 7237.698189: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 35 > >>>>> cur_tx: 4406947 frags: 0 dirty_tx: 4406946 tx_left: 1 > >>>>> 800835: <idle>-0 [001] ..s. 7237.698235: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 36 > >>>>> cur_tx: 4406948 frags: 0 dirty_tx: 4406946 tx_left: 2 > >>>>> 800905: <idle>-0 [000] ..s. 7237.698255: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 > >>>>> dirty_tx: 4406946 tx_left: 3 > >>>>> 800932: <idle>-0 [000] ..s. 7237.698263: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 > >>>>> dirty_tx: 4406947 tx_left: 2 > >>>>> 800960: <idle>-0 [000] ..s. 7237.698270: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 > >>>>> dirty_tx: 4406948 tx_left: 1 > >>>>> 800998: <idle>-0 [001] ..s. 7237.698282: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 37 > >>>>> cur_tx: 4406949 frags: 0 dirty_tx: 4406949 tx_left: 0 > >>>>> 801085: <idle>-0 [000] ..s. 7237.698312: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 38 cur_tx: 4406950 frags: 0 > >>>>> dirty_tx: 4406949 tx_left: 1 > >>>>> 801137: <idle>-0 [001] ..s. 7237.698329: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 38 > >>>>> cur_tx: 4406950 frags: 0 dirty_tx: 4406950 tx_left: 0 > >>>>> 801229: <idle>-0 [000] ..s. 7237.698363: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 39 cur_tx: 4406951 frags: 0 > >>>>> dirty_tx: 4406950 tx_left: 1 > >>>>> 801264: <idle>-0 [001] ..s. 7237.698375: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 39 > >>>>> cur_tx: 4406951 frags: 0 dirty_tx: 4406951 tx_left: 0 > >>>>> 801356: <idle>-0 [000] ..s. 7237.698409: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 40 cur_tx: 4406952 frags: 0 > >>>>> dirty_tx: 4406951 tx_left: 1 > >>>>> 801391: <idle>-0 [001] ..s. 7237.698421: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 40 > >>>>> cur_tx: 4406952 frags: 0 dirty_tx: 4406952 tx_left: 0 > >>>>> 801482: <idle>-0 [000] ..s. 7237.698455: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 41 cur_tx: 4406953 frags: 0 > >>>>> dirty_tx: 4406952 tx_left: 1 > >>>>> 801518: <idle>-0 [001] ..s. 7237.698468: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 41 > >>>>> cur_tx: 4406953 frags: 0 dirty_tx: 4406953 tx_left: 0 > >>>>> 801610: <idle>-0 [000] ..s. 7237.698502: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 42 cur_tx: 4406954 frags: 0 > >>>>> dirty_tx: 4406953 tx_left: 1 > >>>>> 801645: <idle>-0 [001] ..s. 7237.698514: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 42 > >>>>> cur_tx: 4406954 frags: 0 dirty_tx: 4406954 tx_left: 0 > >>>>> 801737: <idle>-0 [000] ..s. 7237.698548: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 43 cur_tx: 4406955 frags: 0 > >>>>> dirty_tx: 4406954 tx_left: 1 > >>>>> 801772: <idle>-0 [001] ..s. 7237.698560: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 43 > >>>>> cur_tx: 4406955 frags: 0 dirty_tx: 4406955 tx_left: 0 > >>>>> 801864: <idle>-0 [000] ..s. 7237.698594: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 44 cur_tx: 4406956 frags: 0 > >>>>> dirty_tx: 4406955 tx_left: 1 > >>>>> 801899: <idle>-0 [001] ..s. 7237.698607: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 44 > >>>>> cur_tx: 4406956 frags: 0 dirty_tx: 4406956 tx_left: 0 > >>>>> 801984: <idle>-0 [000] ..s. 7237.698640: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 45 cur_tx: 4406957 frags: 0 > >>>>> dirty_tx: 4406956 tx_left: 1 > >>>>> 802026: <idle>-0 [001] ..s. 7237.698655: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 45 > >>>>> cur_tx: 4406957 frags: 0 dirty_tx: 4406957 tx_left: 0 > >>>>> 802129: <idle>-0 [001] ..s. 7237.698701: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 46 > >>>>> cur_tx: 4406958 frags: 0 dirty_tx: 4406957 tx_left: 1 > >>>>> 802232: <idle>-0 [001] ..s. 7237.698747: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 47 > >>>>> cur_tx: 4406959 frags: 0 dirty_tx: 4406957 tx_left: 2 > >>>>> 802335: <idle>-0 [001] ..s. 7237.698793: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 48 > >>>>> cur_tx: 4406960 frags: 0 dirty_tx: 4406957 tx_left: 3 > >>>>> 802438: <idle>-0 [001] ..s. 7237.698840: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 49 > >>>>> cur_tx: 4406961 frags: 0 dirty_tx: 4406957 tx_left: 4 > >>>>> 802541: <idle>-0 [001] ..s. 7237.698885: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 50 > >>>>> cur_tx: 4406962 frags: 0 dirty_tx: 4406957 tx_left: 5 > >>>>> 802644: <idle>-0 [001] ..s. 7237.698932: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 51 > >>>>> cur_tx: 4406963 frags: 0 dirty_tx: 4406957 tx_left: 6 > >>>>> 802747: <idle>-0 [001] ..s. 7237.698978: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 52 > >>>>> cur_tx: 4406964 frags: 0 dirty_tx: 4406957 tx_left: 7 > >>>>> 802851: <idle>-0 [001] ..s. 7237.699025: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 53 > >>>>> cur_tx: 4406965 frags: 0 dirty_tx: 4406957 tx_left: 8 > >>>>> 805094: <idle>-0 [000] ..s. 7237.699871: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 > >>>>> dirty_tx: 4406957 tx_left: 9 > >>>>> 805126: <idle>-0 [000] ..s. 7237.699878: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 > >>>>> dirty_tx: 4406958 tx_left: 8 > >>>>> 805197: <idle>-0 [001] ..s. 7237.699905: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 54 > >>>>> cur_tx: 4406966 frags: 0 dirty_tx: 4406959 tx_left: 7 > >>>>> 805234: <idle>-0 [001] ..s. 7237.699925: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 55 > >>>>> cur_tx: 4406967 frags: 0 dirty_tx: 4406959 tx_left: 8 > >>>>> 811446: <idle>-0 [000] ..s. 7237.702344: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 > >>>>> dirty_tx: 4406959 tx_left: 9 > >>>>> 811479: <idle>-0 [000] ..s. 7237.702354: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 > >>>>> dirty_tx: 4406960 tx_left: 8 > >>>>> 811641: <idle>-0 [000] ..s. 7237.702395: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 56 > >>>>> cur_tx: 4406968 frags: 0 dirty_tx: 4406961 tx_left: 7 > >>>>> 811733: <idle>-0 [000] ..s. 7237.702416: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 57 > >>>>> cur_tx: 4406969 frags: 0 dirty_tx: 4406961 tx_left: 8 > >>>>> 812050: hping3-2871 [000] ..s. 7237.704770: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 > >>>>> dirty_tx: 4406961 tx_left: 9 > >>>>> 812061: hping3-2871 [000] ..s. 7237.704777: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 > >>>>> dirty_tx: 4406962 tx_left: 8 > >>>>> 812124: hping3-2871 [000] ..s. 7237.704811: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 58 > >>>>> cur_tx: 4406970 frags: 0 dirty_tx: 4406963 tx_left: 7 > >>>>> 812176: hping3-2871 [000] ..s. 7237.704831: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 59 > >>>>> cur_tx: 4406971 frags: 0 dirty_tx: 4406963 tx_left: 8 > >>>>> 812300: <idle>-0 [000] ..s. 7237.707247: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 > >>>>> dirty_tx: 4406963 tx_left: 9 > >>>>> 812312: <idle>-0 [000] ..s. 7237.707257: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 > >>>>> dirty_tx: 4406964 tx_left: 8 > >>>>> 812389: <idle>-0 [000] ..s. 7237.707297: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 60 > >>>>> cur_tx: 4406972 frags: 0 dirty_tx: 4406965 tx_left: 7 > >>>>> 812426: <idle>-0 [000] ..s. 7237.707318: > >>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 61 > >>>>> cur_tx: 4406973 frags: 0 dirty_tx: 4406965 tx_left: 8 > >>>>> 812827: <idle>-0 [000] ..s. 7237.709690: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 > >>>>> dirty_tx: 4406965 tx_left: 9 > >>>>> 812838: <idle>-0 [000] ..s. 7237.709700: rtl8169_poll: > >>>>> ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 > >>>>> dirty_tx: 4406966 tx_left: 8 > >>>>> ..... > >>>>> > >>>>> > >>>>> BR. > >>>>> Risto > >>>>> > >>>>> On 10.12.2018 23:26, Heiner Kallweit wrote: > >>>>>> Did you test also with the latest linux-next kernel? Some recent > >>>>>> changes like 2e6eedb4813e > >>>>>> "r8169: make use of xmit_more and __netdev_sent_queue" may have a > >>>>>> positive impact. > >>>>>> > >>>>>> On 10.12.2018 00:28, Risto Pajula wrote: > >>>>>>> Hello. > >>>>>>> > >>>>>>> Old subject: "Re: IP fragmentation performance and don't fragment bug > >>>>>>> when forwarding > >>>>>>> > >>>>>>> > >>>>>>> I have now been tracing the kernel and finding the bug seems > >>>>>>> difficult. I think the bug is combination of several things, likely > >>>>>>> cause is that it only occurs with rtl8169 and how it is using the > >>>>>>> netdevapi/DQL api. > >>>>>>> > >>>>>>> From my investigations seems that following happens: > >>>>>>> > >>>>>>> *Burst of frames is received from internet. (eth0) > >>>>>>> *These are accepted and forwarded to the (eth1) > >>>>>>> *DQL hits the limit, this causes the scheduling for the device to be > >>>>>>> stopped (__QUEUE_STATE_STACK_XOFF) > >>>>>>> *This and combination of some timing causes the performance > >>>>>>> degradation, messages are only scheduled for transmission in soft Irq > >>>>>>> context. > >>>>>>> > >>>>>>> I still do now know, why the DQL hits the limits.. I think it should > >>>>>>> not, maybe something undesirable first happens with rtl8169. > >>>>>>> > >>>>>>> I inserted following trace printk functions to the code.. > >>>>>>> > >>>>>>> rtl8169_start_xmit: > >>>>>>> trace_printk("ORP33 %s rtl8169_start_xmit len: %d entry: %u cur_tx: > >>>>>>> %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, skb->len, > >>>>>>> entry, tp->cur_tx, frags, tp->dirty_tx, (tp->cur_tx - tp->dirty_tx) ); > >>>>>>> > >>>>>>> > >>>>>>> rtl_tx: > >>>>>>> trace_printk("ORP33 %s rtl_tx len: %d entry: %u cur_tx: %u frags: %d > >>>>>>> dirty_tx: %u tx_left: %u\n" , tp->dev->name, tx_skb->skb->len, > >>>>>>> tp->cur_tx % NUM_TX_DESC, tp->cur_tx, 0, dirty_tx, tx_left ); > >>>>>>> > >>>>>>> > >>>>>>> Here is some grepped output only filtering the ping packet in two > >>>>>>> different situations: > >>>>>>> > >>>>>>> //trace when downloading ubuntu (~20MB/s), only 3 secs fit to buffer. > >>>>>>> ping is all the time normal.... > >>>>>>> //orig tracing.out3754 > >>>>>>> 117493: <idle>-0 [000] ..s. 5235.407116: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 57 cur_tx: 3747641 frags: 0 > >>>>>>> dirty_tx: 3747640 tx_left: 1 > >>>>>>> 118704: <idle>-0 [000] ..s. 5235.416057: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 20 cur_tx: 3747668 frags: 0 > >>>>>>> dirty_tx: 3747667 tx_left: 1 > >>>>>>> 346319: hping3-2871 [000] .... 5236.002456: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 8 > >>>>>>> cur_tx: 3754312 frags: 0 dirty_tx: 3754312 tx_left: 0 > >>>>>>> 346337: hping3-2871 [000] ..s. 5236.002475: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 9 cur_tx: 3754313 frags: 0 dirty_tx: > >>>>>>> 3754312 tx_left: 1 > >>>>>>> 1277155: hping3-2871 [001] .... 5237.002627: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 38 > >>>>>>> cur_tx: 3766502 frags: 0 dirty_tx: 3766502 tx_left: 0 > >>>>>>> 1277173: <idle>-0 [000] ..s. 5237.002646: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 39 cur_tx: 3766503 frags: 0 > >>>>>>> dirty_tx: 3766502 tx_left: 1 > >>>>>>> > >>>>>>> > >>>>>>> //imcp messages from route to intenal network when TCP stream is on. > >>>>>>> tracing.out3236/eth1_ring.txt is the original > >>>>>>> //no idea why the first frame is tripled... > >>>>>>> 200792: hping3-2871 [001] .... 3239.593384: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 34 > >>>>>>> cur_tx: 2132770 frags: 0 dirty_tx: 2132770 tx_left: 0 > >>>>>>> 201578: hping3-2871 [001] .... 3240.593625: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 40 > >>>>>>> cur_tx: 2132776 frags: 0 dirty_tx: 2132776 tx_left: 0 > >>>>>>> 341061: <idle>-0 [000] ..s. 3241.738011: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 52 > >>>>>>> cur_tx: 2133684 frags: 0 dirty_tx: 2133680 tx_left: 4 > >>>>>>> 342112: <idle>-0 [000] ..s. 3241.745297: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 59 cur_tx: 2133691 frags: 0 > >>>>>>> dirty_tx: 2133684 tx_left: 7 > >>>>>>> 500126: <idle>-0 [000] ..s. 3242.739500: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 49 > >>>>>>> cur_tx: 2134513 frags: 0 dirty_tx: 2134505 tx_left: 8 > >>>>>>> 501739: <idle>-0 [000] ..s. 3242.751654: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 58 cur_tx: 2134522 frags: 0 > >>>>>>> dirty_tx: 2134513 tx_left: 9 > >>>>>>> 646220: <idle>-0 [000] ..s. 3243.745761: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 41 > >>>>>>> cur_tx: 2135273 frags: 0 dirty_tx: 2135267 tx_left: 6 > >>>>>>> 647633: <idle>-0 [000] ..s. 3243.755486: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 49 cur_tx: 2135281 frags: 0 > >>>>>>> dirty_tx: 2135273 tx_left: 8 > >>>>>>> 802878: <idle>-0 [000] ..s. 3244.739947: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 > >>>>>>> cur_tx: 2136087 frags: 0 dirty_tx: 2136081 tx_left: 6 > >>>>>>> 804298: <idle>-0 [000] ..s. 3244.749677: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 31 cur_tx: 2136095 frags: 0 > >>>>>>> dirty_tx: 2136087 tx_left: 8 > >>>>>>> 961190: <idle>-0 [000] ..s. 3245.746217: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 16 > >>>>>>> cur_tx: 2136912 frags: 0 dirty_tx: 2136906 tx_left: 6 > >>>>>>> 962610: <idle>-0 [000] ..s. 3245.755946: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2136920 frags: 0 > >>>>>>> dirty_tx: 2136912 tx_left: 8 > >>>>>>> 1118044: <idle>-0 [000] ..s. 3246.740336: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 62 > >>>>>>> cur_tx: 2137726 frags: 0 dirty_tx: 2137720 tx_left: 6 > >>>>>>> 1119443: <idle>-0 [000] ..s. 3246.750047: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 5 cur_tx: 2137733 frags: 0 dirty_tx: > >>>>>>> 2137726 tx_left: 7 > >>>>>>> 1264047: <idle>-0 [000] ..s. 3247.709202: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 29 > >>>>>>> cur_tx: 2138525 frags: 0 dirty_tx: 2138518 tx_left: 7 > >>>>>>> 1264740: <idle>-0 [000] ..s. 3247.718918: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 33 cur_tx: 2138529 frags: 0 > >>>>>>> dirty_tx: 2138525 tx_left: 4 > >>>>>>> 1419958: <idle>-0 [000] ..s. 3248.736436: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 48 > >>>>>>> cur_tx: 2139312 frags: 0 dirty_tx: 2139305 tx_left: 7 > >>>>>>> 1421551: <idle>-0 [000] ..s. 3248.746189: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 56 cur_tx: 2139320 frags: 0 > >>>>>>> dirty_tx: 2139312 tx_left: 8 > >>>>>>> 1578746: <idle>-0 [000] ..s. 3249.742702: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 46 > >>>>>>> cur_tx: 2140142 frags: 0 dirty_tx: 2140135 tx_left: 7 > >>>>>>> 1580194: <idle>-0 [000] ..s. 3249.752457: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 54 cur_tx: 2140150 frags: 0 > >>>>>>> dirty_tx: 2140142 tx_left: 8 > >>>>>>> 1729597: <idle>-0 [000] ..s. 3250.746839: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 36 > >>>>>>> cur_tx: 2140964 frags: 0 dirty_tx: 2140957 tx_left: 7 > >>>>>>> 1731015: <idle>-0 [000] ..s. 3250.756594: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 43 cur_tx: 2140971 frags: 0 > >>>>>>> dirty_tx: 2140964 tx_left: 7 > >>>>>>> 1881028: <idle>-0 [000] ..s. 3251.740991: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 19 > >>>>>>> cur_tx: 2141779 frags: 0 dirty_tx: 2141772 tx_left: 7 > >>>>>>> 1881381: <idle>-0 [000] ..s. 3251.750718: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 26 cur_tx: 2141786 frags: 0 > >>>>>>> dirty_tx: 2141779 tx_left: 7 > >>>>>>> //something happens, the output queue empties.. now it is fast, and > >>>>>>> rtl8169_start_xmit is not called from soft_irq > >>>>>>> 1896178: hping3-2871 [001] .... 3252.595994: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 > >>>>>>> cur_tx: 2141911 frags: 0 dirty_tx: 2141911 tx_left: 0 > >>>>>>> 1896196: <idle>-0 [000] ..s. 3252.596055: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2141912 frags: 0 > >>>>>>> dirty_tx: 2141911 tx_left: 1 > >>>>>>> 1923136: hping3-2871 [001] .... 3253.596227: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 50 > >>>>>>> cur_tx: 2142066 frags: 0 dirty_tx: 2142066 tx_left: 0 > >>>>>>> 1923154: <idle>-0 [000] ..s. 3253.596289: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 51 cur_tx: 2142067 frags: 0 > >>>>>>> dirty_tx: 2142066 tx_left: 1 > >>>>>>> 1951633: hping3-2871 [001] .... 3254.596459: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 5 > >>>>>>> cur_tx: 2142213 frags: 0 dirty_tx: 2142213 tx_left: 0 > >>>>>>> 1951651: <idle>-0 [000] ..s. 3254.596521: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 6 cur_tx: 2142214 frags: 0 dirty_tx: > >>>>>>> 2142213 tx_left: 1 > >>>>>>> 1954472: hping3-2871 [001] .... 3255.596700: > >>>>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 15 > >>>>>>> cur_tx: 2142223 frags: 0 dirty_tx: 2142223 tx_left: 0 > >>>>>>> 1954490: <idle>-0 [000] ..s. 3255.596763: rtl8169_poll: > >>>>>>> ORP33 eth1 rtl_tx len: 54 entry: 16 cur_tx: 2142224 frags: 0 > >>>>>>> dirty_tx: 2142223 tx_left: 1 > >>>>>>> //until it soon, again is slow.. > >>>>>>> > >>>>>>> > >>>>>>> BR. > >>>>>>> Risto > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 7.12.2018 16:46, Risto Pajula wrote: > >>>>>>>> Hello. > >>>>>>>> > >>>>>>>> I have been to track the poor forwarding latency to the TCP Window > >>>>>>>> scale options. The Netgem device uses rather large windows scale > >>>>>>>> options (x256) and I have been able to reproduce the routers poor > >>>>>>>> forwarding latency also with linux box running in the internal > >>>>>>>> network and changing the net.ipv4.tcp_rmem to a large value and thus > >>>>>>>> changing the TCP window scaling options to larger ones. I still do > >>>>>>>> not have clue why this causes the forwarfing in the linux kernel to > >>>>>>>> block? Maybe something in the connection tracking....? > >>>>>>>> > >>>>>>>> > >>>>>>>> With the ICMP timestamp messages I have been able to also pinpoint > >>>>>>>> that the latency is caused in the eth1 sending side (the following > >>>>>>>> hping3 example is run in the router toward the internal network... > >>>>>>>> > >>>>>>>> > >>>>>>>> xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts > >>>>>>>> -V > >>>>>>>> using eth1, addr: 192.168.0.1, MTU: 1500 > >>>>>>>> HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers > >>>>>>>> + 0 data bytes > >>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40 > >>>>>>>> icmp_seq=0 rtt=7.9 ms > >>>>>>>> ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895 > >>>>>>>> ICMP timestamp RTT tsrtt=7 > >>>>>>>> > >>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40 > >>>>>>>> icmp_seq=1 rtt=235.9 ms > >>>>>>>> ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128 > >>>>>>>> ICMP timestamp RTT tsrtt=235 > >>>>>>>> > >>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40 > >>>>>>>> icmp_seq=2 rtt=3.8 ms > >>>>>>>> ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895 > >>>>>>>> ICMP timestamp RTT tsrtt=3 > >>>>>>>> > >>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40 > >>>>>>>> icmp_seq=3 rtt=47.8 ms > >>>>>>>> ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940 > >>>>>>>> ICMP timestamp RTT tsrtt=47 > >>>>>>>> > >>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40 > >>>>>>>> icmp_seq=4 rtt=7.7 ms > >>>>>>>> ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895 > >>>>>>>> ICMP timestamp RTT tsrtt=7 > >>>>>>>> > >>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40 > >>>>>>>> icmp_seq=5 rtt=3.7 ms > >>>>>>>> ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895 > >>>>>>>> ICMP timestamp RTT tsrtt=3 > >>>>>>>> > >>>>>>>> ^C > >>>>>>>> --- 192.168.0.112 hping statistic --- > >>>>>>>> 6 packets tramitted, 6 packets received, 0% packet loss > >>>>>>>> round-trip min/avg/max = 3.7/51.1/235.9 ms > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> BR. > >>>>>>>> Risto > >>>>>>>> > >>>>>>>> On 2.12.2018 23:32, Risto Pajula wrote: > >>>>>>>>> Hello. > >>>>>>>>> > >>>>>>>>> You can most likely ignore the "DF Bit, mtu bug when forwarding" > >>>>>>>>> case. There isn't actually big IP packets on the wire, instead > >>>>>>>>> there is burst of packets on the wire, which are combined by the > >>>>>>>>> GRO... And thus dropping them should not happen. Sorry about the > >>>>>>>>> invalid bug report. > >>>>>>>>> > >>>>>>>>> However the poor latency from intenal network to the internet still > >>>>>>>>> remain, both GRO enabled and disabled. I will try to study > >>>>>>>>> further... > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> BR. > >>>>>>>>> Risto > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 2.12.2018 14:01, Risto Pajula wrote: > >>>>>>>>>> Hello. > >>>>>>>>>> > >>>>>>>>>> I have encountered a weird performance problem in Linux IP > >>>>>>>>>> fragmentation when using video streaming services behind the NAT. > >>>>>>>>>> Also I have studied a possible bug in the DF bit (don't fragment) > >>>>>>>>>> handling when forwarding the IP packets. > >>>>>>>>>> > >>>>>>>>>> First the system setup description: > >>>>>>>>>> > >>>>>>>>>> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre > >>>>>>>>>> router]-internet > >>>>>>>>>> > >>>>>>>>>> where: > >>>>>>>>>> host1: is a Netgem N7800 "cable box" for online video streaming > >>>>>>>>>> services provided by local telco (Can access Netflix, HBO nordic, > >>>>>>>>>> "live TV", etc.) > >>>>>>>>>> linux router: Linux computer with Dualcore Intel Celeron G1840, > >>>>>>>>>> running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0 > >>>>>>>>>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 > >>>>>>>>>> network, mtu set to 1500, RTL8169sb/8110sb > >>>>>>>>>> eth0: Linux Routers internet facing interface, public ip address, > >>>>>>>>>> mtu set to 1500, RTL8168evl/8111evl > >>>>>>>>>> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly > >>>>>>>>>> connected to the eth0 of the Linux router. > >>>>>>>>>> > >>>>>>>>>> And now when using the Netgem N7800 with online video services > >>>>>>>>>> (Netflix, HBO nordic, etc) the Linux router will receive very BIG > >>>>>>>>>> IP packets in the eth0 upto ~20kB, this seems to lead to the > >>>>>>>>>> following problems in the Linux IP stack. > >>>>>>>>>> > >>>>>>>>>> IP fragmentation performance: > >>>>>>>>>> When the Linux router receives these large IP packets in the eth0 > >>>>>>>>>> everything works, but it seems that them cause very large > >>>>>>>>>> performance degradation from internal network to the internet > >>>>>>>>>> regarding the latency when the IP fragmentation is performed. The > >>>>>>>>>> ping latency from internal network to the internel network > >>>>>>>>>> increases from stable 15ms-20ms up to 700-800ms AND also the ping > >>>>>>>>>> from the internal network to the linux router eth1 (192.168.0.). > >>>>>>>>>> However up link works perfectly, the ping is still stable when > >>>>>>>>>> streaming the online services (From linux router to the internet). > >>>>>>>>>> It seems that the IP fragmentation is somehow blocking the eth1 > >>>>>>>>>> reception or transmission for very long time (which it shouldn't). > >>>>>>>>>> I'm able to test and debug the issue further, but advice regarding > >>>>>>>>>> where to look would be appreciated. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> DF Bit, mtu bug when forwarding: > >>>>>>>>>> I have started to study the above mentioned problem and have found > >>>>>>>>>> a possible bug in the DF bit and mtu handling in IP forwarding. > >>>>>>>>>> The BIG packets received from streaming services all have the "DF > >>>>>>>>>> bit" set and the question is that should we be forwarding them at > >>>>>>>>>> all as that would result them being fragmented? Apparently we > >>>>>>>>>> currently are... I have traced this down to the ip_forward.c > >>>>>>>>>> function ip_exceeds_mtu(), and the following patch seems to fix > >>>>>>>>>> that. > >>>>>>>>>> > >>>>>>>>>> --- net/ipv4/ip_forward.c.orig 2018-12-02 11:09:32.764320780 +0200 > >>>>>>>>>> +++ net/ipv4/ip_forward.c 2018-12-02 12:53:25.031232347 +0200 > >>>>>>>>>> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct > >>>>>>>>>> return false; > >>>>>>>>>> > >>>>>>>>>> /* original fragment exceeds mtu and DF is set */ > >>>>>>>>>> - if (unlikely(IPCB(skb)->frag_max_size > mtu)) > >>>>>>>>>> + if (unlikely(skb->len > mtu)) > >>>>>>>>>> return true; > >>>>>>>>>> > >>>>>>>>>> if (skb->ignore_df) > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> This seems to work (in some ways) - after the change IP packets > >>>>>>>>>> that are too large to the internal network get dropped and we are > >>>>>>>>>> sending "ICMP Destination unreachable, The datagram is too big" > >>>>>>>>>> messages to the originator (as we should?). However it seems that > >>>>>>>>>> not all services really like this... Netflix behaves as expected > >>>>>>>>>> and ping is stable from internal network to the internet, but for > >>>>>>>>>> example HBO nordic will not work anymore (too little buffering? > >>>>>>>>>> Retransimissions not working?). So it seems the original issue > >>>>>>>>>> should be also fixed (And the fragmention should be allowed?). > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Any advice would be appreciated. Thanks! > >>>>>>>>>> > >>>>>>>>>> PS. Watching TV was not this intensive 20 years ago :) > >>>>>>>>>> Did you disable ethernet flow control? Ethernet flow control is usually a bad idea, it can cause head of line blocking. Unfortunately, most devices default to on.