OK, then another idea .. At the very beginning of the mail thread it
was stated that the router has to network ports:
linux router: Linux computer with Dualcore Intel Celeron G1840, running 
currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, mtu set 
to 1500, RTL8169sb/8110sb
eth0: Linux Routers internet facing interface, public ip address, mtu set to 
1500,  RTL8168evl/8111evl

r8169 supports about 50 members of the RTL8169/RTL8168/RTL8101 family
and more or less every single member needs its own quirks.
RTL8168evl (PCIe) is somewhat recent, RTL8169sb (PCI) is ancient.
To rule out you triggered some hardware issue: Could you switch both
interfaces and check whether you see a change in system behavior?

On 11.12.2018 18:01, Risto Pajula wrote:
> Hello.
> 
> A freshly built 4.20.0-rc6-next-20181210-lp150.12.25-default waited me when I 
> got back from work, but unfortunately it did not help at all, it behaved 
> exactly in same manner.
> 
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time=73ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time=83ms TTL=64
> Reply from 192.168.0.1: bytes=32 time=307ms TTL=64
> Reply from 192.168.0.1: bytes=32 time=115ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> 
> BR.
> Risto
> 
> 
> On 11.12.2018 0:20, Risto Pajula wrote:
>> Hello.
>>
>> I have not yet tested with linux-next but I will, thanks for pointing that 
>> out.
>>
>> ...But I have studied the problem a bit more, indeed it seems that the 
>> rtl8169 transmission queue gets stuck.
>>
>> Below is some trace log. Starting from 802026 a burst of frames is forwarded 
>> from eth0 to eth1 and when we get to the rtl_tx only two frames have been 
>> transmitted.. (Should have had 1,25 ms to transmit them).. Also then the 
>> sequence seems to repeat for very long time that only two massages get 
>> transmitted.
>>
>> 800541:          <idle>-0     [001] ..s.  7237.698143: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 34 cur_tx: 4406946 frags: 0 
>> dirty_tx: 4406946 tx_left: 0
>> 800671:          <idle>-0     [001] ..s.  7237.698189: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 35 cur_tx: 4406947 frags: 0 
>> dirty_tx: 4406946 tx_left: 1
>> 800835:          <idle>-0     [001] ..s.  7237.698235: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 36 cur_tx: 4406948 frags: 0 
>> dirty_tx: 4406946 tx_left: 2
>> 800905:          <idle>-0     [000] ..s.  7237.698255: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406946 
>> tx_left: 3
>> 800932:          <idle>-0     [000] ..s.  7237.698263: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406947 
>> tx_left: 2
>> 800960:          <idle>-0     [000] ..s.  7237.698270: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406948 
>> tx_left: 1
>> 800998:          <idle>-0     [001] ..s.  7237.698282: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 37 cur_tx: 4406949 frags: 0 
>> dirty_tx: 4406949 tx_left: 0
>> 801085:          <idle>-0     [000] ..s.  7237.698312: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 38 cur_tx: 4406950 frags: 0 dirty_tx: 4406949 
>> tx_left: 1
>> 801137:          <idle>-0     [001] ..s.  7237.698329: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 38 cur_tx: 4406950 frags: 0 
>> dirty_tx: 4406950 tx_left: 0
>> 801229:          <idle>-0     [000] ..s.  7237.698363: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 39 cur_tx: 4406951 frags: 0 dirty_tx: 4406950 
>> tx_left: 1
>> 801264:          <idle>-0     [001] ..s.  7237.698375: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 39 cur_tx: 4406951 frags: 0 
>> dirty_tx: 4406951 tx_left: 0
>> 801356:          <idle>-0     [000] ..s.  7237.698409: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 40 cur_tx: 4406952 frags: 0 dirty_tx: 4406951 
>> tx_left: 1
>> 801391:          <idle>-0     [001] ..s.  7237.698421: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 40 cur_tx: 4406952 frags: 0 
>> dirty_tx: 4406952 tx_left: 0
>> 801482:          <idle>-0     [000] ..s.  7237.698455: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 41 cur_tx: 4406953 frags: 0 dirty_tx: 4406952 
>> tx_left: 1
>> 801518:          <idle>-0     [001] ..s.  7237.698468: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 41 cur_tx: 4406953 frags: 0 
>> dirty_tx: 4406953 tx_left: 0
>> 801610:          <idle>-0     [000] ..s.  7237.698502: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 42 cur_tx: 4406954 frags: 0 dirty_tx: 4406953 
>> tx_left: 1
>> 801645:          <idle>-0     [001] ..s.  7237.698514: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 42 cur_tx: 4406954 frags: 0 
>> dirty_tx: 4406954 tx_left: 0
>> 801737:          <idle>-0     [000] ..s.  7237.698548: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 43 cur_tx: 4406955 frags: 0 dirty_tx: 4406954 
>> tx_left: 1
>> 801772:          <idle>-0     [001] ..s.  7237.698560: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 43 cur_tx: 4406955 frags: 0 
>> dirty_tx: 4406955 tx_left: 0
>> 801864:          <idle>-0     [000] ..s.  7237.698594: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 44 cur_tx: 4406956 frags: 0 dirty_tx: 4406955 
>> tx_left: 1
>> 801899:          <idle>-0     [001] ..s.  7237.698607: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 44 cur_tx: 4406956 frags: 0 
>> dirty_tx: 4406956 tx_left: 0
>> 801984:          <idle>-0     [000] ..s.  7237.698640: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 45 cur_tx: 4406957 frags: 0 dirty_tx: 4406956 
>> tx_left: 1
>> 802026:          <idle>-0     [001] ..s.  7237.698655: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 45 cur_tx: 4406957 frags: 0 
>> dirty_tx: 4406957 tx_left: 0
>> 802129:          <idle>-0     [001] ..s.  7237.698701: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 46 cur_tx: 4406958 frags: 0 
>> dirty_tx: 4406957 tx_left: 1
>> 802232:          <idle>-0     [001] ..s.  7237.698747: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 47 cur_tx: 4406959 frags: 0 
>> dirty_tx: 4406957 tx_left: 2
>> 802335:          <idle>-0     [001] ..s.  7237.698793: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 48 cur_tx: 4406960 frags: 0 
>> dirty_tx: 4406957 tx_left: 3
>> 802438:          <idle>-0     [001] ..s.  7237.698840: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 49 cur_tx: 4406961 frags: 0 
>> dirty_tx: 4406957 tx_left: 4
>> 802541:          <idle>-0     [001] ..s.  7237.698885: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 50 cur_tx: 4406962 frags: 0 
>> dirty_tx: 4406957 tx_left: 5
>> 802644:          <idle>-0     [001] ..s.  7237.698932: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 51 cur_tx: 4406963 frags: 0 
>> dirty_tx: 4406957 tx_left: 6
>> 802747:          <idle>-0     [001] ..s.  7237.698978: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 52 cur_tx: 4406964 frags: 0 
>> dirty_tx: 4406957 tx_left: 7
>> 802851:          <idle>-0     [001] ..s.  7237.699025: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 53 cur_tx: 4406965 frags: 0 
>> dirty_tx: 4406957 tx_left: 8
>> 805094:          <idle>-0     [000] ..s.  7237.699871: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406957 
>> tx_left: 9
>> 805126:          <idle>-0     [000] ..s.  7237.699878: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406958 
>> tx_left: 8
>> 805197:          <idle>-0     [001] ..s.  7237.699905: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 54 cur_tx: 4406966 frags: 0 
>> dirty_tx: 4406959 tx_left: 7
>> 805234:          <idle>-0     [001] ..s.  7237.699925: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 55 cur_tx: 4406967 frags: 0 
>> dirty_tx: 4406959 tx_left: 8
>> 811446:          <idle>-0     [000] ..s.  7237.702344: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406959 
>> tx_left: 9
>> 811479:          <idle>-0     [000] ..s.  7237.702354: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406960 
>> tx_left: 8
>> 811641:          <idle>-0     [000] ..s.  7237.702395: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 56 cur_tx: 4406968 frags: 0 
>> dirty_tx: 4406961 tx_left: 7
>> 811733:          <idle>-0     [000] ..s.  7237.702416: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 57 cur_tx: 4406969 frags: 0 
>> dirty_tx: 4406961 tx_left: 8
>> 812050:          hping3-2871  [000] ..s.  7237.704770: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406961 
>> tx_left: 9
>> 812061:          hping3-2871  [000] ..s.  7237.704777: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406962 
>> tx_left: 8
>> 812124:          hping3-2871  [000] ..s.  7237.704811: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 58 cur_tx: 4406970 frags: 0 
>> dirty_tx: 4406963 tx_left: 7
>> 812176:          hping3-2871  [000] ..s.  7237.704831: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 59 cur_tx: 4406971 frags: 0 
>> dirty_tx: 4406963 tx_left: 8
>> 812300:          <idle>-0     [000] ..s.  7237.707247: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406963 
>> tx_left: 9
>> 812312:          <idle>-0     [000] ..s.  7237.707257: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406964 
>> tx_left: 8
>> 812389:          <idle>-0     [000] ..s.  7237.707297: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 60 cur_tx: 4406972 frags: 0 
>> dirty_tx: 4406965 tx_left: 7
>> 812426:          <idle>-0     [000] ..s.  7237.707318: rtl8169_start_xmit: 
>> ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 61 cur_tx: 4406973 frags: 0 
>> dirty_tx: 4406965 tx_left: 8
>> 812827:          <idle>-0     [000] ..s.  7237.709690: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 4406965 
>> tx_left: 9
>> 812838:          <idle>-0     [000] ..s.  7237.709700: rtl8169_poll: ORP33 
>> eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 4406966 
>> tx_left: 8
>> .....
>>
>>
>> BR.
>> Risto
>>
>> On 10.12.2018 23:26, Heiner Kallweit wrote:
>>> Did you test also with the latest linux-next kernel? Some recent changes 
>>> like 2e6eedb4813e
>>> "r8169: make use of xmit_more and __netdev_sent_queue" may have a positive 
>>> impact.
>>>
>>> On 10.12.2018 00:28, Risto Pajula wrote:
>>>> Hello.
>>>>
>>>> Old subject: "Re: IP fragmentation performance and don't fragment bug when 
>>>> forwarding
>>>>
>>>>
>>>> I have now been tracing the kernel and finding the bug seems difficult. I 
>>>> think the bug is combination of several things, likely cause is that it 
>>>> only occurs with rtl8169 and how it is using the netdevapi/DQL api.
>>>>
>>>>  From my investigations seems that following happens:
>>>>
>>>> *Burst of frames is received from internet. (eth0)
>>>> *These are accepted and forwarded to the (eth1)
>>>> *DQL hits the limit, this causes the scheduling for the device to be 
>>>> stopped (__QUEUE_STATE_STACK_XOFF)
>>>> *This and combination of some timing causes the  performance degradation, 
>>>> messages are only scheduled for transmission in soft Irq context.
>>>>
>>>> I still do now know, why the DQL hits the limits.. I think it should not,  
>>>> maybe something undesirable first happens with rtl8169.
>>>>
>>>> I inserted following trace printk functions to the code..
>>>>
>>>> rtl8169_start_xmit:
>>>> trace_printk("ORP33 %s rtl8169_start_xmit len: %d entry: %u cur_tx: %u 
>>>> frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, skb->len, entry, 
>>>> tp->cur_tx, frags, tp->dirty_tx, (tp->cur_tx - tp->dirty_tx) );
>>>>
>>>>
>>>> rtl_tx:
>>>> trace_printk("ORP33 %s rtl_tx len: %d entry: %u cur_tx: %u frags: %d 
>>>> dirty_tx: %u tx_left: %u\n" , tp->dev->name, tx_skb->skb->len, tp->cur_tx 
>>>> % NUM_TX_DESC, tp->cur_tx, 0, dirty_tx, tx_left );
>>>>
>>>>
>>>> Here is some grepped output only filtering the ping packet in two 
>>>> different situations:
>>>>
>>>> //trace when downloading ubuntu (~20MB/s), only 3 secs fit to buffer. ping 
>>>> is all the time normal....
>>>> //orig tracing.out3754
>>>> 117493:          <idle>-0     [000] ..s.  5235.407116: rtl8169_poll: ORP33 
>>>> eth1 rtl_tx len: 54 entry: 57 cur_tx: 3747641 frags: 0 dirty_tx: 3747640 
>>>> tx_left: 1
>>>> 118704:          <idle>-0     [000] ..s.  5235.416057: rtl8169_poll: ORP33 
>>>> eth1 rtl_tx len: 54 entry: 20 cur_tx: 3747668 frags: 0 dirty_tx: 3747667 
>>>> tx_left: 1
>>>> 346319:          hping3-2871  [000] ....  5236.002456: rtl8169_start_xmit: 
>>>> ORP33 eth1 rtl8169_start_xmit len: 54 entry: 8 cur_tx: 3754312 frags: 0 
>>>> dirty_tx: 3754312 tx_left: 0
>>>> 346337:          hping3-2871  [000] ..s.  5236.002475: rtl8169_poll: ORP33 
>>>> eth1 rtl_tx len: 54 entry: 9 cur_tx: 3754313 frags: 0 dirty_tx: 3754312 
>>>> tx_left: 1
>>>> 1277155:          hping3-2871  [001] ....  5237.002627: 
>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 38 
>>>> cur_tx: 3766502 frags: 0 dirty_tx: 3766502 tx_left: 0
>>>> 1277173:          <idle>-0     [000] ..s.  5237.002646: rtl8169_poll: 
>>>> ORP33 eth1 rtl_tx len: 54 entry: 39 cur_tx: 3766503 frags: 0 dirty_tx: 
>>>> 3766502 tx_left: 1
>>>>
>>>>
>>>> //imcp messages from route to intenal network when TCP stream is on. 
>>>> tracing.out3236/eth1_ring.txt is the original
>>>> //no idea why the first frame is tripled...
>>>> 200792:          hping3-2871  [001] ....  3239.593384: rtl8169_start_xmit: 
>>>> ORP33 eth1 rtl8169_start_xmit len: 54 entry: 34 cur_tx: 2132770 frags: 0 
>>>> dirty_tx: 2132770 tx_left: 0
>>>> 201578:          hping3-2871  [001] ....  3240.593625: rtl8169_start_xmit: 
>>>> ORP33 eth1 rtl8169_start_xmit len: 54 entry: 40 cur_tx: 2132776 frags: 0 
>>>> dirty_tx: 2132776 tx_left: 0
>>>> 341061:          <idle>-0     [000] ..s.  3241.738011: rtl8169_start_xmit: 
>>>> ORP33 eth1 rtl8169_start_xmit len: 54 entry: 52 cur_tx: 2133684 frags: 0 
>>>> dirty_tx: 2133680 tx_left: 4
>>>> 342112:          <idle>-0     [000] ..s.  3241.745297: rtl8169_poll: ORP33 
>>>> eth1 rtl_tx len: 54 entry: 59 cur_tx: 2133691 frags: 0 dirty_tx: 2133684 
>>>> tx_left: 7
>>>> 500126:          <idle>-0     [000] ..s.  3242.739500: rtl8169_start_xmit: 
>>>> ORP33 eth1 rtl8169_start_xmit len: 54 entry: 49 cur_tx: 2134513 frags: 0 
>>>> dirty_tx: 2134505 tx_left: 8
>>>> 501739:          <idle>-0     [000] ..s.  3242.751654: rtl8169_poll: ORP33 
>>>> eth1 rtl_tx len: 54 entry: 58 cur_tx: 2134522 frags: 0 dirty_tx: 2134513 
>>>> tx_left: 9
>>>> 646220:          <idle>-0     [000] ..s.  3243.745761: rtl8169_start_xmit: 
>>>> ORP33 eth1 rtl8169_start_xmit len: 54 entry: 41 cur_tx: 2135273 frags: 0 
>>>> dirty_tx: 2135267 tx_left: 6
>>>> 647633:          <idle>-0     [000] ..s.  3243.755486: rtl8169_poll: ORP33 
>>>> eth1 rtl_tx len: 54 entry: 49 cur_tx: 2135281 frags: 0 dirty_tx: 2135273 
>>>> tx_left: 8
>>>> 802878:          <idle>-0     [000] ..s.  3244.739947: rtl8169_start_xmit: 
>>>> ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2136087 frags: 0 
>>>> dirty_tx: 2136081 tx_left: 6
>>>> 804298:          <idle>-0     [000] ..s.  3244.749677: rtl8169_poll: ORP33 
>>>> eth1 rtl_tx len: 54 entry: 31 cur_tx: 2136095 frags: 0 dirty_tx: 2136087 
>>>> tx_left: 8
>>>> 961190:          <idle>-0     [000] ..s.  3245.746217: rtl8169_start_xmit: 
>>>> ORP33 eth1 rtl8169_start_xmit len: 54 entry: 16 cur_tx: 2136912 frags: 0 
>>>> dirty_tx: 2136906 tx_left: 6
>>>> 962610:          <idle>-0     [000] ..s.  3245.755946: rtl8169_poll: ORP33 
>>>> eth1 rtl_tx len: 54 entry: 24 cur_tx: 2136920 frags: 0 dirty_tx: 2136912 
>>>> tx_left: 8
>>>> 1118044:          <idle>-0     [000] ..s.  3246.740336: 
>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 62 
>>>> cur_tx: 2137726 frags: 0 dirty_tx: 2137720 tx_left: 6
>>>> 1119443:          <idle>-0     [000] ..s.  3246.750047: rtl8169_poll: 
>>>> ORP33 eth1 rtl_tx len: 54 entry: 5 cur_tx: 2137733 frags: 0 dirty_tx: 
>>>> 2137726 tx_left: 7
>>>> 1264047:          <idle>-0     [000] ..s.  3247.709202: 
>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 29 
>>>> cur_tx: 2138525 frags: 0 dirty_tx: 2138518 tx_left: 7
>>>> 1264740:          <idle>-0     [000] ..s.  3247.718918: rtl8169_poll: 
>>>> ORP33 eth1 rtl_tx len: 54 entry: 33 cur_tx: 2138529 frags: 0 dirty_tx: 
>>>> 2138525 tx_left: 4
>>>> 1419958:          <idle>-0     [000] ..s.  3248.736436: 
>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 48 
>>>> cur_tx: 2139312 frags: 0 dirty_tx: 2139305 tx_left: 7
>>>> 1421551:          <idle>-0     [000] ..s.  3248.746189: rtl8169_poll: 
>>>> ORP33 eth1 rtl_tx len: 54 entry: 56 cur_tx: 2139320 frags: 0 dirty_tx: 
>>>> 2139312 tx_left: 8
>>>> 1578746:          <idle>-0     [000] ..s.  3249.742702: 
>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 46 
>>>> cur_tx: 2140142 frags: 0 dirty_tx: 2140135 tx_left: 7
>>>> 1580194:          <idle>-0     [000] ..s.  3249.752457: rtl8169_poll: 
>>>> ORP33 eth1 rtl_tx len: 54 entry: 54 cur_tx: 2140150 frags: 0 dirty_tx: 
>>>> 2140142 tx_left: 8
>>>> 1729597:          <idle>-0     [000] ..s.  3250.746839: 
>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 36 
>>>> cur_tx: 2140964 frags: 0 dirty_tx: 2140957 tx_left: 7
>>>> 1731015:          <idle>-0     [000] ..s.  3250.756594: rtl8169_poll: 
>>>> ORP33 eth1 rtl_tx len: 54 entry: 43 cur_tx: 2140971 frags: 0 dirty_tx: 
>>>> 2140964 tx_left: 7
>>>> 1881028:          <idle>-0     [000] ..s.  3251.740991: 
>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 19 
>>>> cur_tx: 2141779 frags: 0 dirty_tx: 2141772 tx_left: 7
>>>> 1881381:          <idle>-0     [000] ..s.  3251.750718: rtl8169_poll: 
>>>> ORP33 eth1 rtl_tx len: 54 entry: 26 cur_tx: 2141786 frags: 0 dirty_tx: 
>>>> 2141779 tx_left: 7
>>>> //something happens, the output queue empties.. now it is fast, and 
>>>> rtl8169_start_xmit is not called from soft_irq
>>>> 1896178:          hping3-2871  [001] ....  3252.595994: 
>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 
>>>> cur_tx: 2141911 frags: 0 dirty_tx: 2141911 tx_left: 0
>>>> 1896196:          <idle>-0     [000] ..s.  3252.596055: rtl8169_poll: 
>>>> ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2141912 frags: 0 dirty_tx: 
>>>> 2141911 tx_left: 1
>>>> 1923136:          hping3-2871  [001] ....  3253.596227: 
>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 50 
>>>> cur_tx: 2142066 frags: 0 dirty_tx: 2142066 tx_left: 0
>>>> 1923154:          <idle>-0     [000] ..s.  3253.596289: rtl8169_poll: 
>>>> ORP33 eth1 rtl_tx len: 54 entry: 51 cur_tx: 2142067 frags: 0 dirty_tx: 
>>>> 2142066 tx_left: 1
>>>> 1951633:          hping3-2871  [001] ....  3254.596459: 
>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 5 cur_tx: 
>>>> 2142213 frags: 0 dirty_tx: 2142213 tx_left: 0
>>>> 1951651:          <idle>-0     [000] ..s.  3254.596521: rtl8169_poll: 
>>>> ORP33 eth1 rtl_tx len: 54 entry: 6 cur_tx: 2142214 frags: 0 dirty_tx: 
>>>> 2142213 tx_left: 1
>>>> 1954472:          hping3-2871  [001] ....  3255.596700: 
>>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 15 
>>>> cur_tx: 2142223 frags: 0 dirty_tx: 2142223 tx_left: 0
>>>> 1954490:          <idle>-0     [000] ..s.  3255.596763: rtl8169_poll: 
>>>> ORP33 eth1 rtl_tx len: 54 entry: 16 cur_tx: 2142224 frags: 0 dirty_tx: 
>>>> 2142223 tx_left: 1
>>>> //until it soon, again is slow..
>>>>
>>>>
>>>> BR.
>>>> Risto
>>>>
>>>>
>>>>
>>>> On 7.12.2018 16:46, Risto Pajula wrote:
>>>>> Hello.
>>>>>
>>>>> I have been to track the poor forwarding latency to the TCP Window scale 
>>>>> options. The Netgem device uses rather large windows scale options (x256) 
>>>>> and I have been able to reproduce the routers poor forwarding latency 
>>>>> also with linux box running in the internal network and changing the 
>>>>> net.ipv4.tcp_rmem to a large value and thus changing the TCP window 
>>>>> scaling options to larger ones. I still do not have clue why this causes 
>>>>> the forwarfing in the linux kernel to block? Maybe something in the 
>>>>> connection tracking....?
>>>>>
>>>>>
>>>>> With the ICMP timestamp messages I have been able to also pinpoint that 
>>>>> the latency is caused in the eth1 sending side (the following hping3 
>>>>> example is run in the router toward the internal network...
>>>>>
>>>>>
>>>>> xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
>>>>> using eth1, addr: 192.168.0.1, MTU: 1500
>>>>> HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0 
>>>>> data bytes
>>>>> len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
>>>>> icmp_seq=0 rtt=7.9 ms
>>>>> ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
>>>>> ICMP timestamp RTT tsrtt=7
>>>>>
>>>>> len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
>>>>> icmp_seq=1 rtt=235.9 ms
>>>>> ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
>>>>> ICMP timestamp RTT tsrtt=235
>>>>>
>>>>> len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
>>>>> icmp_seq=2 rtt=3.8 ms
>>>>> ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
>>>>> ICMP timestamp RTT tsrtt=3
>>>>>
>>>>> len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
>>>>> icmp_seq=3 rtt=47.8 ms
>>>>> ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
>>>>> ICMP timestamp RTT tsrtt=47
>>>>>
>>>>> len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
>>>>> icmp_seq=4 rtt=7.7 ms
>>>>> ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
>>>>> ICMP timestamp RTT tsrtt=7
>>>>>
>>>>> len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
>>>>> icmp_seq=5 rtt=3.7 ms
>>>>> ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
>>>>> ICMP timestamp RTT tsrtt=3
>>>>>
>>>>> ^C
>>>>> --- 192.168.0.112 hping statistic ---
>>>>> 6 packets tramitted, 6 packets received, 0% packet loss
>>>>> round-trip min/avg/max = 3.7/51.1/235.9 ms
>>>>>
>>>>>
>>>>>
>>>>> BR.
>>>>> Risto
>>>>>
>>>>> On 2.12.2018 23:32, Risto Pajula wrote:
>>>>>> Hello.
>>>>>>
>>>>>> You can most likely ignore the "DF Bit, mtu bug when forwarding" case. 
>>>>>> There isn't actually big IP packets on the wire, instead there is burst 
>>>>>> of packets on the wire, which are combined by the GRO... And thus 
>>>>>> dropping them should not happen. Sorry about the invalid bug report.
>>>>>>
>>>>>> However the poor latency from intenal network to the internet still 
>>>>>> remain, both GRO enabled and disabled. I will try to study further...
>>>>>>
>>>>>>
>>>>>> BR.
>>>>>> Risto
>>>>>>
>>>>>>
>>>>>> On 2.12.2018 14:01, Risto Pajula wrote:
>>>>>>> Hello.
>>>>>>>
>>>>>>> I have encountered a weird performance problem in Linux IP 
>>>>>>> fragmentation when using video streaming services behind the NAT. Also 
>>>>>>> I have studied a possible bug in the DF bit (don't fragment) handling 
>>>>>>> when forwarding the IP packets.
>>>>>>>
>>>>>>> First the system setup description:
>>>>>>>
>>>>>>> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre 
>>>>>>> router]-internet
>>>>>>>
>>>>>>> where:
>>>>>>> host1: is a Netgem N7800 "cable box" for online video streaming 
>>>>>>> services provided by local telco (Can access Netflix, HBO nordic, "live 
>>>>>>> TV", etc.)
>>>>>>> linux router: Linux computer with Dualcore Intel Celeron G1840, running 
>>>>>>> currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
>>>>>>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, 
>>>>>>> mtu set to 1500, RTL8169sb/8110sb
>>>>>>> eth0: Linux Routers internet facing interface, public ip address, mtu 
>>>>>>> set to 1500,  RTL8168evl/8111evl
>>>>>>> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly 
>>>>>>> connected to the eth0 of the Linux router.
>>>>>>>
>>>>>>> And now when using the Netgem N7800 with online video services 
>>>>>>> (Netflix, HBO nordic, etc) the Linux router will receive very BIG IP 
>>>>>>> packets in the eth0 upto ~20kB, this seems to lead to the following 
>>>>>>> problems in the Linux IP stack.
>>>>>>>
>>>>>>> IP fragmentation performance:
>>>>>>> When the Linux router receives these large IP packets in the eth0 
>>>>>>> everything works, but it seems that them cause very large performance 
>>>>>>> degradation from internal network to the internet regarding the latency 
>>>>>>> when the IP fragmentation is performed. The ping latency from internal 
>>>>>>> network to the internel network increases from stable 15ms-20ms up to 
>>>>>>> 700-800ms AND also the ping from the internal network to the linux 
>>>>>>> router eth1 (192.168.0.). However up link works perfectly, the ping is 
>>>>>>> still stable when streaming the online services (From linux router to 
>>>>>>> the internet). It seems that the IP fragmentation is somehow blocking 
>>>>>>> the eth1 reception or transmission for very long time (which it 
>>>>>>> shouldn't). I'm able to test and debug the issue further, but advice 
>>>>>>> regarding where to look would be appreciated.
>>>>>>>
>>>>>>>
>>>>>>> DF Bit, mtu bug when forwarding:
>>>>>>> I have started to study the above mentioned problem and have found a 
>>>>>>> possible bug in the DF bit and mtu handling in IP forwarding. The BIG 
>>>>>>> packets received from streaming services all have the "DF bit" set and 
>>>>>>> the question is that should we be forwarding them at all as that would 
>>>>>>> result them being fragmented? Apparently we currently are... I have 
>>>>>>> traced this down to the ip_forward.c function ip_exceeds_mtu(), and the 
>>>>>>> following patch seems to fix that.
>>>>>>>
>>>>>>> --- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
>>>>>>> +++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
>>>>>>> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
>>>>>>>                  return false;
>>>>>>>
>>>>>>>          /* original fragment exceeds mtu and DF is set */
>>>>>>> -       if (unlikely(IPCB(skb)->frag_max_size > mtu))
>>>>>>> +        if (unlikely(skb->len > mtu))
>>>>>>>                  return true;
>>>>>>>
>>>>>>>          if (skb->ignore_df)
>>>>>>>
>>>>>>>
>>>>>>> This seems to work (in some ways) - after the change IP packets that 
>>>>>>> are too large to the internal network get dropped and we are sending 
>>>>>>> "ICMP Destination unreachable, The datagram is too big" messages to the 
>>>>>>> originator (as we should?). However it seems that not all services 
>>>>>>> really like this... Netflix behaves as expected and ping is stable from 
>>>>>>> internal network to the internet, but for example HBO nordic will not 
>>>>>>> work anymore (too little buffering? Retransimissions not working?). So 
>>>>>>> it seems the original issue should be also fixed (And the fragmention 
>>>>>>> should be allowed?).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Any advice would be appreciated. Thanks!
>>>>>>>
>>>>>>> PS. Watching TV was not this intensive 20 years ago :)
>>>>>>>
> 

Reply via email to