On 8/8/19 10:08 PM, Heiner Kallweit wrote:
(..snip..)

I was about to ask exactly that, whether you have TSO enabled. I don't know what
can trigger the HW issue, it was just confirmed by Realtek that this chip 
version
has a problem with TSO. So the logical conclusion is: test w/o TSO, ideally the
linux-next version.

So disabling TSO alone didn't work - it leads to reduced throughout (~70 MB/s 
in iperf).
Instead I decided to backport 93681cd7d94f ("r8169: enable HW csum and TSO"), 
which
wasn't easy due to cleanups/renamings of dependencies, but I managed to backport
it and .. got the same problem of reduced throughout. wat?!

After lots of trial & error I started disabling all offloads and finally found
that sg (Scatter-Gather) enabled alone - without TSO - will lead to the 
throughput
drop. So the culprit seems 93681cd7d94f, which disabled TSO on my NIC, but left
sg on by default. This weas repeatable - switch on sg, throughput drop; turn it
off - smooth sailing, now with reduced buffers.

I modified the relevant bits to disable tso & sg like this:

     /* RTL8168e-vl has a HW issue with TSO */
     if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
+        dev->vlan_features &= ~(NETIF_F_ALL_TSO|NETIF_F_SG);
+        dev->hw_features &= ~(NETIF_F_ALL_TSO|NETIF_F_SG);
+        dev->features &= ~(NETIF_F_ALL_TSO|NETIF_F_SG);
     }

This seems to work since it restores performance without sg/tso by default
and without any additional offloads, yet with xmit_more in the mix.
We'll see whether that is stable over the next few days, but I strongly
suspect it will be good and that the hiccups were due to xmit_more/TSO
interaction.

So that didn't take long - got another timeout this morning during some
random light usage, despite sg/tso being disabled this time.
Again the only common element is the xmit_more patch. :(
Not sure whether you want to revert this right away or wait for 5.4-rc1
feedback. Maybe this too is chipset-specific?

Thanks a lot for the analysis and testing. Then I'll submit the disabling
of SG on RTL8168evl (on your behalf), independent of whether it fixes
the timeout issue.

Got it, thanks!

Holger

Reply via email to