On 3/27/2018 1:29 AM, Florian Fainelli wrote:
On 03/26/2018 03:04 PM, Florian Fainelli wrote:
On 03/26/2018 02:16 PM, Tal Gilboa wrote:
On 3/23/2018 4:19 AM, Florian Fainelli wrote:
Hi all,
This patch series adds adaptive interrupt coalescing for the Gigabit
Ethernet
drivers SYSTEMPORT and GENET.
This really helps lower the interrupt count and system load, as
measured by
vmstat for a Gigabit TCP RX session:
I don't see an improvement in system load, the opposite - 42% vs. 100%
for SYSTEMPORT and 85% vs. 100% for GENET. Both with the same bandwidth.
Looks like I did not extract the correct data the load could spike in
both cases (with and without net_dim) up to 100, but averaged over the
transmission I see the following:
GENET without:
1 0 0 1169568 0 25556 0 0 0 0 130079 62795 2
86 13 0 0
GENET with:
1 0 0 1169536 0 25556 0 0 0 0 10566 10869 1
21 78 0 0
Am I missing something? Talking about bandwidth, I would expect 941Mb/s
(assuming this is TCP over IPv4). Do you know why the reduced interrupt
rate doesn't improve bandwidth?
I am assuming that this comes down to a latency, still capturing some
pcap files to analyze the TCP session with wireshark and see if that is
indeed what is going on. The test machine is actually not that great
I would expect 1GbE full wire speed on almost any setup. I'll try
applying your code on my setup and see what I get.
Also, any effect on the client side (you
mentioned enabling TX moderation for SYSTEMPORT)?
Yes, on SYSTEMPORT, being the TCP IPv4 client, I have the following:
SYSTEMPORT without:
2 0 0 191428 0 25748 0 0 0 0 86254 264 0 41
59 0 0
SYSTEMPORT with:
3 0 0 190176 0 25748 0 0 0 0 45485 31332 0
100 0 0 0
I don't get top to agree with these load results though but it looks
like we just have the CPU spinning more, does not look like a win.
The problem appears to be the timeout selection on TX, ignoring it
completely allows us to keep the load average down while maintaining the
bandwidth. Looks like NAPI on TX already does a good job, so interrupt
mitigation on TX is not such a great idea actually...
I saw a similar behavior for TX. For me the issue was too many
outstanding bytes without a completion (defined to be 256KB by sysctl
net.ipv4.tcp_limit_output_bytes). I tested on a 100GbE connection so
with reasonable timeout values I already waited too long (4 TSO
sessions). For the 1GbE case this might have no effect since you need a
very long timeout. I'm currently working on adding TX support for dim.
If you don't see a good benefit currently you might want to wait a
little with TX adaptive interrupt moderation. Maybe only adjust static
moderation for now?
Also, doing UDP TX tests shows that we can lower the interrupt count by
setting an appropriate tx-frames (as expected), but we won't be lowering
the CPU load since that is inherently a CPU intensive work. Past
Do you see higher TX UDP bandwidth? If you are bounded by CPU on both
cases I would at least expect higher bandwidth with less interrupts
since you reduce work from the CPU.
tx-frames=64, the bandwidth completely drops because that would be 1/2
of the ring size.