mlx4_en: add prefetch in xdp rx path

Eric Dumazet Thu, 07 Jul 2016 23:57:21 -0700

On Thu, 2016-07-07 at 21:16 -0700, Alexei Starovoitov wrote:

> I've tried this style of prefetching in the past for normal stack
> and it didn't help at all.


This is very nice, but my experience showed opposite numbers.
So I guess you did not choose the proper prefetch strategy.

prefetching in mlx4 gave me good results, once I made sure our compiler
was not moving the actual prefetch operations on x86_64 (ie forcing use
of asm volatile as in x86_32 instead of the builtin prefetch). You might
check if your compiler does the proper thing because this really hurt me
in the past.

In my case, I was using 40Gbit NIC, and prefetching 128 bytes instead of
64 bytes allowed to remove one stall in GRO engine when using TCP with
TS (total header size : 66 bytes), or tunnels.

The problem with prefetch is that it works well assuming a given rate
(in pps), and given cpus, as prefetch behavior is varying among flavors.

Brenden chose to prefetch N+3, based on some experiments, on some
hardware,

prefetch N+3 can actually slow down if you receive a moderate load,
which is the case 99% of the time in typical workloads on modern servers
with multi queue NIC.

This is why it was hard to upstream such changes, because they focus on
max throughput instead of low latencies.

Re: [PATCH v6 12/12] net/mlx4_en: add prefetch in xdp rx path

Reply via email to