Re: Optimizing instruction-cache, more packets at each stage

Eric Dumazet Thu, 21 Jan 2016 08:39:10 -0800

On Thu, 2016-01-21 at 12:27 +0100, Jesper Dangaard Brouer wrote:

> In my experiments, where I extract several packet before calling
> napi_gro_receive(), and I also delay calling eth_type_trans().  Most of
> my speedup comes from this trick, as the prefetch() now that enough
> time.


It really depends on the cpu.

Many cpus have very poor prefetch performance.
prefetch instructions are lazily defined by Intel/AMD

Ivy Bridge prefetcher for example is known to be not that good.

http://www.agner.org/optimize/blog/read.php?i=415

http://www.agner.org/optimize/blog/read.php?i=285

https://groups.google.com/forum/#!topic/comp.arch/71wnqr_F9sw

Really, refrain from adding stuff that might look good one one cpu.

Re: Optimizing instruction-cache, more packets at each stage

Reply via email to