On Thu, 2016-01-21 at 12:27 +0100, Jesper Dangaard Brouer wrote: > In my experiments, where I extract several packet before calling > napi_gro_receive(), and I also delay calling eth_type_trans(). Most of > my speedup comes from this trick, as the prefetch() now that enough > time.
It really depends on the cpu. Many cpus have very poor prefetch performance. prefetch instructions are lazily defined by Intel/AMD Ivy Bridge prefetcher for example is known to be not that good. http://www.agner.org/optimize/blog/read.php?i=415 http://www.agner.org/optimize/blog/read.php?i=285 https://groups.google.com/forum/#!topic/comp.arch/71wnqr_F9sw Really, refrain from adding stuff that might look good one one cpu.