Re: [net-next PATCH] net: ipv4: fix listify ip_rcv_finish in case of forwarding

Jesper Dangaard Brouer Fri, 13 Jul 2018 04:09:01 -0700

On Thu, 12 Jul 2018 23:10:28 +0300 Or Gerlitz <gerlitz...@gmail.com> wrote:

> On Wed, Jul 11, 2018 at 11:06 PM, Jesper Dangaard Brouer
> <bro...@redhat.com> wrote:
> 
> > Well, I would prefer you to implement those.  I just did a quick
> > implementation (its trivially easy) so I have something to benchmark
> > with.  The performance boost is quite impressive!  
> 
> sounds good, but wait
> 
> 
> > One reason I didn't "just" send a patch, is that Edward so-fare only
> > implemented netif_receive_skb_list() and not napi_gro_receive_list().  
> 
> sfc does't support gro?! doesn't make sense.. Edward?
> 
> > And your driver uses napi_gro_receive().  This sort-of disables GRO for
> > your driver, which is not a choice I can make.  Interestingly I get
> > around the same netperf TCP_STREAM performance.  
> 
> Same TCP performance

I said around the same... I'll redo the benchmarks and verify...
(did it.. see later).

> with GRO and no rx-batching
> 
> or
> 
> without GRO and yes rx-batching

Yes, obviously without GRO and yes rx-batching.

> is by far not intuitive result to me unless both these techniques
> mostly serve to eliminate lots of instruction cache misses and the
> TCP stack is so much optimized that if the code is in the cache,
> going through it once with 64K byte GRO-ed packet is like going
> through it ~40 (64K/1500) times with non GRO-ed packets.

Actually the GRO code path is actually rather expensive, and uses a lot
of indirect-calls.  If you have an UDP workload, then disable-GRO will
give you a 10-15% performance boost.

Edward's changes are basically a generalized version of GRO, up-to the
IP layer (ip_rcv).  So, for me it makes perfect sense.  

> What's the baseline (with GRO and no rx-batching) number on your setup?

Okay, redoing the benchmarks...

Implemented a code hack so I runtime can control if mlx5 driver uses
napi_gro_receive() or netif_receive_skb_list() (abusing a netdev ethtool
controlled feature flag no-in-use).

To get a quick test going with feedback every 3 sec I use:

 $ netperf -t TCP_STREAM -H 198.18.1.1 -D3 -l 60000 -T 4,4

Default: using napi_gro_receive() with GRO enabled:

 Interim result: 25995.28 10^6bits/s over 3.000 seconds

Disable GRO but still use napi_gro_receive():

 Interim result: 21980.45 10^6bits/s over 3.001 seconds

Make driver use netif_receive_skb_list():

 Interim result: 25490.67 10^6bits/s over 3.002 seconds

As you can see, using netif_receive_skb_list() have a huge performance
boost over disabled-GRO.  And it comes very close to the performance
of enabled-GRO. Which is rather impressive! :-)

Notice, even more impressively; these tests are without CONFIG_RETPOLINE.
We primarily merged netif_receive_skb_list() due to the overhead of
RETPOLINEs, but we even see a benefit when not using RETPOLINEs.

> > I assume we can get even better perf if we "listify" napi_gro_receive.  
> 
> yeah, that would be very interesting to get there

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Re: [net-next PATCH] net: ipv4: fix listify ip_rcv_finish in case of forwarding

Reply via email to