On Tue, Apr 26, 2016 at 3:30 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > From: Eric Dumazet <eduma...@google.com> > > sd->input_queue_head is incremented for each processed packet > in process_backlog(), and read from other cpus performing > Out Of Order avoidance in get_rps_cpu() > > Moving this field in a separate cache line keeps it mostly > hot for the cpu in process_backlog(), as other cpus will > only read it. > > In a stress test, process_backlog() was consuming 6.80 % of cpu cycles, > and the patch reduced the cost to 0.65 % > > Signed-off-by: Eric Dumazet <eduma...@google.com>
Very nice! Acked-by: Tom Herbert <t...@herbertland.com> > --- > include/linux/netdevice.h | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 18d8394f2e5d..934ca866562d 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -2747,11 +2747,15 @@ struct softnet_data { > struct sk_buff *completion_queue; > > #ifdef CONFIG_RPS > - /* Elements below can be accessed between CPUs for RPS */ > + /* input_queue_head should be written by cpu owning this struct, > + * and only read by other cpus. Worth using a cache line. > + */ > + unsigned int input_queue_head ____cacheline_aligned_in_smp; > + > + /* Elements below can be accessed between CPUs for RPS/RFS */ > struct call_single_data csd ____cacheline_aligned_in_smp; > struct softnet_data *rps_ipi_next; > unsigned int cpu; > - unsigned int input_queue_head; > unsigned int input_queue_tail; > #endif > unsigned int dropped; > >