tc filter insertion rate degradation

Vlad Buslov Mon, 21 Jan 2019 03:25:34 -0800

Hi Eric,

I've been investigating significant tc filter insertion rate degradation
and it seems it is caused by your commit 001c96db0181 ("net: align
gnet_stats_basic_cpu struct"). With this commit insertion rate is
reduced from ~65k rules/sec to ~43k rules/sec when inserting 1m rules
from file in tc batch mode on my machine.


Tc perf profile indicates that pcpu allocator now consumes 2x CPU:

1) Before:

Samples: 63K of event 'cycles:ppp', Event count (approx.): 48796480071
  Children      Self  Co  Shared Object     Symbol
+   21.19%     3.38%  tc  [kernel.vmlinux]  [k] pcpu_alloc
+    3.45%     0.25%  tc  [kernel.vmlinux]  [k] pcpu_alloc_area

2) After:

Samples1: 92K of event 'cycles:ppp', Event count (approx.): 71446806550
  Children      Self  Co  Shared Object     Symbol
+   44.67%     3.99%  tc  [kernel.vmlinux]  [k] pcpu_alloc
+   19.25%     0.22%  tc  [kernel.vmlinux]  [k] pcpu_alloc_area

It seems that it takes much more work for pcpu allocator to perform
allocation with new stricter alignment requirements. Not sure if it is
expected behavior or not in this case.

Regards,
Vlad

tc filter insertion rate degradation

Reply via email to