On Thu 24 Jan 2019 at 17:21, Dennis Zhou <den...@kernel.org> wrote:
> Hi Vlad and Eric,
>
> On Tue, Jan 22, 2019 at 09:33:10AM -0800, Eric Dumazet wrote:
>> On Mon, Jan 21, 2019 at 3:24 AM Vlad Buslov <vla...@mellanox.com> wrote:
>> >
>> > Hi Eric,
>> >
>> > I've been investigating significant tc filter insertion rate degradation
>> > and it seems it is caused by your commit 001c96db0181 ("net: align
>> > gnet_stats_basic_cpu struct"). With this commit insertion rate is
>> > reduced from ~65k rules/sec to ~43k rules/sec when inserting 1m rules
>> > from file in tc batch mode on my machine.
>> >
>> > Tc perf profile indicates that pcpu allocator now consumes 2x CPU:
>> >
>> > 1) Before:
>> >
>> > Samples: 63K of event 'cycles:ppp', Event count (approx.): 48796480071
>> >   Children      Self  Co  Shared Object     Symbol
>> > +   21.19%     3.38%  tc  [kernel.vmlinux]  [k] pcpu_alloc
>> > +    3.45%     0.25%  tc  [kernel.vmlinux]  [k] pcpu_alloc_area
>> >
>> > 2) After:
>> >
>> > Samples1: 92K of event 'cycles:ppp', Event count (approx.): 71446806550
>> >   Children      Self  Co  Shared Object     Symbol
>> > +   44.67%     3.99%  tc  [kernel.vmlinux]  [k] pcpu_alloc
>> > +   19.25%     0.22%  tc  [kernel.vmlinux]  [k] pcpu_alloc_area
>> >
>> > It seems that it takes much more work for pcpu allocator to perform
>> > allocation with new stricter alignment requirements. Not sure if it is
>> > expected behavior or not in this case.
>> >
>> > Regards,
>> > Vlad
>
> Would you mind sharing a little more information with me:
> 1) output before and after a run of /sys/kernel/debug/percpu_stats

Hi Dennis,

Some of these files are quite large, so I put them to my Dropbox.

Output before:

Percpu Memory Statistics
Allocation Info:
----------------------------------------
  unit_size           :       262144
  static_size         :       139160
  reserved_size       :         8192
  dyn_size            :        28776
  atom_size           :      2097152
  alloc_size          :      2097152

Global Stats:
----------------------------------------
  nr_alloc            :         3343
  nr_dealloc          :          752
  nr_cur_alloc        :         2591
  nr_max_alloc        :         2598
  nr_chunks           :            3
  nr_max_chunks       :            3
  min_alloc_size      :            4
  max_alloc_size      :         8208
  empty_pop_pages     :            3

Per Chunk Stats:
----------------------------------------
Chunk: <- Reserved Chunk
  nr_alloc            :            5
  max_alloc_size      :          320
  empty_pop_pages     :            0
  first_bit           :         1002
  free_bytes          :         7448
  contig_bytes        :         7424
  sum_frag            :           24
  max_frag            :           24
  cur_min_alloc       :           16
  cur_med_alloc       :           64
  cur_max_alloc       :          320

Chunk: <- First Chunk
  nr_alloc            :          479
  max_alloc_size      :         8208
  empty_pop_pages     :            0
  first_bit           :         8192
  free_bytes          :            0
  contig_bytes        :            0
  sum_frag            :            0
  max_frag            :            0
  cur_min_alloc       :            4
  cur_med_alloc       :           24
  cur_max_alloc       :         8208

Chunk:
  nr_alloc            :         1925
  max_alloc_size      :         8208
  empty_pop_pages     :            0
  first_bit           :        63102
  free_bytes          :          852
  contig_bytes        :           12
  sum_frag            :          852
  max_frag            :           12
  cur_min_alloc       :            4
  cur_med_alloc       :            8
  cur_max_alloc       :         8208

Chunk:
  nr_alloc            :          182
  max_alloc_size      :          936
  empty_pop_pages     :            3
  first_bit           :           21
  free_bytes          :       256452
  contig_bytes        :       255120
  sum_frag            :         1332
  max_frag            :          368
  cur_min_alloc       :            8
  cur_med_alloc       :           20
  cur_max_alloc       :          320


After: https://www.dropbox.com/s/unyzhx4vgo2x30e/stats_after?dl=0

> 2) a full perf output

https://www.dropbox.com/s/isfcxca3npn5slx/perf.data?dl=0

> 3) a reproducer

$ sudo tc -b add.0

Example batch file: https://www.dropbox.com/s/ey7cbl5nwu5p0tg/add.0?dl=0

Thanks,
Vlad

Reply via email to