Re: tc filter insertion rate degradation

Eric Dumazet Tue, 22 Jan 2019 14:40:50 -0800

On Tue, Jan 22, 2019 at 1:18 PM Tejun Heo <[email protected]> wrote:
>
> Hello,
>


> Percpu storage is expensive and cache line sharing tends to be less of
> a problem (cuz they're per-cpu), so it is useful to support custom
> alignments for tighter packing.
>


We have BPF percpu maps of two 8-byte counters  (packets and bytes
counter), with millions of slots.

We update the pair for every packet sent on the hosts.

BPF uses an alignment of 8 (that can not be changed/tuned, at least
all call sites from kernel/bpf/hashtab.c )

If we are lucky, all these pairs are allocated using a single cache line.
But when we are not lucky, 25% of the pairs are crossing a cache line,
reducing performance under DDOS.

Using a nicer alignment in our case does not consume more ram, and we
did not notice
extra cost of per-cpu allocations because we keep them in the slow
path (control path)

Re: tc filter insertion rate degradation

Reply via email to