On Tue, Jan 22, 2019 at 1:18 PM Tejun Heo <t...@kernel.org> wrote: > > Hello, >
> Percpu storage is expensive and cache line sharing tends to be less of > a problem (cuz they're per-cpu), so it is useful to support custom > alignments for tighter packing. > We have BPF percpu maps of two 8-byte counters (packets and bytes counter), with millions of slots. We update the pair for every packet sent on the hosts. BPF uses an alignment of 8 (that can not be changed/tuned, at least all call sites from kernel/bpf/hashtab.c ) If we are lucky, all these pairs are allocated using a single cache line. But when we are not lucky, 25% of the pairs are crossing a cache line, reducing performance under DDOS. Using a nicer alignment in our case does not consume more ram, and we did not notice extra cost of per-cpu allocations because we keep them in the slow path (control path)