We've started to use bpf to trace every packet and atomic add instruction (event JITed) started to show up in perf profile. The solution is to do per-cpu counters. For PERCPU_(HASH|ARRAY) map the existing bpf_map_lookup() helper returns per-cpu area which bpf programs can use to store and increment the counters. The BPF_MAP_LOOKUP_ELEM syscall command returns areas from all cpus and user process aggregates the counters. The usage example is in patch 6. The api turned out to be very easy to use from bpf program and from user space. Long term we were discussing to add 'bounded loop' instruction, so bpf programs can do aggregation within the program which may help some use cases. Right now user space aggregation of per-cpu counters fits the best.
This patch set is new approach for per-cpu hash and array maps. I've reused the map tests written by Martin and Ming, but implementation and api is new. Old discussion here: http://thread.gmane.org/gmane.linux.kernel/2123800/focus=2126435 Alexei Starovoitov (4): bpf: introduce BPF_MAP_TYPE_PERCPU_HASH map bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map bpf: add lookup/update support for per-cpu hash and array maps samples/bpf: update tracex[23] examples to use per-cpu maps Martin KaFai Lau (1): samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH tom.leim...@gmail.com (1): samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY include/linux/bpf.h | 24 ++++ include/uapi/linux/bpf.h | 2 + kernel/bpf/arraymap.c | 166 ++++++++++++++++++++-- kernel/bpf/hashtab.c | 340 ++++++++++++++++++++++++++++++++++++++------- kernel/bpf/syscall.c | 57 +++++--- samples/bpf/test_maps.c | 188 +++++++++++++++++++++++++ samples/bpf/tracex2_kern.c | 2 +- samples/bpf/tracex2_user.c | 7 +- samples/bpf/tracex3_kern.c | 8 +- samples/bpf/tracex3_user.c | 21 ++- 10 files changed, 727 insertions(+), 88 deletions(-) -- 2.4.6