We've started to use bpf to trace every packet and atomic add
instruction (event JITed) started to show up in perf profile.
The solution is to do per-cpu counters.
For PERCPU_(HASH|ARRAY) map the existing bpf_map_lookup() helper
returns per-cpu area which bpf programs can use to store and
increment the counters. The BPF_MAP_LOOKUP_ELEM syscall command
returns areas from all cpus and user process aggregates the counters.
The usage example is in patch 6. The api turned out to be very
easy to use from bpf program and from user space.
Long term we were discussing to add 'bounded loop' instruction,
so bpf programs can do aggregation within the program which may
help some use cases. Right now user space aggregation of
per-cpu counters fits the best.

This patch set is new approach for per-cpu hash and array maps.
I've reused the map tests written by Martin and Ming, but
implementation and api is new. Old discussion here:
http://thread.gmane.org/gmane.linux.kernel/2123800/focus=2126435

Alexei Starovoitov (4):
  bpf: introduce BPF_MAP_TYPE_PERCPU_HASH map
  bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map
  bpf: add lookup/update support for per-cpu hash and array maps
  samples/bpf: update tracex[23] examples to use per-cpu maps

Martin KaFai Lau (1):
  samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH

tom.leim...@gmail.com (1):
  samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY

 include/linux/bpf.h        |  24 ++++
 include/uapi/linux/bpf.h   |   2 +
 kernel/bpf/arraymap.c      | 166 ++++++++++++++++++++--
 kernel/bpf/hashtab.c       | 340 ++++++++++++++++++++++++++++++++++++++-------
 kernel/bpf/syscall.c       |  57 +++++---
 samples/bpf/test_maps.c    | 188 +++++++++++++++++++++++++
 samples/bpf/tracex2_kern.c |   2 +-
 samples/bpf/tracex2_user.c |   7 +-
 samples/bpf/tracex3_kern.c |   8 +-
 samples/bpf/tracex3_user.c |  21 ++-
 10 files changed, 727 insertions(+), 88 deletions(-)

-- 
2.4.6

Reply via email to