On Sat, Feb 23, 2019 at 12:34:38AM +0000, Roman Gushchin wrote: > On Fri, Feb 22, 2019 at 03:36:41PM -0800, Alexei Starovoitov wrote: > > JITed BPF programs are indistinguishable from kernel functions, but unlike > > kernel code BPF code can be changed often. > > Typical approach of "perf record" + "perf report" profiling and tunning of > > kernel code works just as well for BPF programs, but kernel code doesn't > > need to be monitored whereas BPF programs do. > > Users load and run large amount of BPF programs. > > These BPF stats allow tools monitor the usage of BPF on the server. > > The monitoring tools will turn sysctl kernel.bpf_stats_enabled > > on and off for few seconds to sample average cost of the programs. > > Aggregated data over hours and days will provide an insight into cost of BPF > > and alarms can trigger in case given program suddenly gets more expensive. > > > > The cost of two sched_clock() per program invocation adds ~20 nsec. > > Fast BPF progs (like selftests/bpf/progs/test_pkt_access.c) will slow down > > from ~40 nsec to ~60 nsec. > > static_key minimizes the cost of the stats collection. > > There is no measurable difference before/after this patch > > with kernel.bpf_stats_enabled=0 > > > > Signed-off-by: Alexei Starovoitov <a...@kernel.org> > > --- > > include/linux/bpf.h | 7 +++++++ > > include/linux/filter.h | 16 +++++++++++++++- > > kernel/bpf/core.c | 13 +++++++++++++ > > kernel/bpf/syscall.c | 24 ++++++++++++++++++++++-- > > kernel/bpf/verifier.c | 5 +++++ > > kernel/sysctl.c | 34 ++++++++++++++++++++++++++++++++++ > > 6 files changed, 96 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index de18227b3d95..14260674bc57 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -340,6 +340,11 @@ enum bpf_cgroup_storage_type { > > > > #define MAX_BPF_CGROUP_STORAGE_TYPE __BPF_CGROUP_STORAGE_MAX > > > > +struct bpf_prog_stats { > > + u64 cnt; > > + u64 nsecs; > > +}; > > + > > struct bpf_prog_aux { > > atomic_t refcnt; > > u32 used_map_cnt; > > @@ -389,6 +394,7 @@ struct bpf_prog_aux { > > * main prog always has linfo_idx == 0 > > */ > > u32 linfo_idx; > > + struct bpf_prog_stats __percpu *stats; > > union { > > struct work_struct work; > > struct rcu_head rcu; > > @@ -559,6 +565,7 @@ void bpf_map_area_free(void *base); > > void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr); > > > > extern int sysctl_unprivileged_bpf_disabled; > > +extern int sysctl_bpf_stats_enabled; > > > > int bpf_map_new_fd(struct bpf_map *map, int flags); > > int bpf_prog_new_fd(struct bpf_prog *prog); > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > index f32b3eca5a04..7b14235b6f7d 100644 > > --- a/include/linux/filter.h > > +++ b/include/linux/filter.h > > @@ -533,7 +533,21 @@ struct sk_filter { > > struct bpf_prog *prog; > > }; > > > > -#define BPF_PROG_RUN(filter, ctx) ({ cant_sleep(); > > (*(filter)->bpf_func)(ctx, (filter)->insnsi); }) > > +DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key); > > + > > +#define BPF_PROG_RUN(prog, ctx) ({ \ > > + u32 ret; \ > > + cant_sleep(); \ > > + if (static_branch_unlikely(&bpf_stats_enabled_key)) { \ > > + u64 start = sched_clock(); \ > > + ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \ > > + this_cpu_inc(prog->aux->stats->cnt); \ > > + this_cpu_add(prog->aux->stats->nsecs, \ > > + sched_clock() - start); \ > > + } else { \ > > + ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \ > > + } \ > > + ret; }) > > Can we bump "cnt" unconditionally and gate only the "nsecs" calculation?
xdp cannot afford to pay this penalty unconditionally. It's not only this_cpu_inc. It's prog->aux->stats loads.