On Sat, Feb 23, 2019 at 12:34:38AM +0000, Roman Gushchin wrote:
> On Fri, Feb 22, 2019 at 03:36:41PM -0800, Alexei Starovoitov wrote:
> > JITed BPF programs are indistinguishable from kernel functions, but unlike
> > kernel code BPF code can be changed often.
> > Typical approach of "perf record" + "perf report" profiling and tunning of
> > kernel code works just as well for BPF programs, but kernel code doesn't
> > need to be monitored whereas BPF programs do.
> > Users load and run large amount of BPF programs.
> > These BPF stats allow tools monitor the usage of BPF on the server.
> > The monitoring tools will turn sysctl kernel.bpf_stats_enabled
> > on and off for few seconds to sample average cost of the programs.
> > Aggregated data over hours and days will provide an insight into cost of BPF
> > and alarms can trigger in case given program suddenly gets more expensive.
> > 
> > The cost of two sched_clock() per program invocation adds ~20 nsec.
> > Fast BPF progs (like selftests/bpf/progs/test_pkt_access.c) will slow down
> > from ~40 nsec to ~60 nsec.
> > static_key minimizes the cost of the stats collection.
> > There is no measurable difference before/after this patch
> > with kernel.bpf_stats_enabled=0
> > 
> > Signed-off-by: Alexei Starovoitov <a...@kernel.org>
> > ---
> >  include/linux/bpf.h    |  7 +++++++
> >  include/linux/filter.h | 16 +++++++++++++++-
> >  kernel/bpf/core.c      | 13 +++++++++++++
> >  kernel/bpf/syscall.c   | 24 ++++++++++++++++++++++--
> >  kernel/bpf/verifier.c  |  5 +++++
> >  kernel/sysctl.c        | 34 ++++++++++++++++++++++++++++++++++
> >  6 files changed, 96 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index de18227b3d95..14260674bc57 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -340,6 +340,11 @@ enum bpf_cgroup_storage_type {
> >  
> >  #define MAX_BPF_CGROUP_STORAGE_TYPE __BPF_CGROUP_STORAGE_MAX
> >  
> > +struct bpf_prog_stats {
> > +   u64 cnt;
> > +   u64 nsecs;
> > +};
> > +
> >  struct bpf_prog_aux {
> >     atomic_t refcnt;
> >     u32 used_map_cnt;
> > @@ -389,6 +394,7 @@ struct bpf_prog_aux {
> >      * main prog always has linfo_idx == 0
> >      */
> >     u32 linfo_idx;
> > +   struct bpf_prog_stats __percpu *stats;
> >     union {
> >             struct work_struct work;
> >             struct rcu_head rcu;
> > @@ -559,6 +565,7 @@ void bpf_map_area_free(void *base);
> >  void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
> >  
> >  extern int sysctl_unprivileged_bpf_disabled;
> > +extern int sysctl_bpf_stats_enabled;
> >  
> >  int bpf_map_new_fd(struct bpf_map *map, int flags);
> >  int bpf_prog_new_fd(struct bpf_prog *prog);
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index f32b3eca5a04..7b14235b6f7d 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -533,7 +533,21 @@ struct sk_filter {
> >     struct bpf_prog *prog;
> >  };
> >  
> > -#define BPF_PROG_RUN(filter, ctx)  ({ cant_sleep(); 
> > (*(filter)->bpf_func)(ctx, (filter)->insnsi); })
> > +DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
> > +
> > +#define BPF_PROG_RUN(prog, ctx)    ({                              \
> > +   u32 ret;                                                \
> > +   cant_sleep();                                           \
> > +   if (static_branch_unlikely(&bpf_stats_enabled_key)) {   \
> > +           u64 start = sched_clock();                      \
> > +           ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \
> > +           this_cpu_inc(prog->aux->stats->cnt);            \
> > +           this_cpu_add(prog->aux->stats->nsecs,           \
> > +                        sched_clock() - start);            \
> > +   } else {                                                \
> > +           ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \
> > +   }                                                       \
> > +   ret; })
> 
> Can we bump "cnt" unconditionally and gate only the "nsecs" calculation?

xdp cannot afford to pay this penalty unconditionally.
It's not only this_cpu_inc. It's prog->aux->stats loads.

Reply via email to