On Mon, Aug 05, 2024 at 01:28:03PM -0700, Andrii Nakryiko wrote:
> trace_uprobe->nhit counter is not incremented atomically, so its value
> is bogus in practice. On the other hand, it's actually a pretty big
> uprobe scalability problem due to heavy cache line bouncing between CPUs
> triggering the same uprobe.

so you're seeing that in the benchmark, right? I'm curious how bad
the numbers are

> 
> Drop it and emit obviously unrealistic value in its stead in
> uporbe_profiler seq file.
> 
> The alternative would be allocating per-CPU counter, but I'm not sure
> it's justified.
> 
> Signed-off-by: Andrii Nakryiko <[email protected]>
> ---
>  kernel/trace/trace_uprobe.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 52e76a73fa7c..5d38207db479 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -62,7 +62,6 @@ struct trace_uprobe {
>       struct uprobe                   *uprobe;
>       unsigned long                   offset;
>       unsigned long                   ref_ctr_offset;
> -     unsigned long                   nhit;
>       struct trace_probe              tp;
>  };
>  
> @@ -821,7 +820,7 @@ static int probes_profile_seq_show(struct seq_file *m, 
> void *v)
>  
>       tu = to_trace_uprobe(ev);
>       seq_printf(m, "  %s %-44s %15lu\n", tu->filename,
> -                     trace_probe_name(&tu->tp), tu->nhit);
> +                trace_probe_name(&tu->tp), ULONG_MAX);

seems harsh.. would it be that bad to create per cpu counter for that?

jirka

>       return 0;
>  }
>  
> @@ -1507,7 +1506,6 @@ static int uprobe_dispatcher(struct uprobe_consumer 
> *con, struct pt_regs *regs)
>       int ret = 0;
>  
>       tu = container_of(con, struct trace_uprobe, consumer);
> -     tu->nhit++;
>  
>       udd.tu = tu;
>       udd.bp_addr = instruction_pointer(regs);
> -- 
> 2.43.5
> 

Reply via email to