On 2024-10-09 01:23, Steven Rostedt wrote:
On Fri,  4 Oct 2024 10:58:15 -0400
Mathieu Desnoyers <[email protected]> wrote:

Use Tasks Trace RCU to protect iteration of system call enter/exit
tracepoint probes to allow those probes to handle page faults.

In preparation for this change, all tracers registering to system call
enter/exit tracepoints should expect those to be called with preemption
enabled.

This allows tracers to fault-in userspace system call arguments such as
path strings within their probe callbacks.

Signed-off-by: Mathieu Desnoyers <[email protected]>
Cc: Michael Jeanson <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: [email protected]
Cc: Joel Fernandes <[email protected]>
---
  include/linux/tracepoint.h | 12 ++++++++++--
  init/Kconfig               |  1 +
  2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 014790495ad8..cefd44b7c91f 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -17,6 +17,7 @@
  #include <linux/errno.h>
  #include <linux/types.h>
  #include <linux/rcupdate.h>
+#include <linux/rcupdate_trace.h>
  #include <linux/tracepoint-defs.h>
  #include <linux/static_call.h>
@@ -107,6 +108,7 @@ void for_each_tracepoint_in_module(struct module *mod,
  #ifdef CONFIG_TRACEPOINTS
  static inline void tracepoint_synchronize_unregister(void)
  {
+       synchronize_rcu_tasks_trace();
        synchronize_rcu();
  }
  #else
@@ -204,11 +206,17 @@ static inline struct tracepoint 
*tracepoint_ptr_deref(tracepoint_ptr_t *p)
                if (!(cond))                                            \
                        return;                                         \
                                                                        \
-               preempt_disable_notrace();                              \

Should add a comment somewhere stating that the syscall version is to allow 
faults.

I plan to add this comment on top of __TO_TRACE:

+ *
+ * With @syscall=0, the tracepoint callback array dereference is
+ * protected by disabling preemption.
+ * With @syscall=1, the tracepoint callback array dereference is
+ * protected by Tasks Trace RCU, which allows probes to handle page
+ * faults.

Thanks,

Mathieu



-- Steve

+               if (syscall)                                            \
+                       rcu_read_lock_trace();                          \
+               else                                                    \
+                       preempt_disable_notrace();                      \
                                                                        \
                __DO_TRACE_CALL(name, TP_ARGS(args));                   \
                                                                        \
-               preempt_enable_notrace();                               \
+               if (syscall)                                            \
+                       rcu_read_unlock_trace();                        \
+               else                                                    \
+                       preempt_enable_notrace();                       \
        } while (0)
/*
diff --git a/init/Kconfig b/init/Kconfig
index fbd0cb06a50a..eedd0064fb36 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1984,6 +1984,7 @@ config BINDGEN_VERSION_TEXT
  #
  config TRACEPOINTS
        bool
+       select TASKS_TRACE_RCU
source "kernel/Kconfig.kexec"


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


Reply via email to