[PATCH 00/14] entry: preempt_schedule_irq() callers scrub

2019-03-11 Thread Valentin Schneider
Hi,

This is the continuation of [1] where I'm hunting down
preempt_schedule_irq() callers because of [2].

I told myself the best way to get this moving forward wouldn't be to write
doc about it, but to go write some fixes and get some discussions going,
which is what this patch-set is about.

I've looked at users of preempt_schedule_irq(), and made sure they didn't
have one of those useless loops. The list of offenders is:

$ grep -r -I "preempt_schedule_irq" arch/ | cut -d/ -f2 | sort | uniq

  arc
  arm
  arm64
  c6x
  csky
  h8300
  ia64
  m68k
  microblaze
  mips
  nds32
  nios2
  parisc
  powerpc
  riscv
  s390
  sh
  sparc
  x86
  xtensa

Regarding that loop, archs seem to fall in 3 categories:
A) Those that don't have the loop
B) Those that have a small need_resched() loop around the
   preempt_schedule_irq() callsite
C) Those that branch to some more generic code further up the entry code
   and eventually branch back to preempt_schedule_irq()

arc, m68k, nios2 fall in A)
sparc, ia64, s390 fall in C)
all the others fall in B)

I've written patches for B) and C) EXCEPT for ia64 and s390 because I
haven't been able to tell if it's actually fine to kill that "long jump"
(and maybe I'm wrong on sparc). Hopefully folks who understand what goes on
in there might be able to shed some light.

Also, since I sent patches for arm & arm64 in [1] I'm not including them
here.

Boot-tested on:
- x86

Build-tested on:
- h8300
- c6x
- powerpc
- mips
- nds32
- microblaze
- sparc
- xtensa

Thanks,
Valentin

[1]: 
https://lore.kernel.org/lkml/20190131182339.9835-1-valentin.schnei...@arm.com/
[2]: https://lore.kernel.org/lkml/cc989920-a13b-d53b-db83-1584a7f53...@arm.com/

Valentin Schneider (14):
  sched/core: Fix preempt_schedule() interrupt return comment
  c6x: entry: Remove unneeded need_resched() loop
  csky: entry: Remove unneeded need_resched() loop
  h8300: entry: Remove unneeded need_resched() loop
  microblaze: entry: Remove unneeded need_resched() loop
  MIPS: entry: Remove unneeded need_resched() loop
  nds32: ex-exit: Remove unneeded need_resched() loop
  powerpc: entry: Remove unneeded need_resched() loop
  RISC-V: entry: Remove unneeded need_resched() loop
  sh: entry: Remove unneeded need_resched() loop
  sh64: entry: Remove unneeded need_resched() loop
  sparc64: rtrap: Remove unneeded need_resched() loop
  x86/entry: Remove unneeded need_resched() loop
  xtensa: entry: Remove unneeded need_resched() loop

 arch/c6x/kernel/entry.S| 3 +--
 arch/csky/kernel/entry.S   | 4 
 arch/h8300/kernel/entry.S  | 3 +--
 arch/microblaze/kernel/entry.S | 5 -
 arch/mips/kernel/entry.S   | 3 +--
 arch/nds32/kernel/ex-exit.S| 4 ++--
 arch/powerpc/kernel/entry_32.S | 6 +-
 arch/powerpc/kernel/entry_64.S | 8 +---
 arch/riscv/kernel/entry.S  | 3 +--
 arch/sh/kernel/cpu/sh5/entry.S | 5 +
 arch/sh/kernel/entry-common.S  | 4 +---
 arch/sparc/kernel/rtrap_64.S   | 1 -
 arch/x86/entry/entry_32.S  | 3 +--
 arch/x86/entry/entry_64.S  | 3 +--
 arch/xtensa/kernel/entry.S | 2 +-
 kernel/sched/core.c| 7 +++
 16 files changed, 16 insertions(+), 48 deletions(-)

--
2.20.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 00/14] entry: preempt_schedule_irq() callers scrub

2019-03-12 Thread Valentin Schneider
On 12/03/2019 18:03, Vineet Gupta wrote:
[...]
>> Regarding that loop, archs seem to fall in 3 categories:
>> A) Those that don't have the loop
> 
> Please clarify that this is the right thing to do (since core code already 
> has the
> loop) hence no fixing is required for this "category"
> 

Right, those don't need any change. I had a brief look at them to double
check they had the proper need_resched() gate before calling
preempt_schedule_irq() (with no loop) and they all seem fine. Also...

>> B) Those that have a small need_resched() loop around the
>>preempt_schedule_irq() callsite
>> C) Those that branch to some more generic code further up the entry code
>>and eventually branch back to preempt_schedule_irq()
>>
>> arc, m68k, nios2 fall in A)
> 

I forgot to include parisc in here.

[...]

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH] sched: Initialize the idle task with preemption disabled

2021-05-12 Thread Valentin Schneider
As pointed out by commit

  de9b8f5dcbd9 ("sched: Fix crash trying to dequeue/enqueue the idle thread")

init_idle() can and will be invoked more than once on the same idle
task. At boot time, it is invoked for the boot CPU thread by
sched_init(). Then smp_init() creates the threads for all the secondary
CPUs and invokes init_idle() on them.

As the hotplug machinery brings the secondaries to life, it will issue
calls to idle_thread_get(), which itself invokes init_idle() yet again.
In this case it's invoked twice more per secondary: at _cpu_up(), and at
bringup_cpu().

Given smp_init() already initializes the idle tasks for all *possible*
CPUs, no further initialization should be required. Now, removing
init_idle() from idle_thread_get() exposes some interesting expectations
with regards to the idle task's preempt_count: the secondary startup always
issues a preempt_disable(), requiring some reset of the preempt count to 0
between hot-unplug and hotplug, which is currently served by
idle_thread_get() -> idle_init().

Given the idle task is supposed to have preemption disabled once and never
see it re-enabled, it seems that what we actually want is to initialize its
preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove
init_idle() from idle_thread_get().

Secondary startups were patched via coccinelle:

  @begone@
  @@

  -preempt_disable();
  ...
  cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);

Signed-off-by: Valentin Schneider 
---
 arch/alpha/kernel/smp.c  | 1 -
 arch/arc/kernel/smp.c| 1 -
 arch/arm/kernel/smp.c| 1 -
 arch/arm64/include/asm/preempt.h | 2 +-
 arch/arm64/kernel/smp.c  | 1 -
 arch/csky/kernel/smp.c   | 1 -
 arch/ia64/kernel/smpboot.c   | 1 -
 arch/mips/kernel/smp.c   | 1 -
 arch/openrisc/kernel/smp.c   | 2 --
 arch/parisc/kernel/smp.c | 1 -
 arch/powerpc/kernel/smp.c| 1 -
 arch/riscv/kernel/smpboot.c  | 1 -
 arch/s390/include/asm/preempt.h  | 4 ++--
 arch/s390/kernel/smp.c   | 1 -
 arch/sh/kernel/smp.c | 2 --
 arch/sparc/kernel/smp_32.c   | 1 -
 arch/sparc/kernel/smp_64.c   | 3 ---
 arch/x86/include/asm/preempt.h   | 2 +-
 arch/x86/kernel/smpboot.c| 1 -
 arch/xtensa/kernel/smp.c | 1 -
 include/asm-generic/preempt.h| 2 +-
 init/main.c  | 6 +-
 kernel/fork.c| 2 +-
 kernel/sched/core.c  | 2 +-
 kernel/smpboot.c | 1 -
 25 files changed, 8 insertions(+), 34 deletions(-)

diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index f4dd9f3f3001..4b2575f936d4 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -166,7 +166,6 @@ smp_callin(void)
DBGS(("smp_callin: commencing CPU %d current %p active_mm %p\n",
  cpuid, current, current->active_mm));
 
-   preempt_disable();
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
 }
 
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index 52906d314537..db0e104d6835 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -189,7 +189,6 @@ void start_kernel_secondary(void)
pr_info("## CPU%u LIVE ##: Executing Code...\n", cpu);
 
local_irq_enable();
-   preempt_disable();
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
 }
 
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 74679240a9d8..c7bb168b0d97 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -432,7 +432,6 @@ asmlinkage void secondary_start_kernel(void)
 #endif
pr_debug("CPU%u: Booted secondary processor\n", cpu);
 
-   preempt_disable();
trace_hardirqs_off();
 
/*
diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/preempt.h
index 80e946b2abee..e83f0982b99c 100644
--- a/arch/arm64/include/asm/preempt.h
+++ b/arch/arm64/include/asm/preempt.h
@@ -23,7 +23,7 @@ static inline void preempt_count_set(u64 pc)
 } while (0)
 
 #define init_idle_preempt_count(p, cpu) do { \
-   task_thread_info(p)->preempt_count = PREEMPT_ENABLED; \
+   task_thread_info(p)->preempt_count = PREEMPT_DISABLED; \
 } while (0)
 
 static inline void set_preempt_need_resched(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 357590beaabb..48fd89256739 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -223,7 +223,6 @@ asmlinkage notrace void secondary_start_kernel(void)
init_gic_priority_masking();
 
rcu_cpu_starting(cpu);
-   preempt_disable();
trace_hardirqs_off();
 
/*
diff --git a/arch/csky/kernel/smp.c b/arch/csky/kernel/smp.c
index 0f9f5eef9338..e2993539af8e 100644
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -281,7 +281,6 @@ void csky_start_secondary(void)
pr_info("CPU%u Online: %s...\n", cpu, __func__);
 
local_irq_enable();
-   preempt_disable();

[RFC PATCH 0/5] Generic IPI sending tracepoint

2022-10-07 Thread Valentin Schneider
Background
==

Detecting IPI *reception* is relatively easy, e.g. using
trace_irq_handler_{entry,exit} or even just function-trace
flush_smp_call_function_queue() for SMP calls.  

Figuring out their *origin*, is trickier as there is no generic tracepoint tied
to e.g. smp_call_function():

o AFAIA x86 has no tracepoint tied to sending IPIs, only receiving them
  (cf. trace_call_function{_single}_entry()).
o arm/arm64 do have trace_ipi_raise(), which gives us the target cpus but also a
  mostly useless string (smp_calls will all be "Function call interrupts").
o Other architectures don't seem to have any IPI-sending related tracepoint.  

I believe one reason those tracepoints used by arm/arm64 ended up as they were
is because these archs used to handle IPIs differently from regular interrupts
(the IRQ driver would directly invoke an IPI-handling routine), which meant 
they 
never showed up in trace_irq_handler_{entry, exit}. The trace_ipi_{entry,exit}
tracepoints gave a way to trace IPI reception but those have become redundant as
of: 

  56afcd3dbd19 ("ARM: Allow IPIs to be handled as normal interrupts")
  d3afc7f12987 ("arm64: Allow IPIs to be handled as normal interrupts")

which gave IPIs a "proper" handler function used through
generic_handle_domain_irq(), which makes them show up via
trace_irq_handler_{entry, exit}.

Changing stuff up
=

Per the above, it would make sense to reshuffle trace_ipi_raise() and move it
into generic code. This also came up during Daniel's talk on Osnoise at the CPU
isolation MC of LPC 2022 [1]. 

Now, to be useful, such a tracepoint needs to export:
o targeted CPU(s)
o calling context

The only way to get the calling context with trace_ipi_raise() is to trigger a
stack dump, e.g. $(trace-cmd -e ipi* -T echo 42).

As for the targeted CPUs, the existing tracepoint does export them, albeit in
cpumask form, which is quite inconvenient from a tooling perspective. For
instance, as far as I'm aware, it's not possible to do event filtering on a
cpumask via trace-cmd.

Because of the above points, this is introducing a new tracepoint.

Patches
===

This results in having trace events for:

o smp_call_function*()
o smp_send_reschedule()
o irq_work_queue*()

This is incomplete, just looking at arm64 there's more IPI types that aren't 
covered:

  IPI_CPU_STOP,
  IPI_CPU_CRASH_STOP,
  IPI_TIMER,
  IPI_WAKEUP,

... But it feels like a good starting point.

Another thing worth mentioning is that depending on the callsite, the _RET_IP_
fed to the tracepoint is not always useful - generic_exec_single() doesn't tell
you much about the actual callback being sent via IPI, so there might be value
in exploding the single tracepoint into at least one variant for smp_calls.

Links
=

[1]: https://youtu.be/5gT57y4OzBM?t=14234

Valentin Schneider (5):
  trace: Add trace_ipi_send_{cpu, cpumask}
  sched, smp: Trace send_call_function_single_ipi()
  smp: Add a multi-CPU variant to send_call_function_single_ipi()
  irq_work: Trace calls to arch_irq_work_raise()
  treewide: Rename and trace arch-definitions of smp_send_reschedule()

 arch/alpha/kernel/smp.c  |  2 +-
 arch/arc/kernel/smp.c|  2 +-
 arch/arm/kernel/smp.c|  5 +
 arch/arm64/kernel/smp.c  |  3 +--
 arch/csky/kernel/smp.c   |  2 +-
 arch/hexagon/kernel/smp.c|  2 +-
 arch/ia64/kernel/smp.c   |  4 ++--
 arch/loongarch/include/asm/smp.h |  2 +-
 arch/mips/include/asm/smp.h  |  2 +-
 arch/openrisc/kernel/smp.c   |  2 +-
 arch/parisc/kernel/smp.c |  4 ++--
 arch/powerpc/kernel/smp.c|  4 ++--
 arch/riscv/kernel/smp.c  |  4 ++--
 arch/s390/kernel/smp.c   |  2 +-
 arch/sh/kernel/smp.c |  2 +-
 arch/sparc/kernel/smp_32.c   |  2 +-
 arch/sparc/kernel/smp_64.c   |  2 +-
 arch/x86/include/asm/smp.h   |  2 +-
 arch/xtensa/kernel/smp.c |  2 +-
 include/linux/smp.h  |  1 +
 include/trace/events/ipi.h   | 27 +++
 kernel/irq_work.c| 12 +++-
 kernel/sched/core.c  |  7 +--
 kernel/smp.c | 18 +-
 24 files changed, 84 insertions(+), 31 deletions(-)

--
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[RFC PATCH 2/5] sched, smp: Trace send_call_function_single_ipi()

2022-10-07 Thread Valentin Schneider
send_call_function_single_ipi() is the thing that sends IPIs at the bottom
of smp_call_function*() via either generic_exec_single() or
smp_call_function_many_cond(). Give it an IPI-related tracepoint.

Note that this ends up tracing any IPI sent via __smp_call_single_queue(),
which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free".

Signed-off-by: Valentin Schneider 
---
 arch/arm/kernel/smp.c   | 3 ---
 arch/arm64/kernel/smp.c | 1 -
 kernel/sched/core.c | 7 +--
 kernel/smp.c| 4 
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 978db2d96b44..3b280d55c1c4 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -48,9 +48,6 @@
 #include 
 #include 
 
-#define CREATE_TRACE_POINTS
-#include 
-
 /*
  * as from 2.5, kernels no longer have an init_tasks structure
  * so we need some other way of telling a new secondary core
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ffc5d76cf695..937d2623e06b 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -51,7 +51,6 @@
 #include 
 #include 
 
-#define CREATE_TRACE_POINTS
 #include 
 
 DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 60fdc0faf1c9..14e5e137172f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -81,6 +81,7 @@
 #include 
 #include 
 #undef CREATE_TRACE_POINTS
+#include 
 
 #include "sched.h"
 #include "stats.h"
@@ -3753,10 +3754,12 @@ void send_call_function_single_ipi(int cpu)
 {
struct rq *rq = cpu_rq(cpu);
 
-   if (!set_nr_if_polling(rq->idle))
+   if (!set_nr_if_polling(rq->idle)) {
+   trace_ipi_send_cpu(_RET_IP_, cpu);
arch_send_call_function_single_ipi(cpu);
-   else
+   } else {
trace_sched_wake_idle_without_ipi(cpu);
+   }
 }
 
 /*
diff --git a/kernel/smp.c b/kernel/smp.c
index e8cdc025a046..7a7a22d69972 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -26,6 +26,10 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+#undef CREATE_TRACE_POINTS
+
 #include "smpboot.h"
 #include "sched/smp.h"
 
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[RFC PATCH 1/5] trace: Add trace_ipi_send_{cpu, cpumask}

2022-10-07 Thread Valentin Schneider
trace_ipi_raise is unsuitable for generically tracing IPI sources; add a
variant of it that takes a callsite and a CPU. Define a macro helper for
handling IPIs sent to multiple CPUs.

Signed-off-by: Valentin Schneider 
---
 include/trace/events/ipi.h | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/include/trace/events/ipi.h b/include/trace/events/ipi.h
index 0be71dad6ec0..fd2f2aeb36fe 100644
--- a/include/trace/events/ipi.h
+++ b/include/trace/events/ipi.h
@@ -35,6 +35,33 @@ TRACE_EVENT(ipi_raise,
TP_printk("target_mask=%s (%s)", __get_bitmask(target_cpus), 
__entry->reason)
 );
 
+TRACE_EVENT(ipi_send_cpu,
+
+   TP_PROTO(unsigned long callsite, unsigned int cpu),
+
+   TP_ARGS(callsite, cpu),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, callsite)
+   __field(unsigned int, cpu)
+   ),
+
+   TP_fast_assign(
+   __entry->callsite = callsite;
+   __entry->cpu  = cpu;
+   ),
+
+   TP_printk("callsite=%pS target_cpu=%d", (void *)__entry->callsite, 
__entry->cpu)
+);
+
+#define trace_ipi_send_cpumask(callsite, mask) do {\
+   if (static_key_false(&__tracepoint_ipi_send_cpu.key)) { \
+   int cpu;\
+   for_each_cpu(cpu, mask) \
+   trace_ipi_send_cpu(callsite, cpu);  \
+   }   \
+} while (0)
+
 DECLARE_EVENT_CLASS(ipi_handler,
 
TP_PROTO(const char *reason),
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[RFC PATCH 4/5] irq_work: Trace calls to arch_irq_work_raise()

2022-10-07 Thread Valentin Schneider
Adding a tracepoint to send_call_function_single_ipi() covers
irq_work_queue_on() when the CPU isn't the local one - add a tracepoint to
irq_work_raise() to cover the other cases.

Signed-off-by: Valentin Schneider 
---
 kernel/irq_work.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 7afa40fe5cc4..5a550b681878 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 
+#include 
+
 static DEFINE_PER_CPU(struct llist_head, raised_list);
 static DEFINE_PER_CPU(struct llist_head, lazy_list);
 static DEFINE_PER_CPU(struct task_struct *, irq_workd);
@@ -74,6 +76,14 @@ void __weak arch_irq_work_raise(void)
 */
 }
 
+static inline void irq_work_raise(void)
+{
+   if (arch_irq_work_has_interrupt())
+   trace_ipi_send_cpu(_RET_IP_, smp_processor_id());
+
+   arch_irq_work_raise();
+}
+
 /* Enqueue on current CPU, work must already be claimed and preempt disabled */
 static void __irq_work_queue_local(struct irq_work *work)
 {
@@ -99,7 +109,7 @@ static void __irq_work_queue_local(struct irq_work *work)
 
/* If the work is "lazy", handle it from next tick if any */
if (!lazy_work || tick_nohz_tick_stopped())
-   arch_irq_work_raise();
+   irq_work_raise();
 }
 
 /* Enqueue the irq work @work on the current CPU */
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[RFC PATCH 3/5] smp: Add a multi-CPU variant to send_call_function_single_ipi()

2022-10-07 Thread Valentin Schneider
This simply wraps around the arch function and prepends it with a
tracepoint, bringing parity with send_call_function_single_ipi().

Signed-off-by: Valentin Schneider 
---
 kernel/smp.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 7a7a22d69972..387735180aed 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -160,6 +160,12 @@ void __init call_function_init(void)
smpcfd_prepare_cpu(smp_processor_id());
 }
 
+static inline void send_call_function_ipi_mask(const struct cpumask *mask)
+{
+   trace_ipi_send_cpumask(_RET_IP_, mask);
+   arch_send_call_function_ipi_mask(mask);
+}
+
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
 
 static DEFINE_STATIC_KEY_FALSE(csdlock_debug_enabled);
@@ -970,7 +976,7 @@ static void smp_call_function_many_cond(const struct 
cpumask *mask,
if (nr_cpus == 1)
send_call_function_single_ipi(last_cpu);
else if (likely(nr_cpus > 1))
-   arch_send_call_function_ipi_mask(cfd->cpumask_ipi);
+   send_call_function_ipi_mask(cfd->cpumask_ipi);
 
cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->pinged, this_cpu, 
CFD_SEQ_NOCPU, CFD_SEQ_PINGED);
}
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[RFC PATCH 5/5] treewide: Rename and trace arch-definitions of smp_send_reschedule()

2022-10-07 Thread Valentin Schneider
To be able to trace invocations of smp_send_reschedule(), rename the
arch-specific definitions of it to arch_smp_send_reschedule() and wrap it
into an smp_send_reschedule() that contains a tracepoint.

Signed-off-by: Valentin Schneider 
---
 arch/alpha/kernel/smp.c  | 2 +-
 arch/arc/kernel/smp.c| 2 +-
 arch/arm/kernel/smp.c| 2 +-
 arch/arm64/kernel/smp.c  | 2 +-
 arch/csky/kernel/smp.c   | 2 +-
 arch/hexagon/kernel/smp.c| 2 +-
 arch/ia64/kernel/smp.c   | 4 ++--
 arch/loongarch/include/asm/smp.h | 2 +-
 arch/mips/include/asm/smp.h  | 2 +-
 arch/openrisc/kernel/smp.c   | 2 +-
 arch/parisc/kernel/smp.c | 4 ++--
 arch/powerpc/kernel/smp.c| 4 ++--
 arch/riscv/kernel/smp.c  | 4 ++--
 arch/s390/kernel/smp.c   | 2 +-
 arch/sh/kernel/smp.c | 2 +-
 arch/sparc/kernel/smp_32.c   | 2 +-
 arch/sparc/kernel/smp_64.c   | 2 +-
 arch/x86/include/asm/smp.h   | 2 +-
 arch/xtensa/kernel/smp.c | 2 +-
 include/linux/smp.h  | 1 +
 kernel/smp.c | 6 ++
 21 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index f4e20f75438f..38637eb9eebd 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -562,7 +562,7 @@ handle_ipi(struct pt_regs *regs)
 }
 
 void
-smp_send_reschedule(int cpu)
+arch_smp_send_reschedule(int cpu)
 {
 #ifdef DEBUG_IPI_MSG
if (cpu == hard_smp_processor_id())
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index ab9e75e90f72..ae2e6a312361 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -292,7 +292,7 @@ static void ipi_send_msg(const struct cpumask *callmap, 
enum ipi_msg_type msg)
ipi_send_msg_one(cpu, msg);
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
ipi_send_msg_one(cpu, IPI_RESCHEDULE);
 }
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 3b280d55c1c4..f216ac890b6f 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -745,7 +745,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
ipi_setup(smp_processor_id());
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 937d2623e06b..8d108edc4a89 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -976,7 +976,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
ipi_setup(smp_processor_id());
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/csky/kernel/smp.c b/arch/csky/kernel/smp.c
index 4b605aa2e1d6..fd7f81be16dd 100644
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -140,7 +140,7 @@ void smp_send_stop(void)
on_each_cpu(ipi_stop, NULL, 1);
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
send_ipi_message(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c
index 4ba93e59370c..4e8bee25b8c6 100644
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -217,7 +217,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
}
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
send_ipi(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c
index e2cc59db86bc..ea4f009a232b 100644
--- a/arch/ia64/kernel/smp.c
+++ b/arch/ia64/kernel/smp.c
@@ -220,11 +220,11 @@ kdump_smp_send_init(void)
  * Called with preemption disabled.
  */
 void
-smp_send_reschedule (int cpu)
+arch_smp_send_reschedule (int cpu)
 {
ia64_send_ipi(cpu, IA64_IPI_RESCHEDULE, IA64_IPI_DM_INT, 0);
 }
-EXPORT_SYMBOL_GPL(smp_send_reschedule);
+EXPORT_SYMBOL_GPL(arch_smp_send_reschedule);
 
 /*
  * Called with preemption disabled.
diff --git a/arch/loongarch/include/asm/smp.h b/arch/loongarch/include/asm/smp.h
index 71189b28bfb2..3fcca134dfb1 100644
--- a/arch/loongarch/include/asm/smp.h
+++ b/arch/loongarch/include/asm/smp.h
@@ -83,7 +83,7 @@ extern void show_ipi_list(struct seq_file *p, int prec);
  * it goes straight through and wastes no time serializing
  * anything. Worst case is that we lose a reschedule ...
  */
-static inline void smp_send_reschedule(int cpu)
+static inline void arch_smp_send_reschedule(int cpu)
 {
loongson3_send_ipi_single(cpu, SMP_RESCHEDULE);
 }
diff --git a/arch/mips/include/asm/smp.h b/arch/mips/include/asm/smp.h
index 5d9ff61004ca..9806e79895d9 100644
--- a/arch/mips/include/asm/smp.h
+++ b/arch/mips/include/asm/smp.h
@@ -66,7 +66,7 @@ extern void calculate_cpu_foreign_map(void);
  * it goes straight through and wastes no time serializing
  * anything. Worst case is that we lose a

Re: [RFC PATCH 4/5] irq_work: Trace calls to arch_irq_work_raise()

2022-10-11 Thread Valentin Schneider
On 08/10/22 15:34, Steven Rostedt wrote:
> On Fri,  7 Oct 2022 16:45:32 +0100
> Valentin Schneider  wrote:
>>  }
>>  
>> +static inline void irq_work_raise(void)
>> +{
>> +if (arch_irq_work_has_interrupt())
>> +trace_ipi_send_cpu(_RET_IP_, smp_processor_id());
>
> To save on the branch, let's make the above:
>
>   if (trace_ipi_send_cpu_enabled() && arch_irq_work_has_interrupt())
>
> As the "trace_*_enabled()" is a static branch that will make it a nop
> when the tracepoint is not enabled.
>

Makes sense, thanks for the suggestion.

> -- Steve
>
>
>> +
>> +arch_irq_work_raise();
>> +}
>> +
>>  /* Enqueue on current CPU, work must already be claimed and preempt 
>> disabled */


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH 0/5] Generic IPI sending tracepoint

2022-10-11 Thread Valentin Schneider
+Cc Douglas

On 07/10/22 17:01, Marcelo Tosatti wrote:
> Hi Valentin,
>
> On Fri, Oct 07, 2022 at 04:41:40PM +0100, Valentin Schneider wrote:
>> Background
>> ==
>> 
>> As for the targeted CPUs, the existing tracepoint does export them, albeit in
>> cpumask form, which is quite inconvenient from a tooling perspective. For
>> instance, as far as I'm aware, it's not possible to do event filtering on a
>> cpumask via trace-cmd.
>
> https://man7.org/linux/man-pages/man1/trace-cmd-set.1.html
>
>-f filter
>Specify a filter for the previous event. This must come after
>a -e. This will filter what events get recorded based on the
>content of the event. Filtering is passed to the kernel
>directly so what filtering is allowed may depend on what
>version of the kernel you have. Basically, it will let you
>use C notation to check if an event should be processed or
>not.
>
>==, >=, <=, >, <, &, |, && and ||
>
>The above are usually safe to use to compare fields.
>
> This looks overkill to me (consider large number of bits set in mask).
>
> +#define trace_ipi_send_cpumask(callsite, mask) do {\
> + if (static_key_false(&__tracepoint_ipi_send_cpu.key)) { \
> +   int cpu;\
> +   for_each_cpu(cpu, mask) \
> +   trace_ipi_send_cpu(callsite, cpu);\
> + }   \
> +} while (0)
>

Indeed, I expected pushback on this :-)

I went for this due to how much simpler an int is to process/use compared
to a cpumask. There is the trigger example I listed above, but the
consumption of the trace event itself as well.

Consider this event collected on an arm64 QEMU instance (output from trace-cmd)

<...>-234   [001]37.251567: ipi_raise:
target_mask=,,,,,,,0004
 (Function call interrupts)

That sort of formatting has been an issue downstream for things like LISA
[1] where events are aggregated into Pandas tables, and we need to play
silly games for performance reason because bitmasks aren't a native Python
type.

I had a look at libtraceevent to see how this data is exposed and if the
answer would be better tooling:

tep_get_field_val() just yields an unsigned long long of value 0x200018,
which AFAICT is just the [length, offset] thing associated with dynamic
arrays. Not really usable, and I don't see anything exported in the lib to
extract and use those values.

tep_get_field_raw() is better, it handles the dynamic array for us and
yields a pointer to the cpumask array at the tail of the record. With that
it's easy to get an output such as: cpumask[size=32]=[4,0,0,0,]. Still,
this isn't a native type for many programming languages.

In contrast, this is immediately readable and consumable by userspace tools

<...>-234   [001]37.250882: ipi_send_cpu: 
callsite=__smp_call_single_queue+0x5c target_cpu=2

Thinking out loud, it makes way more sense to record a cpumask in the
tracepoint, but perhaps we could have a postprocessing step to transform
those into N events each targeting a single CPU?

[1]: 
https://github.com/ARM-software/lisa/blob/37b51243a94b27ea031ff62bb4ce818a59a7f6ef/lisa/trace.py#L4756

>
>> 
>> Because of the above points, this is introducing a new tracepoint.
>> 
>> Patches
>> ===
>> 
>> This results in having trace events for:
>> 
>> o smp_call_function*()
>> o smp_send_reschedule()
>> o irq_work_queue*()
>> 
>> This is incomplete, just looking at arm64 there's more IPI types that aren't 
>> covered:
>> 
>>   IPI_CPU_STOP,
>>   IPI_CPU_CRASH_STOP,
>>   IPI_TIMER,
>>   IPI_WAKEUP,
>> 
>> ... But it feels like a good starting point.
>
> Can't you have a single tracepoint (or variant with cpumask) that would
> cover such cases as well?
>
> Maybe (as parameters for tracepoint):
>
>   * type (reschedule, smp_call_function, timer, wakeup, ...).
>
>   * function address: valid for smp_call_function, irq_work_queue
> types.
>

That's a good point, I wasn't sure about having a parameter serving as
discriminant for another, but the function address would be either valid or
NULL which is fine. So perhaps:
o callsite (i.e. _RET_IP_), serves as type
o address of callback tied to IPI, if any
o target CPUs

>> Another thing worth mentioning is that depending on the callsite, the 
>> _RET_IP_
>> fed to the tracepoint is not alway

Re: [RFC PATCH 0/5] Generic IPI sending tracepoint

2022-10-11 Thread Valentin Schneider
On 11/10/22 18:22, Daniel Bristot de Oliveira wrote:
> On 10/11/22 18:17, Valentin Schneider wrote:
>> Thinking out loud, it makes way more sense to record a cpumask in the
>> tracepoint, but perhaps we could have a postprocessing step to transform
>> those into N events each targeting a single CPU?
>
> My approach on the tracers/rtla is to make the simple things in kernel, and 
> beautify
> things in user-space.
>
> You could keep the tracepoint as a mask, and then make it pretty, like 
> cpus=3-5,8
> in user-space. For example with a trace-cmd/perf loadable plugin, libtracefs 
> helper.
>

That's a nice idea, the one downside I see is that means registering an
event handler for all events with cpumasks rather than directly targeting
cpumask fields, but that doesn't look too horrible. I'll dig a bit in that
direction.

> For rtla I was thinking to make a new tool to parse them. and make it pretty 
> there.
>
> -- Daniel


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[RFC PATCH v2 1/8] DO-NOT-MERGE: tracing: Add __cpumask to denote a trace event field that is a cpumask_t

2022-11-02 Thread Valentin Schneider
From: "Steven Rostedt (Google)" 

The trace events have a __bitmask field that can be used for anything
that requires bitmasks. Although currently it is only used for CPU
masks, it could be used in the future for any type of bitmasks.

There is some user space tooling that wants to know if a field is a CPU
mask and not just some random unsigned long bitmask. Introduce
"__cpumask()" helper functions that work the same as the current
__bitmask() helpers but displays in the format file:

  field:__data_loc cpumask_t *[] mask;offset:36;  size:4; signed:0;

Instead of:

  field:__data_loc unsigned long[] mask;  offset:32;  size:4; signed:0;

The main difference is the type. Instead of "unsigned long" it is
"cpumask_t *". Note, this type field needs to be a real type in the
__dynamic_array() logic that both __cpumask and__bitmask use, but the
comparison field requires it to be a scalar type whereas cpumask_t is a
structure (non-scalar). But everything works when making it a pointer.

Valentin added changes to remove the need of passing in "nr_bits" and the
__cpumask will always use nr_cpumask_bits as its size.

Requested-by: Valentin Schneider 
Reviewed-by: Valentin Schneider 
Signed-off-by: Valentin Schneider 
Signed-off-by: Steven Rostedt (Google) 
---
 include/trace/bpf_probe.h|  6 
 include/trace/perf.h |  6 
 include/trace/stages/stage1_struct_define.h  |  6 
 include/trace/stages/stage2_data_offsets.h   |  6 
 include/trace/stages/stage3_trace_output.h   |  6 
 include/trace/stages/stage4_event_fields.h   |  6 
 include/trace/stages/stage5_get_offsets.h|  6 
 include/trace/stages/stage6_event_callback.h | 20 
 include/trace/stages/stage7_class_define.h   |  2 ++
 samples/trace_events/trace-events-sample.c   |  2 +-
 samples/trace_events/trace-events-sample.h   | 34 +++-
 11 files changed, 91 insertions(+), 9 deletions(-)

diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 6a13220d2d27b..155c495b89ead 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -21,6 +21,9 @@
 #undef __get_bitmask
 #define __get_bitmask(field) (char *)__get_dynamic_array(field)

+#undef __get_cpumask
+#define __get_cpumask(field) (char *)__get_dynamic_array(field)
+
 #undef __get_sockaddr
 #define __get_sockaddr(field) ((struct sockaddr *)__get_dynamic_array(field))

@@ -40,6 +43,9 @@
 #undef __get_rel_bitmask
 #define __get_rel_bitmask(field) (char *)__get_rel_dynamic_array(field)

+#undef __get_rel_cpumask
+#define __get_rel_cpumask(field) (char *)__get_rel_dynamic_array(field)
+
 #undef __get_rel_sockaddr
 #define __get_rel_sockaddr(field) ((struct sockaddr 
*)__get_rel_dynamic_array(field))

diff --git a/include/trace/perf.h b/include/trace/perf.h
index 5800d13146c3d..8f3bf1e177070 100644
--- a/include/trace/perf.h
+++ b/include/trace/perf.h
@@ -21,6 +21,9 @@
 #undef __get_bitmask
 #define __get_bitmask(field) (char *)__get_dynamic_array(field)

+#undef __get_cpumask
+#define __get_cpumask(field) (char *)__get_dynamic_array(field)
+
 #undef __get_sockaddr
 #define __get_sockaddr(field) ((struct sockaddr *)__get_dynamic_array(field))

@@ -41,6 +44,9 @@
 #undef __get_rel_bitmask
 #define __get_rel_bitmask(field) (char *)__get_rel_dynamic_array(field)

+#undef __get_rel_cpumask
+#define __get_rel_cpumask(field) (char *)__get_rel_dynamic_array(field)
+
 #undef __get_rel_sockaddr
 #define __get_rel_sockaddr(field) ((struct sockaddr 
*)__get_rel_dynamic_array(field))

diff --git a/include/trace/stages/stage1_struct_define.h 
b/include/trace/stages/stage1_struct_define.h
index 1b7bab60434c1..69e0dae453bfa 100644
--- a/include/trace/stages/stage1_struct_define.h
+++ b/include/trace/stages/stage1_struct_define.h
@@ -32,6 +32,9 @@
 #undef __bitmask
 #define __bitmask(item, nr_bits) __dynamic_array(char, item, -1)

+#undef __cpumask
+#define __cpumask(item) __dynamic_array(char, item, -1)
+
 #undef __sockaddr
 #define __sockaddr(field, len) __dynamic_array(u8, field, len)

@@ -47,6 +50,9 @@
 #undef __rel_bitmask
 #define __rel_bitmask(item, nr_bits) __rel_dynamic_array(char, item, -1)

+#undef __rel_cpumask
+#define __rel_cpumask(item) __rel_dynamic_array(char, item, -1)
+
 #undef __rel_sockaddr
 #define __rel_sockaddr(field, len) __rel_dynamic_array(u8, field, len)

diff --git a/include/trace/stages/stage2_data_offsets.h 
b/include/trace/stages/stage2_data_offsets.h
index 1b7a8f764fddd..469b6a64293de 100644
--- a/include/trace/stages/stage2_data_offsets.h
+++ b/include/trace/stages/stage2_data_offsets.h
@@ -38,6 +38,9 @@
 #undef __bitmask
 #define __bitmask(item, nr_bits) __dynamic_array(unsigned long, item, -1)

+#undef __cpumask
+#define __cpumask(item) __dynamic_array(unsigned long, item, -1)
+
 #undef __sockaddr
 #define __sockaddr(field, len) __dynamic_array(u8, field, len)

@@ -53,5 +56,8 @@
 #undef __rel_bitmask
 #defin

[RFC PATCH v2 2/8] trace: Add trace_ipi_send_cpumask()

2022-11-02 Thread Valentin Schneider
trace_ipi_raise() is unsuitable for generically tracing IPI sources due to
its "reason" argument being an uninformative string (on arm64 all you get
is "Function call interrupts" for SMP calls).

Add a variant of it that takes a exports a target CPU, a callsite and a
callback.

Signed-off-by: Valentin Schneider 
---
 include/trace/events/ipi.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/include/trace/events/ipi.h b/include/trace/events/ipi.h
index 0be71dad6ec03..b1125dc27682c 100644
--- a/include/trace/events/ipi.h
+++ b/include/trace/events/ipi.h
@@ -35,6 +35,28 @@ TRACE_EVENT(ipi_raise,
TP_printk("target_mask=%s (%s)", __get_bitmask(target_cpus), 
__entry->reason)
 );
 
+TRACE_EVENT(ipi_send_cpumask,
+
+   TP_PROTO(const struct cpumask *cpumask, unsigned long callsite, void 
*callback),
+
+   TP_ARGS(cpumask, callsite, callback),
+
+   TP_STRUCT__entry(
+   __cpumask(cpumask)
+   __field(void *, callsite)
+   __field(void *, callback)
+   ),
+
+   TP_fast_assign(
+   __assign_cpumask(cpumask, cpumask_bits(cpumask));
+   __entry->callsite = (void *)callsite;
+   __entry->callback = callback;
+   ),
+
+   TP_printk("cpumask=%s callsite=%pS callback=%pS",
+ __get_cpumask(cpumask), __entry->callsite, __entry->callback)
+);
+
 DECLARE_EVENT_CLASS(ipi_handler,
 
TP_PROTO(const char *reason),
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[RFC PATCH v2 3/8] sched, smp: Trace IPIs sent via send_call_function_single_ipi()

2022-11-02 Thread Valentin Schneider
send_call_function_single_ipi() is the thing that sends IPIs at the bottom
of smp_call_function*() via either generic_exec_single() or
smp_call_function_many_cond(). Give it an IPI-related tracepoint.

Note that this ends up tracing any IPI sent via __smp_call_single_queue(),
which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free".

Signed-off-by: Valentin Schneider 
---
 arch/arm/kernel/smp.c   | 3 ---
 arch/arm64/kernel/smp.c | 1 -
 kernel/sched/core.c | 7 +--
 kernel/smp.c| 4 
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 978db2d96b446..3b280d55c1c40 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -48,9 +48,6 @@
 #include 
 #include 
 
-#define CREATE_TRACE_POINTS
-#include 
-
 /*
  * as from 2.5, kernels no longer have an init_tasks structure
  * so we need some other way of telling a new secondary core
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ffc5d76cf6955..937d2623e06ba 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -51,7 +51,6 @@
 #include 
 #include 
 
-#define CREATE_TRACE_POINTS
 #include 
 
 DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cb2aa2b54c7a4..02181f8072b5f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -81,6 +81,7 @@
 #include 
 #include 
 #undef CREATE_TRACE_POINTS
+#include 
 
 #include "sched.h"
 #include "stats.h"
@@ -3746,10 +3747,12 @@ void send_call_function_single_ipi(int cpu)
 {
struct rq *rq = cpu_rq(cpu);
 
-   if (!set_nr_if_polling(rq->idle))
+   if (!set_nr_if_polling(rq->idle)) {
+   trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL);
arch_send_call_function_single_ipi(cpu);
-   else
+   } else {
trace_sched_wake_idle_without_ipi(cpu);
+   }
 }
 
 /*
diff --git a/kernel/smp.c b/kernel/smp.c
index 06a413987a14a..e2ca1e2f31274 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -26,6 +26,10 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+#undef CREATE_TRACE_POINTS
+
 #include "smpboot.h"
 #include "sched/smp.h"
 
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[RFC PATCH v2 4/8] smp: Trace IPIs sent via arch_send_call_function_ipi_mask()

2022-11-02 Thread Valentin Schneider
This simply wraps around the arch function and prepends it with a
tracepoint, similar to send_call_function_single_ipi().

Signed-off-by: Valentin Schneider 
---
 kernel/smp.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index e2ca1e2f31274..c4d561cf50d45 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -160,6 +160,13 @@ void __init call_function_init(void)
smpcfd_prepare_cpu(smp_processor_id());
 }
 
+static inline void
+send_call_function_ipi_mask(const struct cpumask *mask)
+{
+   trace_ipi_send_cpumask(mask, _RET_IP_, func);
+   arch_send_call_function_ipi_mask(mask);
+}
+
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
 
 static DEFINE_STATIC_KEY_FALSE(csdlock_debug_enabled);
@@ -970,7 +977,7 @@ static void smp_call_function_many_cond(const struct 
cpumask *mask,
if (nr_cpus == 1)
send_call_function_single_ipi(last_cpu);
else if (likely(nr_cpus > 1))
-   arch_send_call_function_ipi_mask(cfd->cpumask_ipi);
+   send_call_function_ipi_mask(cfd->cpumask_ipi);
 
cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->pinged, this_cpu, 
CFD_SEQ_NOCPU, CFD_SEQ_PINGED);
}
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[RFC PATCH v2 5/8] irq_work: Trace self-IPIs sent via arch_irq_work_raise()

2022-11-02 Thread Valentin Schneider
IPIs sent to remove CPUs via irq_work_queue_on() are now covered by
trace_ipi_send_cpumask(), add another instance of the tracepoint to cover
self-IPIs.

Signed-off-by: Valentin Schneider 
---
 kernel/irq_work.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 7afa40fe5cc43..aec38c294ce68 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 
+#include 
+
 static DEFINE_PER_CPU(struct llist_head, raised_list);
 static DEFINE_PER_CPU(struct llist_head, lazy_list);
 static DEFINE_PER_CPU(struct task_struct *, irq_workd);
@@ -74,6 +76,16 @@ void __weak arch_irq_work_raise(void)
 */
 }
 
+static inline void irq_work_raise(struct irq_work *work)
+{
+   if (trace_ipi_send_cpumask_enabled() && arch_irq_work_has_interrupt())
+   trace_ipi_send_cpumask(cpumask_of(smp_processor_id()),
+  _RET_IP_,
+  work->func);
+
+   arch_irq_work_raise();
+}
+
 /* Enqueue on current CPU, work must already be claimed and preempt disabled */
 static void __irq_work_queue_local(struct irq_work *work)
 {
@@ -99,7 +111,7 @@ static void __irq_work_queue_local(struct irq_work *work)
 
/* If the work is "lazy", handle it from next tick if any */
if (!lazy_work || tick_nohz_tick_stopped())
-   arch_irq_work_raise();
+   irq_work_raise(work);
 }
 
 /* Enqueue the irq work @work on the current CPU */
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[RFC PATCH v2 6/8] treewide: Trace IPIs sent via smp_send_reschedule()

2022-11-02 Thread Valentin Schneider
To be able to trace invocations of smp_send_reschedule(), rename the
arch-specific definitions of it to arch_smp_send_reschedule() and wrap it
into an smp_send_reschedule() that contains a tracepoint.

Signed-off-by: Valentin Schneider 
[csky bits]
Acked-by: Guo Ren 
---
 arch/alpha/kernel/smp.c  | 2 +-
 arch/arc/kernel/smp.c| 2 +-
 arch/arm/kernel/smp.c| 2 +-
 arch/arm64/kernel/smp.c  | 2 +-
 arch/csky/kernel/smp.c   | 2 +-
 arch/hexagon/kernel/smp.c| 2 +-
 arch/ia64/kernel/smp.c   | 4 ++--
 arch/loongarch/include/asm/smp.h | 2 +-
 arch/mips/include/asm/smp.h  | 2 +-
 arch/openrisc/kernel/smp.c   | 2 +-
 arch/parisc/kernel/smp.c | 4 ++--
 arch/powerpc/kernel/smp.c| 4 ++--
 arch/powerpc/kvm/book3s_hv.c | 1 +
 arch/riscv/kernel/smp.c  | 4 ++--
 arch/s390/kernel/smp.c   | 2 +-
 arch/sh/kernel/smp.c | 2 +-
 arch/sparc/kernel/smp_32.c   | 2 +-
 arch/sparc/kernel/smp_64.c   | 2 +-
 arch/x86/include/asm/smp.h   | 2 +-
 arch/x86/kvm/svm/svm.c   | 1 +
 arch/x86/kvm/x86.c   | 1 +
 arch/xtensa/kernel/smp.c | 2 +-
 include/linux/smp.h  | 2 +-
 kernel/smp.c | 8 
 24 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index f4e20f75438f8..38637eb9eebd5 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -562,7 +562,7 @@ handle_ipi(struct pt_regs *regs)
 }
 
 void
-smp_send_reschedule(int cpu)
+arch_smp_send_reschedule(int cpu)
 {
 #ifdef DEBUG_IPI_MSG
if (cpu == hard_smp_processor_id())
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index ad93fe6e4b77d..409cfa4675b40 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -292,7 +292,7 @@ static void ipi_send_msg(const struct cpumask *callmap, 
enum ipi_msg_type msg)
ipi_send_msg_one(cpu, msg);
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
ipi_send_msg_one(cpu, IPI_RESCHEDULE);
 }
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 3b280d55c1c40..f216ac890b6f9 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -745,7 +745,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
ipi_setup(smp_processor_id());
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 937d2623e06ba..8d108edc4a89f 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -976,7 +976,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
ipi_setup(smp_processor_id());
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/csky/kernel/smp.c b/arch/csky/kernel/smp.c
index 4b605aa2e1d65..fd7f81be16dd6 100644
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -140,7 +140,7 @@ void smp_send_stop(void)
on_each_cpu(ipi_stop, NULL, 1);
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
send_ipi_message(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c
index 4ba93e59370c4..4e8bee25b8c68 100644
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -217,7 +217,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
}
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
send_ipi(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c
index e2cc59db86bc2..ea4f009a232b4 100644
--- a/arch/ia64/kernel/smp.c
+++ b/arch/ia64/kernel/smp.c
@@ -220,11 +220,11 @@ kdump_smp_send_init(void)
  * Called with preemption disabled.
  */
 void
-smp_send_reschedule (int cpu)
+arch_smp_send_reschedule (int cpu)
 {
ia64_send_ipi(cpu, IA64_IPI_RESCHEDULE, IA64_IPI_DM_INT, 0);
 }
-EXPORT_SYMBOL_GPL(smp_send_reschedule);
+EXPORT_SYMBOL_GPL(arch_smp_send_reschedule);
 
 /*
  * Called with preemption disabled.
diff --git a/arch/loongarch/include/asm/smp.h b/arch/loongarch/include/asm/smp.h
index 71189b28bfb27..3fcca134dfb1b 100644
--- a/arch/loongarch/include/asm/smp.h
+++ b/arch/loongarch/include/asm/smp.h
@@ -83,7 +83,7 @@ extern void show_ipi_list(struct seq_file *p, int prec);
  * it goes straight through and wastes no time serializing
  * anything. Worst case is that we lose a reschedule ...
  */
-static inline void smp_send_reschedule(int cpu)
+static inline void arch_smp_send_reschedule(int cpu)
 {
loongson3_send_ipi_single(cpu, SMP_RESCHEDULE);
 }
diff --git a/arch/mips/include/asm/smp.h b/arch/mips/include/asm/smp.h
index 5d9ff61004ca7..9806e79895d99 100644
--- a/arch/mips/include/asm/smp.h
+++ b/arch/mips/include/asm

[RFC PATCH v2 8/8] sched, smp: Trace smp callback causing an IPI

2022-11-02 Thread Valentin Schneider
The newly-introduced ipi_send_cpumask tracepoint has a "callback" parameter
which so far has only been fed with NULL.

While CSD_TYPE_SYNC/ASYNC and CSD_TYPE_IRQ_WORK share a similar backing
struct layout (meaning their callback func can be accessed without caring
about the actual CSD type), CSD_TYPE_TTWU doesn't even have a function
attached to its struct. This means we need to check the type of a CSD
before eventually dereferencing its associated callback.

This isn't as trivial as it sounds: the CSD type is stored in
__call_single_node.u_flags, which get cleared right before the callback is
executed via csd_unlock(). This implies checking the CSD type before it is
enqueued on the call_single_queue, as the target CPU's queue can be flushed
before we get to sending an IPI.

Furthermore, send_call_function_single_ipi() only has a CPU parameter, and
would need to have an additional argument to trickle down the invoked
function. This is somewhat silly, as the extra argument will always be
pushed down to the function even when nothing is being traced, which is
unnecessary overhead.

Two options present themselves:
a) Create copies of send_call_function_{single_ipi, ipi_mask}() that take
   an extra argument used for tracing, so that codepaths remain unchanged
   when tracing isn't in effect (a sort of manual -fipa-sra).

b) Stash the CSD func in somewhere as a side effect that
   the portion of send_call_function_{single_ipi, ipi_mask}() under the
   tracepoint's static key can fetch.

a) creates redundant code, and b) is quite fragile due to requiring extra
care for "reentrant" functions (async SMP calls).

This implements a).

Signed-off-by: Valentin Schneider 
---
 kernel/irq_work.c   |  2 ++
 kernel/sched/core.c | 35 ---
 kernel/sched/smp.h  |  1 +
 kernel/smp.c| 42 ++
 4 files changed, 65 insertions(+), 15 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index aec38c294ce68..fcfa75c4a5daf 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -24,6 +24,8 @@
 
 #include 
 
+#include "sched/smp.h"
+
 static DEFINE_PER_CPU(struct llist_head, raised_list);
 static DEFINE_PER_CPU(struct llist_head, lazy_list);
 static DEFINE_PER_CPU(struct task_struct *, irq_workd);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 02181f8072b5f..41196ca67e913 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3743,17 +3743,30 @@ void sched_ttwu_pending(void *arg)
rq_unlock_irqrestore(rq, &rf);
 }
 
-void send_call_function_single_ipi(int cpu)
-{
-   struct rq *rq = cpu_rq(cpu);
-
-   if (!set_nr_if_polling(rq->idle)) {
-   trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL);
-   arch_send_call_function_single_ipi(cpu);
-   } else {
-   trace_sched_wake_idle_without_ipi(cpu);
-   }
-}
+/*
+ * We want a variant that traces the function causing the IPI to be sent, but
+ * we don't want the extra argument to cause unnecessary overhead when tracing
+ * isn't happening.
+ */
+#define GEN_CFSI(suffix, IPI_EXP, ...) 
\
+void send_call_function_single_ipi##suffix(__VA_ARGS__)
\
+{  
\
+   struct rq *rq = cpu_rq(cpu);
\
+   
\
+   if (!set_nr_if_polling(rq->idle)) { 
\
+   IPI_EXP;
\
+   arch_send_call_function_single_ipi(cpu);
\
+   } else {
\
+   trace_sched_wake_idle_without_ipi(cpu); 
\
+   }   
\
+}
+
+GEN_CFSI(/* nop */,
+/* nop */,
+int cpu)
+GEN_CFSI(_trace,
+trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, func),
+int cpu, smp_call_func_t func)
 
 /*
  * Queue a task on the target CPUs wake_list and wake the CPU via IPI if
diff --git a/kernel/sched/smp.h b/kernel/sched/smp.h
index 2eb23dd0f2856..8075ad5e84181 100644
--- a/kernel/sched/smp.h
+++ b/kernel/sched/smp.h
@@ -7,6 +7,7 @@
 extern void sched_ttwu_pending(void *arg);
 
 extern void send_call_function_single_ipi(int cpu);
+extern void send_call_function_single_ipi_trace(int cpu, smp_call_func_t func);
 
 #ifdef CONFIG_SMP
 extern void flush_smp_call_function_queue(void);
diff --git a/kernel/smp.c b/kernel/smp.c
index b96579fe08f09..3b8e6456ac7e7 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -161,12 +161,18 @@ void __init call_function_init(void)
 }
 
 static inline void
-send_call_function_ip

[RFC PATCH v2 7/8] smp: reword smp call IPI comment

2022-11-02 Thread Valentin Schneider
Accessing the call_single_queue hasn't involved a spinlock since 2014:

  6897fc22ea01 ("kernel: use lockless list for smp_call_function_single")

The llist operations (namely cmpxchg() and xchg()) provide similar ordering
guarantees, update the comment to lessen confusion.

Signed-off-by: Valentin Schneider 
---
 kernel/smp.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 44fa4b9b1f46b..b96579fe08f09 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -503,9 +503,10 @@ void __smp_call_single_queue(int cpu, struct llist_node 
*node)
 #endif
 
/*
-* The list addition should be visible before sending the IPI
-* handler locks the list to pull the entry off it because of
-* normal cache coherency rules implied by spinlocks.
+* The list addition should be visible to the target CPU when it pops
+* the head of the list to pull the entry off it in the IPI handler
+* because of normal cache coherency rules implied by the underlying
+* llist ops.
 *
 * If IPIs can go out of order to the cache coherency protocol
 * in an architecture, sufficient synchronisation should be added
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[RFC PATCH v2 0/8] Generic IPI sending tracepoint

2022-11-02 Thread Valentin Schneider
Background
==

Detecting IPI *reception* is relatively easy, e.g. using
trace_irq_handler_{entry,exit} or even just function-trace
flush_smp_call_function_queue() for SMP calls.  

Figuring out their *origin*, is trickier as there is no generic tracepoint tied
to e.g. smp_call_function():

o AFAIA x86 has no tracepoint tied to sending IPIs, only receiving them
  (cf. trace_call_function{_single}_entry()).
o arm/arm64 do have trace_ipi_raise(), which gives us the target cpus but also a
  mostly useless string (smp_calls will all be "Function call interrupts").
o Other architectures don't seem to have any IPI-sending related tracepoint.  

I believe one reason those tracepoints used by arm/arm64 ended up as they were
is because these archs used to handle IPIs differently from regular interrupts
(the IRQ driver would directly invoke an IPI-handling routine), which meant 
they 
never showed up in trace_irq_handler_{entry, exit}. The trace_ipi_{entry,exit}
tracepoints gave a way to trace IPI reception but those have become redundant as
of: 

  56afcd3dbd19 ("ARM: Allow IPIs to be handled as normal interrupts")
  d3afc7f12987 ("arm64: Allow IPIs to be handled as normal interrupts")

which gave IPIs a "proper" handler function used through
generic_handle_domain_irq(), which makes them show up via
trace_irq_handler_{entry, exit}.

Changing stuff up
=

Per the above, it would make sense to reshuffle trace_ipi_raise() and move it
into generic code. This also came up during Daniel's talk on Osnoise at the CPU
isolation MC of LPC 2022 [1]. 

Now, to be useful, such a tracepoint needs to export:
o targeted CPU(s)
o calling context

The only way to get the calling context with trace_ipi_raise() is to trigger a
stack dump, e.g. $(trace-cmd -e ipi* -T echo 42).

This is instead introducing a new tracepoint which exports the relevant context
(callsite, and requested callback for when the callsite isn't helpful), and is
usable by all architectures as it sits in generic code. 

Another thing worth mentioning is that depending on the callsite, the _RET_IP_
fed to the tracepoint is not always useful - generic_exec_single() doesn't tell
you much about the actual callback being sent via IPI, which is why the new
tracepoint also has a @callback argument.

Patches
===

o Patch 1 is included for convenience and will most likely be merged
  independently. FYI I have libtraceevent patches [2] to improve the
  pretty-printing of cpumasks using the new type, which look like:
  <...>-3322  [021]   560.402583: ipi_send_cpumask: cpumask=14,17,21 
callsite=on_each_cpu_cond_mask+0x40 callback=flush_tlb_func+0x0
  <...>-187   [010]   562.590584: ipi_send_cpumask: cpumask=0-23 
callsite=on_each_cpu_cond_mask+0x40 callback=do_sync_core+0x0

o Patches 2-6 spread out the tracepoint accross relevant sites.
o Patch 8 is trying to be smart about tracing the callback associated with the
  IPI.

This results in having IPI trace events for:

o smp_call_function*()
o smp_send_reschedule()
o irq_work_queue*()
o standalone uses of __smp_call_single_queue()

This is incomplete, just looking at arm64 there's more IPI types that aren't
covered: 

  IPI_CPU_STOP,
  IPI_CPU_CRASH_STOP,
  IPI_TIMER,
  IPI_WAKEUP,

... But it feels like a good starting point.

Links
=

[1]: https://youtu.be/5gT57y4OzBM?t=14234
[2]: https://lore.kernel.org/all/20221018151630.1513535-1-vschn...@redhat.com/

Revisions
=

v1 -> v2


o Ditched single-CPU tracepoint
o Changed tracepoint signature to include callback
o Changed tracepoint callsite field to void *; the parameter is still UL to save
  up on casts due to using _RET_IP_.
o Fixed linking failures due to not exporting smp_send_reschedule()

Steven Rostedt (Google) (1):
  tracing: Add __cpumask to denote a trace event field that is a
cpumask_t

Valentin Schneider (7):
  trace: Add trace_ipi_send_cpumask()
  sched, smp: Trace IPIs sent via send_call_function_single_ipi()
  smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
  irq_work: Trace self-IPIs sent via arch_irq_work_raise()
  treewide: Trace IPIs sent via smp_send_reschedule()
  smp: reword smp call IPI comment
  sched, smp: Trace smp callback causing an IPI

 arch/alpha/kernel/smp.c  |  2 +-
 arch/arc/kernel/smp.c|  2 +-
 arch/arm/kernel/smp.c|  5 +-
 arch/arm64/kernel/smp.c  |  3 +-
 arch/csky/kernel/smp.c   |  2 +-
 arch/hexagon/kernel/smp.c|  2 +-
 arch/ia64/kernel/smp.c   |  4 +-
 arch/loongarch/include/asm/smp.h |  2 +-
 arch/mips/include/asm/smp.h  |  2 +-
 arch/openrisc/kernel/smp.c   |  2 +-
 arch/parisc/kernel/smp.c |  4 +-
 arch/powerpc/kernel/smp.c

Re: [RFC PATCH v2 4/8] smp: Trace IPIs sent via arch_send_call_function_ipi_mask()

2022-11-17 Thread Valentin Schneider
On 17/11/22 10:08, Peter Zijlstra wrote:
> On Wed, Nov 02, 2022 at 06:33:32PM +0000, Valentin Schneider wrote:
>> This simply wraps around the arch function and prepends it with a
>> tracepoint, similar to send_call_function_single_ipi().
>>
>> Signed-off-by: Valentin Schneider 
>> ---
>>  kernel/smp.c | 9 -
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index e2ca1e2f31274..c4d561cf50d45 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -160,6 +160,13 @@ void __init call_function_init(void)
>>  smpcfd_prepare_cpu(smp_processor_id());
>>  }
>>
>> +static inline void
>
> Given the use of _RET_IP_, I would strongly recommend you use
> __always_inline.
>

Noted, thanks

>> +send_call_function_ipi_mask(const struct cpumask *mask)
>> +{
>> +trace_ipi_send_cpumask(mask, _RET_IP_, func);
>
> What's func?
>

A rebase fail... That's only plugged in later.

>> +arch_send_call_function_ipi_mask(mask);
>> +}


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH v2 6/8] treewide: Trace IPIs sent via smp_send_reschedule()

2022-11-17 Thread Valentin Schneider
On 17/11/22 10:12, Peter Zijlstra wrote:
> On Wed, Nov 02, 2022 at 06:33:34PM +0000, Valentin Schneider wrote:
>
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index c4d561cf50d45..44fa4b9b1f46b 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -167,6 +167,14 @@ send_call_function_ipi_mask(const struct cpumask *mask)
>>  arch_send_call_function_ipi_mask(mask);
>>  }
>>
>> +void smp_send_reschedule(int cpu)
>> +{
>> +/* XXX scheduler_ipi is inline :/ */
>> +trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL);
>> +arch_smp_send_reschedule(cpu);
>> +}
>> +EXPORT_SYMBOL_GPL(smp_send_reschedule);
>
> Yeah, no.. I see some crazy archs do this, but no we're not exporting
> this in generic.

So the list is: ia64, powerpc, riscv
and they all seem to do it because of KVM:
  c4cb768f0277 ("[IA64] export smp_send_reschedule")
  de56a948b918 ("KVM: PPC: Add support for Book3S processors in hypervisor 
mode")
  d3d7a0ce020e ("RISC-V: Export kernel symbols for kvm")

Other archs get out of it either because their smp_send_reschedule() is
inline (e.g. x86), or because they don't allow building KVM as a module
(e.g. arm64).

If I can cobble the tracepoint+reschedule in an inline helper, then that
wouldn't require any new exports - I'm fighting a bit with the header maze
ATM, but hopefully I can get somewhere with this.


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC PATCH v2 8/8] sched, smp: Trace smp callback causing an IPI

2022-11-17 Thread Valentin Schneider
On 17/11/22 15:12, Peter Zijlstra wrote:
> On Wed, Nov 02, 2022 at 06:33:36PM +0000, Valentin Schneider wrote:
> *yuck*

:-)

>
> How about something like so?
>
> ---
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -24,6 +24,8 @@
>
>  #include 
>
> +#include "sched/smp.h"
> +
>  static DEFINE_PER_CPU(struct llist_head, raised_list);
>  static DEFINE_PER_CPU(struct llist_head, lazy_list);
>  static DEFINE_PER_CPU(struct task_struct *, irq_workd);
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3763,16 +3763,17 @@ void sched_ttwu_pending(void *arg)
>   rq_unlock_irqrestore(rq, &rf);
>  }
>
> -void send_call_function_single_ipi(int cpu)
> +bool send_call_function_single_ipi(int cpu)
>  {
>   struct rq *rq = cpu_rq(cpu);
>
>   if (!set_nr_if_polling(rq->idle)) {
> - trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL);
>   arch_send_call_function_single_ipi(cpu);
> - } else {
> - trace_sched_wake_idle_without_ipi(cpu);
> + return true;
>   }
> +
> + trace_sched_wake_idle_without_ipi(cpu);
> + return false;
>  }
>
>  /*
> --- a/kernel/sched/smp.h
> +++ b/kernel/sched/smp.h
> @@ -6,7 +6,7 @@
>
>  extern void sched_ttwu_pending(void *arg);
>
> -extern void send_call_function_single_ipi(int cpu);
> +extern bool send_call_function_single_ipi(int cpu);
>
>  #ifdef CONFIG_SMP
>  extern void flush_smp_call_function_queue(void);
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -163,7 +163,6 @@ void __init call_function_init(void)
>  static inline void
>  send_call_function_ipi_mask(const struct cpumask *mask)
>  {
> - trace_ipi_send_cpumask(mask, _RET_IP_, func);
>   arch_send_call_function_ipi_mask(mask);
>  }
>
> @@ -438,11 +437,16 @@ static void __smp_call_single_queue_debu
>   struct cfd_seq_local *seq = this_cpu_ptr(&cfd_seq_local);
>   struct call_function_data *cfd = this_cpu_ptr(&cfd_data);
>   struct cfd_percpu *pcpu = per_cpu_ptr(cfd->pcpu, cpu);
> + struct __call_single_data *csd;
> +
> + csd = container_of(node, call_single_data_t, node.llist);
> + WARN_ON_ONCE(!(CSD_TYPE(csd) & (CSD_TYPE_SYNC | CSD_TYPE_ASYNC)));
>
>   cfd_seq_store(pcpu->seq_queue, this_cpu, cpu, CFD_SEQ_QUEUE);
>   if (llist_add(node, &per_cpu(call_single_queue, cpu))) {
>   cfd_seq_store(pcpu->seq_ipi, this_cpu, cpu, CFD_SEQ_IPI);
>   cfd_seq_store(seq->ping, this_cpu, cpu, CFD_SEQ_PING);
> + trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, csd->func);
>   send_call_function_single_ipi(cpu);
>   cfd_seq_store(seq->pinged, this_cpu, cpu, CFD_SEQ_PINGED);
>   } else {
> @@ -487,6 +491,27 @@ static __always_inline void csd_unlock(s
>
>  static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
>
> +static __always_inline
> +bool raw_smp_call_single_queue(int cpu, struct llist_node *node)
> +{
> + /*
> +  * The list addition should be visible to the target CPU when it pops
> +  * the head of the list to pull the entry off it in the IPI handler
> +  * because of normal cache coherency rules implied by the underlying
> +  * llist ops.
> +  *
> +  * If IPIs can go out of order to the cache coherency protocol
> +  * in an architecture, sufficient synchronisation should be added
> +  * to arch code to make it appear to obey cache coherency WRT
> +  * locking and barrier primitives. Generic code isn't really
> +  * equipped to do the right thing...
> +  */
> + if (llist_add(node, &per_cpu(call_single_queue, cpu)))
> + return send_call_function_single_ipi(cpu);
> +
> + return false;
> +}
> +
>  void __smp_call_single_queue(int cpu, struct llist_node *node)
>  {
>  #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
> @@ -503,19 +528,28 @@ void __smp_call_single_queue(int cpu, st
>  #endif
>
>   /*
> -  * The list addition should be visible to the target CPU when it pops
> -  * the head of the list to pull the entry off it in the IPI handler
> -  * because of normal cache coherency rules implied by the underlying
> -  * llist ops.
> -  *
> -  * If IPIs can go out of order to the cache coherency protocol
> -  * in an architecture, sufficient synchronisation should be added
> -  * to arch code to make it appear to obey cache coherency WRT
> -  * locking and barrier primitives. Generic code isn't really
> -  * equipped to do the right thing...
> -  */
> - if (llist_add(node, &per_cpu(call_s

[PATCH v3 0/8] Generic IPI sending tracepoint

2022-12-02 Thread Valentin Schneider
Background
==

Detecting IPI *reception* is relatively easy, e.g. using
trace_irq_handler_{entry,exit} or even just function-trace
flush_smp_call_function_queue() for SMP calls.  

Figuring out their *origin*, is trickier as there is no generic tracepoint tied
to e.g. smp_call_function():

o AFAIA x86 has no tracepoint tied to sending IPIs, only receiving them
  (cf. trace_call_function{_single}_entry()).
o arm/arm64 do have trace_ipi_raise(), which gives us the target cpus but also a
  mostly useless string (smp_calls will all be "Function call interrupts").
o Other architectures don't seem to have any IPI-sending related tracepoint.  

I believe one reason those tracepoints used by arm/arm64 ended up as they were
is because these archs used to handle IPIs differently from regular interrupts
(the IRQ driver would directly invoke an IPI-handling routine), which meant 
they 
never showed up in trace_irq_handler_{entry, exit}. The trace_ipi_{entry,exit}
tracepoints gave a way to trace IPI reception but those have become redundant as
of: 

  56afcd3dbd19 ("ARM: Allow IPIs to be handled as normal interrupts")
  d3afc7f12987 ("arm64: Allow IPIs to be handled as normal interrupts")

which gave IPIs a "proper" handler function used through
generic_handle_domain_irq(), which makes them show up via
trace_irq_handler_{entry, exit}.

Changing stuff up
=

Per the above, it would make sense to reshuffle trace_ipi_raise() and move it
into generic code. This also came up during Daniel's talk on Osnoise at the CPU
isolation MC of LPC 2022 [1]. 

Now, to be useful, such a tracepoint needs to export:
o targeted CPU(s)
o calling context

The only way to get the calling context with trace_ipi_raise() is to trigger a
stack dump, e.g. $(trace-cmd -e ipi* -T echo 42).

This is instead introducing a new tracepoint which exports the relevant context
(callsite, and requested callback for when the callsite isn't helpful), and is
usable by all architectures as it sits in generic code. 

Another thing worth mentioning is that depending on the callsite, the _RET_IP_
fed to the tracepoint is not always useful - generic_exec_single() doesn't tell
you much about the actual callback being sent via IPI, which is why the new
tracepoint also has a @callback argument.

Patches
===

o Patch 1 is included for convenience and will be merged independently. FYI I
  have libtraceevent patches [2] to improve the 
  pretty-printing of cpumasks using the new type, which look like:
  <...>-3322  [021]   560.402583: ipi_send_cpumask: cpumask=14,17,21 
callsite=on_each_cpu_cond_mask+0x40 callback=flush_tlb_func+0x0
  <...>-187   [010]   562.590584: ipi_send_cpumask: cpumask=0-23 
callsite=on_each_cpu_cond_mask+0x40 callback=do_sync_core+0x0

o Patches 2-6 spread out the tracepoint across relevant sites.
  Patch 6 ends up sprinkling lots of #include  which I'm not
  the biggest fan of, but is the least horrible solution I've been able to come
  up with so far.
  
o Patch 8 is trying to be smart about tracing the callback associated with the
  IPI.

This results in having IPI trace events for:

o smp_call_function*()
o smp_send_reschedule()
o irq_work_queue*()
o standalone uses of __smp_call_single_queue()

This is incomplete, just looking at arm64 there's more IPI types that aren't
covered: 

  IPI_CPU_STOP,
  IPI_CPU_CRASH_STOP,
  IPI_TIMER,
  IPI_WAKEUP,

... But it feels like a good starting point.

Links
=

[1]: https://youtu.be/5gT57y4OzBM?t=14234
[2]: https://lore.kernel.org/all/20221116144154.3662923-1-vschn...@redhat.com/

Revisions
=

v2 -> v3


o Dropped the generic export of smp_send_reschedule(), turned it into a macro
  and a bunch of imports
o Dropped the send_call_function_single_ipi() macro madness, split it into sched
  and smp bits using some of Peter's suggestions

v1 -> v2


o Ditched single-CPU tracepoint
o Changed tracepoint signature to include callback
o Changed tracepoint callsite field to void *; the parameter is still UL to save
  up on casts due to using _RET_IP_.
o Fixed linking failures due to not exporting smp_send_reschedule()

Steven Rostedt (Google) (1):
  tracing: Add __cpumask to denote a trace event field that is a
cpumask_t

Valentin Schneider (7):
  trace: Add trace_ipi_send_cpumask()
  sched, smp: Trace IPIs sent via send_call_function_single_ipi()
  smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
  irq_work: Trace self-IPIs sent via arch_irq_work_raise()
  treewide: Trace IPIs sent via smp_send_reschedule()
  smp: reword smp call IPI comment
  sched, smp: Trace smp callback causing an IPI

 arch/alpha/kernel/smp.c  |  2 +-
 arch/arc/kernel/smp.c|  2 +-
 arch/arm/kernel/smp.c|  5 +-
 arch/arm/mach-actions/platsmp.c  |  2 +
 arch/arm64/kernel/smp.c 

[PATCH v3 1/8] DO-NOT-MERGE: tracing: Add __cpumask to denote a trace event field that is a cpumask_t

2022-12-02 Thread Valentin Schneider
From: "Steven Rostedt (Google)" 

The trace events have a __bitmask field that can be used for anything
that requires bitmasks. Although currently it is only used for CPU
masks, it could be used in the future for any type of bitmasks.

There is some user space tooling that wants to know if a field is a CPU
mask and not just some random unsigned long bitmask. Introduce
"__cpumask()" helper functions that work the same as the current
__bitmask() helpers but displays in the format file:

  field:__data_loc cpumask_t *[] mask;offset:36;  size:4; signed:0;

Instead of:

  field:__data_loc unsigned long[] mask;  offset:32;  size:4; signed:0;

The main difference is the type. Instead of "unsigned long" it is
"cpumask_t *". Note, this type field needs to be a real type in the
__dynamic_array() logic that both __cpumask and__bitmask use, but the
comparison field requires it to be a scalar type whereas cpumask_t is a
structure (non-scalar). But everything works when making it a pointer.

Valentin added changes to remove the need of passing in "nr_bits" and the
__cpumask will always use nr_cpumask_bits as its size.

Link: https://lkml.kernel.org/r/20221014080456.1d32b...@rorschach.local.home

Requested-by: Valentin Schneider 
Reviewed-by: Valentin Schneider 
Signed-off-by: Valentin Schneider 
Signed-off-by: Steven Rostedt (Google) 
---
 include/trace/bpf_probe.h|  6 
 include/trace/perf.h |  6 
 include/trace/stages/stage1_struct_define.h  |  6 
 include/trace/stages/stage2_data_offsets.h   |  6 
 include/trace/stages/stage3_trace_output.h   |  6 
 include/trace/stages/stage4_event_fields.h   |  6 
 include/trace/stages/stage5_get_offsets.h|  6 
 include/trace/stages/stage6_event_callback.h | 20 
 include/trace/stages/stage7_class_define.h   |  2 ++
 samples/trace_events/trace-events-sample.c   |  2 +-
 samples/trace_events/trace-events-sample.h   | 34 +++-
 11 files changed, 91 insertions(+), 9 deletions(-)

diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 6a13220d2d27b..155c495b89ead 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -21,6 +21,9 @@
 #undef __get_bitmask
 #define __get_bitmask(field) (char *)__get_dynamic_array(field)

+#undef __get_cpumask
+#define __get_cpumask(field) (char *)__get_dynamic_array(field)
+
 #undef __get_sockaddr
 #define __get_sockaddr(field) ((struct sockaddr *)__get_dynamic_array(field))

@@ -40,6 +43,9 @@
 #undef __get_rel_bitmask
 #define __get_rel_bitmask(field) (char *)__get_rel_dynamic_array(field)

+#undef __get_rel_cpumask
+#define __get_rel_cpumask(field) (char *)__get_rel_dynamic_array(field)
+
 #undef __get_rel_sockaddr
 #define __get_rel_sockaddr(field) ((struct sockaddr 
*)__get_rel_dynamic_array(field))

diff --git a/include/trace/perf.h b/include/trace/perf.h
index 5800d13146c3d..8f3bf1e177070 100644
--- a/include/trace/perf.h
+++ b/include/trace/perf.h
@@ -21,6 +21,9 @@
 #undef __get_bitmask
 #define __get_bitmask(field) (char *)__get_dynamic_array(field)

+#undef __get_cpumask
+#define __get_cpumask(field) (char *)__get_dynamic_array(field)
+
 #undef __get_sockaddr
 #define __get_sockaddr(field) ((struct sockaddr *)__get_dynamic_array(field))

@@ -41,6 +44,9 @@
 #undef __get_rel_bitmask
 #define __get_rel_bitmask(field) (char *)__get_rel_dynamic_array(field)

+#undef __get_rel_cpumask
+#define __get_rel_cpumask(field) (char *)__get_rel_dynamic_array(field)
+
 #undef __get_rel_sockaddr
 #define __get_rel_sockaddr(field) ((struct sockaddr 
*)__get_rel_dynamic_array(field))

diff --git a/include/trace/stages/stage1_struct_define.h 
b/include/trace/stages/stage1_struct_define.h
index 1b7bab60434c1..69e0dae453bfa 100644
--- a/include/trace/stages/stage1_struct_define.h
+++ b/include/trace/stages/stage1_struct_define.h
@@ -32,6 +32,9 @@
 #undef __bitmask
 #define __bitmask(item, nr_bits) __dynamic_array(char, item, -1)

+#undef __cpumask
+#define __cpumask(item) __dynamic_array(char, item, -1)
+
 #undef __sockaddr
 #define __sockaddr(field, len) __dynamic_array(u8, field, len)

@@ -47,6 +50,9 @@
 #undef __rel_bitmask
 #define __rel_bitmask(item, nr_bits) __rel_dynamic_array(char, item, -1)

+#undef __rel_cpumask
+#define __rel_cpumask(item) __rel_dynamic_array(char, item, -1)
+
 #undef __rel_sockaddr
 #define __rel_sockaddr(field, len) __rel_dynamic_array(u8, field, len)

diff --git a/include/trace/stages/stage2_data_offsets.h 
b/include/trace/stages/stage2_data_offsets.h
index 1b7a8f764fddd..469b6a64293de 100644
--- a/include/trace/stages/stage2_data_offsets.h
+++ b/include/trace/stages/stage2_data_offsets.h
@@ -38,6 +38,9 @@
 #undef __bitmask
 #define __bitmask(item, nr_bits) __dynamic_array(unsigned long, item, -1)

+#undef __cpumask
+#define __cpumask(item) __dynamic_array(unsigned long, item, -1)
+
 #undef __sockaddr
 #define __sockaddr(field, le

[PATCH v3 2/8] trace: Add trace_ipi_send_cpumask()

2022-12-02 Thread Valentin Schneider
trace_ipi_raise() is unsuitable for generically tracing IPI sources due to
its "reason" argument being an uninformative string (on arm64 all you get
is "Function call interrupts" for SMP calls).

Add a variant of it that exports a target CPU, a callsite and a callback.

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
---
 include/trace/events/ipi.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/include/trace/events/ipi.h b/include/trace/events/ipi.h
index 0be71dad6ec03..b1125dc27682c 100644
--- a/include/trace/events/ipi.h
+++ b/include/trace/events/ipi.h
@@ -35,6 +35,28 @@ TRACE_EVENT(ipi_raise,
TP_printk("target_mask=%s (%s)", __get_bitmask(target_cpus), 
__entry->reason)
 );
 
+TRACE_EVENT(ipi_send_cpumask,
+
+   TP_PROTO(const struct cpumask *cpumask, unsigned long callsite, void 
*callback),
+
+   TP_ARGS(cpumask, callsite, callback),
+
+   TP_STRUCT__entry(
+   __cpumask(cpumask)
+   __field(void *, callsite)
+   __field(void *, callback)
+   ),
+
+   TP_fast_assign(
+   __assign_cpumask(cpumask, cpumask_bits(cpumask));
+   __entry->callsite = (void *)callsite;
+   __entry->callback = callback;
+   ),
+
+   TP_printk("cpumask=%s callsite=%pS callback=%pS",
+ __get_cpumask(cpumask), __entry->callsite, __entry->callback)
+);
+
 DECLARE_EVENT_CLASS(ipi_handler,
 
TP_PROTO(const char *reason),
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v3 4/8] smp: Trace IPIs sent via arch_send_call_function_ipi_mask()

2022-12-02 Thread Valentin Schneider
This simply wraps around the arch function and prepends it with a
tracepoint, similar to send_call_function_single_ipi().

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
---
 kernel/smp.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index e2ca1e2f31274..93b4386cd3096 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -160,6 +160,13 @@ void __init call_function_init(void)
smpcfd_prepare_cpu(smp_processor_id());
 }
 
+static __always_inline void
+send_call_function_ipi_mask(const struct cpumask *mask)
+{
+   trace_ipi_send_cpumask(mask, _RET_IP_, NULL);
+   arch_send_call_function_ipi_mask(mask);
+}
+
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
 
 static DEFINE_STATIC_KEY_FALSE(csdlock_debug_enabled);
@@ -970,7 +977,7 @@ static void smp_call_function_many_cond(const struct 
cpumask *mask,
if (nr_cpus == 1)
send_call_function_single_ipi(last_cpu);
else if (likely(nr_cpus > 1))
-   arch_send_call_function_ipi_mask(cfd->cpumask_ipi);
+   send_call_function_ipi_mask(cfd->cpumask_ipi);
 
cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->pinged, this_cpu, 
CFD_SEQ_NOCPU, CFD_SEQ_PINGED);
}
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v3 3/8] sched, smp: Trace IPIs sent via send_call_function_single_ipi()

2022-12-02 Thread Valentin Schneider
send_call_function_single_ipi() is the thing that sends IPIs at the bottom
of smp_call_function*() via either generic_exec_single() or
smp_call_function_many_cond(). Give it an IPI-related tracepoint.

Note that this ends up tracing any IPI sent via __smp_call_single_queue(),
which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free".

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
---
 arch/arm/kernel/smp.c   | 3 ---
 arch/arm64/kernel/smp.c | 1 -
 kernel/sched/core.c | 7 +--
 kernel/smp.c| 4 
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 978db2d96b446..3b280d55c1c40 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -48,9 +48,6 @@
 #include 
 #include 
 
-#define CREATE_TRACE_POINTS
-#include 
-
 /*
  * as from 2.5, kernels no longer have an init_tasks structure
  * so we need some other way of telling a new secondary core
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ffc5d76cf6955..937d2623e06ba 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -51,7 +51,6 @@
 #include 
 #include 
 
-#define CREATE_TRACE_POINTS
 #include 
 
 DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index daff72f003858..40587b0d99329 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -81,6 +81,7 @@
 #include 
 #include 
 #undef CREATE_TRACE_POINTS
+#include 
 
 #include "sched.h"
 #include "stats.h"
@@ -3746,10 +3747,12 @@ void send_call_function_single_ipi(int cpu)
 {
struct rq *rq = cpu_rq(cpu);
 
-   if (!set_nr_if_polling(rq->idle))
+   if (!set_nr_if_polling(rq->idle)) {
+   trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL);
arch_send_call_function_single_ipi(cpu);
-   else
+   } else {
trace_sched_wake_idle_without_ipi(cpu);
+   }
 }
 
 /*
diff --git a/kernel/smp.c b/kernel/smp.c
index 06a413987a14a..e2ca1e2f31274 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -26,6 +26,10 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+#undef CREATE_TRACE_POINTS
+
 #include "smpboot.h"
 #include "sched/smp.h"
 
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v3 5/8] irq_work: Trace self-IPIs sent via arch_irq_work_raise()

2022-12-02 Thread Valentin Schneider
IPIs sent to remote CPUs via irq_work_queue_on() are now covered by
trace_ipi_send_cpumask(), add another instance of the tracepoint to cover
self-IPIs.

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
---
 kernel/irq_work.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 7afa40fe5cc43..c33e88e32a67a 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 
+#include 
+
 static DEFINE_PER_CPU(struct llist_head, raised_list);
 static DEFINE_PER_CPU(struct llist_head, lazy_list);
 static DEFINE_PER_CPU(struct task_struct *, irq_workd);
@@ -74,6 +76,16 @@ void __weak arch_irq_work_raise(void)
 */
 }
 
+static __always_inline void irq_work_raise(struct irq_work *work)
+{
+   if (trace_ipi_send_cpumask_enabled() && arch_irq_work_has_interrupt())
+   trace_ipi_send_cpumask(cpumask_of(smp_processor_id()),
+  _RET_IP_,
+  work->func);
+
+   arch_irq_work_raise();
+}
+
 /* Enqueue on current CPU, work must already be claimed and preempt disabled */
 static void __irq_work_queue_local(struct irq_work *work)
 {
@@ -99,7 +111,7 @@ static void __irq_work_queue_local(struct irq_work *work)
 
/* If the work is "lazy", handle it from next tick if any */
if (!lazy_work || tick_nohz_tick_stopped())
-   arch_irq_work_raise();
+   irq_work_raise(work);
 }
 
 /* Enqueue the irq work @work on the current CPU */
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v3 6/8] treewide: Trace IPIs sent via smp_send_reschedule()

2022-12-02 Thread Valentin Schneider
To be able to trace invocations of smp_send_reschedule(), rename the
arch-specific definitions of it to arch_smp_send_reschedule() and wrap it
into an smp_send_reschedule() that contains a tracepoint.

Changes to include the declaration of the tracepoint were driven by the
following coccinelle script:

  @func_use@
  @@
  smp_send_reschedule(...);

  @include@
  @@
  #include 

  @no_include depends on func_use && !include@
  @@
#include <...>
  +
  + #include 

Signed-off-by: Valentin Schneider 
[csky bits]
Acked-by: Guo Ren 
---
 arch/alpha/kernel/smp.c  | 2 +-
 arch/arc/kernel/smp.c| 2 +-
 arch/arm/kernel/smp.c| 2 +-
 arch/arm/mach-actions/platsmp.c  | 2 ++
 arch/arm64/kernel/smp.c  | 2 +-
 arch/csky/kernel/smp.c   | 2 +-
 arch/hexagon/kernel/smp.c| 2 +-
 arch/ia64/kernel/smp.c   | 4 ++--
 arch/loongarch/include/asm/smp.h | 2 +-
 arch/mips/include/asm/smp.h  | 2 +-
 arch/mips/kernel/rtlx-cmp.c  | 2 ++
 arch/openrisc/kernel/smp.c   | 2 +-
 arch/parisc/kernel/smp.c | 4 ++--
 arch/powerpc/kernel/smp.c| 6 --
 arch/powerpc/kvm/book3s_hv.c | 3 +++
 arch/powerpc/platforms/powernv/subcore.c | 2 ++
 arch/riscv/kernel/smp.c  | 4 ++--
 arch/s390/kernel/smp.c   | 2 +-
 arch/sh/kernel/smp.c | 2 +-
 arch/sparc/kernel/smp_32.c   | 2 +-
 arch/sparc/kernel/smp_64.c   | 2 +-
 arch/x86/include/asm/smp.h   | 2 +-
 arch/x86/kvm/svm/svm.c   | 4 
 arch/x86/kvm/x86.c   | 2 ++
 arch/xtensa/kernel/smp.c | 2 +-
 include/linux/smp.h  | 8 ++--
 virt/kvm/kvm_main.c  | 1 +
 27 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index f4e20f75438f8..38637eb9eebd5 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -562,7 +562,7 @@ handle_ipi(struct pt_regs *regs)
 }
 
 void
-smp_send_reschedule(int cpu)
+arch_smp_send_reschedule(int cpu)
 {
 #ifdef DEBUG_IPI_MSG
if (cpu == hard_smp_processor_id())
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index ad93fe6e4b77d..409cfa4675b40 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -292,7 +292,7 @@ static void ipi_send_msg(const struct cpumask *callmap, 
enum ipi_msg_type msg)
ipi_send_msg_one(cpu, msg);
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
ipi_send_msg_one(cpu, IPI_RESCHEDULE);
 }
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 3b280d55c1c40..f216ac890b6f9 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -745,7 +745,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
ipi_setup(smp_processor_id());
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/arm/mach-actions/platsmp.c b/arch/arm/mach-actions/platsmp.c
index f26618b435145..7b208e96fbb67 100644
--- a/arch/arm/mach-actions/platsmp.c
+++ b/arch/arm/mach-actions/platsmp.c
@@ -20,6 +20,8 @@
 #include 
 #include 
 
+#include 
+
 #define OWL_CPU1_ADDR  0x50
 #define OWL_CPU1_FLAG  0x5c
 
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 937d2623e06ba..8d108edc4a89f 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -976,7 +976,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
ipi_setup(smp_processor_id());
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/csky/kernel/smp.c b/arch/csky/kernel/smp.c
index 4b605aa2e1d65..fd7f81be16dd6 100644
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -140,7 +140,7 @@ void smp_send_stop(void)
on_each_cpu(ipi_stop, NULL, 1);
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
send_ipi_message(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c
index 4ba93e59370c4..4e8bee25b8c68 100644
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -217,7 +217,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
}
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
send_ipi(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c
index e2cc59db86bc2..ea4f009a232b4 100644
--- a/arch/ia64/kernel/smp.c
+++ b/arch/ia64/kernel/smp.c
@@ -220,11 +220,11 @@ kdump_smp_send_init(void)
  * Called with preemption disabled.
  */
 void
-smp_send_reschedule (int cpu)
+arch_smp_send_resch

[PATCH v3 7/8] smp: reword smp call IPI comment

2022-12-02 Thread Valentin Schneider
Accessing the call_single_queue hasn't involved a spinlock since 2014:

  6897fc22ea01 ("kernel: use lockless list for smp_call_function_single")

The llist operations (namely cmpxchg() and xchg()) provide similar ordering
guarantees, update the comment to lessen confusion.

Signed-off-by: Valentin Schneider 
---
 kernel/smp.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 93b4386cd3096..821b5986721ac 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -495,9 +495,10 @@ void __smp_call_single_queue(int cpu, struct llist_node 
*node)
 #endif
 
/*
-* The list addition should be visible before sending the IPI
-* handler locks the list to pull the entry off it because of
-* normal cache coherency rules implied by spinlocks.
+* The list addition should be visible to the target CPU when it pops
+* the head of the list to pull the entry off it in the IPI handler
+* because of normal cache coherency rules implied by the underlying
+* llist ops.
 *
 * If IPIs can go out of order to the cache coherency protocol
 * in an architecture, sufficient synchronisation should be added
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v3 8/8] sched, smp: Trace smp callback causing an IPI

2022-12-02 Thread Valentin Schneider
Context
===

The newly-introduced ipi_send_cpumask tracepoint has a "callback" parameter
which so far has only been fed with NULL.

While CSD_TYPE_SYNC/ASYNC and CSD_TYPE_IRQ_WORK share a similar backing
struct layout (meaning their callback func can be accessed without caring
about the actual CSD type), CSD_TYPE_TTWU doesn't even have a function
attached to its struct. This means we need to check the type of a CSD
before eventually dereferencing its associated callback.

This isn't as trivial as it sounds: the CSD type is stored in
__call_single_node.u_flags, which get cleared right before the callback is
executed via csd_unlock(). This implies checking the CSD type before it is
enqueued on the call_single_queue, as the target CPU's queue can be flushed
before we get to sending an IPI.

Furthermore, send_call_function_single_ipi() only has a CPU parameter, and
would need to have an additional argument to trickle down the invoked
function. This is somewhat silly, as the extra argument will always be
pushed down to the function even when nothing is being traced, which is
unnecessary overhead.

Changes
===

send_call_function_single_ipi() is only used by smp.c, and is defined in
sched/core.c as it contains scheduler-specific ops (set_nr_if_polling() of
a CPU's idle task).

Split it into two parts: the scheduler bits remain in sched/core.c, and the
actual IPI emission is moved into smp.c. This lets us define an
__always_inline helper function that can take the related callback as
parameter without creating useless register pressure in the non-traced path
which only gains a (disabled) static branch.

Do the same thing for the multi IPI case.

Signed-off-by: Valentin Schneider 
---
 kernel/sched/core.c | 18 +++-
 kernel/sched/smp.h  |  2 +-
 kernel/smp.c| 72 +
 3 files changed, 66 insertions(+), 26 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 40587b0d99329..e59aac936dcb9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3743,16 +3743,20 @@ void sched_ttwu_pending(void *arg)
rq_unlock_irqrestore(rq, &rf);
 }
 
-void send_call_function_single_ipi(int cpu)
+/*
+ * Prepare the scene for sending an IPI for a remote smp_call
+ *
+ * Returns true if the caller can proceed with sending the IPI.
+ * Returns false otherwise.
+ */
+bool call_function_single_prep_ipi(int cpu)
 {
-   struct rq *rq = cpu_rq(cpu);
-
-   if (!set_nr_if_polling(rq->idle)) {
-   trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL);
-   arch_send_call_function_single_ipi(cpu);
-   } else {
+   if (set_nr_if_polling(cpu_rq(cpu)->idle)) {
trace_sched_wake_idle_without_ipi(cpu);
+   return false;
}
+
+   return true;
 }
 
 /*
diff --git a/kernel/sched/smp.h b/kernel/sched/smp.h
index 2eb23dd0f2856..21ac44428bb02 100644
--- a/kernel/sched/smp.h
+++ b/kernel/sched/smp.h
@@ -6,7 +6,7 @@
 
 extern void sched_ttwu_pending(void *arg);
 
-extern void send_call_function_single_ipi(int cpu);
+extern bool call_function_single_prep_ipi(int cpu);
 
 #ifdef CONFIG_SMP
 extern void flush_smp_call_function_queue(void);
diff --git a/kernel/smp.c b/kernel/smp.c
index 821b5986721ac..5cd680a7e78ef 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -161,9 +161,18 @@ void __init call_function_init(void)
 }
 
 static __always_inline void
-send_call_function_ipi_mask(const struct cpumask *mask)
+send_call_function_single_ipi(int cpu, smp_call_func_t func)
 {
-   trace_ipi_send_cpumask(mask, _RET_IP_, NULL);
+   if (call_function_single_prep_ipi(cpu)) {
+   trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, func);
+   arch_send_call_function_single_ipi(cpu);
+   }
+}
+
+static __always_inline void
+send_call_function_ipi_mask(const struct cpumask *mask, smp_call_func_t func)
+{
+   trace_ipi_send_cpumask(mask, _RET_IP_, func);
arch_send_call_function_ipi_mask(mask);
 }
 
@@ -430,12 +439,16 @@ static void __smp_call_single_queue_debug(int cpu, struct 
llist_node *node)
struct cfd_seq_local *seq = this_cpu_ptr(&cfd_seq_local);
struct call_function_data *cfd = this_cpu_ptr(&cfd_data);
struct cfd_percpu *pcpu = per_cpu_ptr(cfd->pcpu, cpu);
+   struct __call_single_data *csd;
+
+   csd = container_of(node, call_single_data_t, node.llist);
+   WARN_ON_ONCE(!(CSD_TYPE(csd) & (CSD_TYPE_SYNC | CSD_TYPE_ASYNC)));
 
cfd_seq_store(pcpu->seq_queue, this_cpu, cpu, CFD_SEQ_QUEUE);
if (llist_add(node, &per_cpu(call_single_queue, cpu))) {
cfd_seq_store(pcpu->seq_ipi, this_cpu, cpu, CFD_SEQ_IPI);
cfd_seq_store(seq->ping, this_cpu, cpu, CFD_SEQ_PING);
-   send_call_function_single_ipi(cpu);
+   send_call_function_single_ipi(cpu, csd->func);
cf

Re: [PATCH v3 6/8] treewide: Trace IPIs sent via smp_send_reschedule()

2023-01-09 Thread Valentin Schneider
On 08/01/23 20:17, Huacai Chen wrote:
> Hi, Valentin,
>
> On Fri, Dec 2, 2022 at 11:59 PM Valentin Schneider  
> wrote:
>> @@ -83,7 +83,7 @@ extern void show_ipi_list(struct seq_file *p, int prec);
>>   * it goes straight through and wastes no time serializing
>>   * anything. Worst case is that we lose a reschedule ...
>>   */
>> -static inline void smp_send_reschedule(int cpu)
>> +static inline void arch_smp_send_reschedule(int cpu)
>>  {
>> loongson_send_ipi_single(cpu, SMP_RESCHEDULE);
>>  }
> This function has been moved to arch/loongarch/kernel/smp.c since 6.2.
>

Thanks! I'll make sure to rerun the coccinelle script for the next version.


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v3 3/8] sched, smp: Trace IPIs sent via send_call_function_single_ipi()

2023-01-09 Thread Valentin Schneider
On 07/01/23 12:04, Ingo Molnar wrote:
> * Valentin Schneider  wrote:
>
>> send_call_function_single_ipi() is the thing that sends IPIs at the bottom
>> of smp_call_function*() via either generic_exec_single() or
>> smp_call_function_many_cond(). Give it an IPI-related tracepoint.
>> 
>> Note that this ends up tracing any IPI sent via __smp_call_single_queue(),
>> which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free".
>> 
>> Signed-off-by: Valentin Schneider 
>> Reviewed-by: Steven Rostedt (Google) 
>
> Acked-by: Ingo Molnar 
>
> Patch series logistics:
>
>  - No objections from the scheduler side, this feature looks pretty useful.
>

Thanks!

>  - Certain patches are incomplete, others are noted as being merged 
>separately, so I presume you'll send an updated/completed series 
>eventually?
>

The first patch from Steve is now in, so can drop it.

The other patches are complete, though I need to rebase them and regenerate
the treewide patch to catch any changes that came with 6.2. I'll do that
this week.

The "incompleteness" pointed out in the cover letter is about the types of
IPIs that can be traced. This series covers the ones that end up invoking
some core code (coincidentally those are the most common ones), others such
as e.g. tick_broadcast() for arm, arm64, riscv and hexagon remain
unaffected.

I'm not that much interested in these (other than maybe the tick broadcast
one they are all fairly unfrequent), but I'm happy to have a shot at them
for the sake of completeness - either in that series or in a followup, up
to you.  

>  - We can merge this via the scheduler tree I suspect, as most callbacks 
>affected relate to tip:sched/core and tmp:smp/core - but if you have 
>some other preferred tree that's fine too.
>

Either sound good to me.

> Thanks,
>
>   Ingo


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v4 0/7] Generic IPI sending tracepoint

2023-01-19 Thread Valentin Schneider
Background
==

Detecting IPI *reception* is relatively easy, e.g. using
trace_irq_handler_{entry,exit} or even just function-trace
flush_smp_call_function_queue() for SMP calls.  

Figuring out their *origin*, is trickier as there is no generic tracepoint tied
to e.g. smp_call_function():

o AFAIA x86 has no tracepoint tied to sending IPIs, only receiving them
  (cf. trace_call_function{_single}_entry()).
o arm/arm64 do have trace_ipi_raise(), which gives us the target cpus but also a
  mostly useless string (smp_calls will all be "Function call interrupts").
o Other architectures don't seem to have any IPI-sending related tracepoint.  

I believe one reason those tracepoints used by arm/arm64 ended up as they were
is because these archs used to handle IPIs differently from regular interrupts
(the IRQ driver would directly invoke an IPI-handling routine), which meant 
they 
never showed up in trace_irq_handler_{entry, exit}. The trace_ipi_{entry,exit}
tracepoints gave a way to trace IPI reception but those have become redundant as
of: 

  56afcd3dbd19 ("ARM: Allow IPIs to be handled as normal interrupts")
  d3afc7f12987 ("arm64: Allow IPIs to be handled as normal interrupts")

which gave IPIs a "proper" handler function used through
generic_handle_domain_irq(), which makes them show up via
trace_irq_handler_{entry, exit}.

Changing stuff up
=

Per the above, it would make sense to reshuffle trace_ipi_raise() and move it
into generic code. This also came up during Daniel's talk on Osnoise at the CPU
isolation MC of LPC 2022 [1]. 

Now, to be useful, such a tracepoint needs to export:
o targeted CPU(s)
o calling context

The only way to get the calling context with trace_ipi_raise() is to trigger a
stack dump, e.g. $(trace-cmd -e ipi* -T echo 42).

This is instead introducing a new tracepoint which exports the relevant context
(callsite, and requested callback for when the callsite isn't helpful), and is
usable by all architectures as it sits in generic code. 

Another thing worth mentioning is that depending on the callsite, the _RET_IP_
fed to the tracepoint is not always useful - generic_exec_single() doesn't tell
you much about the actual callback being sent via IPI, which is why the new
tracepoint also has a @callback argument.

Patches
===

o Patches 1-5 spread out the tracepoint across relevant sites.
  Patch 5 ends up sprinkling lots of #include  which I'm not
  the biggest fan of, but is the least horrible solution I've been able to come
  up with so far.
  
o Patch 7 is trying to be smart about tracing the callback associated with the
  IPI.

This results in having IPI trace events for:

o smp_call_function*()
o smp_send_reschedule()
o irq_work_queue*()
o standalone uses of __smp_call_single_queue()

This is incomplete, just looking at arm64 there's more IPI types that aren't
covered: 

  IPI_CPU_STOP,
  IPI_CPU_CRASH_STOP,
  IPI_TIMER,
  IPI_WAKEUP,

but apart from IPI_TIMER (cf. tick_broadcast()), those IPIs are both unfrequent
and accompanied with identifiable interference (stopper or cpuhp threads being
scheduled). I've added a point in my todolist to handle those in a later series
for the sake of completeness.

Links
=

[1]: https://youtu.be/5gT57y4OzBM?t=14234

Revisions
=

v3 -> v4


o Rebased against 6.2-rc4
  Re-ran my coccinelle scripts for the treewide change; only loongarch needed
  changes
o Dropped cpumask trace event field patch (now in 6.2-rc1)
o Applied RB and Ack tags
  Ingo, I wasn't sure if you meant to Ack the whole series or just the patch you
  replied to, so since I didn't want to unlawfully forge any tag I only added
  the one.
o Did a small pass on comments and changelogs

v2 -> v3


o Dropped the generic export of smp_send_reschedule(), turned it into a macro
  and a bunch of imports
o Dropped the send_call_function_single_ipi() macro madness, split it into sched
  and smp bits using some of Peter's suggestions

v1 -> v2


o Ditched single-CPU tracepoint
o Changed tracepoint signature to include callback
o Changed tracepoint callsite field to void *; the parameter is still UL to save
  up on casts due to using _RET_IP_.
o Fixed linking failures due to not exporting smp_send_reschedule()

git range-diff v3 vs v4
=

1:  6820c1880d97d < -:  - tracing: Add __cpumask to denote a trace 
event field that is a cpumask_t
2:  ef594e168af0d ! 1:  8f1309849c859 trace: Add trace_ipi_send_cpumask()
@@ Commit message
 its "reason" argument being an uninformative string (on arm64 all you 
get
 is "Function call interrupts" for SMP calls).
 
-Add a variant of it that exports a target CPU, a callsite and a 
callback.
+Add a variant of it that exports a target cpumask, a callsite and a 
callback.
 
   

[PATCH v4 1/7] trace: Add trace_ipi_send_cpumask()

2023-01-19 Thread Valentin Schneider
trace_ipi_raise() is unsuitable for generically tracing IPI sources due to
its "reason" argument being an uninformative string (on arm64 all you get
is "Function call interrupts" for SMP calls).

Add a variant of it that exports a target cpumask, a callsite and a callback.

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
---
 include/trace/events/ipi.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/include/trace/events/ipi.h b/include/trace/events/ipi.h
index 0be71dad6ec03..b1125dc27682c 100644
--- a/include/trace/events/ipi.h
+++ b/include/trace/events/ipi.h
@@ -35,6 +35,28 @@ TRACE_EVENT(ipi_raise,
TP_printk("target_mask=%s (%s)", __get_bitmask(target_cpus), 
__entry->reason)
 );
 
+TRACE_EVENT(ipi_send_cpumask,
+
+   TP_PROTO(const struct cpumask *cpumask, unsigned long callsite, void 
*callback),
+
+   TP_ARGS(cpumask, callsite, callback),
+
+   TP_STRUCT__entry(
+   __cpumask(cpumask)
+   __field(void *, callsite)
+   __field(void *, callback)
+   ),
+
+   TP_fast_assign(
+   __assign_cpumask(cpumask, cpumask_bits(cpumask));
+   __entry->callsite = (void *)callsite;
+   __entry->callback = callback;
+   ),
+
+   TP_printk("cpumask=%s callsite=%pS callback=%pS",
+ __get_cpumask(cpumask), __entry->callsite, __entry->callback)
+);
+
 DECLARE_EVENT_CLASS(ipi_handler,
 
TP_PROTO(const char *reason),
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v4 2/7] sched, smp: Trace IPIs sent via send_call_function_single_ipi()

2023-01-19 Thread Valentin Schneider
send_call_function_single_ipi() is the thing that sends IPIs at the bottom
of smp_call_function*() via either generic_exec_single() or
smp_call_function_many_cond(). Give it an IPI-related tracepoint.

Note that this ends up tracing any IPI sent via __smp_call_single_queue(),
which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free".

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
Acked-by: Ingo Molnar 
---
 arch/arm/kernel/smp.c   | 3 ---
 arch/arm64/kernel/smp.c | 1 -
 kernel/sched/core.c | 7 +--
 kernel/smp.c| 4 
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 36e6efad89f30..45b8ca2ce521f 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -48,9 +48,6 @@
 #include 
 #include 
 
-#define CREATE_TRACE_POINTS
-#include 
-
 /*
  * as from 2.5, kernels no longer have an init_tasks structure
  * so we need some other way of telling a new secondary core
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ffc5d76cf6955..937d2623e06ba 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -51,7 +51,6 @@
 #include 
 #include 
 
-#define CREATE_TRACE_POINTS
 #include 
 
 DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bb1ee6d7bddea..c0d09eb249603 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -81,6 +81,7 @@
 #include 
 #include 
 #undef CREATE_TRACE_POINTS
+#include 
 
 #include "sched.h"
 #include "stats.h"
@@ -3819,10 +3820,12 @@ void send_call_function_single_ipi(int cpu)
 {
struct rq *rq = cpu_rq(cpu);
 
-   if (!set_nr_if_polling(rq->idle))
+   if (!set_nr_if_polling(rq->idle)) {
+   trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL);
arch_send_call_function_single_ipi(cpu);
-   else
+   } else {
trace_sched_wake_idle_without_ipi(cpu);
+   }
 }
 
 /*
diff --git a/kernel/smp.c b/kernel/smp.c
index 06a413987a14a..e2ca1e2f31274 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -26,6 +26,10 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+#undef CREATE_TRACE_POINTS
+
 #include "smpboot.h"
 #include "sched/smp.h"
 
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v4 3/7] smp: Trace IPIs sent via arch_send_call_function_ipi_mask()

2023-01-19 Thread Valentin Schneider
This simply wraps around the arch function and prepends it with a
tracepoint, similar to send_call_function_single_ipi().

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
---
 kernel/smp.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index e2ca1e2f31274..93b4386cd3096 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -160,6 +160,13 @@ void __init call_function_init(void)
smpcfd_prepare_cpu(smp_processor_id());
 }
 
+static __always_inline void
+send_call_function_ipi_mask(const struct cpumask *mask)
+{
+   trace_ipi_send_cpumask(mask, _RET_IP_, NULL);
+   arch_send_call_function_ipi_mask(mask);
+}
+
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
 
 static DEFINE_STATIC_KEY_FALSE(csdlock_debug_enabled);
@@ -970,7 +977,7 @@ static void smp_call_function_many_cond(const struct 
cpumask *mask,
if (nr_cpus == 1)
send_call_function_single_ipi(last_cpu);
else if (likely(nr_cpus > 1))
-   arch_send_call_function_ipi_mask(cfd->cpumask_ipi);
+   send_call_function_ipi_mask(cfd->cpumask_ipi);
 
cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->pinged, this_cpu, 
CFD_SEQ_NOCPU, CFD_SEQ_PINGED);
}
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v4 5/7] treewide: Trace IPIs sent via smp_send_reschedule()

2023-01-19 Thread Valentin Schneider
To be able to trace invocations of smp_send_reschedule(), rename the
arch-specific definitions of it to arch_smp_send_reschedule() and wrap it
into an smp_send_reschedule() that contains a tracepoint.

Changes to include the declaration of the tracepoint were driven by the
following coccinelle script:

  @func_use@
  @@
  smp_send_reschedule(...);

  @include@
  @@
  #include 

  @no_include depends on func_use && !include@
  @@
#include <...>
  +
  + #include 

Signed-off-by: Valentin Schneider 
[csky bits]
Acked-by: Guo Ren 
[riscv bits]
Acked-by: Palmer Dabbelt 
---
 arch/alpha/kernel/smp.c  |  2 +-
 arch/arc/kernel/smp.c|  2 +-
 arch/arm/kernel/smp.c|  2 +-
 arch/arm/mach-actions/platsmp.c  |  2 ++
 arch/arm64/kernel/smp.c  |  2 +-
 arch/csky/kernel/smp.c   |  2 +-
 arch/hexagon/kernel/smp.c|  2 +-
 arch/ia64/kernel/smp.c   |  4 ++--
 arch/loongarch/kernel/smp.c  |  4 ++--
 arch/mips/include/asm/smp.h  |  2 +-
 arch/mips/kernel/rtlx-cmp.c  |  2 ++
 arch/openrisc/kernel/smp.c   |  2 +-
 arch/parisc/kernel/smp.c |  4 ++--
 arch/powerpc/kernel/smp.c|  6 --
 arch/powerpc/kvm/book3s_hv.c |  3 +++
 arch/powerpc/platforms/powernv/subcore.c |  2 ++
 arch/riscv/kernel/smp.c  |  4 ++--
 arch/s390/kernel/smp.c   |  2 +-
 arch/sh/kernel/smp.c |  2 +-
 arch/sparc/kernel/smp_32.c   |  2 +-
 arch/sparc/kernel/smp_64.c   |  2 +-
 arch/x86/include/asm/smp.h   |  2 +-
 arch/x86/kvm/svm/svm.c   |  4 
 arch/x86/kvm/x86.c   |  2 ++
 arch/xtensa/kernel/smp.c |  2 +-
 include/linux/smp.h  | 11 +--
 virt/kvm/kvm_main.c  |  2 ++
 27 files changed, 52 insertions(+), 26 deletions(-)

diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index f4e20f75438f8..38637eb9eebd5 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -562,7 +562,7 @@ handle_ipi(struct pt_regs *regs)
 }
 
 void
-smp_send_reschedule(int cpu)
+arch_smp_send_reschedule(int cpu)
 {
 #ifdef DEBUG_IPI_MSG
if (cpu == hard_smp_processor_id())
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index ad93fe6e4b77d..409cfa4675b40 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -292,7 +292,7 @@ static void ipi_send_msg(const struct cpumask *callmap, 
enum ipi_msg_type msg)
ipi_send_msg_one(cpu, msg);
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
ipi_send_msg_one(cpu, IPI_RESCHEDULE);
 }
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 45b8ca2ce521f..dea24a6e0ed6f 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -744,7 +744,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
ipi_setup(smp_processor_id());
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/arm/mach-actions/platsmp.c b/arch/arm/mach-actions/platsmp.c
index f26618b435145..7b208e96fbb67 100644
--- a/arch/arm/mach-actions/platsmp.c
+++ b/arch/arm/mach-actions/platsmp.c
@@ -20,6 +20,8 @@
 #include 
 #include 
 
+#include 
+
 #define OWL_CPU1_ADDR  0x50
 #define OWL_CPU1_FLAG  0x5c
 
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 937d2623e06ba..8d108edc4a89f 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -976,7 +976,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
ipi_setup(smp_processor_id());
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/csky/kernel/smp.c b/arch/csky/kernel/smp.c
index 4b605aa2e1d65..fd7f81be16dd6 100644
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -140,7 +140,7 @@ void smp_send_stop(void)
on_each_cpu(ipi_stop, NULL, 1);
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
send_ipi_message(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c
index 4ba93e59370c4..4e8bee25b8c68 100644
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -217,7 +217,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
}
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
send_ipi(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c
index e2cc59db86bc2..ea4f009a232b4 100644
--- a/arch/ia64/kernel/smp.c
+++ b/arch/ia64/kernel/smp.c
@@ -220,11 +220,11 @@ kdump_smp_send_init(void)
  * Called with preemption disabl

[PATCH v4 4/7] irq_work: Trace self-IPIs sent via arch_irq_work_raise()

2023-01-19 Thread Valentin Schneider
IPIs sent to remote CPUs via irq_work_queue_on() are now covered by
trace_ipi_send_cpumask(), add another instance of the tracepoint to cover
self-IPIs.

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
---
 kernel/irq_work.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 7afa40fe5cc43..c33e88e32a67a 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 
+#include 
+
 static DEFINE_PER_CPU(struct llist_head, raised_list);
 static DEFINE_PER_CPU(struct llist_head, lazy_list);
 static DEFINE_PER_CPU(struct task_struct *, irq_workd);
@@ -74,6 +76,16 @@ void __weak arch_irq_work_raise(void)
 */
 }
 
+static __always_inline void irq_work_raise(struct irq_work *work)
+{
+   if (trace_ipi_send_cpumask_enabled() && arch_irq_work_has_interrupt())
+   trace_ipi_send_cpumask(cpumask_of(smp_processor_id()),
+  _RET_IP_,
+  work->func);
+
+   arch_irq_work_raise();
+}
+
 /* Enqueue on current CPU, work must already be claimed and preempt disabled */
 static void __irq_work_queue_local(struct irq_work *work)
 {
@@ -99,7 +111,7 @@ static void __irq_work_queue_local(struct irq_work *work)
 
/* If the work is "lazy", handle it from next tick if any */
if (!lazy_work || tick_nohz_tick_stopped())
-   arch_irq_work_raise();
+   irq_work_raise(work);
 }
 
 /* Enqueue the irq work @work on the current CPU */
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v4 6/7] smp: reword smp call IPI comment

2023-01-19 Thread Valentin Schneider
Accessing the call_single_queue hasn't involved a spinlock since 2014:

  6897fc22ea01 ("kernel: use lockless list for smp_call_function_single")

The llist operations (namely cmpxchg() and xchg()) provide similar ordering
guarantees, update the comment to lessen confusion.

Signed-off-by: Valentin Schneider 
---
 kernel/smp.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 93b4386cd3096..821b5986721ac 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -495,9 +495,10 @@ void __smp_call_single_queue(int cpu, struct llist_node 
*node)
 #endif
 
/*
-* The list addition should be visible before sending the IPI
-* handler locks the list to pull the entry off it because of
-* normal cache coherency rules implied by spinlocks.
+* The list addition should be visible to the target CPU when it pops
+* the head of the list to pull the entry off it in the IPI handler
+* because of normal cache coherency rules implied by the underlying
+* llist ops.
 *
 * If IPIs can go out of order to the cache coherency protocol
 * in an architecture, sufficient synchronisation should be added
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v4 7/7] sched, smp: Trace smp callback causing an IPI

2023-01-19 Thread Valentin Schneider
Context
===

The newly-introduced ipi_send_cpumask tracepoint has a "callback" parameter
which so far has only been fed with NULL.

While CSD_TYPE_SYNC/ASYNC and CSD_TYPE_IRQ_WORK share a similar backing
struct layout (meaning their callback func can be accessed without caring
about the actual CSD type), CSD_TYPE_TTWU doesn't even have a function
attached to its struct. This means we need to check the type of a CSD
before eventually dereferencing its associated callback.

This isn't as trivial as it sounds: the CSD type is stored in
__call_single_node.u_flags, which get cleared right before the callback is
executed via csd_unlock(). This implies checking the CSD type before it is
enqueued on the call_single_queue, as the target CPU's queue can be flushed
before we get to sending an IPI.

Furthermore, send_call_function_single_ipi() only has a CPU parameter, and
would need to have an additional argument to trickle down the invoked
function. This is somewhat silly, as the extra argument will always be
pushed down to the function even when nothing is being traced, which is
unnecessary overhead.

Changes
===

send_call_function_single_ipi() is only used by smp.c, and is defined in
sched/core.c as it contains scheduler-specific ops (set_nr_if_polling() of
a CPU's idle task).

Split it into two parts: the scheduler bits remain in sched/core.c, and the
actual IPI emission is moved into smp.c. This lets us define an
__always_inline helper function that can take the related callback as
parameter without creating useless register pressure in the non-traced path
which only gains a (disabled) static branch.

Do the same thing for the multi IPI case.

Signed-off-by: Valentin Schneider 
---
 kernel/sched/core.c | 18 +++-
 kernel/sched/smp.h  |  2 +-
 kernel/smp.c| 72 +
 3 files changed, 66 insertions(+), 26 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c0d09eb249603..9733c3ecdbf16 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3816,16 +3816,20 @@ void sched_ttwu_pending(void *arg)
rq_unlock_irqrestore(rq, &rf);
 }
 
-void send_call_function_single_ipi(int cpu)
+/*
+ * Prepare the scene for sending an IPI for a remote smp_call
+ *
+ * Returns true if the caller can proceed with sending the IPI.
+ * Returns false otherwise.
+ */
+bool call_function_single_prep_ipi(int cpu)
 {
-   struct rq *rq = cpu_rq(cpu);
-
-   if (!set_nr_if_polling(rq->idle)) {
-   trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL);
-   arch_send_call_function_single_ipi(cpu);
-   } else {
+   if (set_nr_if_polling(cpu_rq(cpu)->idle)) {
trace_sched_wake_idle_without_ipi(cpu);
+   return false;
}
+
+   return true;
 }
 
 /*
diff --git a/kernel/sched/smp.h b/kernel/sched/smp.h
index 2eb23dd0f2856..21ac44428bb02 100644
--- a/kernel/sched/smp.h
+++ b/kernel/sched/smp.h
@@ -6,7 +6,7 @@
 
 extern void sched_ttwu_pending(void *arg);
 
-extern void send_call_function_single_ipi(int cpu);
+extern bool call_function_single_prep_ipi(int cpu);
 
 #ifdef CONFIG_SMP
 extern void flush_smp_call_function_queue(void);
diff --git a/kernel/smp.c b/kernel/smp.c
index 821b5986721ac..5cd680a7e78ef 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -161,9 +161,18 @@ void __init call_function_init(void)
 }
 
 static __always_inline void
-send_call_function_ipi_mask(const struct cpumask *mask)
+send_call_function_single_ipi(int cpu, smp_call_func_t func)
 {
-   trace_ipi_send_cpumask(mask, _RET_IP_, NULL);
+   if (call_function_single_prep_ipi(cpu)) {
+   trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, func);
+   arch_send_call_function_single_ipi(cpu);
+   }
+}
+
+static __always_inline void
+send_call_function_ipi_mask(const struct cpumask *mask, smp_call_func_t func)
+{
+   trace_ipi_send_cpumask(mask, _RET_IP_, func);
arch_send_call_function_ipi_mask(mask);
 }
 
@@ -430,12 +439,16 @@ static void __smp_call_single_queue_debug(int cpu, struct 
llist_node *node)
struct cfd_seq_local *seq = this_cpu_ptr(&cfd_seq_local);
struct call_function_data *cfd = this_cpu_ptr(&cfd_data);
struct cfd_percpu *pcpu = per_cpu_ptr(cfd->pcpu, cpu);
+   struct __call_single_data *csd;
+
+   csd = container_of(node, call_single_data_t, node.llist);
+   WARN_ON_ONCE(!(CSD_TYPE(csd) & (CSD_TYPE_SYNC | CSD_TYPE_ASYNC)));
 
cfd_seq_store(pcpu->seq_queue, this_cpu, cpu, CFD_SEQ_QUEUE);
if (llist_add(node, &per_cpu(call_single_queue, cpu))) {
cfd_seq_store(pcpu->seq_ipi, this_cpu, cpu, CFD_SEQ_IPI);
cfd_seq_store(seq->ping, this_cpu, cpu, CFD_SEQ_PING);
-   send_call_function_single_ipi(cpu);
+   send_call_function_single_ipi(cpu, csd->func);
cf

Re: [PATCH v4 0/7] Generic IPI sending tracepoint

2023-02-14 Thread Valentin Schneider


Hey folks,

On 19/01/23 14:36, Valentin Schneider wrote:
> Patches
> ===
>
> o Patches 1-5 spread out the tracepoint across relevant sites.
>   Patch 5 ends up sprinkling lots of #include  which I'm 
> not
>   the biggest fan of, but is the least horrible solution I've been able to 
> come
>   up with so far.
>
> o Patch 7 is trying to be smart about tracing the callback associated with the
>   IPI.
>
> This results in having IPI trace events for:
>
> o smp_call_function*()
> o smp_send_reschedule()
> o irq_work_queue*()
> o standalone uses of __smp_call_single_queue()
>

This still rebases cleanly on top of the latest tip/sched/core, any
objections to parking it there?


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v5 0/7] Generic IPI sending tracepoint

2023-03-07 Thread Valentin Schneider
ule(), turned it into a macro
  and a bunch of imports
o Dropped the send_call_function_single_ipi() macro madness, split it into sched
  and smp bits using some of Peter's suggestions

v1 -> v2


o Ditched single-CPU tracepoint
o Changed tracepoint signature to include callback
o Changed tracepoint callsite field to void *; the parameter is still UL to save
  up on casts due to using _RET_IP_.
o Fixed linking failures due to not exporting smp_send_reschedule()

Valentin Schneider (7):
  trace: Add trace_ipi_send_cpumask()
  sched, smp: Trace IPIs sent via send_call_function_single_ipi()
  smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
  irq_work: Trace self-IPIs sent via arch_irq_work_raise()
  treewide: Trace IPIs sent via smp_send_reschedule()
  smp: reword smp call IPI comment
  sched, smp: Trace smp callback causing an IPI

 arch/alpha/kernel/smp.c  |  2 +-
 arch/arc/kernel/smp.c|  2 +-
 arch/arm/kernel/smp.c|  5 +-
 arch/arm/mach-actions/platsmp.c  |  2 +
 arch/arm64/kernel/smp.c  |  3 +-
 arch/csky/kernel/smp.c   |  2 +-
 arch/hexagon/kernel/smp.c|  2 +-
 arch/ia64/kernel/smp.c   |  4 +-
 arch/loongarch/kernel/smp.c  |  4 +-
 arch/mips/include/asm/smp.h  |  2 +-
 arch/mips/kernel/rtlx-cmp.c  |  2 +
 arch/openrisc/kernel/smp.c   |  2 +-
 arch/parisc/kernel/smp.c |  4 +-
 arch/powerpc/kernel/smp.c|  6 +-
 arch/powerpc/kvm/book3s_hv.c |  3 +
 arch/powerpc/platforms/powernv/subcore.c |  2 +
 arch/riscv/kernel/smp.c  |  4 +-
 arch/s390/kernel/smp.c   |  2 +-
 arch/sh/kernel/smp.c |  2 +-
 arch/sparc/kernel/smp_32.c   |  2 +-
 arch/sparc/kernel/smp_64.c   |  2 +-
 arch/x86/include/asm/smp.h   |  2 +-
 arch/x86/kvm/svm/svm.c   |  4 ++
 arch/x86/kvm/x86.c   |  2 +
 arch/xtensa/kernel/smp.c |  2 +-
 include/linux/smp.h  | 11 +++-
 include/trace/events/ipi.h   | 22 +++
 kernel/irq_work.c| 14 -
 kernel/sched/core.c  | 19 --
 kernel/sched/smp.h   |  2 +-
 kernel/smp.c | 78 +++-
 virt/kvm/kvm_main.c  |  2 +
 32 files changed, 164 insertions(+), 53 deletions(-)

--
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v5 1/7] trace: Add trace_ipi_send_cpumask()

2023-03-07 Thread Valentin Schneider
trace_ipi_raise() is unsuitable for generically tracing IPI sources due to
its "reason" argument being an uninformative string (on arm64 all you get
is "Function call interrupts" for SMP calls).

Add a variant of it that exports a target cpumask, a callsite and a callback.

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
---
 include/trace/events/ipi.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/include/trace/events/ipi.h b/include/trace/events/ipi.h
index 0be71dad6ec03..b1125dc27682c 100644
--- a/include/trace/events/ipi.h
+++ b/include/trace/events/ipi.h
@@ -35,6 +35,28 @@ TRACE_EVENT(ipi_raise,
TP_printk("target_mask=%s (%s)", __get_bitmask(target_cpus), 
__entry->reason)
 );
 
+TRACE_EVENT(ipi_send_cpumask,
+
+   TP_PROTO(const struct cpumask *cpumask, unsigned long callsite, void 
*callback),
+
+   TP_ARGS(cpumask, callsite, callback),
+
+   TP_STRUCT__entry(
+   __cpumask(cpumask)
+   __field(void *, callsite)
+   __field(void *, callback)
+   ),
+
+   TP_fast_assign(
+   __assign_cpumask(cpumask, cpumask_bits(cpumask));
+   __entry->callsite = (void *)callsite;
+   __entry->callback = callback;
+   ),
+
+   TP_printk("cpumask=%s callsite=%pS callback=%pS",
+ __get_cpumask(cpumask), __entry->callsite, __entry->callback)
+);
+
 DECLARE_EVENT_CLASS(ipi_handler,
 
TP_PROTO(const char *reason),
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v5 3/7] smp: Trace IPIs sent via arch_send_call_function_ipi_mask()

2023-03-07 Thread Valentin Schneider
This simply wraps around the arch function and prepends it with a
tracepoint, similar to send_call_function_single_ipi().

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
---
 kernel/smp.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index e2ca1e2f31274..93b4386cd3096 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -160,6 +160,13 @@ void __init call_function_init(void)
smpcfd_prepare_cpu(smp_processor_id());
 }
 
+static __always_inline void
+send_call_function_ipi_mask(const struct cpumask *mask)
+{
+   trace_ipi_send_cpumask(mask, _RET_IP_, NULL);
+   arch_send_call_function_ipi_mask(mask);
+}
+
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
 
 static DEFINE_STATIC_KEY_FALSE(csdlock_debug_enabled);
@@ -970,7 +977,7 @@ static void smp_call_function_many_cond(const struct 
cpumask *mask,
if (nr_cpus == 1)
send_call_function_single_ipi(last_cpu);
else if (likely(nr_cpus > 1))
-   arch_send_call_function_ipi_mask(cfd->cpumask_ipi);
+   send_call_function_ipi_mask(cfd->cpumask_ipi);
 
cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->pinged, this_cpu, 
CFD_SEQ_NOCPU, CFD_SEQ_PINGED);
}
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v5 2/7] sched, smp: Trace IPIs sent via send_call_function_single_ipi()

2023-03-07 Thread Valentin Schneider
send_call_function_single_ipi() is the thing that sends IPIs at the bottom
of smp_call_function*() via either generic_exec_single() or
smp_call_function_many_cond(). Give it an IPI-related tracepoint.

Note that this ends up tracing any IPI sent via __smp_call_single_queue(),
which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free".

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
Acked-by: Ingo Molnar 
---
 arch/arm/kernel/smp.c   | 3 ---
 arch/arm64/kernel/smp.c | 1 -
 kernel/sched/core.c | 7 +--
 kernel/smp.c| 4 
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 0b8c25763adc3..b6c832e195427 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -48,9 +48,6 @@
 #include 
 #include 
 
-#define CREATE_TRACE_POINTS
-#include 
-
 /*
  * as from 2.5, kernels no longer have an init_tasks structure
  * so we need some other way of telling a new secondary core
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 4e83272642552..438c16fc44633 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -51,7 +51,6 @@
 #include 
 #include 
 
-#define CREATE_TRACE_POINTS
 #include 
 
 DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index af017e038b482..85114f75f1c9c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -81,6 +81,7 @@
 #include 
 #include 
 #undef CREATE_TRACE_POINTS
+#include 
 
 #include "sched.h"
 #include "stats.h"
@@ -3830,10 +3831,12 @@ void send_call_function_single_ipi(int cpu)
 {
struct rq *rq = cpu_rq(cpu);
 
-   if (!set_nr_if_polling(rq->idle))
+   if (!set_nr_if_polling(rq->idle)) {
+   trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL);
arch_send_call_function_single_ipi(cpu);
-   else
+   } else {
trace_sched_wake_idle_without_ipi(cpu);
+   }
 }
 
 /*
diff --git a/kernel/smp.c b/kernel/smp.c
index 06a413987a14a..e2ca1e2f31274 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -26,6 +26,10 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+#undef CREATE_TRACE_POINTS
+
 #include "smpboot.h"
 #include "sched/smp.h"
 
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v5 5/7] treewide: Trace IPIs sent via smp_send_reschedule()

2023-03-07 Thread Valentin Schneider
To be able to trace invocations of smp_send_reschedule(), rename the
arch-specific definitions of it to arch_smp_send_reschedule() and wrap it
into an smp_send_reschedule() that contains a tracepoint.

Changes to include the declaration of the tracepoint were driven by the
following coccinelle script:

  @func_use@
  @@
  smp_send_reschedule(...);

  @include@
  @@
  #include 

  @no_include depends on func_use && !include@
  @@
#include <...>
  +
  + #include 

Signed-off-by: Valentin Schneider 
[csky bits]
Acked-by: Guo Ren 
[riscv bits]
Acked-by: Palmer Dabbelt 
---
 arch/alpha/kernel/smp.c  |  2 +-
 arch/arc/kernel/smp.c|  2 +-
 arch/arm/kernel/smp.c|  2 +-
 arch/arm/mach-actions/platsmp.c  |  2 ++
 arch/arm64/kernel/smp.c  |  2 +-
 arch/csky/kernel/smp.c   |  2 +-
 arch/hexagon/kernel/smp.c|  2 +-
 arch/ia64/kernel/smp.c   |  4 ++--
 arch/loongarch/kernel/smp.c  |  4 ++--
 arch/mips/include/asm/smp.h  |  2 +-
 arch/mips/kernel/rtlx-cmp.c  |  2 ++
 arch/openrisc/kernel/smp.c   |  2 +-
 arch/parisc/kernel/smp.c |  4 ++--
 arch/powerpc/kernel/smp.c|  6 --
 arch/powerpc/kvm/book3s_hv.c |  3 +++
 arch/powerpc/platforms/powernv/subcore.c |  2 ++
 arch/riscv/kernel/smp.c  |  4 ++--
 arch/s390/kernel/smp.c   |  2 +-
 arch/sh/kernel/smp.c |  2 +-
 arch/sparc/kernel/smp_32.c   |  2 +-
 arch/sparc/kernel/smp_64.c   |  2 +-
 arch/x86/include/asm/smp.h   |  2 +-
 arch/x86/kvm/svm/svm.c   |  4 
 arch/x86/kvm/x86.c   |  2 ++
 arch/xtensa/kernel/smp.c |  2 +-
 include/linux/smp.h  | 11 +--
 virt/kvm/kvm_main.c  |  2 ++
 27 files changed, 52 insertions(+), 26 deletions(-)

diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index 0ede4b044e869..7439b2377df57 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -562,7 +562,7 @@ handle_ipi(struct pt_regs *regs)
 }
 
 void
-smp_send_reschedule(int cpu)
+arch_smp_send_reschedule(int cpu)
 {
 #ifdef DEBUG_IPI_MSG
if (cpu == hard_smp_processor_id())
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index ad93fe6e4b77d..409cfa4675b40 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -292,7 +292,7 @@ static void ipi_send_msg(const struct cpumask *callmap, 
enum ipi_msg_type msg)
ipi_send_msg_one(cpu, msg);
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
ipi_send_msg_one(cpu, IPI_RESCHEDULE);
 }
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index b6c832e195427..46b23dc1f94ad 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -744,7 +744,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
ipi_setup(smp_processor_id());
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/arm/mach-actions/platsmp.c b/arch/arm/mach-actions/platsmp.c
index f26618b435145..7b208e96fbb67 100644
--- a/arch/arm/mach-actions/platsmp.c
+++ b/arch/arm/mach-actions/platsmp.c
@@ -20,6 +20,8 @@
 #include 
 #include 
 
+#include 
+
 #define OWL_CPU1_ADDR  0x50
 #define OWL_CPU1_FLAG  0x5c
 
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 438c16fc44633..66f2745062dda 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -976,7 +976,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
ipi_setup(smp_processor_id());
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/csky/kernel/smp.c b/arch/csky/kernel/smp.c
index b45d1073307f2..be77383acb5fc 100644
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -140,7 +140,7 @@ void smp_send_stop(void)
on_each_cpu(ipi_stop, NULL, 1);
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
send_ipi_message(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c
index 4ba93e59370c4..4e8bee25b8c68 100644
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -217,7 +217,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
}
 }
 
-void smp_send_reschedule(int cpu)
+void arch_smp_send_reschedule(int cpu)
 {
send_ipi(cpumask_of(cpu), IPI_RESCHEDULE);
 }
diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c
index e2cc59db86bc2..ea4f009a232b4 100644
--- a/arch/ia64/kernel/smp.c
+++ b/arch/ia64/kernel/smp.c
@@ -220,11 +220,11 @@ kdump_smp_send_init(void)
  * Called with preemption disabl

[PATCH v5 4/7] irq_work: Trace self-IPIs sent via arch_irq_work_raise()

2023-03-07 Thread Valentin Schneider
IPIs sent to remote CPUs via irq_work_queue_on() are now covered by
trace_ipi_send_cpumask(), add another instance of the tracepoint to cover
self-IPIs.

Signed-off-by: Valentin Schneider 
Reviewed-by: Steven Rostedt (Google) 
---
 kernel/irq_work.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 7afa40fe5cc43..c33e88e32a67a 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 
+#include 
+
 static DEFINE_PER_CPU(struct llist_head, raised_list);
 static DEFINE_PER_CPU(struct llist_head, lazy_list);
 static DEFINE_PER_CPU(struct task_struct *, irq_workd);
@@ -74,6 +76,16 @@ void __weak arch_irq_work_raise(void)
 */
 }
 
+static __always_inline void irq_work_raise(struct irq_work *work)
+{
+   if (trace_ipi_send_cpumask_enabled() && arch_irq_work_has_interrupt())
+   trace_ipi_send_cpumask(cpumask_of(smp_processor_id()),
+  _RET_IP_,
+  work->func);
+
+   arch_irq_work_raise();
+}
+
 /* Enqueue on current CPU, work must already be claimed and preempt disabled */
 static void __irq_work_queue_local(struct irq_work *work)
 {
@@ -99,7 +111,7 @@ static void __irq_work_queue_local(struct irq_work *work)
 
/* If the work is "lazy", handle it from next tick if any */
if (!lazy_work || tick_nohz_tick_stopped())
-   arch_irq_work_raise();
+   irq_work_raise(work);
 }
 
 /* Enqueue the irq work @work on the current CPU */
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v5 6/7] smp: reword smp call IPI comment

2023-03-07 Thread Valentin Schneider
Accessing the call_single_queue hasn't involved a spinlock since 2014:

  6897fc22ea01 ("kernel: use lockless list for smp_call_function_single")

The llist operations (namely cmpxchg() and xchg()) provide similar ordering
guarantees, update the comment to lessen confusion.

Signed-off-by: Valentin Schneider 
---
 kernel/smp.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 93b4386cd3096..821b5986721ac 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -495,9 +495,10 @@ void __smp_call_single_queue(int cpu, struct llist_node 
*node)
 #endif
 
/*
-* The list addition should be visible before sending the IPI
-* handler locks the list to pull the entry off it because of
-* normal cache coherency rules implied by spinlocks.
+* The list addition should be visible to the target CPU when it pops
+* the head of the list to pull the entry off it in the IPI handler
+* because of normal cache coherency rules implied by the underlying
+* llist ops.
 *
 * If IPIs can go out of order to the cache coherency protocol
 * in an architecture, sufficient synchronisation should be added
-- 
2.31.1


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI

2023-03-07 Thread Valentin Schneider
Context
===

The newly-introduced ipi_send_cpumask tracepoint has a "callback" parameter
which so far has only been fed with NULL.

While CSD_TYPE_SYNC/ASYNC and CSD_TYPE_IRQ_WORK share a similar backing
struct layout (meaning their callback func can be accessed without caring
about the actual CSD type), CSD_TYPE_TTWU doesn't even have a function
attached to its struct. This means we need to check the type of a CSD
before eventually dereferencing its associated callback.

This isn't as trivial as it sounds: the CSD type is stored in
__call_single_node.u_flags, which get cleared right before the callback is
executed via csd_unlock(). This implies checking the CSD type before it is
enqueued on the call_single_queue, as the target CPU's queue can be flushed
before we get to sending an IPI.

Furthermore, send_call_function_single_ipi() only has a CPU parameter, and
would need to have an additional argument to trickle down the invoked
function. This is somewhat silly, as the extra argument will always be
pushed down to the function even when nothing is being traced, which is
unnecessary overhead.

Changes
===

send_call_function_single_ipi() is only used by smp.c, and is defined in
sched/core.c as it contains scheduler-specific ops (set_nr_if_polling() of
a CPU's idle task).

Split it into two parts: the scheduler bits remain in sched/core.c, and the
actual IPI emission is moved into smp.c. This lets us define an
__always_inline helper function that can take the related callback as
parameter without creating useless register pressure in the non-traced path
which only gains a (disabled) static branch.

Do the same thing for the multi IPI case.

Signed-off-by: Valentin Schneider 
---
 kernel/sched/core.c | 18 +++-
 kernel/sched/smp.h  |  2 +-
 kernel/smp.c| 72 +
 3 files changed, 66 insertions(+), 26 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 85114f75f1c9c..60c79b4e4a5b1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3827,16 +3827,20 @@ void sched_ttwu_pending(void *arg)
rq_unlock_irqrestore(rq, &rf);
 }
 
-void send_call_function_single_ipi(int cpu)
+/*
+ * Prepare the scene for sending an IPI for a remote smp_call
+ *
+ * Returns true if the caller can proceed with sending the IPI.
+ * Returns false otherwise.
+ */
+bool call_function_single_prep_ipi(int cpu)
 {
-   struct rq *rq = cpu_rq(cpu);
-
-   if (!set_nr_if_polling(rq->idle)) {
-   trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL);
-   arch_send_call_function_single_ipi(cpu);
-   } else {
+   if (set_nr_if_polling(cpu_rq(cpu)->idle)) {
trace_sched_wake_idle_without_ipi(cpu);
+   return false;
}
+
+   return true;
 }
 
 /*
diff --git a/kernel/sched/smp.h b/kernel/sched/smp.h
index 2eb23dd0f2856..21ac44428bb02 100644
--- a/kernel/sched/smp.h
+++ b/kernel/sched/smp.h
@@ -6,7 +6,7 @@
 
 extern void sched_ttwu_pending(void *arg);
 
-extern void send_call_function_single_ipi(int cpu);
+extern bool call_function_single_prep_ipi(int cpu);
 
 #ifdef CONFIG_SMP
 extern void flush_smp_call_function_queue(void);
diff --git a/kernel/smp.c b/kernel/smp.c
index 821b5986721ac..5cd680a7e78ef 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -161,9 +161,18 @@ void __init call_function_init(void)
 }
 
 static __always_inline void
-send_call_function_ipi_mask(const struct cpumask *mask)
+send_call_function_single_ipi(int cpu, smp_call_func_t func)
 {
-   trace_ipi_send_cpumask(mask, _RET_IP_, NULL);
+   if (call_function_single_prep_ipi(cpu)) {
+   trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, func);
+   arch_send_call_function_single_ipi(cpu);
+   }
+}
+
+static __always_inline void
+send_call_function_ipi_mask(const struct cpumask *mask, smp_call_func_t func)
+{
+   trace_ipi_send_cpumask(mask, _RET_IP_, func);
arch_send_call_function_ipi_mask(mask);
 }
 
@@ -430,12 +439,16 @@ static void __smp_call_single_queue_debug(int cpu, struct 
llist_node *node)
struct cfd_seq_local *seq = this_cpu_ptr(&cfd_seq_local);
struct call_function_data *cfd = this_cpu_ptr(&cfd_data);
struct cfd_percpu *pcpu = per_cpu_ptr(cfd->pcpu, cpu);
+   struct __call_single_data *csd;
+
+   csd = container_of(node, call_single_data_t, node.llist);
+   WARN_ON_ONCE(!(CSD_TYPE(csd) & (CSD_TYPE_SYNC | CSD_TYPE_ASYNC)));
 
cfd_seq_store(pcpu->seq_queue, this_cpu, cpu, CFD_SEQ_QUEUE);
if (llist_add(node, &per_cpu(call_single_queue, cpu))) {
cfd_seq_store(pcpu->seq_ipi, this_cpu, cpu, CFD_SEQ_IPI);
cfd_seq_store(seq->ping, this_cpu, cpu, CFD_SEQ_PING);
-   send_call_function_single_ipi(cpu);
+   send_call_function_single_ipi(cpu, csd->func);
cf

Re: [PATCH v5 1/7] trace: Add trace_ipi_send_cpumask()

2023-03-22 Thread Valentin Schneider
On 22/03/23 11:30, Peter Zijlstra wrote:
> On Wed, Mar 22, 2023 at 10:39:55AM +0100, Peter Zijlstra wrote:
>> On Tue, Mar 07, 2023 at 02:35:52PM +, Valentin Schneider wrote:
>> > +TRACE_EVENT(ipi_send_cpumask,
>> > +
>> > +  TP_PROTO(const struct cpumask *cpumask, unsigned long callsite, void 
>> > *callback),
>> > +
>> > +  TP_ARGS(cpumask, callsite, callback),
>> > +
>> > +  TP_STRUCT__entry(
>> > +  __cpumask(cpumask)
>> > +  __field(void *, callsite)
>> > +  __field(void *, callback)
>> > +  ),
>> > +
>> > +  TP_fast_assign(
>> > +  __assign_cpumask(cpumask, cpumask_bits(cpumask));
>> > +  __entry->callsite = (void *)callsite;
>> > +  __entry->callback = callback;
>> > +  ),
>> > +
>> > +  TP_printk("cpumask=%s callsite=%pS callback=%pS",
>> > +__get_cpumask(cpumask), __entry->callsite, __entry->callback)
>> > +);
>>
>> Would it make sense to add a variant like: ipi_send_cpu() that records a
>> single cpu instead of a cpumask. A lot of sites seems to do:
>> cpumask_of(cpu) for that first argument, and it seems to me it is quite
>> daft to have to memcpy a full multi-word cpumask in those cases.
>>
>> Remember, nr_possible_cpus > 64 is quite common these days.
>
> Something we litte bit like so...
>

I was wondering whether we could stick with a single trace event, but let
ftrace be aware of weight=1 vs weight>1 cpumasks.

For weight>1, it would memcpy() as usual, for weight=1, it could write a
pointer to a cpu_bit_bitmap[] equivalent embedded in the trace itself.

Unfortunately, Ftrace bitmasks are represented as a u32 made of two 16 bit
values: [offset in event record, size], so there isn't a straightforward
way to point to a "reusable" cpumask. AFAICT the only alternative would be
to do that via a different trace event, but then we should just go with a
plain old uint - i.e. do what you're doing here, so:

Tested-and-reviewed-by: Valentin Schneider 

(with the tiny typo fix below)

> @@ -35,6 +35,28 @@ TRACE_EVENT(ipi_raise,
>   TP_printk("target_mask=%s (%s)", __get_bitmask(target_cpus), 
> __entry->reason)
>  );
>
> +TRACE_EVENT(ipi_send_cpu,
> +
> + TP_PROTO(const unsigned int cpu, unsigned long callsite, void 
> *callback),
> +
> + TP_ARGS(cpu, callsite, callback),
> +
> + TP_STRUCT__entry(
> + __field(unsigned int, cpu)
> + __field(void *, callsite)
> + __field(void *, callback)
> + ),
> +
> + TP_fast_assign(
> + __entry->cpu = cpu;
> + __entry->callsite = (void *)callsite;
> + __entry->callback = callback;
> + ),
> +
> + TP_printk("cpu=%s callsite=%pS callback=%pS",
^
  s/s/u/

> +   __entry->cpu, __entry->callsite, __entry->callback)
> +);
> +


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI

2023-03-22 Thread Valentin Schneider
On 22/03/23 10:53, Peter Zijlstra wrote:
> On Tue, Mar 07, 2023 at 02:35:58PM +0000, Valentin Schneider wrote:
>
>> @@ -477,6 +490,25 @@ static __always_inline void csd_unlock(struct 
>> __call_single_data *csd)
>>  smp_store_release(&csd->node.u_flags, 0);
>>  }
>>
>> +static __always_inline void
>> +raw_smp_call_single_queue(int cpu, struct llist_node *node, smp_call_func_t 
>> func)
>> +{
>> +/*
>> + * The list addition should be visible to the target CPU when it pops
>> + * the head of the list to pull the entry off it in the IPI handler
>> + * because of normal cache coherency rules implied by the underlying
>> + * llist ops.
>> + *
>> + * If IPIs can go out of order to the cache coherency protocol
>> + * in an architecture, sufficient synchronisation should be added
>> + * to arch code to make it appear to obey cache coherency WRT
>> + * locking and barrier primitives. Generic code isn't really
>> + * equipped to do the right thing...
>> + */
>> +if (llist_add(node, &per_cpu(call_single_queue, cpu)))
>> +send_call_function_single_ipi(cpu, func);
>> +}
>> +
>>  static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
>>
>>  void __smp_call_single_queue(int cpu, struct llist_node *node)
>> @@ -493,21 +525,25 @@ void __smp_call_single_queue(int cpu, struct 
>> llist_node *node)
>>  }
>>  }
>>  #endif
>>  /*
>> + * We have to check the type of the CSD before queueing it, because
>> + * once queued it can have its flags cleared by
>> + *   flush_smp_call_function_queue()
>> + * even if we haven't sent the smp_call IPI yet (e.g. the stopper
>> + * executes migration_cpu_stop() on the remote CPU).
>>   */
>> +if (trace_ipi_send_cpumask_enabled()) {
>> +call_single_data_t *csd;
>> +smp_call_func_t func;
>> +
>> +csd = container_of(node, call_single_data_t, node.llist);
>> +func = CSD_TYPE(csd) == CSD_TYPE_TTWU ?
>> +sched_ttwu_pending : csd->func;
>> +
>> +raw_smp_call_single_queue(cpu, node, func);
>> +} else {
>> +raw_smp_call_single_queue(cpu, node, NULL);
>> +}
>>  }
>
> Hurmph... so we only really consume @func when we IPI. Would it not be
> more useful to trace this thing for *every* csd enqeued?

It's true that any CSD enqueued on that CPU's call_single_queue in the
[first CSD llist_add()'ed, IPI IRQ hits] timeframe is a potential source of
interference.

However, can we be sure that first CSD isn't an indirect cause for the
following ones? say the target CPU exits RCU EQS due to the IPI, there's a
bit of time before it gets to flush_smp_call_function_queue() where some other 
CSD
could be enqueued *because* of that change in state.

I couldn't find a easy example of that, I might be biased as this is where
I'd like to go wrt IPI'ing isolated CPUs in usermode. But regardless, when
correlating an IPI IRQ with its source, we'd always have to look at the
first CSD in that CSD stack.


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI

2023-03-22 Thread Valentin Schneider
On 22/03/23 15:04, Peter Zijlstra wrote:
> On Wed, Mar 22, 2023 at 12:20:28PM +0000, Valentin Schneider wrote:
>> On 22/03/23 10:53, Peter Zijlstra wrote:
>
>> > Hurmph... so we only really consume @func when we IPI. Would it not be
>> > more useful to trace this thing for *every* csd enqeued?
>>
>> It's true that any CSD enqueued on that CPU's call_single_queue in the
>> [first CSD llist_add()'ed, IPI IRQ hits] timeframe is a potential source of
>> interference.
>>
>> However, can we be sure that first CSD isn't an indirect cause for the
>> following ones? say the target CPU exits RCU EQS due to the IPI, there's a
>> bit of time before it gets to flush_smp_call_function_queue() where some 
>> other CSD
>> could be enqueued *because* of that change in state.
>>
>> I couldn't find a easy example of that, I might be biased as this is where
>> I'd like to go wrt IPI'ing isolated CPUs in usermode. But regardless, when
>> correlating an IPI IRQ with its source, we'd always have to look at the
>> first CSD in that CSD stack.
>
> So I was thinking something like this:
>
> ---
> Subject: trace,smp: Trace all smp_function_call*() invocations
> From: Peter Zijlstra 
> Date: Wed Mar 22 14:58:36 CET 2023
>
> (Ab)use the trace_ipi_send_cpu*() family to trace all
> smp_function_call*() invocations, not only those that result in an
> actual IPI.
>
> The queued entries log their callback function while the actual IPIs
> are traced on generic_smp_call_function_single_interrupt().
>
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  kernel/smp.c |   58 
> ++
>  1 file changed, 30 insertions(+), 28 deletions(-)
>
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -106,18 +106,20 @@ void __init call_function_init(void)
>  }
>
>  static __always_inline void
> -send_call_function_single_ipi(int cpu, smp_call_func_t func)
> +send_call_function_single_ipi(int cpu)
>  {
>   if (call_function_single_prep_ipi(cpu)) {
> - trace_ipi_send_cpu(cpu, _RET_IP_, func);
> + trace_ipi_send_cpu(cpu, _RET_IP_,
> +generic_smp_call_function_single_interrupt);

Hm, this does get rid of the func being passed down the helpers, but this
means the trace events are now stateful, i.e. I need the first and last
events in a CSD stack to figure out which one actually caused the IPI.

It also requires whoever is looking at the trace to be aware of which IPIs
are attached to a CSD, and which ones aren't. ATM that's only the resched
IPI, but per the cover letter there's more to come (e.g. tick_broadcast()
for arm64/riscv and a few others). For instance:

   hackbench-157   [001]10.894320: ipi_send_cpu: cpu=3 
callsite=check_preempt_curr+0x37 callback=0x0
   hackbench-157   [001]10.895068: ipi_send_cpu: cpu=3 
callsite=try_to_wake_up+0x29e callback=sched_ttwu_pending+0x0
   hackbench-157   [001]10.895068: ipi_send_cpu: cpu=3 
callsite=try_to_wake_up+0x29e 
callback=generic_smp_call_function_single_interrupt+0x0

That first one sent a RESCHEDULE IPI, the second one a CALL_FUNCTION one,
but you really have to know what you're looking at...

Are you worried about the @func being pushed down? Staring at x86 asm is
not good for the soul, but AFAICT this does cause an extra register to be
popped in the prologue because all of the helpers are __always_inline, so
both paths of the static key(s) are in the same stackframe.

I can "improve" this with:

---
diff --git a/kernel/smp.c b/kernel/smp.c
index 5cd680a7e78ef..55f120dae1713 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -511,6 +511,26 @@ raw_smp_call_single_queue(int cpu, struct llist_node 
*node, smp_call_func_t func
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
 
+static noinline void __smp_call_single_queue_trace(int cpu, struct llist_node 
*node)
+{
+   call_single_data_t *csd;
+   smp_call_func_t func;
+
+
+   /*
+* We have to check the type of the CSD before queueing it, because
+* once queued it can have its flags cleared by
+*   flush_smp_call_function_queue()
+* even if we haven't sent the smp_call IPI yet (e.g. the stopper
+* executes migration_cpu_stop() on the remote CPU).
+*/
+   csd = container_of(node, call_single_data_t, node.llist);
+   func = CSD_TYPE(csd) == CSD_TYPE_TTWU ?
+   sched_ttwu_pending : csd->func;
+
+   raw_smp_call_single_queue(cpu, node, func);
+}
+
 void __smp_call_single_queue(int cpu, struct llist_node *node)
 {
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
@@ -525,25 +545,10 @@ void __smp_call_single_queue(int cpu, struc

Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI

2023-03-22 Thread Valentin Schneider
On 22/03/23 18:22, Peter Zijlstra wrote:
> On Wed, Mar 22, 2023 at 05:01:13PM +0000, Valentin Schneider wrote:
>
>> > So I was thinking something like this:
>
>> Hm, this does get rid of the func being passed down the helpers, but this
>> means the trace events are now stateful, i.e. I need the first and last
>> events in a CSD stack to figure out which one actually caused the IPI.
>
> Isn't much of tracing stateful? I mean, why am I always writing awk
> programs to parse trace output?
>
> The one that is directly followed by
> generic_smp_call_function_single_interrupt() (horrible name that), is
> the one that tripped the IPI.
>

Right.

>> It also requires whoever is looking at the trace to be aware of which IPIs
>> are attached to a CSD, and which ones aren't. ATM that's only the resched
>> IPI, but per the cover letter there's more to come (e.g. tick_broadcast()
>> for arm64/riscv and a few others). For instance:
>> 
>>hackbench-157   [001]10.894320: ipi_send_cpu: cpu=3 
>> callsite=check_preempt_curr+0x37 callback=0x0
>
> Arguably we should be setting callback to scheduler_ipi(), except
> ofcourse, that's not an actual function...
>
> Maybe we can do "extern inline" for the actual users and provide a dummy
> function for the symbol when tracing.
>

Huh, I wasn't aware that was an option, I'll look into that. I did scribble
down a comment next to smp_send_reschedule(), but having a decodable
function name would be better!

>>hackbench-157   [001]10.895068: ipi_send_cpu: cpu=3 
>> callsite=try_to_wake_up+0x29e callback=sched_ttwu_pending+0x0
>>hackbench-157   [001]10.895068: ipi_send_cpu: cpu=3 
>> callsite=try_to_wake_up+0x29e 
>> callback=generic_smp_call_function_single_interrupt+0x0
>> 
>> That first one sent a RESCHEDULE IPI, the second one a CALL_FUNCTION one,
>> but you really have to know what you're looking at...
>
> But you have to know that anyway, you can't do tracing and not know wtf
> you're doing. Or rather, if you do, I don't give a crap and you can keep
> the pieces :-)
>
> Grepping the callback should be pretty quick resolution at to what trips
> it, no?
>
> (also, if you *really* can't manage, we can always add yet another
> argument that gives a type thingy)
>

Ah, I was a bit unclear here - I don't care too much about the IPI type
being used, but rather being able to figure out on IRQ entry where that IPI
came from - thinking some more about now, I don't think logging *all* CSDs
causes an issue there, as you'd look at the earliest-not-seen-yet event
targeting this CPU anyway.

That'll be made easy once I get to having cpumask filters for ftrace, so
I can just issue something like:

  trace-cmd record -e 'ipi_send_cpu' -f "cpu == 3" -e 'ipi_send_cpumask' -f 
"cpus \in {3}" -T hackbench 

(it's somewhere on the todolist...)

TL;DR: I *think* I've convinced myself logging all of them isn't an issue -
I'm going to play with this on something "smarter" than just hackbench
under QEMU just to drill it in.


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI

2023-03-23 Thread Valentin Schneider
On 22/03/23 15:04, Peter Zijlstra wrote:
> @@ -798,14 +794,20 @@ static void smp_call_function_many_cond(
>   }
>  
>   /*
> +  * Trace each smp_function_call_*() as an IPI, actual IPIs
> +  * will be traced with 
> func==generic_smp_call_function_single_ipi().
> +  */
> + trace_ipi_send_cpumask(cfd->cpumask_ipi, _RET_IP_, func);

I just got a trace pointing out this can emit an event even though no IPI
is sent if e.g. the cond_func predicate filters all CPUs in the argument
mask:

  ipi_send_cpumask: cpumask= callsite=on_each_cpu_cond_mask+0x3c 
callback=flush_tlb_func+0x0

Maybe something like so on top?

---
diff --git a/kernel/smp.c b/kernel/smp.c
index ba5478814e677..1dc452017d000 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -791,6 +791,8 @@ static void smp_call_function_many_cond(const struct 
cpumask *mask,
}
}
 
+   if (!nr_cpus)
+   goto local;
/*
 * Trace each smp_function_call_*() as an IPI, actual IPIs
 * will be traced with 
func==generic_smp_call_function_single_ipi().
@@ -804,10 +806,10 @@ static void smp_call_function_many_cond(const struct 
cpumask *mask,
 */
if (nr_cpus == 1)
send_call_function_single_ipi(last_cpu);
-   else if (likely(nr_cpus > 1))
+   else
send_call_function_ipi_mask(cfd->cpumask_ipi);
}
-
+local:
if (run_local && (!cond_func || cond_func(this_cpu, info))) {
unsigned long flags;
 


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI

2023-03-23 Thread Valentin Schneider
On 23/03/23 18:41, Peter Zijlstra wrote:
> On Thu, Mar 23, 2023 at 04:25:25PM +0000, Valentin Schneider wrote:
>> On 22/03/23 15:04, Peter Zijlstra wrote:
>> > @@ -798,14 +794,20 @@ static void smp_call_function_many_cond(
>> >}
>> >  
>> >/*
>> > +   * Trace each smp_function_call_*() as an IPI, actual IPIs
>> > +   * will be traced with 
>> > func==generic_smp_call_function_single_ipi().
>> > +   */
>> > +  trace_ipi_send_cpumask(cfd->cpumask_ipi, _RET_IP_, func);
>> 
>> I just got a trace pointing out this can emit an event even though no IPI
>> is sent if e.g. the cond_func predicate filters all CPUs in the argument
>> mask:
>> 
>>   ipi_send_cpumask: cpumask= callsite=on_each_cpu_cond_mask+0x3c 
>> callback=flush_tlb_func+0x0
>> 
>> Maybe something like so on top?
>> 
>> ---
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index ba5478814e677..1dc452017d000 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -791,6 +791,8 @@ static void smp_call_function_many_cond(const struct 
>> cpumask *mask,
>>  }
>>  }
>>  
>> +if (!nr_cpus)
>> +goto local;
>
> Hmm, this isn't right. You can get nr_cpus==0 even though it did add
> some to various lists but never was first.
>

Duh, glanced over that.

> But urgh, even if we were to say count nr_queued we'd never get the mask
> right, because we don't track which CPUs have the predicate matched,
> only those we need to actually send an IPI to :/
>
> Ooh, I think we can clear those bits from cfd->cpumask, arguably that's
> a correctness fix too, because the 'run_remote && wait' case shouldn't
> wait on things we didn't queue.
>

Yeah, that makes sense to me. Just one tiny suggestion below.

> Hmm?
>
>
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -728,9 +728,9 @@ static void smp_call_function_many_cond(
>   int cpu, last_cpu, this_cpu = smp_processor_id();
>   struct call_function_data *cfd;
>   bool wait = scf_flags & SCF_WAIT;
> + int nr_cpus = 0, nr_queued = 0;
>   bool run_remote = false;
>   bool run_local = false;
> - int nr_cpus = 0;
>  
>   lockdep_assert_preemption_disabled();
>  
> @@ -772,8 +772,10 @@ static void smp_call_function_many_cond(
>   for_each_cpu(cpu, cfd->cpumask) {
>   call_single_data_t *csd = per_cpu_ptr(cfd->csd, cpu);
>  
> - if (cond_func && !cond_func(cpu, info))
> + if (cond_func && !cond_func(cpu, info)) {
> + __cpumask_clear_cpu(cpu, cfd->cpumask);
>   continue;
> + }
>  
>   csd_lock(csd);
>   if (wait)
> @@ -789,13 +791,15 @@ static void smp_call_function_many_cond(
>   nr_cpus++;
>   last_cpu = cpu;
>   }
> + nr_queued++;
>   }
>  
>   /*
>* Trace each smp_function_call_*() as an IPI, actual IPIs
>* will be traced with 
> func==generic_smp_call_function_single_ipi().
>*/
> - trace_ipi_send_cpumask(cfd->cpumask_ipi, _RET_IP_, func);
> + if (nr_queued)

With your change to cfd->cpumask, we could ditch nr_queued and make this

if (!cpumask_empty(cfd->cpumask))

since cfd->cpumask now only contains CPUs that have had a CSD queued.

> + trace_ipi_send_cpumask(cfd->cpumask, _RET_IP_, func);
>  
>   /*
>* Choose the most efficient way to send an IPI. Note that the


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc