[PATCH v3 0/8] Generic IPI sending tracepoint
Background == Detecting IPI *reception* is relatively easy, e.g. using trace_irq_handler_{entry,exit} or even just function-trace flush_smp_call_function_queue() for SMP calls. Figuring out their *origin*, is trickier as there is no generic tracepoint tied to e.g. smp_call_function(): o AFAIA x86 has no tracepoint tied to sending IPIs, only receiving them (cf. trace_call_function{_single}_entry()). o arm/arm64 do have trace_ipi_raise(), which gives us the target cpus but also a mostly useless string (smp_calls will all be "Function call interrupts"). o Other architectures don't seem to have any IPI-sending related tracepoint. I believe one reason those tracepoints used by arm/arm64 ended up as they were is because these archs used to handle IPIs differently from regular interrupts (the IRQ driver would directly invoke an IPI-handling routine), which meant they never showed up in trace_irq_handler_{entry, exit}. The trace_ipi_{entry,exit} tracepoints gave a way to trace IPI reception but those have become redundant as of: 56afcd3dbd19 ("ARM: Allow IPIs to be handled as normal interrupts") d3afc7f12987 ("arm64: Allow IPIs to be handled as normal interrupts") which gave IPIs a "proper" handler function used through generic_handle_domain_irq(), which makes them show up via trace_irq_handler_{entry, exit}. Changing stuff up = Per the above, it would make sense to reshuffle trace_ipi_raise() and move it into generic code. This also came up during Daniel's talk on Osnoise at the CPU isolation MC of LPC 2022 [1]. Now, to be useful, such a tracepoint needs to export: o targeted CPU(s) o calling context The only way to get the calling context with trace_ipi_raise() is to trigger a stack dump, e.g. $(trace-cmd -e ipi* -T echo 42). This is instead introducing a new tracepoint which exports the relevant context (callsite, and requested callback for when the callsite isn't helpful), and is usable by all architectures as it sits in generic code. Another thing worth mentioning is that depending on the callsite, the _RET_IP_ fed to the tracepoint is not always useful - generic_exec_single() doesn't tell you much about the actual callback being sent via IPI, which is why the new tracepoint also has a @callback argument. Patches === o Patch 1 is included for convenience and will be merged independently. FYI I have libtraceevent patches [2] to improve the pretty-printing of cpumasks using the new type, which look like: <...>-3322 [021] 560.402583: ipi_send_cpumask: cpumask=14,17,21 callsite=on_each_cpu_cond_mask+0x40 callback=flush_tlb_func+0x0 <...>-187 [010] 562.590584: ipi_send_cpumask: cpumask=0-23 callsite=on_each_cpu_cond_mask+0x40 callback=do_sync_core+0x0 o Patches 2-6 spread out the tracepoint across relevant sites. Patch 6 ends up sprinkling lots of #include which I'm not the biggest fan of, but is the least horrible solution I've been able to come up with so far. o Patch 8 is trying to be smart about tracing the callback associated with the IPI. This results in having IPI trace events for: o smp_call_function*() o smp_send_reschedule() o irq_work_queue*() o standalone uses of __smp_call_single_queue() This is incomplete, just looking at arm64 there's more IPI types that aren't covered: IPI_CPU_STOP, IPI_CPU_CRASH_STOP, IPI_TIMER, IPI_WAKEUP, ... But it feels like a good starting point. Links = [1]: https://youtu.be/5gT57y4OzBM?t=14234 [2]: https://lore.kernel.org/all/20221116144154.3662923-1-vschn...@redhat.com/ Revisions = v2 -> v3 o Dropped the generic export of smp_send_reschedule(), turned it into a macro and a bunch of imports o Dropped the send_call_function_single_ipi() macro madness, split it into sched and smp bits using some of Peter's suggestions v1 -> v2 o Ditched single-CPU tracepoint o Changed tracepoint signature to include callback o Changed tracepoint callsite field to void *; the parameter is still UL to save up on casts due to using _RET_IP_. o Fixed linking failures due to not exporting smp_send_reschedule() Steven Rostedt (Google) (1): tracing: Add __cpumask to denote a trace event field that is a cpumask_t Valentin Schneider (7): trace: Add trace_ipi_send_cpumask() sched, smp: Trace IPIs sent via send_call_function_single_ipi() smp: Trace IPIs sent via arch_send_call_function_ipi_mask() irq_work: Trace self-IPIs sent via arch_irq_work_raise() treewide: Trace IPIs sent via smp_send_reschedule() smp: reword smp call IPI comment sched, smp: Trace smp callback causing an IPI arch/alpha/kernel/smp.c | 2 +- arch/arc/kernel/smp.c| 2 +- arch/arm/kernel/smp.c| 5 +- arch/arm/mach-actions/platsmp.c | 2 + arch/arm64/kernel/smp.c | 3 +- arch/csky/kernel/smp.c | 2 +- arch/hexagon/kernel/sm
[PATCH v3 1/8] DO-NOT-MERGE: tracing: Add __cpumask to denote a trace event field that is a cpumask_t
From: "Steven Rostedt (Google)" The trace events have a __bitmask field that can be used for anything that requires bitmasks. Although currently it is only used for CPU masks, it could be used in the future for any type of bitmasks. There is some user space tooling that wants to know if a field is a CPU mask and not just some random unsigned long bitmask. Introduce "__cpumask()" helper functions that work the same as the current __bitmask() helpers but displays in the format file: field:__data_loc cpumask_t *[] mask;offset:36; size:4; signed:0; Instead of: field:__data_loc unsigned long[] mask; offset:32; size:4; signed:0; The main difference is the type. Instead of "unsigned long" it is "cpumask_t *". Note, this type field needs to be a real type in the __dynamic_array() logic that both __cpumask and__bitmask use, but the comparison field requires it to be a scalar type whereas cpumask_t is a structure (non-scalar). But everything works when making it a pointer. Valentin added changes to remove the need of passing in "nr_bits" and the __cpumask will always use nr_cpumask_bits as its size. Link: https://lkml.kernel.org/r/20221014080456.1d32b...@rorschach.local.home Requested-by: Valentin Schneider Reviewed-by: Valentin Schneider Signed-off-by: Valentin Schneider Signed-off-by: Steven Rostedt (Google) --- include/trace/bpf_probe.h| 6 include/trace/perf.h | 6 include/trace/stages/stage1_struct_define.h | 6 include/trace/stages/stage2_data_offsets.h | 6 include/trace/stages/stage3_trace_output.h | 6 include/trace/stages/stage4_event_fields.h | 6 include/trace/stages/stage5_get_offsets.h| 6 include/trace/stages/stage6_event_callback.h | 20 include/trace/stages/stage7_class_define.h | 2 ++ samples/trace_events/trace-events-sample.c | 2 +- samples/trace_events/trace-events-sample.h | 34 +++- 11 files changed, 91 insertions(+), 9 deletions(-) diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h index 6a13220d2d27b..155c495b89ead 100644 --- a/include/trace/bpf_probe.h +++ b/include/trace/bpf_probe.h @@ -21,6 +21,9 @@ #undef __get_bitmask #define __get_bitmask(field) (char *)__get_dynamic_array(field) +#undef __get_cpumask +#define __get_cpumask(field) (char *)__get_dynamic_array(field) + #undef __get_sockaddr #define __get_sockaddr(field) ((struct sockaddr *)__get_dynamic_array(field)) @@ -40,6 +43,9 @@ #undef __get_rel_bitmask #define __get_rel_bitmask(field) (char *)__get_rel_dynamic_array(field) +#undef __get_rel_cpumask +#define __get_rel_cpumask(field) (char *)__get_rel_dynamic_array(field) + #undef __get_rel_sockaddr #define __get_rel_sockaddr(field) ((struct sockaddr *)__get_rel_dynamic_array(field)) diff --git a/include/trace/perf.h b/include/trace/perf.h index 5800d13146c3d..8f3bf1e177070 100644 --- a/include/trace/perf.h +++ b/include/trace/perf.h @@ -21,6 +21,9 @@ #undef __get_bitmask #define __get_bitmask(field) (char *)__get_dynamic_array(field) +#undef __get_cpumask +#define __get_cpumask(field) (char *)__get_dynamic_array(field) + #undef __get_sockaddr #define __get_sockaddr(field) ((struct sockaddr *)__get_dynamic_array(field)) @@ -41,6 +44,9 @@ #undef __get_rel_bitmask #define __get_rel_bitmask(field) (char *)__get_rel_dynamic_array(field) +#undef __get_rel_cpumask +#define __get_rel_cpumask(field) (char *)__get_rel_dynamic_array(field) + #undef __get_rel_sockaddr #define __get_rel_sockaddr(field) ((struct sockaddr *)__get_rel_dynamic_array(field)) diff --git a/include/trace/stages/stage1_struct_define.h b/include/trace/stages/stage1_struct_define.h index 1b7bab60434c1..69e0dae453bfa 100644 --- a/include/trace/stages/stage1_struct_define.h +++ b/include/trace/stages/stage1_struct_define.h @@ -32,6 +32,9 @@ #undef __bitmask #define __bitmask(item, nr_bits) __dynamic_array(char, item, -1) +#undef __cpumask +#define __cpumask(item) __dynamic_array(char, item, -1) + #undef __sockaddr #define __sockaddr(field, len) __dynamic_array(u8, field, len) @@ -47,6 +50,9 @@ #undef __rel_bitmask #define __rel_bitmask(item, nr_bits) __rel_dynamic_array(char, item, -1) +#undef __rel_cpumask +#define __rel_cpumask(item) __rel_dynamic_array(char, item, -1) + #undef __rel_sockaddr #define __rel_sockaddr(field, len) __rel_dynamic_array(u8, field, len) diff --git a/include/trace/stages/stage2_data_offsets.h b/include/trace/stages/stage2_data_offsets.h index 1b7a8f764fddd..469b6a64293de 100644 --- a/include/trace/stages/stage2_data_offsets.h +++ b/include/trace/stages/stage2_data_offsets.h @@ -38,6 +38,9 @@ #undef __bitmask #define __bitmask(item, nr_bits) __dynamic_array(unsigned long, item, -1) +#undef __cpumask +#define __cpumask(item) __dynamic_array(unsigned long, item, -1) + #undef __sockaddr #define __sockaddr(field, len) __dynamic_array(u8, field, len) @@ -53,5 +56,8 @@ #undef
[PATCH v3 2/8] trace: Add trace_ipi_send_cpumask()
trace_ipi_raise() is unsuitable for generically tracing IPI sources due to its "reason" argument being an uninformative string (on arm64 all you get is "Function call interrupts" for SMP calls). Add a variant of it that exports a target CPU, a callsite and a callback. Signed-off-by: Valentin Schneider Reviewed-by: Steven Rostedt (Google) --- include/trace/events/ipi.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/include/trace/events/ipi.h b/include/trace/events/ipi.h index 0be71dad6ec03..b1125dc27682c 100644 --- a/include/trace/events/ipi.h +++ b/include/trace/events/ipi.h @@ -35,6 +35,28 @@ TRACE_EVENT(ipi_raise, TP_printk("target_mask=%s (%s)", __get_bitmask(target_cpus), __entry->reason) ); +TRACE_EVENT(ipi_send_cpumask, + + TP_PROTO(const struct cpumask *cpumask, unsigned long callsite, void *callback), + + TP_ARGS(cpumask, callsite, callback), + + TP_STRUCT__entry( + __cpumask(cpumask) + __field(void *, callsite) + __field(void *, callback) + ), + + TP_fast_assign( + __assign_cpumask(cpumask, cpumask_bits(cpumask)); + __entry->callsite = (void *)callsite; + __entry->callback = callback; + ), + + TP_printk("cpumask=%s callsite=%pS callback=%pS", + __get_cpumask(cpumask), __entry->callsite, __entry->callback) +); + DECLARE_EVENT_CLASS(ipi_handler, TP_PROTO(const char *reason), -- 2.31.1 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH v3 4/8] smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
This simply wraps around the arch function and prepends it with a tracepoint, similar to send_call_function_single_ipi(). Signed-off-by: Valentin Schneider Reviewed-by: Steven Rostedt (Google) --- kernel/smp.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/kernel/smp.c b/kernel/smp.c index e2ca1e2f31274..93b4386cd3096 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -160,6 +160,13 @@ void __init call_function_init(void) smpcfd_prepare_cpu(smp_processor_id()); } +static __always_inline void +send_call_function_ipi_mask(const struct cpumask *mask) +{ + trace_ipi_send_cpumask(mask, _RET_IP_, NULL); + arch_send_call_function_ipi_mask(mask); +} + #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG static DEFINE_STATIC_KEY_FALSE(csdlock_debug_enabled); @@ -970,7 +977,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask, if (nr_cpus == 1) send_call_function_single_ipi(last_cpu); else if (likely(nr_cpus > 1)) - arch_send_call_function_ipi_mask(cfd->cpumask_ipi); + send_call_function_ipi_mask(cfd->cpumask_ipi); cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->pinged, this_cpu, CFD_SEQ_NOCPU, CFD_SEQ_PINGED); } -- 2.31.1 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH v3 3/8] sched, smp: Trace IPIs sent via send_call_function_single_ipi()
send_call_function_single_ipi() is the thing that sends IPIs at the bottom of smp_call_function*() via either generic_exec_single() or smp_call_function_many_cond(). Give it an IPI-related tracepoint. Note that this ends up tracing any IPI sent via __smp_call_single_queue(), which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free". Signed-off-by: Valentin Schneider Reviewed-by: Steven Rostedt (Google) --- arch/arm/kernel/smp.c | 3 --- arch/arm64/kernel/smp.c | 1 - kernel/sched/core.c | 7 +-- kernel/smp.c| 4 4 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 978db2d96b446..3b280d55c1c40 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -48,9 +48,6 @@ #include #include -#define CREATE_TRACE_POINTS -#include - /* * as from 2.5, kernels no longer have an init_tasks structure * so we need some other way of telling a new secondary core diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index ffc5d76cf6955..937d2623e06ba 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -51,7 +51,6 @@ #include #include -#define CREATE_TRACE_POINTS #include DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index daff72f003858..40587b0d99329 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -81,6 +81,7 @@ #include #include #undef CREATE_TRACE_POINTS +#include #include "sched.h" #include "stats.h" @@ -3746,10 +3747,12 @@ void send_call_function_single_ipi(int cpu) { struct rq *rq = cpu_rq(cpu); - if (!set_nr_if_polling(rq->idle)) + if (!set_nr_if_polling(rq->idle)) { + trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL); arch_send_call_function_single_ipi(cpu); - else + } else { trace_sched_wake_idle_without_ipi(cpu); + } } /* diff --git a/kernel/smp.c b/kernel/smp.c index 06a413987a14a..e2ca1e2f31274 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -26,6 +26,10 @@ #include #include +#define CREATE_TRACE_POINTS +#include +#undef CREATE_TRACE_POINTS + #include "smpboot.h" #include "sched/smp.h" -- 2.31.1 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH v3 5/8] irq_work: Trace self-IPIs sent via arch_irq_work_raise()
IPIs sent to remote CPUs via irq_work_queue_on() are now covered by trace_ipi_send_cpumask(), add another instance of the tracepoint to cover self-IPIs. Signed-off-by: Valentin Schneider Reviewed-by: Steven Rostedt (Google) --- kernel/irq_work.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/kernel/irq_work.c b/kernel/irq_work.c index 7afa40fe5cc43..c33e88e32a67a 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -22,6 +22,8 @@ #include #include +#include + static DEFINE_PER_CPU(struct llist_head, raised_list); static DEFINE_PER_CPU(struct llist_head, lazy_list); static DEFINE_PER_CPU(struct task_struct *, irq_workd); @@ -74,6 +76,16 @@ void __weak arch_irq_work_raise(void) */ } +static __always_inline void irq_work_raise(struct irq_work *work) +{ + if (trace_ipi_send_cpumask_enabled() && arch_irq_work_has_interrupt()) + trace_ipi_send_cpumask(cpumask_of(smp_processor_id()), + _RET_IP_, + work->func); + + arch_irq_work_raise(); +} + /* Enqueue on current CPU, work must already be claimed and preempt disabled */ static void __irq_work_queue_local(struct irq_work *work) { @@ -99,7 +111,7 @@ static void __irq_work_queue_local(struct irq_work *work) /* If the work is "lazy", handle it from next tick if any */ if (!lazy_work || tick_nohz_tick_stopped()) - arch_irq_work_raise(); + irq_work_raise(work); } /* Enqueue the irq work @work on the current CPU */ -- 2.31.1 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH v3 6/8] treewide: Trace IPIs sent via smp_send_reschedule()
To be able to trace invocations of smp_send_reschedule(), rename the arch-specific definitions of it to arch_smp_send_reschedule() and wrap it into an smp_send_reschedule() that contains a tracepoint. Changes to include the declaration of the tracepoint were driven by the following coccinelle script: @func_use@ @@ smp_send_reschedule(...); @include@ @@ #include @no_include depends on func_use && !include@ @@ #include <...> + + #include Signed-off-by: Valentin Schneider [csky bits] Acked-by: Guo Ren --- arch/alpha/kernel/smp.c | 2 +- arch/arc/kernel/smp.c| 2 +- arch/arm/kernel/smp.c| 2 +- arch/arm/mach-actions/platsmp.c | 2 ++ arch/arm64/kernel/smp.c | 2 +- arch/csky/kernel/smp.c | 2 +- arch/hexagon/kernel/smp.c| 2 +- arch/ia64/kernel/smp.c | 4 ++-- arch/loongarch/include/asm/smp.h | 2 +- arch/mips/include/asm/smp.h | 2 +- arch/mips/kernel/rtlx-cmp.c | 2 ++ arch/openrisc/kernel/smp.c | 2 +- arch/parisc/kernel/smp.c | 4 ++-- arch/powerpc/kernel/smp.c| 6 -- arch/powerpc/kvm/book3s_hv.c | 3 +++ arch/powerpc/platforms/powernv/subcore.c | 2 ++ arch/riscv/kernel/smp.c | 4 ++-- arch/s390/kernel/smp.c | 2 +- arch/sh/kernel/smp.c | 2 +- arch/sparc/kernel/smp_32.c | 2 +- arch/sparc/kernel/smp_64.c | 2 +- arch/x86/include/asm/smp.h | 2 +- arch/x86/kvm/svm/svm.c | 4 arch/x86/kvm/x86.c | 2 ++ arch/xtensa/kernel/smp.c | 2 +- include/linux/smp.h | 8 ++-- virt/kvm/kvm_main.c | 1 + 27 files changed, 47 insertions(+), 25 deletions(-) diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c index f4e20f75438f8..38637eb9eebd5 100644 --- a/arch/alpha/kernel/smp.c +++ b/arch/alpha/kernel/smp.c @@ -562,7 +562,7 @@ handle_ipi(struct pt_regs *regs) } void -smp_send_reschedule(int cpu) +arch_smp_send_reschedule(int cpu) { #ifdef DEBUG_IPI_MSG if (cpu == hard_smp_processor_id()) diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c index ad93fe6e4b77d..409cfa4675b40 100644 --- a/arch/arc/kernel/smp.c +++ b/arch/arc/kernel/smp.c @@ -292,7 +292,7 @@ static void ipi_send_msg(const struct cpumask *callmap, enum ipi_msg_type msg) ipi_send_msg_one(cpu, msg); } -void smp_send_reschedule(int cpu) +void arch_smp_send_reschedule(int cpu) { ipi_send_msg_one(cpu, IPI_RESCHEDULE); } diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 3b280d55c1c40..f216ac890b6f9 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -745,7 +745,7 @@ void __init set_smp_ipi_range(int ipi_base, int n) ipi_setup(smp_processor_id()); } -void smp_send_reschedule(int cpu) +void arch_smp_send_reschedule(int cpu) { smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE); } diff --git a/arch/arm/mach-actions/platsmp.c b/arch/arm/mach-actions/platsmp.c index f26618b435145..7b208e96fbb67 100644 --- a/arch/arm/mach-actions/platsmp.c +++ b/arch/arm/mach-actions/platsmp.c @@ -20,6 +20,8 @@ #include #include +#include + #define OWL_CPU1_ADDR 0x50 #define OWL_CPU1_FLAG 0x5c diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 937d2623e06ba..8d108edc4a89f 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -976,7 +976,7 @@ void __init set_smp_ipi_range(int ipi_base, int n) ipi_setup(smp_processor_id()); } -void smp_send_reschedule(int cpu) +void arch_smp_send_reschedule(int cpu) { smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE); } diff --git a/arch/csky/kernel/smp.c b/arch/csky/kernel/smp.c index 4b605aa2e1d65..fd7f81be16dd6 100644 --- a/arch/csky/kernel/smp.c +++ b/arch/csky/kernel/smp.c @@ -140,7 +140,7 @@ void smp_send_stop(void) on_each_cpu(ipi_stop, NULL, 1); } -void smp_send_reschedule(int cpu) +void arch_smp_send_reschedule(int cpu) { send_ipi_message(cpumask_of(cpu), IPI_RESCHEDULE); } diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c index 4ba93e59370c4..4e8bee25b8c68 100644 --- a/arch/hexagon/kernel/smp.c +++ b/arch/hexagon/kernel/smp.c @@ -217,7 +217,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus) } } -void smp_send_reschedule(int cpu) +void arch_smp_send_reschedule(int cpu) { send_ipi(cpumask_of(cpu), IPI_RESCHEDULE); } diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c index e2cc59db86bc2..ea4f009a232b4 100644 --- a/arch/ia64/kernel/smp.c +++ b/arch/ia64/kernel/smp.c @@ -220,11 +220,11 @@ kdump_smp_send_init(void) * Called with preemption disabled. */ void -smp_send_reschedule (int cpu) +arch_smp_send_reschedule (int cpu) { ia64_sen
[PATCH v3 7/8] smp: reword smp call IPI comment
Accessing the call_single_queue hasn't involved a spinlock since 2014: 6897fc22ea01 ("kernel: use lockless list for smp_call_function_single") The llist operations (namely cmpxchg() and xchg()) provide similar ordering guarantees, update the comment to lessen confusion. Signed-off-by: Valentin Schneider --- kernel/smp.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 93b4386cd3096..821b5986721ac 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -495,9 +495,10 @@ void __smp_call_single_queue(int cpu, struct llist_node *node) #endif /* -* The list addition should be visible before sending the IPI -* handler locks the list to pull the entry off it because of -* normal cache coherency rules implied by spinlocks. +* The list addition should be visible to the target CPU when it pops +* the head of the list to pull the entry off it in the IPI handler +* because of normal cache coherency rules implied by the underlying +* llist ops. * * If IPIs can go out of order to the cache coherency protocol * in an architecture, sufficient synchronisation should be added -- 2.31.1 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH v3 8/8] sched, smp: Trace smp callback causing an IPI
Context === The newly-introduced ipi_send_cpumask tracepoint has a "callback" parameter which so far has only been fed with NULL. While CSD_TYPE_SYNC/ASYNC and CSD_TYPE_IRQ_WORK share a similar backing struct layout (meaning their callback func can be accessed without caring about the actual CSD type), CSD_TYPE_TTWU doesn't even have a function attached to its struct. This means we need to check the type of a CSD before eventually dereferencing its associated callback. This isn't as trivial as it sounds: the CSD type is stored in __call_single_node.u_flags, which get cleared right before the callback is executed via csd_unlock(). This implies checking the CSD type before it is enqueued on the call_single_queue, as the target CPU's queue can be flushed before we get to sending an IPI. Furthermore, send_call_function_single_ipi() only has a CPU parameter, and would need to have an additional argument to trickle down the invoked function. This is somewhat silly, as the extra argument will always be pushed down to the function even when nothing is being traced, which is unnecessary overhead. Changes === send_call_function_single_ipi() is only used by smp.c, and is defined in sched/core.c as it contains scheduler-specific ops (set_nr_if_polling() of a CPU's idle task). Split it into two parts: the scheduler bits remain in sched/core.c, and the actual IPI emission is moved into smp.c. This lets us define an __always_inline helper function that can take the related callback as parameter without creating useless register pressure in the non-traced path which only gains a (disabled) static branch. Do the same thing for the multi IPI case. Signed-off-by: Valentin Schneider --- kernel/sched/core.c | 18 +++- kernel/sched/smp.h | 2 +- kernel/smp.c| 72 + 3 files changed, 66 insertions(+), 26 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 40587b0d99329..e59aac936dcb9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3743,16 +3743,20 @@ void sched_ttwu_pending(void *arg) rq_unlock_irqrestore(rq, &rf); } -void send_call_function_single_ipi(int cpu) +/* + * Prepare the scene for sending an IPI for a remote smp_call + * + * Returns true if the caller can proceed with sending the IPI. + * Returns false otherwise. + */ +bool call_function_single_prep_ipi(int cpu) { - struct rq *rq = cpu_rq(cpu); - - if (!set_nr_if_polling(rq->idle)) { - trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, NULL); - arch_send_call_function_single_ipi(cpu); - } else { + if (set_nr_if_polling(cpu_rq(cpu)->idle)) { trace_sched_wake_idle_without_ipi(cpu); + return false; } + + return true; } /* diff --git a/kernel/sched/smp.h b/kernel/sched/smp.h index 2eb23dd0f2856..21ac44428bb02 100644 --- a/kernel/sched/smp.h +++ b/kernel/sched/smp.h @@ -6,7 +6,7 @@ extern void sched_ttwu_pending(void *arg); -extern void send_call_function_single_ipi(int cpu); +extern bool call_function_single_prep_ipi(int cpu); #ifdef CONFIG_SMP extern void flush_smp_call_function_queue(void); diff --git a/kernel/smp.c b/kernel/smp.c index 821b5986721ac..5cd680a7e78ef 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -161,9 +161,18 @@ void __init call_function_init(void) } static __always_inline void -send_call_function_ipi_mask(const struct cpumask *mask) +send_call_function_single_ipi(int cpu, smp_call_func_t func) { - trace_ipi_send_cpumask(mask, _RET_IP_, NULL); + if (call_function_single_prep_ipi(cpu)) { + trace_ipi_send_cpumask(cpumask_of(cpu), _RET_IP_, func); + arch_send_call_function_single_ipi(cpu); + } +} + +static __always_inline void +send_call_function_ipi_mask(const struct cpumask *mask, smp_call_func_t func) +{ + trace_ipi_send_cpumask(mask, _RET_IP_, func); arch_send_call_function_ipi_mask(mask); } @@ -430,12 +439,16 @@ static void __smp_call_single_queue_debug(int cpu, struct llist_node *node) struct cfd_seq_local *seq = this_cpu_ptr(&cfd_seq_local); struct call_function_data *cfd = this_cpu_ptr(&cfd_data); struct cfd_percpu *pcpu = per_cpu_ptr(cfd->pcpu, cpu); + struct __call_single_data *csd; + + csd = container_of(node, call_single_data_t, node.llist); + WARN_ON_ONCE(!(CSD_TYPE(csd) & (CSD_TYPE_SYNC | CSD_TYPE_ASYNC))); cfd_seq_store(pcpu->seq_queue, this_cpu, cpu, CFD_SEQ_QUEUE); if (llist_add(node, &per_cpu(call_single_queue, cpu))) { cfd_seq_store(pcpu->seq_ipi, this_cpu, cpu, CFD_SEQ_IPI); cfd_seq_store(seq->ping, this_cpu, cpu, CFD_SEQ_PING); - send_call_function_single_ipi(cpu); + send_call_function_single_ipi(cpu, csd->func); cfd_seq_store(seq->pinged, this_cpu, cpu, CFD_SEQ_PINGED); } else {