On Mon, Mar 16, 2026 at 05:40:50PM +0000, Dmitry Ilvokhin wrote:

[...]

> A possible generic solution is a trace_contended_release() for spin
> locks, for example:
> 
>     if (trace_contended_release_enabled() &&
>         atomic_read(&lock->val) & ~_Q_LOCKED_MASK)
>         trace_contended_release(lock);
> 
> This might work on x86, but could increase code size and regress
> performance on arches where spin_unlock() is inlined, such as arm64
> under !PREEMPTION.

I took a stab at this idea and submitted an RFC [1].

The implementation builds on your earlier observation from Matthew that
_raw_spin_unlock() is not inlined in most configurations. In those
cases, when the tracepoint is disabled, this adds a single NOP on the
fast path, with the conditional check staying out of line. The measured
text size increase in this configuration is +983 bytes.

For configurations where _raw_spin_unlock() is inlined, the
instrumentation does increase code size more noticeably
(+71 KB in my measurements), since the check and out of line call is
replicated at each call site.

This provides a generic release-side signal for contended locks,
allowing: correlation of lock holders with waiters and measurement of
contended hold times

This RFC addressing the same visibility gap without introducing per-lock
instrumentation.

If this tradeoff is acceptable, this could be a generic alternative to
lock-specific tracepoints.

[1]: 
https://lore.kernel.org/all/51aad0415b78c5a39f2029722118fa01eac77538.1773858853.gi...@ilvokhin.com
 

Reply via email to