On 8/29/25 9:51 AM, Jakub Jelinek wrote:
On Thu, Aug 28, 2025 at 02:24:33PM -0700, Andi Kleen wrote:
Jakub Jelinek <[email protected]> writes:

On Wed, Aug 27, 2025 at 03:52:11PM +0200, Michal Jires wrote:
This new pass heuristically detects symbols referenced by toplevel
assembly to prevent their optimization.

Heuristics is done by comparing identifiers in assembly to known
symbols.

The pass is split into 2 passes, in LGEN and in WPA.
There must be one pass for WPA to be able to reference any symbol.
However in WPA there may be multiple symbols with the same name,
so we handle those local symbols in LGEN.

Why the heuristics when in GCC 15+ toplevel assembly can express those
exactly?

The kernel maintainers rejected that.

Then maybe they should reconsider, because nothing else can work reliably.
Gas supports macros, so even if you teach the compiler about the detailed
syntax of assembly syntax of each target, due to the macros it isn't
possible to find out if some identifier will end up being stringified, or
will have some prefixes/suffixes appended to it, or if it will end up being
say a register name rather than symbol, etc.  Not to mention that GCC wants
inline asms to be black boxes and that various programs actually post
process the assembly created by the compiler, so what appears in inline
asm doesn't really have to be even valid assembler, it can be arbitrary
text.
Last time I've heard the kernel only has very few toplevel asms that would
need to be tweaked, and it can be done with macros.
If you have
asm (".section whatever; .globl foo; foo: ...; use bar; .previous");
as something that defines foo and uses bar, in GCC 15 that should be
written as
extern int foo, bar; // Can be functions too
asm (".section whatever; .globl %cc0; %cc0: ...; use %cc1; .previous"
      :: ":" (&foo), "-s" (&bar));
Then the compiler reliably knows what to rename in the asm template and
what should be kept as unmodified.

        Jakub

There are 27 unique toplevel assembly in following files.
That is when building only vmlinux with default settings.
There are probably a few more.

arch/x86/include/asm/alternative.h
arch/x86/include/asm/cfi.h
arch/x86/include/asm/paravirt.h
arch/x86/include/asm/static_call.h
arch/x86/kernel/alternative.c
arch/x86/kernel/callthunks.c
arch/x86/kernel/kprobes/opt.c
arch/x86/kernel/rethook.c
arch/x86/kernel/static_call.c
arch/x86/kernel/uprobes.c
arch/x86/lib/error-inject.c
include/linux/btf_ids.h
include/linux/export-internal.h
include/linux/export.h
include/linux/init.h
include/linux/objtool.h
include/linux/pci.h
include/linux/tracepoint.h
kernel/configs.c

with worst of them looking like this:

#define DEFINE_ASM_FUNC(func, instr, sec)               \
        asm (".pushsection " #sec ", \"ax\"\n"          \
             ".global " #func "\n\t"                    \
             ".type " #func ", @function\n\t"           \
             ASM_FUNC_ALIGN "\n"                        \
             #func ":\n\t"                              \
             ASM_ENDBR                                  \
             instr "\n\t"                               \
             ASM_RET                                    \
             ".size " #func ", . - " #func "\n\t"       \
             ".popsection")

where "instr" can be dozen of instructions. This would require propagating the asm inputs/outputs alongside instr, and the input/output indices will be offset by one on macro's callsite. Fortunately there are only few of these, but it would still be messier than the previously rejected Linux LTO patchset. We might have better reasons for these changes design wise, but for Linux folks the reason is still the same.



The heuristics are not fully reliable, but it is enough to build the kernel without changes to the assembly.
So at least for the "Does this project benefit from LTO?" stage it seems
very useful.

False positives are not ideal, but they only disable optimizations that would not be possible without LTO. Though I should move this pass behind a non-default flag and enable it only when needed.




I didn't realize that extended toplevel assembly without LTO works fully. I will try to make it work with LTO as well for comparison.

Does the current extended toplevel assembly have a way to specify that it defines a local symbol?

Michal

Reply via email to