15 Regression][x64] Shared libraries can no longer be compiled with profiling

ardb at kernel dot org via Gcc-bugs Fri, 21 Mar 2025 13:24:30 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386


--- Comment #39 from Ard Biesheuvel <ardb at kernel dot org> ---
(In reply to Alexander Monakov from comment #38)
> (In reply to Ard Biesheuvel from comment #37)
> > Yes, we can drop -mcmodel=kernel, and use -mcmodel=small instead. This is
> > why I'm not keen on relying on that - it is ill-defined and there is really
> > no need to have this special case. In the kernel, we are trying to move away
> > from all the special sauce in the toolchain - x86 especially is affected by
> > this, whereas arm64 and other architectures just use -mcmodel=small. The
> > primary sticking point is the relative cost of RIP-relative LEA vs 32-bit
> > absolute MOV but that gap appears to have been closing in recent designs.
> 
> There's a couple places where GCC restricts offsets differently for
> -mcmodel=kernel vs. -mcmodel=small in the x86 backend. It's been determined
> it doesn't matter? How so?
> https://gcc.gnu.org/cgit/gcc/tree/gcc/config/i386/predicates.md#n239
> 

PIC code can run anywhere, so it can also run in the top 2 GB of the 64-bit
address space, which is what the kernel code model is limited to.

> > I'm not sure what that would solve. When linking the kernel, all
> > R_X86_64_PLT32 can be resolved directly, and so there is never the need for
> > a PLT in practice. The compiler does not have to care about this
> > distinction. Relaxing a CALL via a PLT into a direct one is much easier than
> > relaxing a GOT based data reference into a direct one.
> 
> It's not just about the calls. On x86-64 it's less pronounced, but on arm64
> telling the compiler up front that everything ends up in the final
> executable can improve codegen when referencing extern variables, for
> instance:
> 
> //__attribute__((visibility("hidden")))
> extern int a[];
> 
> int f(void)
> {
>     return a[1];
> }
> 
> gcc -O2 -fpie gets you
> 
> f:
>  adrp x0, 0 <_GLOBAL_OFFSET_TABLE_>
>     R_AARCH64_ADR_PREL_PG_HI21 _GLOBAL_OFFSET_TABLE_
>  ldr  x0, [x0]
>     R_AARCH64_LD64_GOTPAGE_LO15 a
>  ldr  w0, [x0, #4]
>  ret
> 
> and with the attribute uncommented, emulating what -fstatic-pie would do:
> 
> f:
>  adrp x0, 0 <a>
>     R_AARCH64_ADR_PREL_PG_HI21 a+0x4
>  ldr  w0, [x0]
>     R_AARCH64_LDST32_ABS_LO12_NC a+0x4
>  ret

In Linux, we don't even bother with PIC codegen, even though we link with -pie.
The non-PIC AArch64 small code model uses PC-relative references for code and
data.

I do agree that it would be better for this behavior to be explicit, so I'd
switch Linux to it if it ever appeared. But only to keep the existing behavior.

We do use hidden visibility in Linux (using #pragma) in some places, when
building PIC code that must not ever use absolute references (which makes the
use of a GOT impossible)

[Bug target/119386] [14/15 Regression][x64] Shared libraries can no longer be compiled with profiling

Reply via email to