https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125436
Bug ID: 125436
Summary: calls to __tls_get_addr in ms_abi with -fPIC function
clobber callee-save resisters
Product: gcc
Version: 13.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: puetzk at puetzk dot org
Target Milestone: ---
Created attachment 64536
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64536&action=edit
g++ -g -shared -fPIC -O1 foo.cpp -o libfoo.so
If a function with accesses a `thead_local` variable, the resulting code will
call __tls_get_addr() (at least for the default -mtls-dialect=gnu).
However, if the function containing the TLS access is `__attribute((ms_abi))`,
or is inlined into a function that is `__attribute((ms_abi))`, the code
generation does not appear to account for the possibility that this implicit
function call will use the sysv_abi interpretation of caller-save/callee-save,
and therefore __tls_get_addr (or particularly __tls_get_addr_slow) may clobber
registers that the caller did not preserve, corrupting state upon return.
For example, the attached code miscompiles on ubuntu 24.04 g++
13.3.0-6ubuntu2~24.04.1
g++ -g -shared -fPIC -O1 foo.cpp -o libfoo.so
gcc -g -DUSE_DL main.c -ldl
./a.out
> ms_abi = ac6778d0 (garbage from __tls_get_addr_slow, varies from run to run)
> sysv = 12345678 (correct answer)
> ms_abi = 54479fb8 (different garbage, since now the DTV is already set )
> ms_abi = 54479fb8 (stable from now on)
As seen from https://godbolt.org/z/YhqqhjnbT
> ms_foo:
> push rdi #
> push rsi #
> push rbx #
> test ecx, ecx # i
> js .L8 #,
> mov ebx, ecx # i, tmp95
> # /app/example.cpp:29: ret = magic_number;
> mov edi, DWORD PTR magic_number[rip] # <retval>, magic_number
>
> # /app/example.cpp:35: tls_instance += i;
> lea rdi, tls_instance@tlsld[rip]
> call __tls_get_addr@PLT #
> mov rsi, rax # tmp86, tmp96
> add DWORD PTR tls_instance@dtpoff[rsi], ebx # tls_instance, i
> .L6:
> # /app/example.cpp:46: }
> mov eax, edi #, <retval>
> pop rbx #
> pop rsi #
> pop rdi #
> ret
> .L8:
> # /app/example.cpp:20: int ret = -1;
> mov edi, -1 # <retval>,
> # /app/example.cpp:45: return ret;
> jmp .L6 #
So ret is allicated into edi, which is clobbered during the access to
tls_instance, then edi is transferred back into edx for the return value.
Flipping through versions on compiler explorer, the codegen very similar to
this is seen in every gcc version <= 15.1
Something changed in 15.2, because it gets
> mov ebp, DWORD PTR magic_number[rip]
i.e. `ret` is allocated in ebp rather than edi. This avoids the bug (since ebp
is calee-save in both ms_abi and sysv_abi), but I think it's an incidental
register-allocation change rather than an actual fix, because the other
register preserving needed at a boundary between ms_abi and sysv_abi does not
appear, meaning __tls_get_addr could still clobber other things that the caller
of ms_foo() expected it to preserve.
gcc 16.1 (and trunk) change back to allocating `ret` in edi, but now they hoist
> lea rdi, "tls_instance"@tlsld[rip] #
> call "__tls_get_addr"@PLT #
# /app/example.cpp:10: ret = magic_number;
> mov edx, DWORD PTR "magic_number"[rip] # <retval>, magic_number
This too avoids the immediate symptom, but still leaves ms_abi in violation of
its __attribute__((ms_abi) by having the __tls_get_addr call potentially
clobber the registers (RDI, RSI, XMM6-XMM15, I think?) that ms_abi considers
callee-save but sysv_abi considers caller-save.
FWIW, clang gets this right, and so does gcc when there's actually an explicit
call to a sysv_abi function in the source code, rather than an implicit call to
__tls_get_addr an implementation detail.
I can't find anything in
https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt that suggests
anything other than treating __tls_get_addr as a normal sysv_abi function, e.g.
it says
> The functions defined above [ the @TLSCALL ones for -mtls-dialect=gnu2 ] use
> custom calling conventions that require them to preserve any registers they
> modify. This penalizes the case that requires dynamic TLS, since it must
> preserve (*) all call-clobbered registers before calling __tls_get_addr()
Which certainly sounds to me like `__tls_get_addr` does *not* make any special
promises, and is free to clobber anything other than RBX, RSP, RBP, and
R12–R15.