H.J. proposed to switch the default for GCC 16 (turning on -mtls-dialect=gnu2 by default). This is a bit tricky because when we tried to make the switch in Fedora (for eventual implementation), we hit an ABI compatibility problem:
_dl_tlsdesc_dynamic doesn't preserve all caller-saved registers <https://sourceware.org/bugzilla/show_bug.cgi?id=31372> This means that changing the defaults can have backwards compatibility impact with older distributions. (a) Do not nothing special and switch the default. Maybe try to backport the glibc fix to more release branches and distributions. I think we implicitly decided to follow this path when we decided thiswas a glibc bug and not a GCC bug. The downside is that missing the bug fix can result in unexpected, difficult-to-diagnose behavior. However, when we rebuilt Fedora, the problem was exceedingly rare (we observed one single failure, if I recall correctly). (b) Introduce binary markup to indicate that binaries may need the glibc fix, and that glibc has the fix. [PATCH] x86-64: Add GLIBC_ABI_GNU2_TLS [BZ #33129] <https://inbox.sourceware.org/libc-alpha/20250704205341.155335-1-hjl.to...@gmail.com/> This requires changes to all linkers, GCC and glibc. (c) Introduce a new relocation type with the same behavior as R_X86_64_TLSDESC. Unpatched glibc will not support it and error out during relocation processing. Requires linker changes, GCC and glibc changes. Does not produce a nice error message, unlike the GLIBC_ABI_GNU2_TLS change. Ideally would need package manager changes to produce the right dependencies (with GLIBC_ABI_GNU2_TLS, this could happen automatically). (d) Make the GCC default conditional on the glibc version used at GCC build time. Add __memcmpeq support to GCC 16. Maybe add errno@@GLIBC_2.43 to glibc 2.43. Even today, it is likely that binaries contain at least one symbol version reference to something that is relatively recent, and the __memcmpeq and errno changes would increase this effect. Combined with the backport mentioned under (a), that could be enough to force glibc upgrades in pretty much all cases. We have __libc_start_main@@GLIBC_2.34, so if the glibc backports go back to 2.34 (or even 2.31), only shared objects suffer from this issue. Among the Fedora binaries, the outliers without dependencies on recent glibc are mostly Perl modules, and I expect the errno and __memcmpeq would cover at least some of these. This is not as clean as (b) and (c), but only needs glibc and GCC changes (for __memcmpeq). It does not achieve 100% bug prevention, but given that bugs seem to be rare, this may be good enough. (e) Skip over GNU2 TLS altogether and implement inline TLS sequences (GNU3 descriptors?) that do not have the dlopen incompatibility of initial-exec TLS. This is currently vaporware. It requires nontrivial glibc changes, GCC changes, linker changes, and x86-64 psABI work to define new relocation types and perhaps relaxations. This is probably what we want long-term. User experience is similar to (c), but with more implementation sequences. For comparison with an initial-exec TLS read, movq threadvar@gottpoff(%rip), %rax movl %fs:(%rax), %eax this could look like this: movl threadvar@gottpslot, %eax movq %fs:(%rax), %rax movl threadvar@gottlsslotoff, %ecx movl (%rcx, %rax), %eax Or with the descriptor in one word: movq threadvar@gottpslotoff, %rax movq %rax, %rdx movq %fs:(%eax), %rax shrq $32, %rdx movl %(rax, %rdx), %eax Or with a bit shorter instruction, using a 32-bit descriptor (which still could cover at least 3 GiB of TLS data per thread): movl threadvar@gottpslotoff, %rax movzbl %al, %edx shr $8, %eax movq %fs:64(%edx), %rdx mov (%rdx, %rax), %eax And if we want a negative TLS slot index (which glibc would not use, and I think it's incompatible with local-exec TLS anyway): movq threadvar@gottpslotoff, %rax movslq %eax, %rdx shrq $32, %rax movq %fs:(%rdx), %rdx movl %(rdx, %rax), %eax There might be other variant sequences. Implementing this on the glibc side would require fundamental changes to the TLS allocator, which is why this isn't straightforward. (f) A less ambitions variant of (e): A new TLS descriptor call back that returns the address of the TLS variable, and not the offset from the thread pointer. This is much easier to implement on the glibc side. The current GNU2 TLS descriptor callback is optimized for static TLS access. We can avoid a memory access in the static TLS callback if we use the RDFSBASE instruction (if glibc detects run-time support). It's a new relocation type, so this too needs GCC, linker, ABI changes. However, these changes are largely mechanical (except perhaps for the relaxation support). Basically, TLS accesses would change from leaq threadvar@TLSDESC(%rip), %rax call *threadvar@TLSCALL(%rax) movl %fs:(%rax), %eax to: leaq threadvar@TLSDESC2(%rip), %rax call *threadvar@TLSCALL2(%rax) movl (%rax), %eax And the implementation of the static TLS case would change from endbr64 movq 8(%rax), %rax retq to: endbr64 rdfsbase %rax addq %rsi, %rax retq But I don't think this detour is worth it if we eventually want to land on (e). I'm personally leaning towards (d) or (a) for GCC 16. I dislike (b). And (e) is unrealistic in the short term. Thanks, Florian