On Mon, Jul 7, 2025 at 10:39 AM Florian Weimer via Gcc <gcc@gcc.gnu.org> wrote:
>
> H.J. proposed to switch the default for GCC 16 (turning on
> -mtls-dialect=gnu2 by default).  This is a bit tricky because when we
> tried to make the switch in Fedora (for eventual implementation), we hit
> an ABI compatibility problem:
>
>   _dl_tlsdesc_dynamic doesn't preserve all caller-saved registers
>   <https://sourceware.org/bugzilla/show_bug.cgi?id=31372>
>
> This means that changing the defaults can have backwards compatibility
> impact with older distributions.
>
> (a) Do not nothing special and switch the default.  Maybe try to
> backport the glibc fix to more release branches and distributions.  I
> think we implicitly decided to follow this path when we decided thiswas
> a glibc bug and not a GCC bug.  The downside is that missing the bug fix
> can result in unexpected, difficult-to-diagnose behavior.  However, when
> we rebuilt Fedora, the problem was exceedingly rare (we observed one
> single failure, if I recall correctly).
>
> (b) Introduce binary markup to indicate that binaries may need the glibc
> fix, and that glibc has the fix.
>
>   [PATCH] x86-64: Add GLIBC_ABI_GNU2_TLS [BZ #33129]
>   
> <https://inbox.sourceware.org/libc-alpha/20250704205341.155335-1-hjl.to...@gmail.com/>
>
> This requires changes to all linkers, GCC and glibc.
>
> (c) Introduce a new relocation type with the same behavior as
> R_X86_64_TLSDESC.  Unpatched glibc will not support it and error out
> during relocation processing.  Requires linker changes, GCC and glibc
> changes.  Does not produce a nice error message, unlike the
> GLIBC_ABI_GNU2_TLS change.  Ideally would need package manager changes
> to produce the right dependencies (with GLIBC_ABI_GNU2_TLS, this could
> happen automatically).
>
> (d) Make the GCC default conditional on the glibc version used at GCC
> build time.  Add __memcmpeq support to GCC 16.  Maybe add
> errno@@GLIBC_2.43 to glibc 2.43.  Even today, it is likely that binaries
> contain at least one symbol version reference to something that is
> relatively recent, and the __memcmpeq and errno changes would increase
> this effect.  Combined with the backport mentioned under (a), that could
> be enough to force glibc upgrades in pretty much all cases.  We have
> __libc_start_main@@GLIBC_2.34, so if the glibc backports go back to 2.34
> (or even 2.31), only shared objects suffer from this issue.  Among the
> Fedora binaries, the outliers without dependencies on recent glibc are
> mostly Perl modules, and I expect the errno and __memcmpeq would cover
> at least some of these.  This is not as clean as (b) and (c), but only
> needs glibc and GCC changes (for __memcmpeq).  It does not achieve 100%
> bug prevention, but given that bugs seem to be rare, this may be good
> enough.
>
> (e) Skip over GNU2 TLS altogether and implement inline TLS sequences
> (GNU3 descriptors?) that do not have the dlopen incompatibility of
> initial-exec TLS.  This is currently vaporware.  It requires nontrivial
> glibc changes, GCC changes, linker changes, and x86-64 psABI work to
> define new relocation types and perhaps relaxations.  This is probably
> what we want long-term.  User experience is similar to (c), but with
> more implementation sequences.
>
> For comparison with an initial-exec TLS read,
>
>         movq    threadvar@gottpoff(%rip), %rax
>         movl    %fs:(%rax), %eax
>
> this could look like this:
>
>         movl    threadvar@gottpslot, %eax
>         movq    %fs:(%rax), %rax
>         movl    threadvar@gottlsslotoff, %ecx
>         movl    (%rcx, %rax), %eax
>
> Or with the descriptor in one word:
>
>         movq    threadvar@gottpslotoff, %rax
>         movq    %rax, %rdx
>         movq    %fs:(%eax), %rax
>         shrq    $32, %rdx
>         movl    %(rax, %rdx), %eax
>
> Or with a bit shorter instruction, using a 32-bit descriptor (which
> still could cover at least 3 GiB of TLS data per thread):
>
>         movl    threadvar@gottpslotoff, %rax
>         movzbl  %al, %edx
>         shr     $8, %eax
>         movq    %fs:64(%edx), %rdx
>         mov     (%rdx, %rax), %eax
>
> And if we want a negative TLS slot index (which glibc would not use, and
> I think it's incompatible with local-exec TLS anyway):
>
>         movq    threadvar@gottpslotoff, %rax
>         movslq  %eax, %rdx
>         shrq    $32, %rax
>         movq    %fs:(%rdx), %rdx
>         movl    %(rdx, %rax), %eax
>
> There might be other variant sequences.
>
> Implementing this on the glibc side would require fundamental changes to
> the TLS allocator, which is why this isn't straightforward.
>
> (f) A less ambitions variant of (e): A new TLS descriptor call back that
> returns the address of the TLS variable, and not the offset from the
> thread pointer.  This is much easier to implement on the glibc side.
> The current GNU2 TLS descriptor callback is optimized for static TLS
> access.  We can avoid a memory access in the static TLS callback if we
> use the RDFSBASE instruction (if glibc detects run-time support).  It's
> a new relocation type, so this too needs GCC, linker, ABI changes.
> However, these changes are largely mechanical (except perhaps for the
> relaxation support).  Basically, TLS accesses would change from
>
>         leaq    threadvar@TLSDESC(%rip), %rax
>         call    *threadvar@TLSCALL(%rax)
>         movl    %fs:(%rax), %eax
>
> to:
>
>         leaq    threadvar@TLSDESC2(%rip), %rax
>         call    *threadvar@TLSCALL2(%rax)
>         movl    (%rax), %eax
>
> And the implementation of the static TLS case would change from
>
>         endbr64
>         movq    8(%rax), %rax
>         retq
>
> to:
>
>         endbr64
>         rdfsbase %rax
>         addq    %rsi, %rax
>         retq
>
> But I don't think this detour is worth it if we eventually want to land
> on (e).
>
>
> I'm personally leaning towards (d) or (a) for GCC 16.  I dislike (b).
> And (e) is unrealistic in the short term.

I think both (a) or (d) are reasonable, though I am missing a
configure time flag to override the changed default.  Even with
glibc fixed we likely do not want to have this change in older
enterprise code streams given there might be unknown external
tooling that might be confused.

Richard.

> Thanks,
> Florian
>

Reply via email to