H.J. proposed to switch the default for GCC 16 (turning on
-mtls-dialect=gnu2 by default).  This is a bit tricky because when we
tried to make the switch in Fedora (for eventual implementation), we hit
an ABI compatibility problem:

  _dl_tlsdesc_dynamic doesn't preserve all caller-saved registers
  <https://sourceware.org/bugzilla/show_bug.cgi?id=31372>

This means that changing the defaults can have backwards compatibility
impact with older distributions.

(a) Do not nothing special and switch the default.  Maybe try to
backport the glibc fix to more release branches and distributions.  I
think we implicitly decided to follow this path when we decided thiswas
a glibc bug and not a GCC bug.  The downside is that missing the bug fix
can result in unexpected, difficult-to-diagnose behavior.  However, when
we rebuilt Fedora, the problem was exceedingly rare (we observed one
single failure, if I recall correctly).

(b) Introduce binary markup to indicate that binaries may need the glibc
fix, and that glibc has the fix.

  [PATCH] x86-64: Add GLIBC_ABI_GNU2_TLS [BZ #33129]
  
<https://inbox.sourceware.org/libc-alpha/20250704205341.155335-1-hjl.to...@gmail.com/>

This requires changes to all linkers, GCC and glibc.

(c) Introduce a new relocation type with the same behavior as
R_X86_64_TLSDESC.  Unpatched glibc will not support it and error out
during relocation processing.  Requires linker changes, GCC and glibc
changes.  Does not produce a nice error message, unlike the
GLIBC_ABI_GNU2_TLS change.  Ideally would need package manager changes
to produce the right dependencies (with GLIBC_ABI_GNU2_TLS, this could
happen automatically).

(d) Make the GCC default conditional on the glibc version used at GCC
build time.  Add __memcmpeq support to GCC 16.  Maybe add
errno@@GLIBC_2.43 to glibc 2.43.  Even today, it is likely that binaries
contain at least one symbol version reference to something that is
relatively recent, and the __memcmpeq and errno changes would increase
this effect.  Combined with the backport mentioned under (a), that could
be enough to force glibc upgrades in pretty much all cases.  We have
__libc_start_main@@GLIBC_2.34, so if the glibc backports go back to 2.34
(or even 2.31), only shared objects suffer from this issue.  Among the
Fedora binaries, the outliers without dependencies on recent glibc are
mostly Perl modules, and I expect the errno and __memcmpeq would cover
at least some of these.  This is not as clean as (b) and (c), but only
needs glibc and GCC changes (for __memcmpeq).  It does not achieve 100%
bug prevention, but given that bugs seem to be rare, this may be good
enough.

(e) Skip over GNU2 TLS altogether and implement inline TLS sequences
(GNU3 descriptors?) that do not have the dlopen incompatibility of
initial-exec TLS.  This is currently vaporware.  It requires nontrivial
glibc changes, GCC changes, linker changes, and x86-64 psABI work to
define new relocation types and perhaps relaxations.  This is probably
what we want long-term.  User experience is similar to (c), but with
more implementation sequences.

For comparison with an initial-exec TLS read,

        movq    threadvar@gottpoff(%rip), %rax
        movl    %fs:(%rax), %eax

this could look like this:

        movl    threadvar@gottpslot, %eax
        movq    %fs:(%rax), %rax
        movl    threadvar@gottlsslotoff, %ecx
        movl    (%rcx, %rax), %eax

Or with the descriptor in one word:

        movq    threadvar@gottpslotoff, %rax
        movq    %rax, %rdx
        movq    %fs:(%eax), %rax
        shrq    $32, %rdx
        movl    %(rax, %rdx), %eax

Or with a bit shorter instruction, using a 32-bit descriptor (which
still could cover at least 3 GiB of TLS data per thread):

        movl    threadvar@gottpslotoff, %rax
        movzbl  %al, %edx
        shr     $8, %eax
        movq    %fs:64(%edx), %rdx
        mov     (%rdx, %rax), %eax

And if we want a negative TLS slot index (which glibc would not use, and
I think it's incompatible with local-exec TLS anyway):

        movq    threadvar@gottpslotoff, %rax
        movslq  %eax, %rdx
        shrq    $32, %rax
        movq    %fs:(%rdx), %rdx
        movl    %(rdx, %rax), %eax

There might be other variant sequences.

Implementing this on the glibc side would require fundamental changes to
the TLS allocator, which is why this isn't straightforward.

(f) A less ambitions variant of (e): A new TLS descriptor call back that
returns the address of the TLS variable, and not the offset from the
thread pointer.  This is much easier to implement on the glibc side.
The current GNU2 TLS descriptor callback is optimized for static TLS
access.  We can avoid a memory access in the static TLS callback if we
use the RDFSBASE instruction (if glibc detects run-time support).  It's
a new relocation type, so this too needs GCC, linker, ABI changes.
However, these changes are largely mechanical (except perhaps for the
relaxation support).  Basically, TLS accesses would change from

        leaq    threadvar@TLSDESC(%rip), %rax
        call    *threadvar@TLSCALL(%rax)
        movl    %fs:(%rax), %eax

to:

        leaq    threadvar@TLSDESC2(%rip), %rax
        call    *threadvar@TLSCALL2(%rax)
        movl    (%rax), %eax

And the implementation of the static TLS case would change from

        endbr64
        movq    8(%rax), %rax
        retq

to:

        endbr64
        rdfsbase %rax
        addq    %rsi, %rax
        retq

But I don't think this detour is worth it if we eventually want to land
on (e).


I'm personally leaning towards (d) or (a) for GCC 16.  I dislike (b).
And (e) is unrealistic in the short term.

Thanks,
Florian

Reply via email to