Re: GCC 12.5 Release Candidate available from gcc.gnu.org

2025-07-07 Thread Richard Biener via Gcc
On Mon, 7 Jul 2025, Iain Sandoe wrote:

> 
> 
> > On 4 Jul 2025, at 08:53, Richard Biener via Gcc  wrote:
> > 
> > The first release candidate for GCC 12.5 is available from
> > 
> > https://gcc.gnu.org/pub/gcc/snapshots/12.5.0-RC-20250704/
> > ftp://gcc.gnu.org/pub/gcc/snapshots/12.5.0-RC-20250704/
> > 
> > and shortly its mirrors.  It has been generated from git commit
> > r12-11250-gb71ac987cd1499.
> > 
> > I have so far bootstrapped and tested the release candidate on
> > x86_64-linux.
> > Please test it and report any issues to bugzilla.
> > 
> > If all goes well, we'd like to release 12.5 on Friday, Jul 11th
> > and close the branch.
> 
> I have tested this on a range of Darwin/macOS platforms and, 
> unfortunately, identified that I have ommitted one backport that
> has considerable fallout on the latest macOS + latest Xcode.
> 
> The newer OS tools now emit a warning for the use of an
> obsolete command line option - which leads to around 13k test
> fails (e.g. 
> https://gcc.gnu.org/pipermail/gcc-testresults/2025-July/852085.html)
> 
> The patch that’s needed is completely Darwin-local:
> r14-2269-g3c776fdf1a8258 
> 
> This affects Darwin23 (macOS 14/Sonoma) and later OS versions
> that need the newer tools.
> 
> I wonder if it would be possible to apply this, since the branch will
> now be closed and therefore there’s no opportunity to fix it in the
> future.

Yes, this is fine to apply.

Richard.


Re: Switching x86-64 to GNU2 TLS descriptors

2025-07-07 Thread Richard Biener via Gcc
On Mon, Jul 7, 2025 at 10:39 AM Florian Weimer via Gcc  wrote:
>
> H.J. proposed to switch the default for GCC 16 (turning on
> -mtls-dialect=gnu2 by default).  This is a bit tricky because when we
> tried to make the switch in Fedora (for eventual implementation), we hit
> an ABI compatibility problem:
>
>   _dl_tlsdesc_dynamic doesn't preserve all caller-saved registers
>   
>
> This means that changing the defaults can have backwards compatibility
> impact with older distributions.
>
> (a) Do not nothing special and switch the default.  Maybe try to
> backport the glibc fix to more release branches and distributions.  I
> think we implicitly decided to follow this path when we decided thiswas
> a glibc bug and not a GCC bug.  The downside is that missing the bug fix
> can result in unexpected, difficult-to-diagnose behavior.  However, when
> we rebuilt Fedora, the problem was exceedingly rare (we observed one
> single failure, if I recall correctly).
>
> (b) Introduce binary markup to indicate that binaries may need the glibc
> fix, and that glibc has the fix.
>
>   [PATCH] x86-64: Add GLIBC_ABI_GNU2_TLS [BZ #33129]
>   
> 
>
> This requires changes to all linkers, GCC and glibc.
>
> (c) Introduce a new relocation type with the same behavior as
> R_X86_64_TLSDESC.  Unpatched glibc will not support it and error out
> during relocation processing.  Requires linker changes, GCC and glibc
> changes.  Does not produce a nice error message, unlike the
> GLIBC_ABI_GNU2_TLS change.  Ideally would need package manager changes
> to produce the right dependencies (with GLIBC_ABI_GNU2_TLS, this could
> happen automatically).
>
> (d) Make the GCC default conditional on the glibc version used at GCC
> build time.  Add __memcmpeq support to GCC 16.  Maybe add
> errno@@GLIBC_2.43 to glibc 2.43.  Even today, it is likely that binaries
> contain at least one symbol version reference to something that is
> relatively recent, and the __memcmpeq and errno changes would increase
> this effect.  Combined with the backport mentioned under (a), that could
> be enough to force glibc upgrades in pretty much all cases.  We have
> __libc_start_main@@GLIBC_2.34, so if the glibc backports go back to 2.34
> (or even 2.31), only shared objects suffer from this issue.  Among the
> Fedora binaries, the outliers without dependencies on recent glibc are
> mostly Perl modules, and I expect the errno and __memcmpeq would cover
> at least some of these.  This is not as clean as (b) and (c), but only
> needs glibc and GCC changes (for __memcmpeq).  It does not achieve 100%
> bug prevention, but given that bugs seem to be rare, this may be good
> enough.
>
> (e) Skip over GNU2 TLS altogether and implement inline TLS sequences
> (GNU3 descriptors?) that do not have the dlopen incompatibility of
> initial-exec TLS.  This is currently vaporware.  It requires nontrivial
> glibc changes, GCC changes, linker changes, and x86-64 psABI work to
> define new relocation types and perhaps relaxations.  This is probably
> what we want long-term.  User experience is similar to (c), but with
> more implementation sequences.
>
> For comparison with an initial-exec TLS read,
>
> movqthreadvar@gottpoff(%rip), %rax
> movl%fs:(%rax), %eax
>
> this could look like this:
>
> movlthreadvar@gottpslot, %eax
> movq%fs:(%rax), %rax
> movlthreadvar@gottlsslotoff, %ecx
> movl(%rcx, %rax), %eax
>
> Or with the descriptor in one word:
>
> movqthreadvar@gottpslotoff, %rax
> movq%rax, %rdx
> movq%fs:(%eax), %rax
> shrq$32, %rdx
> movl%(rax, %rdx), %eax
>
> Or with a bit shorter instruction, using a 32-bit descriptor (which
> still could cover at least 3 GiB of TLS data per thread):
>
> movlthreadvar@gottpslotoff, %rax
> movzbl  %al, %edx
> shr $8, %eax
> movq%fs:64(%edx), %rdx
> mov (%rdx, %rax), %eax
>
> And if we want a negative TLS slot index (which glibc would not use, and
> I think it's incompatible with local-exec TLS anyway):
>
> movqthreadvar@gottpslotoff, %rax
> movslq  %eax, %rdx
> shrq$32, %rax
> movq%fs:(%rdx), %rdx
> movl%(rdx, %rax), %eax
>
> There might be other variant sequences.
>
> Implementing this on the glibc side would require fundamental changes to
> the TLS allocator, which is why this isn't straightforward.
>
> (f) A less ambitions variant of (e): A new TLS descriptor call back that
> returns the address of the TLS variable, and not the offset from the
> thread pointer.  This is much easier to implement on the glibc side.
> The current GNU2 TLS descriptor callback is optimized for static TLS
> access.  We can avoid a memory access in the static TLS callback if we
> use the R

Switching x86-64 to GNU2 TLS descriptors

2025-07-07 Thread Florian Weimer via Gcc
H.J. proposed to switch the default for GCC 16 (turning on
-mtls-dialect=gnu2 by default).  This is a bit tricky because when we
tried to make the switch in Fedora (for eventual implementation), we hit
an ABI compatibility problem:

  _dl_tlsdesc_dynamic doesn't preserve all caller-saved registers
  

This means that changing the defaults can have backwards compatibility
impact with older distributions.

(a) Do not nothing special and switch the default.  Maybe try to
backport the glibc fix to more release branches and distributions.  I
think we implicitly decided to follow this path when we decided thiswas
a glibc bug and not a GCC bug.  The downside is that missing the bug fix
can result in unexpected, difficult-to-diagnose behavior.  However, when
we rebuilt Fedora, the problem was exceedingly rare (we observed one
single failure, if I recall correctly).

(b) Introduce binary markup to indicate that binaries may need the glibc
fix, and that glibc has the fix.

  [PATCH] x86-64: Add GLIBC_ABI_GNU2_TLS [BZ #33129]
  


This requires changes to all linkers, GCC and glibc.

(c) Introduce a new relocation type with the same behavior as
R_X86_64_TLSDESC.  Unpatched glibc will not support it and error out
during relocation processing.  Requires linker changes, GCC and glibc
changes.  Does not produce a nice error message, unlike the
GLIBC_ABI_GNU2_TLS change.  Ideally would need package manager changes
to produce the right dependencies (with GLIBC_ABI_GNU2_TLS, this could
happen automatically).

(d) Make the GCC default conditional on the glibc version used at GCC
build time.  Add __memcmpeq support to GCC 16.  Maybe add
errno@@GLIBC_2.43 to glibc 2.43.  Even today, it is likely that binaries
contain at least one symbol version reference to something that is
relatively recent, and the __memcmpeq and errno changes would increase
this effect.  Combined with the backport mentioned under (a), that could
be enough to force glibc upgrades in pretty much all cases.  We have
__libc_start_main@@GLIBC_2.34, so if the glibc backports go back to 2.34
(or even 2.31), only shared objects suffer from this issue.  Among the
Fedora binaries, the outliers without dependencies on recent glibc are
mostly Perl modules, and I expect the errno and __memcmpeq would cover
at least some of these.  This is not as clean as (b) and (c), but only
needs glibc and GCC changes (for __memcmpeq).  It does not achieve 100%
bug prevention, but given that bugs seem to be rare, this may be good
enough.

(e) Skip over GNU2 TLS altogether and implement inline TLS sequences
(GNU3 descriptors?) that do not have the dlopen incompatibility of
initial-exec TLS.  This is currently vaporware.  It requires nontrivial
glibc changes, GCC changes, linker changes, and x86-64 psABI work to
define new relocation types and perhaps relaxations.  This is probably
what we want long-term.  User experience is similar to (c), but with
more implementation sequences.

For comparison with an initial-exec TLS read,

movqthreadvar@gottpoff(%rip), %rax
movl%fs:(%rax), %eax

this could look like this:

movlthreadvar@gottpslot, %eax
movq%fs:(%rax), %rax
movlthreadvar@gottlsslotoff, %ecx
movl(%rcx, %rax), %eax

Or with the descriptor in one word:

movqthreadvar@gottpslotoff, %rax
movq%rax, %rdx
movq%fs:(%eax), %rax
shrq$32, %rdx
movl%(rax, %rdx), %eax

Or with a bit shorter instruction, using a 32-bit descriptor (which
still could cover at least 3 GiB of TLS data per thread):

movlthreadvar@gottpslotoff, %rax
movzbl  %al, %edx
shr $8, %eax
movq%fs:64(%edx), %rdx
mov (%rdx, %rax), %eax

And if we want a negative TLS slot index (which glibc would not use, and
I think it's incompatible with local-exec TLS anyway):

movqthreadvar@gottpslotoff, %rax
movslq  %eax, %rdx
shrq$32, %rax
movq%fs:(%rdx), %rdx
movl%(rdx, %rax), %eax

There might be other variant sequences.

Implementing this on the glibc side would require fundamental changes to
the TLS allocator, which is why this isn't straightforward.

(f) A less ambitions variant of (e): A new TLS descriptor call back that
returns the address of the TLS variable, and not the offset from the
thread pointer.  This is much easier to implement on the glibc side.
The current GNU2 TLS descriptor callback is optimized for static TLS
access.  We can avoid a memory access in the static TLS callback if we
use the RDFSBASE instruction (if glibc detects run-time support).  It's
a new relocation type, so this too needs GCC, linker, ABI changes.
However, these changes are largely mechanical (except perhaps for the
relaxation support).  Basically, TLS accesses would change fr

Re: scan-*-dump-times across multiple functions considered harmful

2025-07-07 Thread David Malcolm via Gcc
On Thu, 2025-07-03 at 10:12 +0100, Joern Wolfgang Rennecke wrote:


> 
> On 02/07/2025 18:59, David Malcolm wrote:
>   ...
> > Brainstorming some ideas on other possible approaches on making our
> > tests less brittle; for context I did some investigation back in
> > 2018
> > about implementing "optimizations remarks" like clang does:
> > diagnostics
> > about optimization decisions, so you could have a dg directive like
> > this on a particular line:
> > 
> >    foo ();  /* { dg-remark "inlined call to 'foo' into 'bar'" } */
> 
> I like the idea.  However, it seems unlikely that we can make a
> clean switchover in this decade, unless you find one or more
> corporate sponsors.

> We probably always want dump files without a rigid structure, because
> it makes it easier to add debug output when you flesh out a new pass
> or a change to an existing one.  We can make the calls that generate
> the json output also emit output in the dump file, so we won't carry
> a doubled maintenance burden; however, this means the current ad-hoc 
> messages would become more unified; thus the testsuite will have to
> be adjusted.
> FWIW, even you you were to get rid of the current dump files (which
> I think would be stifling for GCC development for the above
> reasons), you would have to adjust the testsuite.
> So, we could use the json framework for new dump output that is
> contributed before or along with the parts of the testsuite that
> scan for it, but for any legacy dump output that is scanned for
> in the testsuite, that requires to adjust the testsuite.  More
> than 26K dejagnu scan-*-dump* directives in the gcc15 testsuite.
> And you'll have a bit flag day, or a ton of small ones.  Plus
> all the friction that this will create with porting patches up
> and down gcc versions.
> That is a lot of thankless work, which I can't imagine doing as a
> hobby.  And condifering people at the start of their career who
> might think of doing some unpaid drudge work in hope of getting 
> recognition that'll get them some paid work, with paying work
> for GCC drying up, they would more likely do something for LLVM,
> which also seems to better align with the skills of recent
> graduates.

Hi Joern

I don't think the two approaches (dumpfiles vs remarks) are mutually
exclusive: I was thinking of an approach where we extend the existing
dump mechanism so that messages go both to the dumpfiles, *and* are
emitted as remarks, with a new command-line option to enable the latter
sink.

Thus people writing new testcases would have the option of using
remarks to get greater precision about what is being tested, but the
existing testcases continue to work without needing porting, and
dumpfiles can continue to contain ad-hoc information.  In particular,
there wouldn't need to be a big "flag day" change, but obviously test
cases using dg-remark would only work for versions of GCC that support
remarks.

> 
> So, unless/until you have (a) corporate sponsor(s) to pay for
> the work on the existing testsuite - and that work is
> successfully concluded - we will have to find a way to
> make the scans of the dump files more maintainable.
> In fact, if we can solve the maintenance hassle of having
> multiple in a test by making the scan patterns more specific,
> so we don't have to split the tests up, that will put us in
> better position if/when the transition to a more organized
> optimization records system is made.
> 

I can try prototyping something before Cauldron (though I have a fairly
full plate already for GCC 16).

Dave



Free Piano to Someone Who’ll Love It

2025-07-07 Thread Jeanie Fruge via Gcc
Dear Gcc,

I hope this message finds you well. I wanted to check in regarding the message 
I sent earlier about the Yamaha piano that belonged to my late husband. It’s a 
special piece with a lot of meaning, and I’d be so happy if it ended up with 
someone who truly appreciates music.

Please feel free to reach out if you’d like any additional information or if 
someone you know might be a good fit.

Thank you for considering this, and I’d appreciate any thoughts or suggestions 
you might have.

Best regards,
Jeanie


Re: Switching x86-64 to GNU2 TLS descriptors

2025-07-07 Thread Florian Weimer via Gcc
* Richard Biener:

> I think both (a) or (d) are reasonable, though I am missing a
> configure time flag to override the changed default.  Even with
> glibc fixed we likely do not want to have this change in older
> enterprise code streams given there might be unknown external
> tooling that might be confused.

Yes, a configure flag makes sense.

> Oh, and what exactly is the advantage of GNU TLS2 descriptors?

The GNU2 TLS descriptor callback preserves most registers, and does not
need to save many registers on its fast path.  This isn't true for
__tls_get_addr, which follows the standard calling convention.  The
descriptors can be specialized based on the DSO that defines the TLS
variable.  So GNU2 TLS descriptors are expected a little to be a bit
faster.

Thanks,
Florian



Re: Switching x86-64 to GNU2 TLS descriptors

2025-07-07 Thread H.J. Lu via Gcc
On Mon, Jul 7, 2025 at 4:37 PM Florian Weimer  wrote:
>
> H.J. proposed to switch the default for GCC 16 (turning on
> -mtls-dialect=gnu2 by default).  This is a bit tricky because when we
> tried to make the switch in Fedora (for eventual implementation), we hit
> an ABI compatibility problem:
>
>   _dl_tlsdesc_dynamic doesn't preserve all caller-saved registers
>   
>
> This means that changing the defaults can have backwards compatibility
> impact with older distributions.
>
> (a) Do not nothing special and switch the default.  Maybe try to
> backport the glibc fix to more release branches and distributions.  I
> think we implicitly decided to follow this path when we decided thiswas
> a glibc bug and not a GCC bug.  The downside is that missing the bug fix
> can result in unexpected, difficult-to-diagnose behavior.  However, when
> we rebuilt Fedora, the problem was exceedingly rare (we observed one
> single failure, if I recall correctly).
>
> (b) Introduce binary markup to indicate that binaries may need the glibc
> fix, and that glibc has the fix.
>
>   [PATCH] x86-64: Add GLIBC_ABI_GNU2_TLS [BZ #33129]
>   
> 
>
> This requires changes to all linkers, GCC and glibc.

This option is independent of GCC.   Only glibc and linker changes
are needed.   It just introduces a glibc version dependency whenever
GNU2 TLS is used, regardless whether it is the default or not.

> (c) Introduce a new relocation type with the same behavior as
> R_X86_64_TLSDESC.  Unpatched glibc will not support it and error out
> during relocation processing.  Requires linker changes, GCC and glibc
> changes.  Does not produce a nice error message, unlike the
> GLIBC_ABI_GNU2_TLS change.  Ideally would need package manager changes
> to produce the right dependencies (with GLIBC_ABI_GNU2_TLS, this could
> happen automatically).
>
> (d) Make the GCC default conditional on the glibc version used at GCC
> build time.  Add __memcmpeq support to GCC 16.  Maybe add
> errno@@GLIBC_2.43 to glibc 2.43.  Even today, it is likely that binaries
> contain at least one symbol version reference to something that is
> relatively recent, and the __memcmpeq and errno changes would increase
> this effect.  Combined with the backport mentioned under (a), that could
> be enough to force glibc upgrades in pretty much all cases.  We have
> __libc_start_main@@GLIBC_2.34, so if the glibc backports go back to 2.34
> (or even 2.31), only shared objects suffer from this issue.  Among the
> Fedora binaries, the outliers without dependencies on recent glibc are
> mostly Perl modules, and I expect the errno and __memcmpeq would cover
> at least some of these.  This is not as clean as (b) and (c), but only
> needs glibc and GCC changes (for __memcmpeq).  It does not achieve 100%
> bug prevention, but given that bugs seem to be rare, this may be good
> enough.
>
> (e) Skip over GNU2 TLS altogether and implement inline TLS sequences
> (GNU3 descriptors?) that do not have the dlopen incompatibility of
> initial-exec TLS.  This is currently vaporware.  It requires nontrivial
> glibc changes, GCC changes, linker changes, and x86-64 psABI work to
> define new relocation types and perhaps relaxations.  This is probably
> what we want long-term.  User experience is similar to (c), but with
> more implementation sequences.
>
> For comparison with an initial-exec TLS read,
>
> movqthreadvar@gottpoff(%rip), %rax
> movl%fs:(%rax), %eax
>
> this could look like this:
>
> movlthreadvar@gottpslot, %eax
> movq%fs:(%rax), %rax
> movlthreadvar@gottlsslotoff, %ecx
> movl(%rcx, %rax), %eax
>
> Or with the descriptor in one word:
>
> movqthreadvar@gottpslotoff, %rax
> movq%rax, %rdx
> movq%fs:(%eax), %rax
> shrq$32, %rdx
> movl%(rax, %rdx), %eax
>
> Or with a bit shorter instruction, using a 32-bit descriptor (which
> still could cover at least 3 GiB of TLS data per thread):
>
> movlthreadvar@gottpslotoff, %rax
> movzbl  %al, %edx
> shr $8, %eax
> movq%fs:64(%edx), %rdx
> mov (%rdx, %rax), %eax
>
> And if we want a negative TLS slot index (which glibc would not use, and
> I think it's incompatible with local-exec TLS anyway):
>
> movqthreadvar@gottpslotoff, %rax
> movslq  %eax, %rdx
> shrq$32, %rax
> movq%fs:(%rdx), %rdx
> movl%(rdx, %rax), %eax
>
> There might be other variant sequences.
>
> Implementing this on the glibc side would require fundamental changes to
> the TLS allocator, which is why this isn't straightforward.
>
> (f) A less ambitions variant of (e): A new TLS descriptor call back that
> returns the address of the TLS variable, and not the offset from the
> thread pointer.  This is much easi

Offset vtable address

2025-07-07 Thread Thomas de Bock via Gcc
Currently working with the C++-frontend, I am trying to compare the addresses 
of the "trgt" (originally a type) and "src" (originally a class instance) 
vtables.

Currently I am successfully retrieving the vptr value of the src instance at 
runtime, and the vtable address of the trgt at compiletime, with the following 
code:


  tree src_vptr = build_vfield_ref(src_obj, TREE_TYPE(src_obj));
  tree trgt_vtbl_decl = get_vtable_decl(target_type, 0);

  // Gives actual vtable address, vptr does not (offsetted by RTTI and 
offset-to-top)
  tree trgt_vtbl_addr = build_address(trgt_vtbl_decl);

However, as the comment suggests, this is not satisfactory: Due to the way the 
vptr is offsetted by the offset-to-top and RTTI ptr, they differ by 0x10 bytes.

Not wanting to just offset by a hardcoded amount, I tried to adjust for this by 
getting the DECL_INITIAL of the trgt_vtbl_decl then trying to get the 
decleration at index 2 from the CONSTRUCTOR_ELTS, but this resulted in strange 
values.
Is there anyone that knows how I can account for this offset using the c++ 
frontend API, is just hardcoding the offset to be 2 pointers fine too?
Hoping to get it merged, so dedicated to following proper convention, any help 
very much appreciated, thank you.

This e-mail and any attachments may contain information that is confidential 
and proprietary and otherwise protected from disclosure. If you are not the 
intended recipient of this e-mail, do not read, duplicate or redistribute it by 
any means. Please immediately delete it and any attachments and notify the 
sender that you have received it by mistake. Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail or any 
attachments. The DRW Companies make no representations that this e-mail or any 
attachments are free of computer viruses or other defects.


Re: Offset vtable address

2025-07-07 Thread Thomas de Bock via Gcc
Managed to get it working with:


  tree index = build_int_cst (NULL_TREE,
  -2 * TARGET_VTABLE_DATA_ENTRY_DISTANCE);
  tree src_vptr = build_address(build_vtbl_ref(src_obj, index));
  tree trgt_vtbl_decl = get_vtable_decl(target_type, 0);
  tree trgt_vtbl_addr = build_address(trgt_vtbl_decl);



From: Thomas de Bock
Sent: 07 July 2025 16:05:17
To: gcc@gcc.gnu.org
Subject: Offset vtable address


Currently working with the C++-frontend, I am trying to compare the addresses 
of the "trgt" (originally a type) and "src" (originally a class instance) 
vtables.

Currently I am successfully retrieving the vptr value of the src instance at 
runtime, and the vtable address of the trgt at compiletime, with the following 
code:


  tree src_vptr = build_vfield_ref(src_obj, TREE_TYPE(src_obj));
  tree trgt_vtbl_decl = get_vtable_decl(target_type, 0);

  // Gives actual vtable address, vptr does not (offsetted by RTTI and 
offset-to-top)
  tree trgt_vtbl_addr = build_address(trgt_vtbl_decl);

However, as the comment suggests, this is not satisfactory: Due to the way the 
vptr is offsetted by the offset-to-top and RTTI ptr, they differ by 0x10 bytes.

Not wanting to just offset by a hardcoded amount, I tried to adjust for this by 
getting the DECL_INITIAL of the trgt_vtbl_decl then trying to get the 
decleration at index 2 from the CONSTRUCTOR_ELTS, but this resulted in strange 
values.
Is there anyone that knows how I can account for this offset using the c++ 
frontend API, is just hardcoding the offset to be 2 pointers fine too?
Hoping to get it merged, so dedicated to following proper convention, any help 
very much appreciated, thank you.

This e-mail and any attachments may contain information that is confidential 
and proprietary and otherwise protected from disclosure. If you are not the 
intended recipient of this e-mail, do not read, duplicate or redistribute it by 
any means. Please immediately delete it and any attachments and notify the 
sender that you have received it by mistake. Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail or any 
attachments. The DRW Companies make no representations that this e-mail or any 
attachments are free of computer viruses or other defects.


Re: Switching x86-64 to GNU2 TLS descriptors

2025-07-07 Thread Adhemerval Zanella Netto via Gcc



On 07/07/25 05:37, Florian Weimer via Gcc wrote:
> H.J. proposed to switch the default for GCC 16 (turning on
> -mtls-dialect=gnu2 by default).  This is a bit tricky because when we
> tried to make the switch in Fedora (for eventual implementation), we hit
> an ABI compatibility problem:
> 
>   _dl_tlsdesc_dynamic doesn't preserve all caller-saved registers
>   
> 
> This means that changing the defaults can have backwards compatibility
> impact with older distributions.
> 
> (a) Do not nothing special and switch the default.  Maybe try to
> backport the glibc fix to more release branches and distributions.  I
> think we implicitly decided to follow this path when we decided thiswas
> a glibc bug and not a GCC bug.  The downside is that missing the bug fix
> can result in unexpected, difficult-to-diagnose behavior.  However, when
> we rebuilt Fedora, the problem was exceedingly rare (we observed one
> single failure, if I recall correctly).
> 
> (b) Introduce binary markup to indicate that binaries may need the glibc
> fix, and that glibc has the fix.
> 
>   [PATCH] x86-64: Add GLIBC_ABI_GNU2_TLS [BZ #33129]
>   
> 
> 
> This requires changes to all linkers, GCC and glibc.
> 
> (c) Introduce a new relocation type with the same behavior as
> R_X86_64_TLSDESC.  Unpatched glibc will not support it and error out
> during relocation processing.  Requires linker changes, GCC and glibc
> changes.  Does not produce a nice error message, unlike the
> GLIBC_ABI_GNU2_TLS change.  Ideally would need package manager changes
> to produce the right dependencies (with GLIBC_ABI_GNU2_TLS, this could
> happen automatically).
> 
> (d) Make the GCC default conditional on the glibc version used at GCC
> build time.  Add __memcmpeq support to GCC 16.  Maybe add
> errno@@GLIBC_2.43 to glibc 2.43.  Even today, it is likely that binaries
> contain at least one symbol version reference to something that is
> relatively recent, and the __memcmpeq and errno changes would increase
> this effect.  Combined with the backport mentioned under (a), that could
> be enough to force glibc upgrades in pretty much all cases.  We have
> __libc_start_main@@GLIBC_2.34, so if the glibc backports go back to 2.34
> (or even 2.31), only shared objects suffer from this issue.  Among the
> Fedora binaries, the outliers without dependencies on recent glibc are
> mostly Perl modules, and I expect the errno and __memcmpeq would cover
> at least some of these.  This is not as clean as (b) and (c), but only
> needs glibc and GCC changes (for __memcmpeq).  It does not achieve 100%
> bug prevention, but given that bugs seem to be rare, this may be good
> enough.
> 
> (e) Skip over GNU2 TLS altogether and implement inline TLS sequences
> (GNU3 descriptors?) that do not have the dlopen incompatibility of
> initial-exec TLS.  This is currently vaporware.  It requires nontrivial
> glibc changes, GCC changes, linker changes, and x86-64 psABI work to
> define new relocation types and perhaps relaxations.  This is probably
> what we want long-term.  User experience is similar to (c), but with
> more implementation sequences.
> 
> For comparison with an initial-exec TLS read,
> 
>   movqthreadvar@gottpoff(%rip), %rax
>   movl%fs:(%rax), %eax
> 
> this could look like this:
> 
>   movlthreadvar@gottpslot, %eax
> movq  %fs:(%rax), %rax
> movl  threadvar@gottlsslotoff, %ecx
> movl  (%rcx, %rax), %eax
> 
> Or with the descriptor in one word:
> 
>   movqthreadvar@gottpslotoff, %rax
> movq  %rax, %rdx
> movq  %fs:(%eax), %rax
> shrq  $32, %rdx
> movl  %(rax, %rdx), %eax
> 
> Or with a bit shorter instruction, using a 32-bit descriptor (which
> still could cover at least 3 GiB of TLS data per thread):
> 
>   movlthreadvar@gottpslotoff, %rax
> movzbl%al, %edx
> shr   $8, %eax
> movq  %fs:64(%edx), %rdx
> mov   (%rdx, %rax), %eax
> 
> And if we want a negative TLS slot index (which glibc would not use, and
> I think it's incompatible with local-exec TLS anyway):
> 
>   movqthreadvar@gottpslotoff, %rax
> movslq%eax, %rdx
> shrq  $32, %rax
> movq  %fs:(%rdx), %rdx
> movl  %(rdx, %rax), %eax
> 
> There might be other variant sequences.
> 
> Implementing this on the glibc side would require fundamental changes to
> the TLS allocator, which is why this isn't straightforward.
> 
> (f) A less ambitions variant of (e): A new TLS descriptor call back that
> returns the address of the TLS variable, and not the offset from the
> thread pointer.  This is much easier to implement on the glibc side.
> The current GNU2 TLS descriptor callback is optimized for static TLS
> access.  We can avoid a memory access in the static TLS callback if we
> use the RDFSBASE instructi

Re: Switching x86-64 to GNU2 TLS descriptors

2025-07-07 Thread Richard Biener via Gcc
On Mon, Jul 7, 2025 at 10:50 AM Richard Biener
 wrote:
>
> On Mon, Jul 7, 2025 at 10:39 AM Florian Weimer via Gcc  
> wrote:
> >
> > H.J. proposed to switch the default for GCC 16 (turning on
> > -mtls-dialect=gnu2 by default).  This is a bit tricky because when we
> > tried to make the switch in Fedora (for eventual implementation), we hit
> > an ABI compatibility problem:
> >
> >   _dl_tlsdesc_dynamic doesn't preserve all caller-saved registers
> >   
> >
> > This means that changing the defaults can have backwards compatibility
> > impact with older distributions.
> >
> > (a) Do not nothing special and switch the default.  Maybe try to
> > backport the glibc fix to more release branches and distributions.  I
> > think we implicitly decided to follow this path when we decided thiswas
> > a glibc bug and not a GCC bug.  The downside is that missing the bug fix
> > can result in unexpected, difficult-to-diagnose behavior.  However, when
> > we rebuilt Fedora, the problem was exceedingly rare (we observed one
> > single failure, if I recall correctly).
> >
> > (b) Introduce binary markup to indicate that binaries may need the glibc
> > fix, and that glibc has the fix.
> >
> >   [PATCH] x86-64: Add GLIBC_ABI_GNU2_TLS [BZ #33129]
> >   
> > 
> >
> > This requires changes to all linkers, GCC and glibc.
> >
> > (c) Introduce a new relocation type with the same behavior as
> > R_X86_64_TLSDESC.  Unpatched glibc will not support it and error out
> > during relocation processing.  Requires linker changes, GCC and glibc
> > changes.  Does not produce a nice error message, unlike the
> > GLIBC_ABI_GNU2_TLS change.  Ideally would need package manager changes
> > to produce the right dependencies (with GLIBC_ABI_GNU2_TLS, this could
> > happen automatically).
> >
> > (d) Make the GCC default conditional on the glibc version used at GCC
> > build time.  Add __memcmpeq support to GCC 16.  Maybe add
> > errno@@GLIBC_2.43 to glibc 2.43.  Even today, it is likely that binaries
> > contain at least one symbol version reference to something that is
> > relatively recent, and the __memcmpeq and errno changes would increase
> > this effect.  Combined with the backport mentioned under (a), that could
> > be enough to force glibc upgrades in pretty much all cases.  We have
> > __libc_start_main@@GLIBC_2.34, so if the glibc backports go back to 2.34
> > (or even 2.31), only shared objects suffer from this issue.  Among the
> > Fedora binaries, the outliers without dependencies on recent glibc are
> > mostly Perl modules, and I expect the errno and __memcmpeq would cover
> > at least some of these.  This is not as clean as (b) and (c), but only
> > needs glibc and GCC changes (for __memcmpeq).  It does not achieve 100%
> > bug prevention, but given that bugs seem to be rare, this may be good
> > enough.
> >
> > (e) Skip over GNU2 TLS altogether and implement inline TLS sequences
> > (GNU3 descriptors?) that do not have the dlopen incompatibility of
> > initial-exec TLS.  This is currently vaporware.  It requires nontrivial
> > glibc changes, GCC changes, linker changes, and x86-64 psABI work to
> > define new relocation types and perhaps relaxations.  This is probably
> > what we want long-term.  User experience is similar to (c), but with
> > more implementation sequences.
> >
> > For comparison with an initial-exec TLS read,
> >
> > movqthreadvar@gottpoff(%rip), %rax
> > movl%fs:(%rax), %eax
> >
> > this could look like this:
> >
> > movlthreadvar@gottpslot, %eax
> > movq%fs:(%rax), %rax
> > movlthreadvar@gottlsslotoff, %ecx
> > movl(%rcx, %rax), %eax
> >
> > Or with the descriptor in one word:
> >
> > movqthreadvar@gottpslotoff, %rax
> > movq%rax, %rdx
> > movq%fs:(%eax), %rax
> > shrq$32, %rdx
> > movl%(rax, %rdx), %eax
> >
> > Or with a bit shorter instruction, using a 32-bit descriptor (which
> > still could cover at least 3 GiB of TLS data per thread):
> >
> > movlthreadvar@gottpslotoff, %rax
> > movzbl  %al, %edx
> > shr $8, %eax
> > movq%fs:64(%edx), %rdx
> > mov (%rdx, %rax), %eax
> >
> > And if we want a negative TLS slot index (which glibc would not use, and
> > I think it's incompatible with local-exec TLS anyway):
> >
> > movqthreadvar@gottpslotoff, %rax
> > movslq  %eax, %rdx
> > shrq$32, %rax
> > movq%fs:(%rdx), %rdx
> > movl%(rdx, %rax), %eax
> >
> > There might be other variant sequences.
> >
> > Implementing this on the glibc side would require fundamental changes to
> > the TLS allocator, which is why this isn't straightforward.
> >
> > (f) A less ambitions variant of (e): A new TLS descriptor call back that
> > returns the address o