Re: New TLS usage in libgcc_s.so.1, compatibility impact

2024-01-15 Thread Adhemerval Zanella Netto via Gcc



On 15/01/24 09:46, Szabolcs Nagy wrote:
> The 01/13/2024 13:49, Florian Weimer wrote:
>> This commit
>>
>> commit 8abddb187b33480d8827f44ec655f45734a1749d
>> Author: Andrew Burgess 
>> Date:   Sat Aug 5 14:31:06 2023 +0200
>>
>> libgcc: support heap-based trampolines
>> 
>> Add support for heap-based trampolines on x86_64-linux, aarch64-linux,
>> and x86_64-darwin. Implement the __builtin_nested_func_ptr_created and
>> __builtin_nested_func_ptr_deleted functions for these targets.
>> 
>> Co-Authored-By: Maxim Blinov 
>> Co-Authored-By: Iain Sandoe 
>> Co-Authored-By: Francois-Xavier Coudert 
>>
>> added TLS usage to libgcc_s.so.1.  The way that libgcc_s is currently
>> built, it ends up using a dynamic TLS variant on the Linux targets.
>> This means that there is no up-front TLS allocation with glibc (but
>> there would be one with musl).
>>
>> There is still a compatibility impact because glibc assigns a TLS module
>> ID upfront.  This seems to be what causes the
>> ust/libc-wrapper/test_libc-wrapper test in lttng-tools to fail.  We end
>> up with an infinite regress during process termination because
>> libgcc_s.so.1 has been loaded, resulting in a DTV update.  When this
>> happens, the bottom of the stack looks like this:
>>
>> #4447 0x77f288f0 in free () from 
>> /lib64/liblttng-ust-libc-wrapper.so.1
>> #4448 0x77fdb142 in free (ptr=)
>> at ../include/rtld-malloc.h:50
>> #4449 _dl_update_slotinfo (req_modid=3, new_gen=2) at ../elf/dl-tls.c:822
>> #4450 0x77fdb214 in update_get_addr (ti=0x77f2bfc0, 
>> gen=) at ../elf/dl-tls.c:916
>> #4451 0x77fddccc in __tls_get_addr ()
>> at ../sysdeps/x86_64/tls_get_addr.S:55
>> #4452 0x77f288f0 in free () from 
>> /lib64/liblttng-ust-libc-wrapper.so.1
>> #4453 0x77fdb142 in free (ptr=)
>> at ../include/rtld-malloc.h:50
>> #4454 _dl_update_slotinfo (req_modid=2, new_gen=2) at ../elf/dl-tls.c:822
>> #4455 0x77fdb214 in update_get_addr (ti=0x77f39fa0, 
>> gen=) at ../elf/dl-tls.c:916
>> #4456 0x77fddccc in __tls_get_addr ()
>> at ../sysdeps/x86_64/tls_get_addr.S:55
>> #4457 0x77f36113 in lttng_ust_cancelstate_disable_push ()
>>from /lib64/liblttng-ust-common.so.1
>> #4458 0x77f4c2e8 in ust_lock_nocheck () from /lib64/liblttng-ust.so.1
>> #4459 0x77f5175a in lttng_ust_cleanup () from 
>> /lib64/liblttng-ust.so.1
>> #4460 0x77fca0f2 in _dl_call_fini (
>> closure_map=closure_map@entry=0x77fbe000) at dl-call_fini.c:43
>> #4461 0x77fce06e in _dl_fini () at dl-fini.c:114
>> #4462 0x77d82fe6 in __run_exit_handlers () from /lib64/libc.so.6
>>
>> Cc:ing  for awareness.
>>
>> The issue also requires a recent glibc with changes to DTV management:
>> commit d2123d68275acc0f061e73d5f86ca504e0d5a344 ("elf: Fix slow tls
>> access after dlopen [BZ #19924]").  If I understand things correctly,
>> before this glibc change, we didn't deallocate the old DTV, so there was
>> no call to the free function.
> 
> with 19924 fixed, after a dlopen or dlclose every thread updates
> its dtv on the next dynamic tls access.
> 
> before that, dtv was only updated up to the generation of the
> module being accessed for a particular tls access.
> 
> so hitting the free in the dtv update path is now more likely
> but the free is not new, it was there before.
> 
> also note that this is unlikely to happen on aarch64 since
> tlsdesc only does dynamic tls access after a 512byte static
> tls reservation runs out.
> 
>>
>> On the glibc side, we should recommend that intercepting mallocs and its
>> dependencies use initial-exec TLS because that kind of TLS does not use
>> malloc.  If intercepting mallocs using dynamic TLS work at all, that's
>> totally by accident, and was in the past helped by glibc bug 19924.  (I
> 
> right.
> 
>> don't think there is anything special about libgcc_s.so.1 that triggers
>> the test failure above, it is just an object with dynamic TLS that is
>> implicitly loaded via dlopen at the right stage of the test.)  In this
>> particular case, we can also paper over the test failure in glibc by not
>> call free at all because the argument is a null pointer:
>>
>> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
>> index 7b3dd9ab60..14c71cbd06 100644
>> --- a/elf/dl-tls.c
>> +++ b/elf/dl-tls.c
>> @@ -819,7 +819,8 @@ _dl_update_slotinfo (unsigned long int req_modid, size_t 
>> new_gen)
>>   dtv entry free it.  Note: this is not AS-safe.  */
>>/* XXX Ideally we will at some point create a memory
>>   pool.  */
>> -  free (dtv[modid].pointer.to_free);
>> +  if (dtv[modid].pointer.to_free != NULL)
>> +free (dtv[modid].pointer.to_free);
>>dtv[modid].pointer.val = TLS_DTV_UNALLOCATED;
>>dtv[modid].pointer.to_free = NULL;
> 
> can be done, but !=NULL is more likely since we do modid reuse
> after dlclose.
> 
> there is also

Re: Switching x86-64 to GNU2 TLS descriptors

2025-07-07 Thread Adhemerval Zanella Netto via Gcc



On 07/07/25 05:37, Florian Weimer via Gcc wrote:
> H.J. proposed to switch the default for GCC 16 (turning on
> -mtls-dialect=gnu2 by default).  This is a bit tricky because when we
> tried to make the switch in Fedora (for eventual implementation), we hit
> an ABI compatibility problem:
> 
>   _dl_tlsdesc_dynamic doesn't preserve all caller-saved registers
>   
> 
> This means that changing the defaults can have backwards compatibility
> impact with older distributions.
> 
> (a) Do not nothing special and switch the default.  Maybe try to
> backport the glibc fix to more release branches and distributions.  I
> think we implicitly decided to follow this path when we decided thiswas
> a glibc bug and not a GCC bug.  The downside is that missing the bug fix
> can result in unexpected, difficult-to-diagnose behavior.  However, when
> we rebuilt Fedora, the problem was exceedingly rare (we observed one
> single failure, if I recall correctly).
> 
> (b) Introduce binary markup to indicate that binaries may need the glibc
> fix, and that glibc has the fix.
> 
>   [PATCH] x86-64: Add GLIBC_ABI_GNU2_TLS [BZ #33129]
>   
> 
> 
> This requires changes to all linkers, GCC and glibc.
> 
> (c) Introduce a new relocation type with the same behavior as
> R_X86_64_TLSDESC.  Unpatched glibc will not support it and error out
> during relocation processing.  Requires linker changes, GCC and glibc
> changes.  Does not produce a nice error message, unlike the
> GLIBC_ABI_GNU2_TLS change.  Ideally would need package manager changes
> to produce the right dependencies (with GLIBC_ABI_GNU2_TLS, this could
> happen automatically).
> 
> (d) Make the GCC default conditional on the glibc version used at GCC
> build time.  Add __memcmpeq support to GCC 16.  Maybe add
> errno@@GLIBC_2.43 to glibc 2.43.  Even today, it is likely that binaries
> contain at least one symbol version reference to something that is
> relatively recent, and the __memcmpeq and errno changes would increase
> this effect.  Combined with the backport mentioned under (a), that could
> be enough to force glibc upgrades in pretty much all cases.  We have
> __libc_start_main@@GLIBC_2.34, so if the glibc backports go back to 2.34
> (or even 2.31), only shared objects suffer from this issue.  Among the
> Fedora binaries, the outliers without dependencies on recent glibc are
> mostly Perl modules, and I expect the errno and __memcmpeq would cover
> at least some of these.  This is not as clean as (b) and (c), but only
> needs glibc and GCC changes (for __memcmpeq).  It does not achieve 100%
> bug prevention, but given that bugs seem to be rare, this may be good
> enough.
> 
> (e) Skip over GNU2 TLS altogether and implement inline TLS sequences
> (GNU3 descriptors?) that do not have the dlopen incompatibility of
> initial-exec TLS.  This is currently vaporware.  It requires nontrivial
> glibc changes, GCC changes, linker changes, and x86-64 psABI work to
> define new relocation types and perhaps relaxations.  This is probably
> what we want long-term.  User experience is similar to (c), but with
> more implementation sequences.
> 
> For comparison with an initial-exec TLS read,
> 
>   movqthreadvar@gottpoff(%rip), %rax
>   movl%fs:(%rax), %eax
> 
> this could look like this:
> 
>   movlthreadvar@gottpslot, %eax
> movq  %fs:(%rax), %rax
> movl  threadvar@gottlsslotoff, %ecx
> movl  (%rcx, %rax), %eax
> 
> Or with the descriptor in one word:
> 
>   movqthreadvar@gottpslotoff, %rax
> movq  %rax, %rdx
> movq  %fs:(%eax), %rax
> shrq  $32, %rdx
> movl  %(rax, %rdx), %eax
> 
> Or with a bit shorter instruction, using a 32-bit descriptor (which
> still could cover at least 3 GiB of TLS data per thread):
> 
>   movlthreadvar@gottpslotoff, %rax
> movzbl%al, %edx
> shr   $8, %eax
> movq  %fs:64(%edx), %rdx
> mov   (%rdx, %rax), %eax
> 
> And if we want a negative TLS slot index (which glibc would not use, and
> I think it's incompatible with local-exec TLS anyway):
> 
>   movqthreadvar@gottpslotoff, %rax
> movslq%eax, %rdx
> shrq  $32, %rax
> movq  %fs:(%rdx), %rdx
> movl  %(rdx, %rax), %eax
> 
> There might be other variant sequences.
> 
> Implementing this on the glibc side would require fundamental changes to
> the TLS allocator, which is why this isn't straightforward.
> 
> (f) A less ambitions variant of (e): A new TLS descriptor call back that
> returns the address of the TLS variable, and not the offset from the
> thread pointer.  This is much easier to implement on the glibc side.
> The current GNU2 TLS descriptor callback is optimized for static TLS
> access.  We can avoid a memory access in the static TLS callback if we
> use the RDFSBASE instructi