Disabling TLS address caching to help QEMU on GNU/Linux
Currently, the GNU/Linux ABI does not really specify whether the thread pointer (the address of the TCB) may change at a function boundary. Traditionally, GCC assumes that the ABI allows caching addresses of thread-local variables across function calls. Such caching varies in aggressiveness between targets, probably due to differences in the choice of -mtls-dialect=gnu and -mtls-dialect=gnu2 as the default for the targets. (Caching with -mtls-dialect=gnu2 appears to be more aggressive.) In addition to that, glibc defines errno as this: extern int *__errno_location (void) __attribute__ ((__const__)); #define errno (*__errno_location ()) And the const attribute has the side effect of caching the address of errno within the same stack frame. With stackful coroutines, such address caching is only valid if coroutines are only ever resumed on the same thread on which they were suspended. (The C++ coroutine implementation is not stackful and is not affected by this at the ABI level.) Historically, I think we took the position that cross-thread resumption is undefined. But the ABIs aren't crystal-clear on this matter. One important piece of software for GNU is QEMU (not just for GNU/Linux, Hurd development also benefits from virtualization). QEMU uses stackful coroutines extensively. There are some hard-to-change code areas where resumption happens across threads unfortunately. These increasingly cause problems with more inlining, inter-procedural analysis, and a general push towards LTO (which is also needed for some security hardening features). Should the GNU toolchain offer something to help out the QEMU developers? Maybe GCC could offer an option to disable the caching for all TLS models. glibc could detect that mode based on a new preprocessor macro and adjust its __errno_location declaration and similar function declarations. There will be a performance impact of this, of course, but it would make the QEMU usage well-defined (at the lowest levels). If this is a programming model that should be supported, then restoring some of the optimizations would be possible, by annotating context-switching functions and TLS-address-dependent functions. But I think QEMU would immediately benefit from just the simple approach that disables address caching of TLS variables. Thanks, Florian
Re: Disabling TLS address caching to help QEMU on GNU/Linux
Hi Florian, This also affects fibres implementations (both C++ and D ones at least from discussion with both communities). > On 20 Jul 2021, at 15:52, Florian Weimer via Gcc wrote: > > Currently, the GNU/Linux ABI does not really specify whether the thread > pointer (the address of the TCB) may change at a function boundary. > > Traditionally, GCC assumes that the ABI allows caching addresses of > thread-local variables across function calls. Such caching varies in > aggressiveness between targets, probably due to differences in the > choice of -mtls-dialect=gnu and -mtls-dialect=gnu2 as the default for > the targets. (Caching with -mtls-dialect=gnu2 appears to be more > aggressive.) > > In addition to that, glibc defines errno as this: > > extern int *__errno_location (void) __attribute__ ((__const__)); > #define errno (*__errno_location ()) > > And the const attribute has the side effect of caching the address of > errno within the same stack frame. > > With stackful coroutines, such address caching is only valid if > coroutines are only ever resumed on the same thread on which they were > suspended. (The C++ coroutine implementation is not stackful and is not > affected by this at the ABI level.) There are C++20 coroutine library writers who want to switch threads in symmetric transfers [ I am not entirely convinced about this at present and it certainly would be suspect with TLS address caching enabled - since a TLS pointer could equally be cached in the coroutine frame ]. The C++20 coroutine ABI is silent on such matters (it only describes the visible part of the coroutine frame and the builtins used by the std library). > Historically, I think we took the > position that cross-thread resumption is undefined. But the ABIs aren't > crystal-clear on this matter. > One important piece of software for GNU is QEMU (not just for GNU/Linux, > Hurd development also benefits from virtualization). QEMU uses stackful > coroutines extensively. There are some hard-to-change code areas where > resumption happens across threads unfortunately. These increasingly > cause problems with more inlining, inter-procedural analysis, and a > general push towards LTO (which is also needed for some security > hardening features). > > Should the GNU toolchain offer something to help out the QEMU > developers? Maybe GCC could offer an option to disable the caching for > all TLS models. glibc could detect that mode based on a new > preprocessor macro and adjust its __errno_location declaration and > similar function declarations. There will be a performance impact of > this, of course, but it would make the QEMU usage well-defined (at the > lowest levels). > > If this is a programming model that should be supported, then restoring > some of the optimizations would be possible, by annotating > context-switching functions and TLS-address-dependent functions. But I > think QEMU would immediately benefit from just the simple approach that > disables address caching of TLS variables. IMO the general cases you note above are enough reason to want some mechanism to control this, thanks Iain > > Thanks, > Florian >
Re: Question about PIC code and GOT
On Fri, 11 Jun 2021, vincent Dupaquis wrote: > I've got the feeling the GOT is not convinient and goes in opposite > direction than the one we try to achieve with having PIC, at least this is the > reason why I'm trying to avoid it. > > Any clue on the reason why it has been implemented that way ? Without going into processor-specific details, which may imply additional requirements, you need to have a place to store final load-time addresses of preemptible symbols, and GOT serves exactly that place. The only case you could possibly be able to avoid, again barring any processor-specific requirements, the creation of a GOT with PIC code is a statically-linked position-independent executable (PIE). Late answer, but HTH. Maciej
Re: Making *-netbsd-* to mean ELF not a.out for all CPUs
On Fri, 11 Jun 2021, John Ericson wrote: > I would like to propose that GNU tools consistently interpret configs > with "netbsd" as meaning ELF as opposed to a.out. Currently, newer CPUs > do that, but older ones have "netbsd" mean a.out for historical reasons, > and "netbsdelf" is used instead. This inconsistency is a bit of a > nuisance to my distro / package set[2] which aims to support cross > compilation to/from arbitrary platforms without special cases. Other > platforms that formerly used a.out (like Linux) have long since changed > the default to be ELF, so I don't know why NetBSD shouldn't too. Have you verified that `config.guess' with DTRT to complement your change when indeed natively run on an old a.out installation of NetBSD? Maciej