On Tue, Jul 20, 2021 at 4:54 PM Florian Weimer via Gcc <gcc@gcc.gnu.org> wrote: > > Currently, the GNU/Linux ABI does not really specify whether the thread > pointer (the address of the TCB) may change at a function boundary. > > Traditionally, GCC assumes that the ABI allows caching addresses of > thread-local variables across function calls. Such caching varies in > aggressiveness between targets, probably due to differences in the > choice of -mtls-dialect=gnu and -mtls-dialect=gnu2 as the default for > the targets. (Caching with -mtls-dialect=gnu2 appears to be more > aggressive.) > > In addition to that, glibc defines errno as this: > > extern int *__errno_location (void) __attribute__ ((__const__)); > #define errno (*__errno_location ()) > > And the const attribute has the side effect of caching the address of > errno within the same stack frame. > > With stackful coroutines, such address caching is only valid if > coroutines are only ever resumed on the same thread on which they were > suspended. (The C++ coroutine implementation is not stackful and is not > affected by this at the ABI level.) Historically, I think we took the > position that cross-thread resumption is undefined. But the ABIs aren't > crystal-clear on this matter. > > One important piece of software for GNU is QEMU (not just for GNU/Linux, > Hurd development also benefits from virtualization). QEMU uses stackful > coroutines extensively. There are some hard-to-change code areas where > resumption happens across threads unfortunately. These increasingly > cause problems with more inlining, inter-procedural analysis, and a > general push towards LTO (which is also needed for some security > hardening features). > > Should the GNU toolchain offer something to help out the QEMU > developers? Maybe GCC could offer an option to disable the caching for > all TLS models. glibc could detect that mode based on a new > preprocessor macro and adjust its __errno_location declaration and > similar function declarations. There will be a performance impact of > this, of course, but it would make the QEMU usage well-defined (at the > lowest levels).
But how does TLS usage transfer between threads? On the gimple level the TLS pointer is not visible and thus we'd happily CSE its address: __thread int x[2]; void bar (int *); int *foo(int i) { int *p = &x[i]; bar (p); return &x[i]; } results in int * foo (int i) { int * p; sizetype _5; sizetype _6; <bb 2> [local count: 1073741824]: _5 = (sizetype) i_1(D); _6 = _5 * 4; p_2 = &x + _6; bar (p_2); return p_2; } to make this work as expected one would need to expose the TLS pointer access. > If this is a programming model that should be supported, then restoring > some of the optimizations would be possible, by annotating > context-switching functions and TLS-address-dependent functions. But I > think QEMU would immediately benefit from just the simple approach that > disables address caching of TLS variables. > > Thanks, > Florian >