On Tue, 18 Nov 2025 11:02:32 -0800 (PST)
Jeremy Drake wrote:

> On Tue, 18 Nov 2025, Takashi Yano via Cygwin wrote:
> 
> > Hi Jeremy,
> >
> > On Mon, 17 Nov 2025 11:16:41 -0800 (PST)
> > Jeremy Drake wrote:
> > > On Sat, 8 Nov 2025, Takashi Yano via Cygwin wrote:
> > >
> > > > I looked into the problem, and found that the executable for
> > > > the following code registers two pthread_keys with each destructor;
> > > > one is void emutls_destroy(void *ptr) in libgcc/emutls.c, and the
> > > > other is void run(void *p) in libstdc++-v3/libsupc++/atexit_thread.cc.
> > > > emutls_destroy() free's the memory erea of static thread_local X,
> > > > that is accessed from X::~X() which is called from run(). As a result,
> > > > if the emutls_destroy() is called before run(), run() referres to
> > > > the memory erea already free'ed.
> > > >
> > > > I think this is a bug of gcc. This issue does not occur in Linux,
> > > > because Linux does not use emutls.
> > >
> > >
> > > There is a similar longstanding issue in mingw-w64.  The problem there is
> > > that the pthread_key destructors run before the native Windows TLS
> > > callbacks.  emutls still uses pthread_key to manage static thread_locals,
> > > but C++ destructors are called from the Windows TLS callbacks (by way of
> > > __cxa_thread_atexit if memory serves).
> >
> > Thanks for the information. When I compile my reproducer with mingw
> > compiler, the issue does not seem to happen. How does mingw handle
> > this issue?
> 
> I remember working on this a while back, and adjusting the order that
> destructors are called to try to make it as correct as I could, but this
> last scenario was not fixable in the existing model.  LIU Hao actually
> made a new thread model for Win32/GCC largely to get all the destructors
> to run in the standards-compliant order.  Perhaps he can shed some light
> on what is supposed to happen here from the C/C++ standard side?
> 
> >
> > > Cygwin should have it comparatively easy: it controls all the pieces (it
> > > doesn't need to care about when Windows TLS callbacks happen because if
> > > somebody calls ExitThread they get the undefined behavior they deserve).
> > > Couldn't Cygwin simply provide its own __cxa_thread_atexit and ensure
> > > destructors registered there run before pthread_key destructors?
> >
> > It is not difficult to add a workaround for this issue in cygwin side.
> > However, IIRC, BSD libc does the same with cygwin 3.7.0-dev. I don't
> > think it is good idea to add workaround to cygwin for a bug of apps
> > on cygwin.
> >
> > > Regardless, is it really undefined in what order pthread_key destructors
> > > run?  I would expect they'd run in reverse order of registration (most
> > > recently registered first).  Wouldn't that prevent this issue too
> > > (without mucking about with the Itanium C++ ABI)?
> >
> > https://pubs.opengroup.org/onlinepubs/9799919799/ says:
> > "The order of destructor calls is unspecified if more than one destructor
> >  exists for a thread when it exits."
> >
> > As you expected, the reverse-order'ed destructor-call hides the issue.
> > (That is what 3.6.5 does.)
> 
> This sounds like pthread_key destructors are not fit for purpose for
> running C++ destructors then, unless possibly used to register a single
> "meta-destructor" that runs the destructors in the proper order...  I
> think Cygwin would be better served with a different __cxa_thread_atexit
> implementation since the order of destructor calls is significant to the
> C++ standard.  Then it would be a matter of running those *before* the
> pthread_key destructors.

This is the issue of gcc, so I think implementing __cxa_thread_atexit in
cygwin is not the right thing to do.

I tried to modify gcc to use "a single meta-destructor" in libgcc/emutls.c
and libstdc++/libsupc++/atexit_thread.cc. Please have a look at the attached
patch for gcc. I confirmed that the patch solves the issue and my check code
works as expected.

I think the patch is not smart enough to apply to upstream, so any comments
and suggestions would be appreciated.

-- 
Takashi Yano <[email protected]>
--- origsrc/gcc-13.4.0/libgcc/emutls.c  2025-06-06 01:03:02.000000000 +0900
+++ src/gcc-13.4.0/libgcc/emutls.c      2025-12-02 11:57:20.180683200 +0900
@@ -50,6 +50,14 @@ struct __emutls_array
   void **data[];
 };
 
+struct __emutls_thread_specific
+{
+  void *emutls_array_ptr;
+  void *atexit_thread_specific_ptr;
+};
+
+static void (*atexit_thread_fn)(void *);
+
 /* EMUTLS_ATTR is provided to allow targets to build the emulated tls
    routines as weak definitions, for example.
    If there is no definition, fall back to the default.  */
@@ -62,6 +70,13 @@ void *__emutls_get_address (struct __emu
 EMUTLS_ATTR
 void __emutls_register_common (struct __emutls_object *, word, word, void *);
 
+EMUTLS_ATTR
+void __emutls_atexit_thread_init (void (*)(void *));
+EMUTLS_ATTR
+void *__emutls_atexit_thread_getspecific (void);
+EMUTLS_ATTR
+void __emutls_atexit_thread_setspecific (void *);
+
 #ifdef __GTHREADS
 #ifdef __GTHREAD_MUTEX_INIT
 static __gthread_mutex_t emutls_mutex = __GTHREAD_MUTEX_INIT;
@@ -74,14 +89,24 @@ static pointer emutls_size;
 static void
 emutls_destroy (void *ptr)
 {
-  struct __emutls_array *arr = ptr;
-  pointer size = arr->size;
-  pointer i;
+  struct __emutls_thread_specific *dat = ptr;
+  struct __emutls_array *arr = dat->emutls_array_ptr;
+
+  if (atexit_thread_fn && dat->atexit_thread_specific_ptr)
+    atexit_thread_fn (dat->atexit_thread_specific_ptr);
 
-  for (i = 0; i < size; ++i)
+  if (arr)
     {
-      if (arr->data[i])
-       free (arr->data[i][-1]);
+      pointer size = arr->size;
+      pointer i;
+
+      for (i = 0; i < size; ++i)
+       {
+         if (arr->data[i])
+           free (arr->data[i][-1]);
+       }
+
+      free (arr);
     }
 
   free (ptr);
@@ -132,6 +157,60 @@ emutls_alloc (struct __emutls_object *ob
   return ret;
 }
 
+static void
+emutls_init_once (void)
+{
+  static __gthread_once_t once = __GTHREAD_ONCE_INIT;
+  __gthread_once (&once, emutls_init);
+}
+
+EMUTLS_ATTR void
+__emutls_atexit_thread_init (void (*fn)(void *))
+{
+  emutls_init_once ();
+  atexit_thread_fn = fn;
+}
+
+static void *
+emutls_get_thread_specific (void)
+{
+  void *dat = __gthread_getspecific (emutls_key);
+  if (__builtin_expect (dat == NULL, 0))
+    {
+      dat = calloc (1, sizeof (struct __emutls_thread_specific));
+      __gthread_setspecific (emutls_key, (void *) dat);
+    }
+  return dat;
+}
+
+EMUTLS_ATTR void *
+__emutls_atexit_thread_getspecific (void)
+{
+  struct __emutls_thread_specific *dat = emutls_get_thread_specific ();
+  return dat->atexit_thread_specific_ptr;
+}
+
+EMUTLS_ATTR void
+__emutls_atexit_thread_setspecific (void *ptr)
+{
+  struct __emutls_thread_specific *dat = emutls_get_thread_specific ();
+  dat->atexit_thread_specific_ptr = ptr;
+}
+
+static void *
+emutls_getspecific (void)
+{
+  struct __emutls_thread_specific *dat = emutls_get_thread_specific ();
+  return dat->emutls_array_ptr;
+}
+
+static void
+emutls_setspecific (void *ptr)
+{
+  struct __emutls_thread_specific *dat = emutls_get_thread_specific ();
+  dat->emutls_array_ptr = ptr;
+}
+
 /* Despite applying the attribute to the declaration, in this case the mis-
    match between the builtin's declaration [void * (*)(void *)] and the
    implementation here, causes the decl. attributes to be discarded.  */
@@ -153,8 +232,7 @@ __emutls_get_address (struct __emutls_ob
 
   if (__builtin_expect (offset == 0, 0))
     {
-      static __gthread_once_t once = __GTHREAD_ONCE_INIT;
-      __gthread_once (&once, emutls_init);
+      emutls_init_once ();
       __gthread_mutex_lock (&emutls_mutex);
       offset = obj->loc.offset;
       if (offset == 0)
@@ -165,7 +243,7 @@ __emutls_get_address (struct __emutls_ob
       __gthread_mutex_unlock (&emutls_mutex);
     }
 
-  struct __emutls_array *arr = __gthread_getspecific (emutls_key);
+  struct __emutls_array *arr = emutls_getspecific ();
   if (__builtin_expect (arr == NULL, 0))
     {
       pointer size = offset + 32;
@@ -173,7 +251,7 @@ __emutls_get_address (struct __emutls_ob
       if (arr == NULL)
        abort ();
       arr->size = size;
-      __gthread_setspecific (emutls_key, (void *) arr);
+      emutls_setspecific ((void *) arr);
     }
   else if (__builtin_expect (offset > arr->size, 0))
     {
@@ -187,7 +265,7 @@ __emutls_get_address (struct __emutls_ob
       arr->size = size;
       memset (arr->data + orig_size, 0,
              (size - orig_size) * sizeof (void *));
-      __gthread_setspecific (emutls_key, (void *) arr);
+      emutls_setspecific ((void *) arr);
     }
 
   void *ret = arr->data[offset - 1];
--- origsrc/gcc-13.4.0/libgcc/libgcc-std.ver.in 2025-06-06 01:03:02.000000000 
+0900
+++ src/gcc-13.4.0/libgcc/libgcc-std.ver.in     2025-12-02 11:57:20.182683000 
+0900
@@ -302,6 +302,9 @@ GCC_4.3.0 {
 
   __emutls_get_address
   __emutls_register_common
+  __emutls_atexit_thread_init
+  __emutls_atexit_thread_getspecific
+  __emutls_atexit_thread_setspecific
   __PFX__ffssi2
   __PFX__extendxftf2
   __PFX__trunctfxf2
--- origsrc/gcc-13.4.0/libstdc++-v3/libsupc++/atexit_thread.cc  2025-06-06 
01:03:03.000000000 +0900
+++ src/gcc-13.4.0/libstdc++-v3/libsupc++/atexit_thread.cc      2025-12-02 
11:57:20.184683500 +0900
@@ -71,6 +71,11 @@ __cxxabiv1::__cxa_thread_atexit (void (_
 
 #else /* _GLIBCXX_HAVE___CXA_THREAD_ATEXIT_IMPL */
 
+extern "C" void __emutls_atexit_thread_init (void (*)(void *));
+extern "C" void *__emutls_atexit_thread_getspecific (void);
+extern "C" void __emutls_atexit_thread_setspecific (void *);
+#define __emutls_atexit_thread_delete() __emutls_atexit_thread_init (NULL);
+
 namespace {
   // One element in a singly-linked stack of cleanups.
   struct elt
@@ -83,8 +88,6 @@ namespace {
 #endif
   };
 
-  // Keep a per-thread list of cleanups in gthread_key storage.
-  __gthread_key_t key;
   // But also support non-threaded mode.
   elt *single_thread;
 
@@ -112,8 +115,8 @@ namespace {
     void *e;
     if (__gthread_active_p ())
       {
-       e = __gthread_getspecific (key);
-       __gthread_setspecific (key, NULL);
+       e = __emutls_atexit_thread_getspecific ();
+       __emutls_atexit_thread_setspecific (NULL);
       }
     else
       {
@@ -127,8 +130,8 @@ namespace {
   // key init/delete rather than atexit so that delete is run on dlclose.
   void key_init() {
     struct key_s {
-      key_s() { __gthread_key_create (&key, run); }
-      ~key_s() { __gthread_key_delete (key); }
+      key_s() { __emutls_atexit_thread_init (run); }
+      ~key_s() { __emutls_atexit_thread_delete (); }
     };
     static key_s ks;
     // Also make sure the destructors are run by std::exit.
@@ -163,7 +166,7 @@ __cxxabiv1::__cxa_thread_atexit (void (_
 
   elt *first;
   if (__gthread_active_p ())
-    first = static_cast<elt*>(__gthread_getspecific (key));
+    first = static_cast<elt*>(__emutls_atexit_thread_getspecific ());
   else
     first = single_thread;
 
@@ -183,7 +186,7 @@ __cxxabiv1::__cxa_thread_atexit (void (_
 #endif
 
   if (__gthread_active_p ())
-    __gthread_setspecific (key, new_elt);
+    __emutls_atexit_thread_setspecific (new_elt);
   else
     single_thread = new_elt;
 
-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to