Hi, Anindya, did a good analysis of the problem with mpv using gpu video output backend (it is using EGL and mesa if I correctly followed).
For people not reading ports@ here a resume: the destructor function used in pthread_key_create() needs to be present in memory until _rthread_tls_destructors() is called. in the case of mesa, eglInitialize() function could load, via dlopen(), code which will use pthread_key_create() with destructor. once dlclose() is called, the object is unloaded from memory, but a reference to destructor is kept, leading to segfault when _rthread_tls_destructors() run and use the destructor (because pointing to unloaded code). We already take care of such situation with __cxa_thread_atexit_impl (in libc/stdlib/thread_atexit.c), by keeping an additionnal reference on object loaded (it makes ld.so aware that it is still used and so dlclose() doesn't unload it). I used the same idiom for pthread_key_create() and used dlctl(3) in the same way with the destructor address. With the following diff, I am not able to reproduce the segfault anymore with mpv. diff 393e7b397988bb6abe46729de1794883d2b9d5cf /home/semarie/repos/openbsd/src blob - 58b39bb7df70f54c708d2e2b11a3978806a86005 file + lib/libc/thread/rthread_tls.c --- lib/libc/thread/rthread_tls.c +++ lib/libc/thread/rthread_tls.c @@ -22,10 +22,8 @@ #include <errno.h> #include <pthread.h> #include <stdlib.h> +#include <dlfcn.h> -#include <pthread.h> -#include <stdlib.h> - #include "rthread.h" @@ -58,6 +56,9 @@ pthread_key_create(pthread_key_t *key, void (*destruct rkeys[hint].used = 1; rkeys[hint].destructor = destructor; + if (destructor != NULL) + dlctl(NULL, DL_REFERENCE, destructor); + *key = hint++; if (hint >= PTHREAD_KEYS_MAX) hint = 0; I am still unsure if we want to solve this problem this way, and if this diff would introduce more problems than it tries to solve. At first place, it might be better to not register a destructor when using dlopen()... Comments would be appreciated. Thanks. -- Sebastien Marie On Wed, May 05, 2021 at 01:20:02AM -0700, Anindya Mukherjee wrote: > Hi, > > I have been investigating the crash on exit problem with mpv in ports > with vo=gpu. I think I made a little bit of progress and thought I'd > share my findings. > > The crash (SIGSEGV) happens when thread local destructors > are called from /usr/src/lib/libc/thread/rthread_tls.c:182 in > _rthread_tls_destructors after the gpu thread exits: vo_thread in > video/out/vo.c:1067. The crashing call stack looks like this: > > #0 0x00000176ffdc9680 in ?? () > #1 0x0000017748d347b5 in _rthread_tls_destructors (thread=0x17798917840) at > /usr/src/lib/libc/thread/rthread_tls.c:182 > #2 0x0000017748d98623 in _libc_pthread_exit (retval=<error reading variable: > Unhandled dwarf expression opcode 0xa3>) at > /usr/src/lib/libc/thread/rthread.c:150 > #3 0x0000017795b22189 in _rthread_start (v=<error reading variable: > Unhandled dwarf expression opcode 0xa3>) at > /usr/src/lib/librthread/rthread.c:97 > #4 0x0000017748d0c5ba in __tfork_thread () at > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 > > Note that some of the traces were taken from different runs so there > might be some mismatch between the handles/addresses. > > It crashes because the destructor is dangling. This mystified me because > if I look at the mpv source, there is no thread local data for the gpu > thread. Indeed, right after the gpu thread starts running if we look > inside the thread structure, local_storage is null. However, if we look > at the same thread at the point of the crash, its local_storage is > populated: > > (gdb) p *(*thread).local_storage > $3 = { > keyid = 7, > next = 0x177353442e0, > data = 0x1770c276000 > } > > The keys are indexed by the keyid in the rkeys array, from where the > destructor is fetched in _rthread_tls_destructors: > > (gdb) p rkeys[7] > $6 = { > used = 1, > destructor = 0x43dd33e2680 > } > > This destructor now points to invalid memory. It turns out the thread > local storage is being initialised here: > > #0 _libc_pthread_key_create (key=0x43dd380da08, destructor=0x43dd33e2680) at > /usr/src/lib/libc/thread/rthread_tls.c:42 > #1 0x0000043dd33e2667 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #2 0x0000043e793f82f7 in pthread_once (once_control=0x43dd380d9f8, > init_routine=0x43e793db3c0 <_libc_pthread_key_create>) at > /usr/src/lib/libc/thread/rthread_once.c:26 > #3 0x0000043dd33e24bd in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #4 0x0000043dd305475f in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #5 0x0000043dd3036c70 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #6 0x0000043dd30e3ca3 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #7 0x0000043dd30e4b96 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #8 0x0000043dd302d89a in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #9 0x0000043dd3031162 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #10 0x0000043dd3031ec6 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #11 0x0000043dd30f7a8b in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #12 0x0000043dd311a94e in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #13 0x0000043dd311addf in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #14 0x0000043dd33ae4a6 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #15 0x0000043dd33ad6a2 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #16 0x0000043dd276e1d3 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #17 0x0000043b8f0816c6 in gl_clear (ra=0x43e50a61a50, dst=0x43e399aa950, > color=0x43e0e382370, scissor=0x43e0e382390) at > ../mpv-0.33.1/video/out/opengl/ra_gl.c:684 > #18 0x0000043b8f061db8 in gl_video_render_frame (p=0x43db938c050, > frame=0x43e399bb350, fbo=..., flags=3) at > ../mpv-0.33.1/video/out/gpu/video.c:3251 > #19 0x0000043b8f089a8f in draw_frame (vo=0x43e32b9f450, frame=0x43e399bb350) > at ../mpv-0.33.1/video/out/vo_gpu.c:87 > #20 0x0000043b8f0882a4 in render_frame (vo=0x43e32b9f450) at > ../mpv-0.33.1/video/out/vo.c:957 > #21 0x0000043b8f087735 in vo_thread (ptr=0x43e32b9f450) at > ../mpv-0.33.1/video/out/vo.c:1095 > #22 0x0000043da1682181 in _rthread_start (v=<error reading variable: > Unhandled dwarf expression opcode 0xa3>) at > /usr/src/lib/librthread/rthread.c:96 > #23 0x0000043e793b35ba in __tfork_thread () at > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 > > So the mpv code is not directly aware of the TLS, and it is being > allocated in iris_dri.so. This trace was taken at the start of video > playback, and at this point iris_dri.so is loaded (using dlopen) and the > destructor is valid. > > #0 dlopen (libname=0xbbddfb93320 "/usr/X11R6/lib/modules/dri/iris_dri.so", > flags=258) at /usr/src/libexec/ld.so/dlfcn.c:51 > #1 0x00000bbd76e126e6 in loader_open_driver (driver_name=0xbbd52b277e0 > "iris", out_driver_handle=0xbbd417eda28, search_path_vars=<optimized out>) at > /usr/xenocara/lib/mesa/mk/libloader/../../src/loader/loader.c:579 > #2 0x00000bbd76e0a7a8 in dri2_open_driver (disp=<optimized out>) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:771 > #3 dri2_load_driver_common (disp=<optimized out>, > driver_extensions=<optimized out>) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:783 > #4 dri2_load_driver_dri3 (disp=<error reading variable: Unhandled dwarf > expression opcode 0xa3>) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:808 > #5 0x00000bbd76e0331c in dri2_initialize_x11_dri3 (drv=<optimized out>, > disp=0xbbd4180c000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/platform_x11.c:1393 > #6 dri2_initialize_x11 (drv=<error reading variable: Unhandled dwarf > expression opcode 0xa3>, disp=0xbbd4180c000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/platform_x11.c:1554 > #7 0x00000bbd76e0c352 in dri2_initialize (drv=0xbbd417fd200, > disp=0xbbd4180c000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1143 > #8 0x00000bbd76e0649e in _eglMatchAndInitialize (disp=0xbbd4180c000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/egldriver.c:75 > #9 _eglMatchDriver (disp=0xbbd4180c000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/egldriver.c:98 > #10 0x00000bbd76dfa1c1 in eglInitialize (dpy=<optimized out>, major=0x0, > minor=0x0) at /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/eglapi.c:617 > #11 0x00000bbb39afbca0 in mpegl_init (ctx=0xbbd41809350) at > ../mpv-0.33.1/video/out/opengl/context_x11egl.c:109 > #12 0x00000bbb39ad0c96 in ra_ctx_create (vo=0xbbd5b2df050, context_type=0x0, > context_name=0x0, opts=...) at ../mpv-0.33.1/video/out/gpu/context.c:185 > #13 0x00000bbb39b083a7 in preinit (vo=0xbbd5b2df050) at > ../mpv-0.33.1/video/out/vo_gpu.c:298 > #14 0x00000bbb39b06679 in vo_thread (ptr=0xbbd5b2df050) at > ../mpv-0.33.1/video/out/vo.c:1080 > #15 0x00000bbe30c09181 in _rthread_start (v=<error reading variable: > Unhandled dwarf expression opcode 0xa3>) at > /usr/src/lib/librthread/rthread.c:96 > #16 0x00000bbdb38aa5ba in __tfork_thread () at > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 > > However, when mpv is shutting down, iris_dri.so is > unloaded here: > > #0 dlclose (handle=0xb79c1fa5800) at /usr/src/libexec/ld.so/dlfcn.c:274 > #1 0x00000b7a51b541e0 in dri2_display_destroy (disp=0xb7969481000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1204 > #2 0x00000b7a51b55407 in dri2_display_release (disp=0xb7969481000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1188 > #3 dri2_terminate (drv=<error reading variable: Unhandled dwarf expression > opcode 0xa3>, disp=0xb7969481000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1285 > #4 0x00000b7a51b43db7 in eglTerminate (dpy=<optimized out>) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/eglapi.c:675 > #5 0x00000b775b50a174 in mpegl_uninit (ctx=0xb796948f550) at > ../mpv-0.33.1/video/out/opengl/context_x11egl.c:51 > #6 0x00000b775b4dedc8 in ra_ctx_destroy (ctx_ptr=0xb796faf1158) at > ../mpv-0.33.1/video/out/gpu/context.c:211 > #7 0x00000b775b516d8d in uninit (vo=0xb796fabb650) at > ../mpv-0.33.1/video/out/vo_gpu.c:286 > #8 0x00000b775b514994 in vo_thread (ptr=0xb796fabb650) at > ../mpv-0.33.1/video/out/vo.c:1136 > #9 0x00000b796775c181 in _rthread_start (v=<error reading variable: > Unhandled dwarf expression opcode 0xa3>) at > /usr/src/lib/librthread/rthread.c:96 > #10 0x00000b7a3e8ef5ba in __tfork_thread () at > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 > > So iris_dri.so is unloaded before the _rthread_tls_destructors function > gets to the destructor. This is the cause of the crash. I did a quick > and dirty test by doing this: > LD_PRELOAD=/usr/X11R6/lib/modules/dri/iris_dri.so ./mpv -v file.mp4 > and indeed now mpv does not crash on exit (vo=gpu is being used by > default), because the destructor is being resolved from LD_PRELOAD. > > I intend to look at this on Linux to see why it does not crash there, > but haven't gotten to it yet. In the meanwhile, I wonder if we can > patch the OpenBSD port in some way to prevent the dangling TLS > destructor. If anyone has a clean solution based on the above > information please feel free to chime in. I'd love to get this fixed. > > Regards, > Anindya >