[ Added the Score-P devs, as they also implement a tool based on the OMPT too. Added the libunwind devs, because they know about the general problem of TLS variables and signals too. ]
Dear John, On 05/12/2018 02:19 AM, John Mellor-Crummey wrote:
While using a sampling-based profiler (HPCToolkit) to measure the performance of an application using a dynamically-linked version of the LLVM OpenMP runtime, I encountered a deadlock on x86_64. Although I haven’t considered other architectures in detail, I believe that they may be similarly affected. Here’s what I believe I have observed: there is a subtle race condition between TLS setup for an OpenMP runtime and and a profiler that inspects it through the OMPT interface. A thread executing code in __kmp_launch_worker in the context of the LLVM OpenMP runtime library acquired the lock controlling access to TLS state (__tls_get_addr callstls_get_addr_tail callspthread_mutex_lock)to set up TLS needed for its access to its thread local variable __kmp_gtid in frame 24 of the callstack shown below. Immediately after acquiring the TLS lock by setting its __lock field with a CMPXCHG but before recording the lock owner or finishing TLS setup, the thread was interrupted by our profiler. As a normal part of its operation to record a sample, our profiler uses the OMPT tools API to check if the thread is an OpenMP thread by inspecting the thread id being maintained by the OpenMP runtime. A call to a runtime entry point through the OMPT API led to an access to __kmp_gtid in frame 5 of the call stack. However, TLS has still not been set up for the OpenMP runtime shared library for this thread and causing the access to __kmp_gtid to go through the same protocol as before (__tls_get_addr callstls_get_addr_tail callspthread_mutex_lock). However, the lock has already been acquired in frame 21 so it is unavailable for acquistion in frames 0-2, causing deadlock. The TLS lock is implemented as a recursive lock, but the profiler interrupted the lock acquisition in libpthread before the owner field of the recursive lock was set, so the inner call to pthread_mutex_lock can't succeed.*This is a serious problem if a profiler using the OMPT interface can cause a deadlock. * We need a design of the OMPT interface and OpenMP runtime implementations that make this impossible. After thinking about this for a while, I think that a profiler can arrange to receive the ompt_callback_thread_begin and the profiler then set a thread local flag in its own TLS variables to note that a thread is an OpenMP thread. A profiler must not invoke any ompt runtime entry point on a thread that has not announced itself as an OpenMP thread by previously calling ompt_callback_thread_begin. An OpenMP runtime should ensure that its TLS is allocated before invoking the callback ompt_callback_thread_begin. Similarly, a profiler shouldn’t invoke an OMPT callback on a thread after receiving ompt_callback_thread_end. If a profiler thread doesn’t use the OMPT interface to inspect a thread that hasn’t announced itself as an OpenMP thread, it won’t access any TLS state that the OpenMP library may maintain. Does anyone care to comment or offer a vision of a different solution?
there actually is a very simple solution for this: declare the TLS variable with the model "initial-exec" [1]. This avoids the repeated calls to __tls_get_addr, which is expensive anyway and as this uses malloc, it is not async-signal safe either. Though it is also wise to touch any TLS variable before any signal can be triggered. Maybe OMPT can signal this, so that the OMPT user can setup interrupt sources after that was done. Best, Bert [1] https://www.akkadia.org/drepper/tls.pdf
Below my signature block are some details of the thread state that I observed, in case you want to validate my assessment of the situation.
-- Dipl.-Inf. Bert Wesarg wiss. Mitarbeiter Technische Universität Dresden Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) 01062 Dresden 📞 +49 (351) 463-42451 📠 +49 (351) 463-37773 📧 [email protected]
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Libunwind-devel mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/libunwind-devel
