https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744
--- Comment #20 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Gleb Natapov from comment #19) > (In reply to Jakub Jelinek from comment #18) > > (In reply to Gleb Natapov from comment #16) > > > Can you suggest an alternative to libgcc patch? Use other TLS model? > > > Allocate per thread storage dynamically somehow? > > > > If we want to use TLS (which I hope we don't), then e.g. a single __thread > > pointer with some registered destructor that would free it on process exit > > could do the job, and on the first exception it would try to allocate memory > > for the cache and other stuff and use that (otherwise, if memory allocation > > fails, just take a lock and be non-scalable). > > > I see that sjlj uses __gthread_setspecific/__gthread_getspecific. Can we do > the same here? Can? Yes. Want? Nope. It is worse than TLS. > > Another alternative, perhaps much better, if Torvald is going to improve > > rwlocks sufficiently, would be to use rwlock to guard writes to the cache > > etc. too, and perhaps somewhat enlarge the cache (either statically, or > > allow extending it through allocation). > > I'd expect that usually these apps that use exceptions too much only care > > about a couple of shared libraries, so writes to the cache ought to be rare. > > > As I said in my previous reply, I tested the new rwlock and in congested case > it still slows does the system significantly, not the implementation fault, > cpu just does not like locked instruction much. Not having a lock will be > significantly better. You still need at least one lock, the array of locks is definitely a bad idea. Perhaps if you are worried about using 2 different rwlocks, it would be possible to just use the glibc internal one, by adding dl_iterate_phdr alternate entrypoint - dl_iterate_phdr would then be documented to only allow a single thread in the callback, which it satisfies now and in newer libc could wrlock _dl_load_lock, and then dl_iterate_phdr alternate entrypoint would be documented to allow multiple threads in the callback (i.e. it could rdlock _dl_load_lock). On the libgcc side then it would call dl_iterate_phdr_rd (or whatever name it would have) first, and perform only read-only lookup in the cache, and if it wouldn't find anything, it would call dl_iterate_phdr afterwards and tweak the cache.