[Bug libgcc/71744] Concurrently throwing exceptions is not scalable

jakub at gcc dot gnu.org Wed, 14 Sep 2016 04:28:55 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744


--- Comment #20 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Gleb Natapov from comment #19)
> (In reply to Jakub Jelinek from comment #18)
> > (In reply to Gleb Natapov from comment #16)
> > > Can you suggest an alternative to libgcc patch? Use other TLS model?
> > > Allocate per thread storage dynamically somehow?
> > 
> > If we want to use TLS (which I hope we don't), then e.g. a single __thread
> > pointer with some registered destructor that would free it on process exit
> > could do the job, and on the first exception it would try to allocate memory
> > for the cache and other stuff and use that (otherwise, if memory allocation
> > fails, just take a lock and be non-scalable).
> >
> I see that sjlj uses __gthread_setspecific/__gthread_getspecific. Can we do
> the same here?

Can? Yes.  Want?  Nope.  It is worse than TLS.

> > Another alternative, perhaps much better, if Torvald is going to improve
> > rwlocks sufficiently, would be to use rwlock to guard writes to the cache
> > etc. too, and perhaps somewhat enlarge the cache (either statically, or
> > allow extending it through allocation).
> > I'd expect that usually these apps that use exceptions too much only care
> > about a couple of shared libraries, so writes to the cache ought to be rare.
> >
> As I said in my previous reply, I tested the new rwlock and in congested case
> it still slows does the system significantly, not the implementation fault,
> cpu just does not like locked instruction much. Not having a lock will be
> significantly better.

You still need at least one lock, the array of locks is definitely a bad idea.
Perhaps if you are worried about using 2 different rwlocks, it would be
possible to just use the glibc internal one, by adding dl_iterate_phdr
alternate entrypoint - dl_iterate_phdr would then be documented to only allow a
single thread in the callback, which it satisfies now and in newer libc could
wrlock _dl_load_lock, and then dl_iterate_phdr alternate entrypoint would be
documented to allow multiple threads in the callback (i.e. it could rdlock
_dl_load_lock).  On the libgcc side then it would call dl_iterate_phdr_rd (or
whatever name it would have) first, and perform only read-only lookup in the
cache, and if it wouldn't find anything, it would call dl_iterate_phdr
afterwards and tweak the cache.

[Bug libgcc/71744] Concurrently throwing exceptions is not scalable

Reply via email to