Re: performance of exception handling

Florian Weimer via Gcc Mon, 11 May 2020 03:42:10 -0700

* Thomas Neumann via Gcc:

> Currently, exception handling scales poorly due to global mutexes when
> throwing. This can be seen with a small demo script here:
> https://repl.it/repls/DeliriousPrivateProfiler
> Using a thread count >1 is much slower than running single threaded.
> This global locking is particular painful on a machine with more than a
> hundred cores, as there mutexes are expensive and contention becomes
> much more likely due to the high degree of parallelism.
>
> Of course conventional wisdom is not to use exceptions when exceptions
> can occur somewhat frequently. But I think that is a silly argument, see
> the WG21 paper P0709 for a detailed discussion.


Link: <https://wg21.link/P0709>

I'm not sure if your summary is correct.

The claim in the paper that program bugs should not result in catchable
exceptions is also not what matches my limited experience with
application servers: They tend to install an exception handler of last
resort to catch unexpected exceptions (“bugs”) from processed requests
and log them, instead of letting them terminate the entire application
server.

> In particular since there is no technical reason why they have to be
> slow, it is just the current implementation that is slow.

I agree, the present state is not inherently due to the exception
handling model, it's a consequence of the current implementation.

> In the current gcc implementation on Linux the bottleneck is
> _Unwind_Find_FDE, or more precisely, the function dl_iterate_phdr,
> that is called for every frame and that iterates over all shared
> libraries while holding a global lock.
> That is inherently slow, both due to global locking and due to the data
> structures involved.

In particular, the libgcc unwinder relies on the global lock to protect
its own cache, so we cannot remove the lock from glibc.

> And it is not easy to speed that up with, e.g., a thread local cache,
> as glibc has no mechanism to notify us if a shared library is added or
> removed.

It is of course possible to change glibc.

My current preferred solution is something that moves the entire code
that locates the relevant FDE table into glibc.  This is all the code in
_Unwind_IteratePhdrCallback until the first read_encoded_value_with_base
call.  And the callback mechanism would be gone, so _Unwind_Find_FDE
would call __dl_ehframe_find (see below) and then the reamining
processing in _Unwind_IteratePhdrCallback.

The glibc interface would look like this:

/* Data returned by dl_find_ehframe.  */
struct dl_ehframe_info
{
  /* The link map of the object which contains the address.  */
  const struct link_map *dlehf_map;

  /* A pointer to its dynamic section.  This is a null pointer in
     statically linked applications.  */
  const ElfW(Dyn) *dlehf_dynamic;

  /* A pointer to the start of the PT_GNU_EH_FRAME segment for the
     object.  This is a null pointer if the object does not contain
     such a segment.  */
  const void *dlehf_ehframe;

  /* The size of the segment, or zero if not present.  */
  size_t dlehf_ehframe_size;

  /* Text and data base for the DWARF information in the segment.  */
  ElfW(Addr) dlehf_text_base;
  ElfW(Addr) dlehf_data_base;
};

/* Find the PT_GNU_EH_FRAME segment of the object which contains
   ADDRESS and writes information to it to *RESULT.  Return -1 if
   nothing was found, or 0 on success.  (*RESULT can be written to on
   failure, too.)  */
int __dl_ehframe_find (ElfW(Addr) __address,
                       struct dl_ehframe_info *__result)
  __THROW __nonnull ((2)) __attribute_warn_unused_result__;

It is the responsiblity of the glibc implementation of __dl_ehframe_find
to provide proper synchronization with the dynamic loader.  We can start
out with a lock-based implementation, as we have it today, and optimize
it later.

Based on prior discussions, this works because unwinding with a corrupt
stack or a stack containing unmapped objects is already undefined today,
so the live stack keeps all pointers returned from __dl_ehframe_find
valid.

The cache as it exists today would be removed from libgcc, but we
probably want to add a small cache that avoids the need to call into
glibc while unwinding through the same object (in which case we probably
should add boundary information to struct dl_ehframe_info).

The advantage of doing it this way is that it does not require
recompiling and relinking objects.

Thanks,
Florian

Re: performance of exception handling

Reply via email to