* Thomas Neumann via Gcc: > Currently, exception handling scales poorly due to global mutexes when > throwing. This can be seen with a small demo script here: > https://repl.it/repls/DeliriousPrivateProfiler > Using a thread count >1 is much slower than running single threaded. > This global locking is particular painful on a machine with more than a > hundred cores, as there mutexes are expensive and contention becomes > much more likely due to the high degree of parallelism. > > Of course conventional wisdom is not to use exceptions when exceptions > can occur somewhat frequently. But I think that is a silly argument, see > the WG21 paper P0709 for a detailed discussion.
Link: <https://wg21.link/P0709> I'm not sure if your summary is correct. The claim in the paper that program bugs should not result in catchable exceptions is also not what matches my limited experience with application servers: They tend to install an exception handler of last resort to catch unexpected exceptions (“bugs”) from processed requests and log them, instead of letting them terminate the entire application server. > In particular since there is no technical reason why they have to be > slow, it is just the current implementation that is slow. I agree, the present state is not inherently due to the exception handling model, it's a consequence of the current implementation. > In the current gcc implementation on Linux the bottleneck is > _Unwind_Find_FDE, or more precisely, the function dl_iterate_phdr, > that is called for every frame and that iterates over all shared > libraries while holding a global lock. > That is inherently slow, both due to global locking and due to the data > structures involved. In particular, the libgcc unwinder relies on the global lock to protect its own cache, so we cannot remove the lock from glibc. > And it is not easy to speed that up with, e.g., a thread local cache, > as glibc has no mechanism to notify us if a shared library is added or > removed. It is of course possible to change glibc. My current preferred solution is something that moves the entire code that locates the relevant FDE table into glibc. This is all the code in _Unwind_IteratePhdrCallback until the first read_encoded_value_with_base call. And the callback mechanism would be gone, so _Unwind_Find_FDE would call __dl_ehframe_find (see below) and then the reamining processing in _Unwind_IteratePhdrCallback. The glibc interface would look like this: /* Data returned by dl_find_ehframe. */ struct dl_ehframe_info { /* The link map of the object which contains the address. */ const struct link_map *dlehf_map; /* A pointer to its dynamic section. This is a null pointer in statically linked applications. */ const ElfW(Dyn) *dlehf_dynamic; /* A pointer to the start of the PT_GNU_EH_FRAME segment for the object. This is a null pointer if the object does not contain such a segment. */ const void *dlehf_ehframe; /* The size of the segment, or zero if not present. */ size_t dlehf_ehframe_size; /* Text and data base for the DWARF information in the segment. */ ElfW(Addr) dlehf_text_base; ElfW(Addr) dlehf_data_base; }; /* Find the PT_GNU_EH_FRAME segment of the object which contains ADDRESS and writes information to it to *RESULT. Return -1 if nothing was found, or 0 on success. (*RESULT can be written to on failure, too.) */ int __dl_ehframe_find (ElfW(Addr) __address, struct dl_ehframe_info *__result) __THROW __nonnull ((2)) __attribute_warn_unused_result__; It is the responsiblity of the glibc implementation of __dl_ehframe_find to provide proper synchronization with the dynamic loader. We can start out with a lock-based implementation, as we have it today, and optimize it later. Based on prior discussions, this works because unwinding with a corrupt stack or a stack containing unmapped objects is already undefined today, so the live stack keeps all pointers returned from __dl_ehframe_find valid. The cache as it exists today would be removed from libgcc, but we probably want to add a small cache that avoids the need to call into glibc while unwinding through the same object (in which case we probably should add boundary information to struct dl_ehframe_info). The advantage of doing it this way is that it does not require recompiling and relinking objects. Thanks, Florian