https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744
--- Comment #25 from Gleb Natapov <gleb at scylladb dot com> --- (In reply to Jakub Jelinek from comment #24) > (In reply to Gleb Natapov from comment #23) > > I am not sure I agree. 64 lock will take one page of memory, which is > > negligible amount nowadays and we can drop the array if compiled for single > > threaded machine. > > It is perhaps negligible for your app, but libc has lots of users with > different needs. And dl_load_lock isn't the only widely used lock in libc, > are we going to use page of array locks in each case such lock has > scalability issues with certain use cases? > That's a fair point. I think severity of the issue should be taken into account. I can tell from our experience (and searching the web we are not alone) that for exception throwing languages like C++ the issue is very serious, and no, we do not use exceptions as flow control, but when errors happen they tend to happen in bunches and when the first bunch slows the system to a crawl it causes even more errors. The only workaround is to not use exception which for us is not acceptable, so fixing the issue in its root is the only option. Using Torvald's rwlock would be definitely better that current state, but not as good as per thread lock. > > > Such interface will make new dl_iterate_phdr_rd to libgcc specific, also > > scalablity will depend on cache efficiency, so while benchmark will show > > much better result, real application will not benefit. Complex C++ > > applications tend to have deep call chains. > > Why would it be libgcc specific? It would be another libc supported API, > with clear documentation on what it does and any user could just use it. > I think I misunderstood what you propose. My patch essentially does what you suggest already, it calls the function dl_iterate_phdr_parallel instead of dl_iterate_phdr_rd, but otherwise it is the same: it can run multiple callback in parallel, so we only disagree on how _parallel_ part is achieved internally. On glibc list there were some concerns about widening the interface though. They may prefer to use symbol versioning to change dl_iterate_phdr semantics (not sure if and how this can be done yet). > As > I said, the number of entries in the cache could be extended. > Unless it extends dynamically it would be hard to guess a proper size, and the price of underguessing is too high. Finding a proper size dynamically will require a lot of cache management code which I do not think belong here.