https://sourceware.org/bugzilla/show_bug.cgi?id=27695
Bug ID: 27695 Summary: ld has poor performance characteristics when loading large quantities of .so files Product: binutils Version: 2.28 Status: UNCONFIRMED Severity: normal Priority: P2 Component: ld Assignee: unassigned at sourceware dot org Reporter: steve.gargolinski at gmail dot com Target Milestone: --- Our application is growing and our startup time is increasing significantly on Linux while remaining fairly consistent on Windows. A typical startup workflow that we've been measuring takes about 10 seconds on Windows and over 60 seconds on Linux with comparable hardware. Profiling the platform startup time difference attributes the time completely to ld.so. We did a bunch of experimentation and investigation and realized that our growing quantity of dynamic libraries is a major contributor to this change. In order to replicate this outside of our product we generated a small sample application that measure time to load 100,000 small generated classes (constructor, virtual destructor) spread across a varying quantity of dynamic libraries. Loading these 100,000 classes in one dynamic library takes about 0.3 seconds. Loading the same 100,000 classes spread across 1,000 libraries takes over 9 seconds! Back to our real world use case. In our product we generally load libraries with RTLD_GLOBAL. One of the main performance bottlenecks we were able to identify is in _dl_lookup_symbol_x(). When searching the global scope (symbol_scope[0]), the search found nothing > 50% of the time and did so with linear performance. return _dl_lookup_symbol_x(undef_name, undef_map, ref, symbol_scope, version, type_class, flags, skip_map); A major portion of our 60 second startup time is spent here. We experimented with adding a hashset of symbols previously loaded into the global scope (updated in add_to_global()) so that we could get constant time lookup on this check instead of linear. This was a major improvement to both our test application and our real product. The test application mentioned above, which previously took 9 seconds to load 1,000 libraries, now performs the same operation in 1 second. We've prototyped a strategy to dynamically patch ld.so at startup of our application and our workflow time measurements improved from 60 seconds to 30 seconds. Still not nearly as fast as Windows, but a major improvement. We've tested this on a bunch of versions of multiple distributions and have been able to improve all of them. With this change we're adding some memory overhead. Also timing improvements will not be seen by applications loading a small number of dynamic libraries (and can even cause a performance regression due to time spent populating the hashset) - but it's a huge improvement to our use case. I'm happy to share any of the fixes or investigations in more detail. Improving ld.so performance as dynamic library quantity scales is really important to our use case and we're looking for input on whether this can be a useful addition to the glibc codebase. -- You are receiving this mail because: You are on the CC list for the bug.