We currently have a latent bug in glibc where C++ constructor calls can fail if they have static or thread storage duration and a non-trivial destructor. The reason is that __cxa_atexit (and __cxa_thread_atexit_impl) may have to allocate memory. We can avoid that if we know how many such static calls exist in an object (for C++, the compiler will never emit these calls repeatedly in a loop). Then we can allocate the resources beforehand, either during process and thread start, or when dlopen is called and new objects are loaded.
What would be the most ELF-flavored way to implement this? After the final link, I expect that the count (or counts, we need a separate counter for thread-local storage) would show up under a new dynamic tag in the dynamic segment. This is actually a very good fit because older loaders will just ignore it. But the question remains what GCC should emit into assembler & object files, so that the link editor can compute the total count from that. Thanks, Florian