https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95775
--- Comment #4 from Yichao Yu <yyc1992 at gmail dot com> --- > Hey. My opinion is similar to Richi's. If you really want a highly optimized > library, you should rather use a dlopen mechanism with pre-built set of > options. Well, a few things, 1. That sounds like an argument against `target_clone` and `target`. If dlopen'ing different libraries is your recommended solution then none of these would be needed. 2. The solution you propose put all the pression on the user of the library. That has a few problems. 2.1. There are strictly more users than libraries. (Assuming the library is used at all) so this is forcing more (repeated) work to be done. 2.2. The author of the library and to a lesser degree the builder of the library has the best knowledge of the set of features that can benefit the library/the most useful for the deployment environment. The author of the user code of the library, who has to implement the dispatch/loading logic in general has much less complete knowledge of what the target to support. 2.3. It'll be even worse for code size since this forces each user to carry their own library, and now all data has to be duplicated as well in additional to code. Also because, 3. There's no standard way of doing this AFAICT. Now (3) is really the main point. I'm fine with whatever mechanism that allows multiple versions of the code to be available as long as it requires no more effort/cost from/for the user (and to a lesser degree the author) of the library. If one such mechanism is provided by gcc/glibc/binutils so that library writers don't have to invent their own loading and detection mechanism and won't cause unnecessary indirection (as cheap as ifunc) and will just work for the user to either link or dlopen, then I think it doesn't really matter if that's backed by one file/multiple files or whatever one can come up with. Currently, the only mechanism available that fits this description AFAICT is `target_clones`/`ifunc`. Unless there's a roadmap that I'm not aware of to replace this mechanism with a similar one backed by multiple files I don't think suggesting such a mechanism is the right approach. Again, I said in the very first post that I totally agree this won't be the method to give absolutely the best performance, but neither is `target_clones`. I also completely agree that this option can be misused and the compiler should not do it on its own before getting smarter but this is far from the first option that can be misused and given how cheap memory is and how multiple load of the same library doesn't take more memory this isn't even closoed to be the worse misused either.