https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95776
Bug ID: 95776 Summary: Reduce indirection with target_clones at link time (with LTO) Product: gcc Version: 10.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- Currently, if a function is not not visible outside the final library (static, or internal or hidden visibility), the call of the plt will be replaced with the call to the function directly. With target_clones, this is also possible within the same compilation unit for static functions as callees. The caller that has the same cloning attribute will simply call the cloned function without indirection. However, this stops working when the two are combined. Even with the maximum options and attribute to help it (hidden visibility, same compilation unit, -Wl,-Bsymbolic, LTO) the call to the cloned function from a caller with matching cloning attribute still go through the PLT. Test code ``` __attribute__((noinline,visibility("hidden"))) int f1(int *p) { asm volatile ("" :: "r"(p) : "memory"); return *p; } __attribute__((noinline,visibility("hidden"),target_clones("default,avx2"))) int f2(int *p) { asm volatile ("" :: "r"(p) : "memory"); return *p; } __attribute__((noinline)) int g1(int *p) { return f1(p); } __attribute__((noinline,target_clones("default,avx2"))) int g2(int *p) { return f2(p); } ``` Compiled with `-fPIC -flto -O3 -Wl,-Bsymbolic -shared`. The `f1` call calls `f1` directly whereas the two cloned `f2` calls both call `f2@plt`. The same also applies to inlining, target_clones kills inlining even with lto on. I assume this happens because this can only be done at link time which either didn't get passed enough info to determine this or simply didn't get implemented? I assume this should be possible since it can be done within a single compilation unit.