https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95776

            Bug ID: 95776
           Summary: Reduce indirection with target_clones at link time
                    (with LTO)
           Product: gcc
           Version: 10.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: lto
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yyc1992 at gmail dot com
                CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Currently, if a function is not not visible outside the final library (static,
or internal or hidden visibility), the call of the plt will be replaced with
the call to the function directly.

With target_clones, this is also possible within the same compilation unit for
static functions as callees. The caller that has the same cloning attribute
will simply call the cloned function without indirection.

However, this stops working when the two are combined. Even with the maximum
options and attribute to help it (hidden visibility, same compilation unit,
-Wl,-Bsymbolic, LTO) the call to the cloned function from a caller with
matching cloning attribute still go through the PLT.

Test code

```
__attribute__((noinline,visibility("hidden"))) int f1(int *p)
{
    asm volatile ("" :: "r"(p) : "memory");
    return *p;
}

__attribute__((noinline,visibility("hidden"),target_clones("default,avx2")))
int f2(int *p)
{
    asm volatile ("" :: "r"(p) : "memory");
    return *p;
}

__attribute__((noinline)) int g1(int *p)
{
    return f1(p);
}

__attribute__((noinline,target_clones("default,avx2"))) int g2(int *p)
{
    return f2(p);
}
```

Compiled with `-fPIC -flto -O3 -Wl,-Bsymbolic -shared`. The `f1` call calls
`f1` directly whereas the two cloned `f2` calls both call `f2@plt`.

The same also applies to inlining, target_clones kills inlining even with lto
on.

I assume this happens because this can only be done at link time which either
didn't get passed enough info to determine this or simply didn't get
implemented? I assume this should be possible since it can be done within a
single compilation unit.

Reply via email to