https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99551

            Bug ID: 99551
           Summary: aarch64: csel is used for cold scalar computation
                    which affects performance
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

this is an optimization bug, i don't know which layer it should
be fixed so i report it as target bug.

cold path affects performance of hot code because csel is used:

long foo(long x, int c)
{
    if (__builtin_expect(c,0))
        x = (x + 15) & ~15;
    return x;
}


compiles to

foo:
        cmp     w1, 0
        add     x1, x0, 15
        and     x1, x1, -16
        csel    x0, x1, x0, ne
        ret

i think it would be better to use a branch if the user
explicitly marked the computation cold.
e.g. this is faster if c is always 0:

long foo(long x, int c)
{
    if (__builtin_expect(c,0)) {
        asm ("");
        x = (x + 15) & ~15;
    }
    return x;
}

foo:
        cbnz    w1, .L7
        ret
.L7:
        add     x0, x0, 15
        and     x0, x0, -16
        ret

Reply via email to