https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99551
Bug ID: 99551
Summary: aarch64: csel is used for cold scalar computation
which affects performance
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: nsz at gcc dot gnu.org
Target Milestone: ---
this is an optimization bug, i don't know which layer it should
be fixed so i report it as target bug.
cold path affects performance of hot code because csel is used:
long foo(long x, int c)
{
if (__builtin_expect(c,0))
x = (x + 15) & ~15;
return x;
}
compiles to
foo:
cmp w1, 0
add x1, x0, 15
and x1, x1, -16
csel x0, x1, x0, ne
ret
i think it would be better to use a branch if the user
explicitly marked the computation cold.
e.g. this is faster if c is always 0:
long foo(long x, int c)
{
if (__builtin_expect(c,0)) {
asm ("");
x = (x + 15) & ~15;
}
return x;
}
foo:
cbnz w1, .L7
ret
.L7:
add x0, x0, 15
and x0, x0, -16
ret