https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125557
Bug ID: 125557
Summary: Missed if-conversion of load addresses
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ktkachov at gcc dot gnu.org
Target Milestone: ---
The reduced testcase is:
#include <stdint.h>
#include <stddef.h>
const uint8_t *
advance_loop (const uint8_t *ip, size_t tag, const uint8_t *end)
{
while (ip < end)
{
size_t tag_type = tag & 3;
if (tag_type == 0)
{
size_t nlt = (tag >> 2) + 1;
tag = ip[nlt];
ip += nlt + 1;
}
else
{
tag = ip[tag_type];
ip += tag_type + 1;
}
}
return ip;
}
For aarch64 at -O3 LLVM manages to if-convert it into:
.LBB0_1:
add x8, x0, x1, lsr #2
ands x9, x1, #0x3
add x9, x0, x9
add x10, x8, #2
csinc x8, x9, x8, ne
ldrb w1, [x8]
csinc x0, x10, x9, eq
cmp x0, x2
b.lo .LBB0_1
whereas GCC keeps conditional branches. From what I can tell this is because
both arms of the diamond have memory loads, and GCC doesn't figure out that it
can conditionally select the addresses for an unconditional load.
This loop is hot in the Snappy compression library, where it leads to GCC being
behind LLVM on aarch64 machines.
Perhaps phiopt not transforming PHI <*P, *Q> into P' = PHI <P, Q>; result = *P'
?