https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122749
Bug ID: 122749
Summary: [16 Regression] useless type conversions inserted
during vectorization blocking MLA
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: tnfchris at gcc dot gnu.org
Target Milestone: ---
The following example
int foo2 (char *buf, int len) {
int x;
for (int i =0; i < len; i++) {
x += (int) i * buf[i];
}
return x;
}
compiled with -O3 -mcpu=neoverse-v2 used to generate a 4x unrolled MLA sequence
mla z29.s, p7/m, z2.s, z0.s
mla z27.s, p7/m, z4.s, z26.s
mla z30.s, p7/m, z1.s, z0.s
mla z28.s, p7/m, z23.s, z3.s
but now generates MUL + ADD
mul z2.s, z2.s, z1.s
mul z4.s, z4.s, z26.s
mul z1.s, z24.s, z1.s
mul z3.s, z23.s, z3.s
add z29.s, z2.s, z29.s
add z30.s, z1.s, z30.s
add z28.s, z3.s, z28.s
add z0.s, z4.s, z0.s
It looks like we no longer match the aarch64_pred_fma pattern in combine.