https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122749

            Bug ID: 122749
           Summary: [16 Regression] useless type conversions inserted
                    during vectorization blocking MLA
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

The following example

int foo2 (char *buf, int len) {
    int x;
    for (int i =0; i < len; i++) {
        x += (int) i * buf[i];
    }
    return x;
}

compiled with -O3 -mcpu=neoverse-v2 used to generate a 4x unrolled MLA sequence

        mla     z29.s, p7/m, z2.s, z0.s
        mla     z27.s, p7/m, z4.s, z26.s
        mla     z30.s, p7/m, z1.s, z0.s
        mla     z28.s, p7/m, z23.s, z3.s

but now generates MUL + ADD

        mul     z2.s, z2.s, z1.s
        mul     z4.s, z4.s, z26.s
        mul     z1.s, z24.s, z1.s
        mul     z3.s, z23.s, z3.s
        add     z29.s, z2.s, z29.s
        add     z30.s, z1.s, z30.s
        add     z28.s, z3.s, z28.s
        add     z0.s, z4.s, z0.s

It looks like we no longer match the aarch64_pred_fma pattern in combine.

Reply via email to