https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98350
Bug ID: 98350 Summary: Reassociation breaks FMA chains Product: gcc Version: unknown Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- Consider the testcase: #define N 1024 double a[N]; double b[N]; double c[N]; double d[N]; double e[N]; double f[N]; double g[N]; double h[N]; double j[N]; double k[N]; double l[N]; double m[N]; double o[N]; double p[N]; void foo (void) { for (int i = 0; i < N; i++) { a[i] += b[i]* c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i] * l[i] + m[i]* o[i] + p[i]; } } For -Ofast --param=tree-reassoc-width=1 GCC generates the loop: .L2: ldr q1, [x1, x0] ldr q0, [x12, x0] ldr q3, [x14, x0] fadd v0.2d, v0.2d, v1.2d ldr q1, [x13, x0] ldr q2, [x11, x0] fmla v0.2d, v3.2d, v1.2d ldr q1, [x10, x0] ldr q3, [x9, x0] fmla v0.2d, v2.2d, v1.2d ldr q1, [x8, x0] ldr q2, [x7, x0] fmla v0.2d, v3.2d, v1.2d ldr q1, [x6, x0] ldr q3, [x5, x0] fmla v0.2d, v2.2d, v1.2d ldr q1, [x4, x0] ldr q2, [x3, x0] fmla v0.2d, v3.2d, v1.2d ldr q1, [x2, x0] fmla v0.2d, v2.2d, v1.2d str q0, [x1, x0] add x0, x0, 16 cmp x0, 8192 bne .L2 with --param=tree-reassoc-width=4 it generates: .L2: ldr q5, [x11, x0] ldr q4, [x7, x0] ldr q0, [x3, x0] ldr q3, [x12, x0] ldr q1, [x8, x0] ldr q2, [x4, x0] fmul v3.2d, v3.2d, v5.2d fmul v1.2d, v1.2d, v4.2d fmul v2.2d, v2.2d, v0.2d ldr q16, [x1, x0] ldr q18, [x14, x0] ldr q17, [x13, x0] ldr q0, [x2, x0] ldr q7, [x10, x0] ldr q6, [x9, x0] ldr q5, [x6, x0] ldr q4, [x5, x0] fmla v3.2d, v18.2d, v17.2d fadd v0.2d, v0.2d, v16.2d fmla v1.2d, v7.2d, v6.2d fmla v2.2d, v5.2d, v4.2d fadd v0.2d, v0.2d, v3.2d fadd v1.2d, v1.2d, v2.2d fadd v0.2d, v0.2d, v1.2d str q0, [x1, x0] add x0, x0, 16 cmp x0, 8192 bne .L2 The reassociation is evident. The problem here is that the fmla chains are something we'd want to preserve. Is there a way we can get the reassoc pass to handle FMAs more intelligently?