https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118611
Bug ID: 118611 Summary: LRA inserts unneeded reload on FMA chain Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization, ra Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- Target: aarch64* The following example: #include <arm_neon.h> float32x4_t bad (float32x4_t x, float32x4_t c0, float32x4_t c1, float32x4_t c3, float32x4_t c2) { float32x4_t z2 = vmulq_f32 (x, x); float32x4_t p1 = vfmaq_laneq_f32 (c1, z2, c3, 0); float32x4_t p2 = vfmaq_laneq_f32 (c2, z2, c3, 2); // Mov is inserted to save P1. (Correct behaviour) float32x4_t p5 = vfmaq_f32 (p1, z2, p1); float32x4_t p6 = vfmaq_f32 (p1, z2, p2); // Mov is inserted to save P5, which is only used once. (Unneeded) float32x4_t y = vfmaq_f32 (p5, x, p6); return vfmaq_f32 (c0, x, y); } compiled with -O3 generates: bad: fmul v31.4s, v0.4s, v0.4s fmla v2.4s, v31.4s, v3.s[0] fmla v4.4s, v31.4s, v3.s[2] mov v30.16b, v2.16b fmla v30.4s, v31.4s, v2.4s fmla v2.4s, v31.4s, v4.4s mov v31.16b, v30.16b fmla v31.4s, v0.4s, v2.4s fmla v1.4s, v0.4s, v31.4s mov v0.16b, v1.16b ret where the second MOV is unneeded because v30 isn't live after the FMA. It seems that we know the lifetime (insn 17 16 18 2 (set (reg:V4SF 101 [ _8 ]) (fma:V4SF (reg:V4SF 118 [ x ]) (reg:V4SF 102 [ _9 ]) (reg:V4SF 103 [ _10 ]))) "":11639:10 2407 {fmav4sf4} (expr_list:REG_DEAD (reg:V4SF 103 [ _10 ]) (expr_list:REG_DEAD (reg:V4SF 102 [ _9 ]) (nil)))) but still: Choosing alt 0 in insn 17: (0) =w (1) w (2) w (3) 0 {fmav4sf4} Creating newreg=125 from oldreg=103, assigning class FP_REGS to r125 17: r125:V4SF={r118:V4SF*r102:V4SF+r125:V4SF} REG_DEAD r103:V4SF REG_DEAD r102:V4SF Inserting insn reload before: 33: r125:V4SF=r103:V4SF Inserting insn reload after: 34: r101:V4SF=r125:V4SF