https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118611

            Bug ID: 118611
           Summary: LRA inserts unneeded reload on FMA chain
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization, ra
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*

The following example:

#include <arm_neon.h>

float32x4_t
bad (float32x4_t x, float32x4_t c0, float32x4_t c1, float32x4_t c3,
     float32x4_t c2)
{
    float32x4_t z2 = vmulq_f32 (x, x);

    float32x4_t p1 = vfmaq_laneq_f32 (c1, z2, c3, 0);
    float32x4_t p2 = vfmaq_laneq_f32 (c2, z2, c3, 2);

    // Mov is inserted to save P1. (Correct behaviour)
    float32x4_t p5 = vfmaq_f32 (p1, z2, p1);
    float32x4_t p6 = vfmaq_f32 (p1, z2, p2);

    // Mov is inserted to save P5, which is only used once. (Unneeded)
    float32x4_t y = vfmaq_f32 (p5, x, p6);
    return vfmaq_f32 (c0, x, y);
}

compiled with -O3 generates:

bad:
  fmul v31.4s, v0.4s, v0.4s
  fmla v2.4s, v31.4s, v3.s[0]
  fmla v4.4s, v31.4s, v3.s[2]
  mov v30.16b, v2.16b
  fmla v30.4s, v31.4s, v2.4s
  fmla v2.4s, v31.4s, v4.4s
  mov v31.16b, v30.16b
  fmla v31.4s, v0.4s, v2.4s
  fmla v1.4s, v0.4s, v31.4s
  mov v0.16b, v1.16b
  ret

where the second MOV is unneeded because v30 isn't live after the FMA.

It seems that we know the lifetime

(insn 17 16 18 2 (set (reg:V4SF 101 [ _8 ])
        (fma:V4SF (reg:V4SF 118 [ x ])
            (reg:V4SF 102 [ _9 ])
            (reg:V4SF 103 [ _10 ]))) "":11639:10 2407 {fmav4sf4}
     (expr_list:REG_DEAD (reg:V4SF 103 [ _10 ])
        (expr_list:REG_DEAD (reg:V4SF 102 [ _9 ])
            (nil))))

but still:

      Choosing alt 0 in insn 17:  (0) =w  (1) w  (2) w  (3) 0 {fmav4sf4}
      Creating newreg=125 from oldreg=103, assigning class FP_REGS to r125
   17: r125:V4SF={r118:V4SF*r102:V4SF+r125:V4SF}
      REG_DEAD r103:V4SF
      REG_DEAD r102:V4SF
    Inserting insn reload before:
   33: r125:V4SF=r103:V4SF
    Inserting insn reload after:
   34: r101:V4SF=r125:V4SF

Reply via email to