https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88833

            Bug ID: 88833
           Summary: [SVE] Redundant moves for WHILELO-based loops
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsandifo at gcc dot gnu.org
  Target Milestone: ---

Compiling this function with -O3 -march=armv8-a+sve:

subroutine foo(x, y, z)
  real :: x(100), y(100), z(100)
  x = y + z(1)
end subroutine foo

gives:

foo_:
.LFB0:
        .cfi_startproc
        mov     x4, 100
        mov     x5, x4        // Redundant
        mov     x3, 0
        ptrue   p1.s, all
        whilelo p0.s, xzr, x4
        ld1rw   z1.s, p1/z, [x2]
        .p2align 3,,7
.L2:
        ld1w    z0.s, p0/z, [x1, x3, lsl 2]
        fadd    z0.s, z0.s, z1.s
        st1w    z0.s, p0, [x0, x3, lsl 2]
        incw    x3
        whilelo p0.s, x3, x5
        bne     .L2
        ret
        .cfi_endproc

There's no need for the move here.  We should just be able to use x4 for both
WHILELOs.

Although the move itself shouldn't be expensive in context, it suggests that
the RA isn't seeing an accurate picture, which could hurt in more complex
cases.

Reply via email to