https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88833
Bug ID: 88833 Summary: [SVE] Redundant moves for WHILELO-based loops Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Compiling this function with -O3 -march=armv8-a+sve: subroutine foo(x, y, z) real :: x(100), y(100), z(100) x = y + z(1) end subroutine foo gives: foo_: .LFB0: .cfi_startproc mov x4, 100 mov x5, x4 // Redundant mov x3, 0 ptrue p1.s, all whilelo p0.s, xzr, x4 ld1rw z1.s, p1/z, [x2] .p2align 3,,7 .L2: ld1w z0.s, p0/z, [x1, x3, lsl 2] fadd z0.s, z0.s, z1.s st1w z0.s, p0, [x0, x3, lsl 2] incw x3 whilelo p0.s, x3, x5 bne .L2 ret .cfi_endproc There's no need for the move here. We should just be able to use x4 for both WHILELOs. Although the move itself shouldn't be expensive in context, it suggests that the RA isn't seeing an accurate picture, which could hurt in more complex cases.