https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88838
Bug ID: 88838 Summary: [SVE] Use 32-bit WHILELO in LP64 mode Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Compiling this test with -O3 -march=armv8-a+sve: void f (int *restrict x, int *restrict y, int *restrict z, int n) { for (int i = 0; i < n; i += 1) x[i] = y[i] + z[i]; } produces: f: .LFB0: .cfi_startproc cmp w3, 0 ble .L1 mov x4, 0 sxtw x3, w3 whilelo p0.s, xzr, x3 .p2align 3,,7 .L3: ld1w z1.s, p0/z, [x1, x4, lsl 2] ld1w z0.s, p0/z, [x2, x4, lsl 2] add z0.s, z0.s, z1.s st1w z0.s, p0, [x0, x4, lsl 2] incw x4 whilelo p0.s, x4, x3 bne .L3 .L1: ret We could (and should) avoid the SXTW by using WHILELO on W registers instead of X registers. vect_verify_full_masking checks which IV widths are supported for WHILELO but prefers to go to Pmode width. This is because using Pmode allows ivopts to reuse the IV for indices (as in the loads and store above). However, it would be better to use a 32-bit WHILELO with a truncated 64-bit IV if: (a) the limit is extended from 32 bits. (b) the detection loop in vect_verify_full_masking detects that using a 32-bit IV would be correct. The thing to avoid is when using a 32-bit IV might wrap (see vect_set_loop_masks_directly). In that case it would be better to stick with 64-bit WHILELOs.