https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100849
Bug ID: 100849 Summary: Poor placement of vector IVs Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Vector IV increments are usually placed at the beginning of a loop body. This means that both the old and new IV values are live at the same time, forcing a move. E.g.: int x[100], y[100]; void f1 (void) { for (int i = 0; i < 100; ++i) x[i] = (i & 11) == 2 ? y[i] : 1; } produces: <bb 3> [local count: 268435400]: # vect_vec_iv_.7_47 = PHI <_48(3), { 4, 5, 6, 7 }(2)> # ivtmp.21_21 = PHI <ivtmp.21_16(3), 0(2)> _48 = vect_vec_iv_.7_47 + { 4, 4, 4, 4 }; vect__1.8_50 = vect_vec_iv_.7_47 & { 11, 11, 11, 11 }; vect_iftmp.11_54 = MEM <vector(4) int> [(int *)&y + 16B + ivtmp.21_21 * 1]; vect_iftmp.12_58 = .VCOND (vect__1.8_50, { 2, 2, 2, 2 }, vect_iftmp.11_54, { 1, 1, 1, 1 }, 113); MEM <vector(4) int> [(int *)&x + 16B + ivtmp.21_21 * 1] = vect_iftmp.12_58; ivtmp.21_16 = ivtmp.21_21 + 16; if (ivtmp.21_16 != 384) goto <bb 3>; [96.00%] else goto <bb 4>; [4.00%] It might be better to place the vector IV at the same place as the original scalar increment (or at the end of the loop body?) The AArch64 Advanced SIMD code is: .L2: mov v0.16b, v1.16b add x2, x4, x0 add v1.4s, v1.4s, v6.4s add x1, x3, x0 add x0, x0, 16 ldr q3, [x2, 16] and v0.16b, v0.16b, v5.16b cmeq v0.4s, v0.4s, v4.4s bsl v0.16b, v3.16b, v2.16b str q0, [x1, 16] cmp x0, 384 bne .L2