On Thu, 25 Jun 2026, Pengfei Li wrote:
>
> On 25/06/2026 12:13, Richard Biener wrote:
> > On Thu, 25 Jun 2026, Pengfei Li wrote:
> >
> >> In loop vectorization analysis, the target can suggest an unroll factor
> >> based on the cost model to expose more ILP. However, when a loop has a
> >> known iteration count that is no greater than the current vectorization
> >> factor, the vectorized loop will be executed at most once. In this case,
> >> applying a suggested unroll factor greater than 1 only increases the
> >> code size and complexity of the loop body.
> >>
> >> The testcase added in this patch has a fixed 16-iteration byte SAD loop.
> >> When compiling it on some AArch64 SVE targets, the cost model suggests
> >> an unroll factor of 4 even though one vector iteration in VNx16QI mode
> >> covers all 16 scalar iterations. The extra unrolled chunks are fully
> >> masked off and redundant.
> >>
> >> This fixes the issue by resetting the suggested unroll factor when the
> >> iteration count is known to be no greater than the current VF.
> > Huh, but we have already analyzed the loop with unroll factor == 1
> > so failing here is exactly what we want?
>
> Just to make sure I understand. Are you saying that this is a wrong place to
> reject the suggested unroll factor?
I'm saying that the factor will be rejected by costing when re-analyzing
the loop with the suggested unroll factor. Does it not?
> I put the check here to make it next to an existing max-VF check, but do you
> think this should be handled by just skipping the vect re-analysis with unroll
> factor > 1 in an outer function like vect_analyze_loop_1() ?
No, I'm saying we should do what the target asks us to and I think
we'll reject it outright anyway via an exiting check. Yes, we do
extra work, but then the fix is to the target to not suggest such
unrolling when it's obviously nonsense.
Richard.
>
> Thanks,
> Pengfei
>
> >> Bootstrapped and tested on aarch64-linux-gnu and x86_64-linux-gnu.
> >>
> >> gcc/ChangeLog:
> >>
> >> * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Reset
> >> the suggested unroll factor for small iteration-count loops.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/aarch64/sve/vect-no-unroll-1.c: New test.
> >> ---
> >> .../gcc.target/aarch64/sve/vect-no-unroll-1.c | 17 +++++++++
> >> gcc/tree-vect-loop.cc | 38 +++++++++++++------
> >> 2 files changed, 43 insertions(+), 12 deletions(-)
> >> create mode 100644
> >> gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
> >>
> >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
> >> b/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
> >> new file mode 100644
> >> index 00000000000..7dfa851a1da
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
> >> @@ -0,0 +1,17 @@
> >> +/* Check that a small-niters loop is not over-unrolled. */
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -mtune=neoverse-v2 -mautovec-preference=sve-only" }
> >> */
> >> +
> >> +#include <stdint.h>
> >> +#include <stdlib.h>
> >> +
> >> +int
> >> +foo (uint8_t *p1, uint8_t *p2)
> >> +{
> >> + int sum = 0;
> >> + for (int i = 0; i < 16; i++)
> >> + sum += abs (p1[i] - p2[i]);
> >> + return sum;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-not {\tld1b\t[^\n]*, mul vl} } } */
> >> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> >> index 0167d52b28b..45e5eca5724 100644
> >> --- a/gcc/tree-vect-loop.cc
> >> +++ b/gcc/tree-vect-loop.cc
> >> @@ -4420,19 +4420,33 @@ vect_estimate_min_profitable_iters (loop_vec_info
> >> loop_vinfo,
> >> *suggested_unroll_factor
> >> = loop_vinfo->vector_costs->suggested_unroll_factor ();
> >> - if (suggested_unroll_factor && *suggested_unroll_factor > 1
> >> - && LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) !=
> >> MAX_VECTORIZATION_FACTOR
> >> - && !known_le (LOOP_VINFO_VECT_FACTOR (loop_vinfo) *
> >> - *suggested_unroll_factor,
> >> - LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo)))
> >> + if (suggested_unroll_factor && *suggested_unroll_factor > 1)
> >> {
> >> - if (dump_enabled_p ())
> >> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >> - "can't unroll as unrolled vectorization factor
> >> larger"
> >> - " than maximum vectorization factor: "
> >> - HOST_WIDE_INT_PRINT_UNSIGNED "\n",
> >> - LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo));
> >> - *suggested_unroll_factor = 1;
> >> + if (LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) !=
> >> MAX_VECTORIZATION_FACTOR
> >> + && !known_le (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
> >> + * *suggested_unroll_factor,
> >> + LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo)))
> >> + {
> >> + if (dump_enabled_p ())
> >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >> + "can't unroll as unrolled vectorization factor "
> >> + "larger than maximum vectorization factor: "
> >> + HOST_WIDE_INT_PRINT_UNSIGNED "\n",
> >> + LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo));
> >> + *suggested_unroll_factor = 1;
> >> + }
> >> + else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >> + && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> >> + LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
> >> + {
> >> + if (dump_enabled_p ())
> >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >> + "can't unroll as the loop iteration count is "
> >> + "no greater than the vectorization factor: "
> >> + HOST_WIDE_INT_PRINT_UNSIGNED "\n",
> >> + LOOP_VINFO_INT_NITERS (loop_vinfo));
> >> + *suggested_unroll_factor = 1;
> >> + }
> >> }
> >>
> >> vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
> >>
>
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)