In loop vectorization analysis, the target can suggest an unroll factor
based on the cost model to expose more ILP. However, when a loop has a
known iteration count that is no greater than the current vectorization
factor, the vectorized loop will be executed at most once. In this case,
applying a suggested unroll factor greater than 1 only increases the
code size and complexity of the loop body.
The testcase added in this patch has a fixed 16-iteration byte SAD loop.
When compiling it on some AArch64 SVE targets, the cost model suggests
an unroll factor of 4 even though one vector iteration in VNx16QI mode
covers all 16 scalar iterations. The extra unrolled chunks are fully
masked off and redundant.
This fixes the issue by resetting the suggested unroll factor when the
iteration count is known to be no greater than the current VF.
Bootstrapped and tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop.cc (vect_estimate_min_profitable_iters): Reset
the suggested unroll factor for small iteration-count loops.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/vect-no-unroll-1.c: New test.
---
.../gcc.target/aarch64/sve/vect-no-unroll-1.c | 17 +++++++++
gcc/tree-vect-loop.cc | 38 +++++++++++++------
2 files changed, 43 insertions(+), 12 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
b/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
new file mode 100644
index 00000000000..7dfa851a1da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
@@ -0,0 +1,17 @@
+/* Check that a small-niters loop is not over-unrolled. */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=neoverse-v2 -mautovec-preference=sve-only" } */
+
+#include <stdint.h>
+#include <stdlib.h>
+
+int
+foo (uint8_t *p1, uint8_t *p2)
+{
+ int sum = 0;
+ for (int i = 0; i < 16; i++)
+ sum += abs (p1[i] - p2[i]);
+ return sum;
+}
+
+/* { dg-final { scan-assembler-not {\tld1b\t[^\n]*, mul vl} } } */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 0167d52b28b..45e5eca5724 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4420,19 +4420,33 @@ vect_estimate_min_profitable_iters (loop_vec_info
loop_vinfo,
*suggested_unroll_factor
= loop_vinfo->vector_costs->suggested_unroll_factor ();
- if (suggested_unroll_factor && *suggested_unroll_factor > 1
- && LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) != MAX_VECTORIZATION_FACTOR
- && !known_le (LOOP_VINFO_VECT_FACTOR (loop_vinfo) *
- *suggested_unroll_factor,
- LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo)))
+ if (suggested_unroll_factor && *suggested_unroll_factor > 1)
{
- if (dump_enabled_p ())
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
- "can't unroll as unrolled vectorization factor larger"
- " than maximum vectorization factor: "
- HOST_WIDE_INT_PRINT_UNSIGNED "\n",
- LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo));
- *suggested_unroll_factor = 1;
+ if (LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) != MAX_VECTORIZATION_FACTOR
+ && !known_le (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+ * *suggested_unroll_factor,
+ LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo)))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "can't unroll as unrolled vectorization factor "
+ "larger than maximum vectorization factor: "
+ HOST_WIDE_INT_PRINT_UNSIGNED "\n",
+ LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo));
+ *suggested_unroll_factor = 1;
+ }
+ else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+ && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
+ LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "can't unroll as the loop iteration count is "
+ "no greater than the vectorization factor: "
+ HOST_WIDE_INT_PRINT_UNSIGNED "\n",
+ LOOP_VINFO_INT_NITERS (loop_vinfo));
+ *suggested_unroll_factor = 1;
+ }
}
vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
--
2.43.0