Hi All,
The vectorizer now tries to maintain the target VF that a user wanted through
uncreasing the unroll factor if the user used pragma GCC unroll and we've
vectorized the loop.
This change makes the AArch64 backend honor this initial value being set by
the vectorizer.
Consider the loop
void f1 (int *restrict a, int n)
{
#pragma GCC unroll 4 requested
for (int i = 0; i < n; i++)
a[i] *= 2;
}
The target can then choose to create multiple epilogues to deal with the "rest".
The example above now generates:
.L4:
ldr q31, [x2]
add v31.4s, v31.4s, v31.4s
str q31, [x2], 16
cmp x2, x3
bne .L4
as V4SI maintains the requested VF, but e.g. pragma unroll 8 generates:
.L4:
ldp q30, q31, [x2]
add v30.4s, v30.4s, v30.4s
add v31.4s, v31.4s, v31.4s
stp q30, q31, [x2], 32
cmp x3, x2
bne .L4
Note that as a follow up I plan on looking into asking the vectorizer to
generate multiple epilogues when we do unroll like this. Atm I added a TODO
since e.g. for early break we don't support vector epilogues yet.
Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_vector_costs::determine_suggested_unroll_factor): Use
m_suggested_unroll_factor instead of 1.
(aarch64_vector_costs::finish_cost): Add todo for epilogues.
---
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index
9e3f2885bccb62550c5fcfdf93d72fbc2e63233e..cf6f56a08d67044c8dc34578902eb4cb416641bd
100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18075,7 +18075,7 @@ aarch64_vector_costs::determine_suggested_unroll_factor
()
if (!sve && !TARGET_SVE2 && m_has_avg)
return 1;
- unsigned int max_unroll_factor = 1;
+ unsigned int max_unroll_factor = m_suggested_unroll_factor;
for (auto vec_ops : m_ops)
{
aarch64_simd_vec_issue_info const *vec_issue
@@ -18293,6 +18293,8 @@ aarch64_vector_costs::finish_cost (const vector_costs
*uncast_scalar_costs)
m_costs[vect_body]);
m_suggested_unroll_factor = determine_suggested_unroll_factor ();
+ /* TODO: Add support for multiple epilogues and costing for early break.
*/
+
/* For gather and scatters there's an additional overhead for the first
iteration. For low count loops they're not beneficial so model the
overhead as loop prologue costs. */
--
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9e3f2885bccb62550c5fcfdf93d72fbc2e63233e..cf6f56a08d67044c8dc34578902eb4cb416641bd 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18075,7 +18075,7 @@ aarch64_vector_costs::determine_suggested_unroll_factor ()
if (!sve && !TARGET_SVE2 && m_has_avg)
return 1;
- unsigned int max_unroll_factor = 1;
+ unsigned int max_unroll_factor = m_suggested_unroll_factor;
for (auto vec_ops : m_ops)
{
aarch64_simd_vec_issue_info const *vec_issue
@@ -18293,6 +18293,8 @@ aarch64_vector_costs::finish_cost (const vector_costs *uncast_scalar_costs)
m_costs[vect_body]);
m_suggested_unroll_factor = determine_suggested_unroll_factor ();
+ /* TODO: Add support for multiple epilogues and costing for early break. */
+
/* For gather and scatters there's an additional overhead for the first
iteration. For low count loops they're not beneficial so model the
overhead as loop prologue costs. */