Hi All, The vectorizer now tries to maintain the target VF that a user wanted through uncreasing the unroll factor if the user used pragma GCC unroll and we've vectorized the loop.
This change makes the AArch64 backend honor this initial value being set by the vectorizer. Consider the loop void f1 (int *restrict a, int n) { #pragma GCC unroll 4 requested for (int i = 0; i < n; i++) a[i] *= 2; } The target can then choose to create multiple epilogues to deal with the "rest". The example above now generates: .L4: ldr q31, [x2] add v31.4s, v31.4s, v31.4s str q31, [x2], 16 cmp x2, x3 bne .L4 as V4SI maintains the requested VF, but e.g. pragma unroll 8 generates: .L4: ldp q30, q31, [x2] add v30.4s, v30.4s, v30.4s add v31.4s, v31.4s, v31.4s stp q30, q31, [x2], 32 cmp x3, x2 bne .L4 Note that as a follow up I plan on looking into asking the vectorizer to generate multiple epilogues when we do unroll like this. Atm I added a TODO since e.g. for early break we don't support vector epilogues yet. Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_vector_costs::determine_suggested_unroll_factor): Use m_suggested_unroll_factor instead of 1. (aarch64_vector_costs::finish_cost): Add todo for epilogues. --- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 9e3f2885bccb62550c5fcfdf93d72fbc2e63233e..cf6f56a08d67044c8dc34578902eb4cb416641bd 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -18075,7 +18075,7 @@ aarch64_vector_costs::determine_suggested_unroll_factor () if (!sve && !TARGET_SVE2 && m_has_avg) return 1; - unsigned int max_unroll_factor = 1; + unsigned int max_unroll_factor = m_suggested_unroll_factor; for (auto vec_ops : m_ops) { aarch64_simd_vec_issue_info const *vec_issue @@ -18293,6 +18293,8 @@ aarch64_vector_costs::finish_cost (const vector_costs *uncast_scalar_costs) m_costs[vect_body]); m_suggested_unroll_factor = determine_suggested_unroll_factor (); + /* TODO: Add support for multiple epilogues and costing for early break. */ + /* For gather and scatters there's an additional overhead for the first iteration. For low count loops they're not beneficial so model the overhead as loop prologue costs. */ --
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 9e3f2885bccb62550c5fcfdf93d72fbc2e63233e..cf6f56a08d67044c8dc34578902eb4cb416641bd 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -18075,7 +18075,7 @@ aarch64_vector_costs::determine_suggested_unroll_factor () if (!sve && !TARGET_SVE2 && m_has_avg) return 1; - unsigned int max_unroll_factor = 1; + unsigned int max_unroll_factor = m_suggested_unroll_factor; for (auto vec_ops : m_ops) { aarch64_simd_vec_issue_info const *vec_issue @@ -18293,6 +18293,8 @@ aarch64_vector_costs::finish_cost (const vector_costs *uncast_scalar_costs) m_costs[vect_body]); m_suggested_unroll_factor = determine_suggested_unroll_factor (); + /* TODO: Add support for multiple epilogues and costing for early break. */ + /* For gather and scatters there's an additional overhead for the first iteration. For low count loops they're not beneficial so model the overhead as loop prologue costs. */