> > > > > > > > - /* Loops vectorized with a variable factor won't benefit from > > > > + /* Loops vectorized would have already taken into account unrolling > specified > > > > + by the user as the suggested unroll factor, as such we need to > > > > prevent the > > > > + RTL unroller from unrolling twice. The only exception is static > > > > known > > > > + iterations where we would have expected the loop to be fully > > > > unrolled. > > > > + Loops vectorized with a variable factor won't benefit from > > > > unrolling/peeling. */ > > > > - if (!vf.is_constant ()) > > > > + if (LOOP_VINFO_USER_UNROLL (loop_vinfo) > > > > > > ... this is the transform phase - is LOOP_VINFO_USER_UNROLL copied > > > from the earlier attempt? > > > > Ah, I see I forgot to copy it when the loop_vinfo is copied.. Will fix. > >
I've been looking more into the behavior and I think it's correct not to copy
it from an earlier attempt.
The flag would be re-set every time during vect_estimate_min_profitable_iters
as we have to recalculate
the unroll based on the assumed_vf.
When vect_analyze_loop_2 initializes the costing structure, we just set it
again during vect_analyze_loop_costing
as loop->unroll is not cleared until vectorization succeeds.
For the epilogue it would be false, which I think makes sense as the epilogues
should determine their VF solely
based on that of the previous attempt? Because I think it makes sense that the
epilogues should be able to tell
the vectorizer that it wants to re-use the same mode for the next attempt, just
without the unrolling.
> In the end whatever we do it's going to be a matter of documenting
> the interaction between vectorization and #pragma GCC unroll.
>
Docs added
> The way you handle it is reasonable, the question is whether to
> set loop->unroll to 1 in the end (disable any further unrolling)
> or to 0 (only auto-unroll based on heuristics). I'd argue 0
> makes more sense - iff we chose to apply the extra unrolling
> during vectorization.
0 does make more sense to me as well. I think where we got crossed earlier was
that I was mentioning that
Having unroll > 1 after this wasn't a good idea, so was a miscommunication. But
0 makes much sense.
Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
* tree-vectorizer.h (vector_costs::set_suggested_unroll_factor,
LOOP_VINFO_USER_UNROLL): New.
(class _loop_vec_info): Add user_unroll.
* tree-vect-loop.cc (vect_estimate_min_profitable_iters): Set
suggested_unroll_factor before calling backend costing.
(_loop_vec_info::_loop_vec_info): Initialize user_unroll.
(vect_transform_loop): Clear the loop->unroll value if the pragma was
used.
doc/extend.texi (pragma unroll): Document vectorizer interaction.
-- inline copy of patch --
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index
e87a3c271f8420d8fd175823b5bb655f76c89afe..f8261d13903afc90d3341c09ab3fdbd0ab96ea49
100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -10398,6 +10398,11 @@ unrolled @var{n} times regardless of any commandline
arguments.
When the option is @var{preferred} then the user is allowed to override the
unroll amount through commandline options.
+If the loop was vectorized the unroll factor specified will be used to seed the
+vectorizer unroll factor. Whether the loop is unrolled or not will be
+determined by target costing. The resulting vectorized loop may still be
+unrolled more in later passes depending on the target costing.
+
@end table
@node Thread-Local
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index
fe6f3cf188e40396b299ff9e814cc402bc2d4e2d..1fbf92b5f4b176ada7379930b73ab503fb423e99
100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1073,6 +1073,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in,
vec_info_shared *shared)
peeling_for_gaps (false),
peeling_for_niter (false),
early_breaks (false),
+ user_unroll (false),
no_data_dependencies (false),
has_mask_store (false),
scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -4983,6 +4984,26 @@ vect_estimate_min_profitable_iters (loop_vec_info
loop_vinfo,
}
}
+ /* Seed the target cost model with what the user requested if the unroll
+ factor is larger than 1 vector VF. */
+ auto user_unroll = LOOP_VINFO_LOOP (loop_vinfo)->unroll;
+ if (user_unroll > 1)
+ {
+ LOOP_VINFO_USER_UNROLL (loop_vinfo) = true;
+ int unroll_fact = user_unroll / assumed_vf;
+ unroll_fact = 1 << ceil_log2 (unroll_fact);
+ if (unroll_fact > 1)
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "setting unroll factor to %d based on user requested "
+ "unroll factor %d and suggested vectorization "
+ "factor: %d\n",
+ unroll_fact, user_unroll, assumed_vf);
+ loop_vinfo->vector_costs->set_suggested_unroll_factor (unroll_fact);
+ }
+ }
+
/* Complete the target-specific cost calculations. */
loop_vinfo->vector_costs->finish_cost (loop_vinfo->scalar_costs);
vec_prologue_cost = loop_vinfo->vector_costs->prologue_cost ();
@@ -12373,6 +12394,13 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple
*loop_vectorized_call)
dump_printf_loc (MSG_NOTE, vect_location, "Disabling unrolling due to"
" variable-length vectorization factor\n");
}
+
+ /* When we have unrolled the loop due to a user requested value we should
+ leave it up to the RTL unroll heuristics to determine if it's still worth
+ while to unroll more. */
+ if (LOOP_VINFO_USER_UNROLL (loop_vinfo))
+ loop->unroll = 0;
+
/* Free SLP instances here because otherwise stmt reference counting
won't work. */
slp_instance instance;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index
a2f33a5ecd60288fe7f28ee639ff8b6a77667796..5675309ab16dc7f7f0b98e229e4798075e6cdd7e
100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -970,6 +970,10 @@ public:
/* Main loop IV cond. */
gcond* loop_iv_cond;
+ /* True if we have an unroll factor requested by the user through pragma GCC
+ unroll. */
+ bool user_unroll;
+
/* True if there are no loop carried data dependencies in the loop.
If loop->safelen <= 1, then this is always true, either the loop
didn't have any loop carried data dependencies, or the loop is being
@@ -1094,6 +1098,7 @@ public:
#define LOOP_VINFO_CHECK_UNEQUAL_ADDRS(L) (L)->check_unequal_addrs
#define LOOP_VINFO_CHECK_NONZERO(L) (L)->check_nonzero
#define LOOP_VINFO_LOWER_BOUNDS(L) (L)->lower_bounds
+#define LOOP_VINFO_USER_UNROLL(L) (L)->user_unroll
#define LOOP_VINFO_GROUPED_STORES(L) (L)->grouped_stores
#define LOOP_VINFO_SLP_INSTANCES(L) (L)->slp_instances
#define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor
@@ -1693,6 +1698,7 @@ public:
unsigned int outside_cost () const;
unsigned int total_cost () const;
unsigned int suggested_unroll_factor () const;
+ void set_suggested_unroll_factor (unsigned int);
machine_mode suggested_epilogue_mode () const;
protected:
@@ -1791,6 +1797,15 @@ vector_costs::suggested_unroll_factor () const
return m_suggested_unroll_factor;
}
+/* Set the suggested unroll factor. */
+
+inline void
+vector_costs::set_suggested_unroll_factor (unsigned unroll_fact)
+{
+ gcc_checking_assert (!m_finished);
+ m_suggested_unroll_factor = unroll_fact;
+}
+
/* Return the suggested epilogue mode. */
inline machine_mode
rb19435.patch
Description: rb19435.patch
