When we do loop masking via mask or length a single scalar iteration
should be sufficient to avoid excess accesses. This fixes the last
known FAILs with --param vect-force-slp=1.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
Do we know of a case where the peeling isn't sufficient with VL vectors?
The CI will probably fail because of dependent patches I just pushed :/
Thanks,
Richard.
PR tree-optimization/117558
* tree-vect-stmts.cc (get_group_load_store_type): Exempt
VL vector types from not sufficient gap peeling testing.
---
gcc/tree-vect-stmts.cc | 41 +++++++++++++++++++----------------------
1 file changed, 19 insertions(+), 22 deletions(-)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index d3552266eee..b5f90803eed 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2181,33 +2181,30 @@ get_group_load_store_type (vec_info *vinfo,
stmt_vec_info stmt_info,
/* Peeling for gaps assumes that a single scalar iteration
is enough to make sure the last vector iteration doesn't
- access excess elements. */
+ access excess elements. For variable-length vectors the
+ required loop masking ensures a single iteration is always
+ sufficient. */
+ unsigned HOST_WIDE_INT cnunits, cvf, cremain, cpart_size;
if (overrun_p
- && (!can_div_trunc_p (group_size
- * LOOP_VINFO_VECT_FACTOR (loop_vinfo) - gap,
- nunits, &tem, &remain)
- || maybe_lt (remain + group_size, nunits)))
- {
+ && nunits.is_constant (&cnunits)
+ && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&cvf)
+ && ((cremain = (group_size * cvf - gap) % cnunits), true)
+ && cremain + group_size < cnunits
/* But peeling a single scalar iteration is enough if
we can use the next power-of-two sized partial
access and that is sufficiently small to be covered
by the single scalar iteration. */
- unsigned HOST_WIDE_INT cnunits, cvf, cremain, cpart_size;
- if (!nunits.is_constant (&cnunits)
- || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&cvf)
- || (((cremain = (group_size * cvf - gap) % cnunits), true)
- && ((cpart_size = (1 << ceil_log2 (cremain))), true)
- && (cremain + group_size < cpart_size
- || vector_vector_composition_type
- (vectype, cnunits / cpart_size,
- &half_vtype) == NULL_TREE)))
- {
- if (dump_enabled_p ())
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
- "peeling for gaps insufficient for "
- "access\n");
- return false;
- }
+ && ((cpart_size = (1 << ceil_log2 (cremain))), true)
+ && (cremain + group_size < cpart_size
+ || vector_vector_composition_type
+ (vectype, cnunits / cpart_size,
+ &half_vtype) == NULL_TREE))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "peeling for gaps insufficient for "
+ "access\n");
+ return false;
}
}
}
--
2.43.0