On Fri, 16 May 2025, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > Targets recently got the ability to request the vector mode to be > > used for a vector epilogue (or the epilogue of a vector epilogue). The > > following adds the ability for it to indicate the epilogue should use > > loop masking, irrespective of the --param vect-partial-vector-usage > > setting. > > Is overriding the --param a good idea? I can imagine we'd eventually > want a --param to override the override of the --param. :)
;) So for x86 my plan is to use options_set to check if the user explicitly specified the --param and in this case honor it, but I don't want to have the default of the --param to affect all loops in the same way. So the default for x86 would be to let target heuristics decide. > > The simple prototype below uses a separate flag from the epilogue > > mode, but I wonder how we want to more generally want to handle > > whether to use masking or not when iterating over modes. Currently > > we mostly rely on --param vect-partial-vector-usage. aarch64 > > and riscv have both variable-length modes but also fixed-size modes > > where for the latter, like on x86, the target couldn't request > > a mode specifically with or without masking. It seems both > > aarch64 and riscv fully rely on cost comparison and fully > > exploiting the mode iteration space (but not masked vs. non-masked?!) > > here? > > > > I was thinking of adding a vectorization_mode class that would > > encapsulate the mode and whether to allow masking or alternatively > > to make the vector_modes array (and the m_suggested_epilogue_mode) > > a std::pair of mode and mask flag? > > Predicated vs. non-predicated SVE is interesting for the main loop. > The class sounds like it would be useful for that. > > I suppose predicated vs. non-predicated SVE is also potentially > interesting for an unrolled epilogue, although there, it would in > theory be better to predicate only the last vector iteration > (i.e. part predicated, part unpredicated). Yes, the latter is what we want for AVX512, keep the main loop not predicated but have the epilog predicated (using the same VF). > So I suppose unpredicated SVE epilogue loops might be interesting > until that partial predication is implemented, but I'm not sure how > useful unpredicated SVE epilogue loops would be "once" the partial > predication is supported. > > I don't imagine we'll often know a priori for AArch64 which type > of vector epilogue is best. Since switching between SVE and > Advanced SIMD is assumed to be essentially free, I think we'll > still rely on the current approach of costing both and seeing > which is cheaper. So the other case we might run into on x86 is if you have a known loop tripcount but fully vectorizing the epilogue is still not possible because while we have half-SSE, like V8QImode, we don't have V4QI or V2QI, so even with multiple epilogues we'd still end up with an iterating scalar epilog. Those cases might be good candidates for a predicated epilog as well. So in the end we'd prefer branchless epilogues. Predication on x86 is quite a bit more expensive so I don't see us using a predicated main vector loop anytime soon, and I'd expect that to be the case for all archs when using a fixed-size mode? Is that the case for -msve-vector-bits=X as well? Is there an advantage for not using a predicated main vector loop? Richard. > Thanks, > Richard > > > > > For the x86 case going the prototype way would be sufficient, we > > wouldn't want to say use a masked AVX epilogue for a AVX512 loop, > > so any further iteration on epilogue modes if the requested mode > > would fail to vectorize is OK to be unmasked. > > > > Any comments on this? You are not yet using m_suggested_epilogue_mode > > to get more than one vector epilogue, this might be a way to add > > heuristics when to use a masked epilogue. > > > > Thanks, > > Richard. > > > > * tree-vectorizer.h (vector_costs::suggested_epilogue_mode): > > Add masked output parameter and return m_masked_epilogue. > > (vector_costs::m_masked_epilogue): New tristate flag. > > (vector_costs::vector_costs): Initialize m_masked_epilogue. > > * tree-vect-loop.cc (vect_analyze_loop_1): Pass in masked > > flag to optionally initialize can_use_partial_vectors_p. > > (vect_analyze_loop): For epilogues also get whether to use > > a masked epilogue for this loop from the target and use > > that for the first epilogue mode we try. > > --- > > gcc/tree-vect-loop.cc | 29 +++++++++++++++++++++-------- > > gcc/tree-vectorizer.h | 12 +++++++++--- > > 2 files changed, 30 insertions(+), 11 deletions(-) > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > index 2d1a6883e6b..4af510ff20c 100644 > > --- a/gcc/tree-vect-loop.cc > > +++ b/gcc/tree-vect-loop.cc > > @@ -3407,6 +3407,7 @@ vect_analyze_loop_1 (class loop *loop, > > vec_info_shared *shared, > > const vect_loop_form_info *loop_form_info, > > loop_vec_info orig_loop_vinfo, > > const vector_modes &vector_modes, unsigned &mode_i, > > + int masked_p, > > machine_mode &autodetected_vector_mode, > > bool &fatal) > > { > > @@ -3415,6 +3416,8 @@ vect_analyze_loop_1 (class loop *loop, > > vec_info_shared *shared, > > > > machine_mode vector_mode = vector_modes[mode_i]; > > loop_vinfo->vector_mode = vector_mode; > > + if (masked_p != -1) > > + loop_vinfo->can_use_partial_vectors_p = masked_p; > > unsigned int suggested_unroll_factor = 1; > > unsigned slp_done_for_suggested_uf = 0; > > > > @@ -3580,7 +3583,7 @@ vect_analyze_loop (class loop *loop, gimple > > *loop_vectorized_call, > > cached_vf_per_mode[last_mode_i] = -1; > > opt_loop_vec_info loop_vinfo > > = vect_analyze_loop_1 (loop, shared, &loop_form_info, > > - NULL, vector_modes, mode_i, > > + NULL, vector_modes, mode_i, -1, > > autodetected_vector_mode, fatal); > > if (fatal) > > break; > > @@ -3665,19 +3668,24 @@ vect_analyze_loop (class loop *loop, gimple > > *loop_vectorized_call, > > array may contain length-agnostic and length-specific modes. Their > > ordering is not guaranteed, so we could end up picking a mode for the > > main > > loop that is after the epilogue's optimal mode. */ > > + int masked_p = -1; > > if (!unlimited_cost_model (loop) > > - && first_loop_vinfo->vector_costs->suggested_epilogue_mode () != > > VOIDmode) > > + && (first_loop_vinfo->vector_costs->suggested_epilogue_mode > > (masked_p) > > + != VOIDmode)) > > { > > vector_modes[0] > > - = first_loop_vinfo->vector_costs->suggested_epilogue_mode (); > > + = first_loop_vinfo->vector_costs->suggested_epilogue_mode (masked_p); > > cached_vf_per_mode[0] = 0; > > } > > else > > vector_modes[0] = autodetected_vector_mode; > > mode_i = 0; > > > > - bool supports_partial_vectors = > > - partial_vectors_supported_p () && param_vect_partial_vector_usage != 0; > > + /* ??? If we'd compute LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P > > unconditionally > > + for the main loop we could re-use that as short-cut here. Not if > > + the suggested epilogue mode is different, of course. */ > > + bool supports_partial_vectors = partial_vectors_supported_p () && > > ((param_vect_partial_vector_usage != 0 && masked_p != 0) > > + || masked_p == 1); > > poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo); > > > > loop_vec_info orig_loop_vinfo = first_loop_vinfo; > > @@ -3707,7 +3715,7 @@ vect_analyze_loop (class loop *loop, gimple > > *loop_vectorized_call, > > opt_loop_vec_info loop_vinfo > > = vect_analyze_loop_1 (loop, shared, &loop_form_info, > > orig_loop_vinfo, > > - vector_modes, mode_i, > > + vector_modes, mode_i, masked_p, > > autodetected_vector_mode, fatal); > > if (fatal) > > break; > > @@ -3738,6 +3746,10 @@ vect_analyze_loop (class loop *loop, gimple > > *loop_vectorized_call, > > break; > > } > > > > + /* ??? x86 does not have masked / unmasked modes, not sure > > + if SVE fixed-size modes would support masking? That said, > > + the modes array should be changed to pair(mode, masked_p). */ > > + masked_p = -1; > > if (mode_i == vector_modes.length ()) > > break; > > } > > @@ -3748,12 +3760,13 @@ vect_analyze_loop (class loop *loop, gimple > > *loop_vectorized_call, > > > > /* When we selected a first vectorized epilogue, see if the target > > suggests to have another one. */ > > + masked_p = -1; > > if (!unlimited_cost_model (loop) > > - && (orig_loop_vinfo->vector_costs->suggested_epilogue_mode () > > + && (orig_loop_vinfo->vector_costs->suggested_epilogue_mode (masked_p) > > != VOIDmode)) > > { > > vector_modes[0] > > - = orig_loop_vinfo->vector_costs->suggested_epilogue_mode (); > > + = orig_loop_vinfo->vector_costs->suggested_epilogue_mode (masked_p); > > cached_vf_per_mode[0] = 0; > > mode_i = 0; > > } > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > > index 7aa2b02b63c..6022bc0d55a 100644 > > --- a/gcc/tree-vectorizer.h > > +++ b/gcc/tree-vectorizer.h > > @@ -1693,7 +1693,7 @@ public: > > unsigned int outside_cost () const; > > unsigned int total_cost () const; > > unsigned int suggested_unroll_factor () const; > > - machine_mode suggested_epilogue_mode () const; > > + machine_mode suggested_epilogue_mode (int &masked) const; > > > > protected: > > unsigned int record_stmt_cost (stmt_vec_info, vect_cost_model_location, > > @@ -1717,8 +1717,12 @@ protected: > > unsigned int m_suggested_unroll_factor; > > > > /* The suggested mode to be used for a vectorized epilogue or VOIDmode, > > - determined at finish_cost. */ > > + determined at finish_cost. m_masked_epilogue is whether in case > > + of non-VOIDmode epilogue mode the epilogue should use masked > > + vectorization, regardless of --param vect-partial-vector-usage > > + setting. If -1 then the --param setting takes precedence. */ > > machine_mode m_suggested_epilogue_mode; > > + int m_masked_epilogue; > > > > /* True if finish_cost has been called. */ > > bool m_finished; > > @@ -1734,6 +1738,7 @@ vector_costs::vector_costs (vec_info *vinfo, bool > > costing_for_scalar) > > m_costs (), > > m_suggested_unroll_factor(1), > > m_suggested_epilogue_mode(VOIDmode), > > + m_masked_epilogue (-1), > > m_finished (false) > > { > > } > > @@ -1794,9 +1799,10 @@ vector_costs::suggested_unroll_factor () const > > /* Return the suggested epilogue mode. */ > > > > inline machine_mode > > -vector_costs::suggested_epilogue_mode () const > > +vector_costs::suggested_epilogue_mode (int &masked_p) const > > { > > gcc_checking_assert (m_finished); > > + masked_p = m_masked_epilogue; > > return m_suggested_epilogue_mode; > > } > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)