Richard Biener <rguent...@suse.de> writes: > The following fixes the computation of supports_partial_vectors which > is used to prune the set of modes to iterate over for epilog > vectorization. The used partial_vectors_supported_p predicate > only looks for while_ult while also support predication when > mask modes are integer modes as for AVX512. > > I've noticed this isn't very effective on x86_64 anyway since > if the main loop mode is autodetected we skip re-analyzing > mode_i == 0, but then mode_i == 1 is usually the very same > large mode. > > Thus I do wonder if we should instead always (or when > --param vect-partial-vector-usage != 0, or when the target would > support predication in principle) perform main loop analysis > with partial vectors in mind (start with can_use_partial_vectors_p = > true), but only at the end honor the --param when deciding on > using_partial_vectors_p. We can then remember can_use_partial_vectors_p > for each analyzed mode and use that more specific info for the > pruning?
Yeah, sounds like that could work. In principle, epilogue loops should be strictly easier to vectorise than main loops. If you know that the epilogue "loop" never iterates, there could in principle be cases where we'd need to clear can_use_partial_vectors_p for the main loop but not for the epilogue loop. I can't think of any situation like that off-hand though. Likewise for unrolling. > For the missed skipping we probably want to increment > mode_i based on vect_chooses_same_modes_p, like we do in > vect_analyze_loop_1. I'll propose a patch for this - but this > would regress --param vect-partial-vector-usage=1 on x86 without > the patch below. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > OK? > > * tree-vect-loop.cc (vect_analyze_loop): Consider AVX512 > style masking when computing supports_partial_vectors. > --- > gcc/tree-vect-loop.cc | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index c824b5abaaf..b91ef4a2325 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -3742,8 +3742,15 @@ vect_analyze_loop (class loop *loop, gimple > *loop_vectorized_call, > vector_modes[0] = autodetected_vector_mode; > mode_i = 0; > > - bool supports_partial_vectors = > - partial_vectors_supported_p () && param_vect_partial_vector_usage != 0; > + bool supports_partial_vectors = param_vect_partial_vector_usage != 0; > + machine_mode mask_mode; > + if (supports_partial_vectors > + && !partial_vectors_supported_p () > + && !(VECTOR_MODE_P (first_loop_vinfo->vector_mode) > + && targetm.vectorize.get_mask_mode > + (first_loop_vinfo->vector_mode).exists (&mask_mode) > + && SCALAR_INT_MODE_P (mask_mode))) > + supports_partial_vectors = false; LGTM FWIW. I suppose an alternative would be to do this check within the loop and use vector_modes[mode_i] rather than first_loop_vinfo->vector_mode, so that we test the mode that we intend to use. But maybe the extra precision (if that's what it is) isn't useful in practice. Thanks, Richard > poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo); > > loop_vec_info orig_loop_vinfo = first_loop_vinfo;