On Tue, 27 Oct 2020, Richard Sandiford wrote: > Sorry for the very late comment (was out last week)? > > Richard Biener <rguent...@suse.de> writes: > > This enables SLP store group splitting also for loop vectorization. > > For the existing testcase gcc.dg/vect/vect-complex-5.c this then > > generates much better code, likewise for the PR97428 testcase. > > > > Both of those have a splitting opportunity splitting the group > > into two equal (vector-sized) halves, still the patch enables > > quite arbitrary splitting since generally the interleaving scheme > > results in quite awkward code for even small groups. If any > > problems surface with this it's easy to restrict the splitting > > to known-good cases. Is there any additional constraints for > > non-constant sized vectors? Note this interacts with vector > > size iteration (but comparing interleaving cost with SLP cost > > of a smaller vector size doesn't reliably pick the smaller > > vector size). > > Not sure about the variable-sized vector aspect. For SVE it > isn't really natural to split the store itself up: I think we'd > instead want to keep a unified store and blend in the stored > values where necessary. E.g. rather than split: > > a a a a b b c c > > into: > > a a a a > b b > c c > > we'd be better off having predicated groups of the form: > > a a a a _ _ _ _ > _ _ _ _ b b _ _ > _ _ _ _ _ _ c c > > This is one thing on the very long todo list :-/
Hmm, I see. Looking at the case of a group_size == 3 store right now which (for the sake of register pressure) would benefit from V4xy vectorization and a masked store, doing sth "smart" to fill up lane 4 (duplicating another one would always work but possibly make loads more expensive, masking would work here as well). The splitting and mode iteration also has "interesting" interaction (as well as the load group being in the need of a common vector type ...). So yeah - loads of work ahead. > > @@ -2323,6 +2323,36 @@ vect_analyze_slp_instance (vec_info *vinfo, > > rest, max_tree_size); > > return res; > > } > > + > > + /* For loop vectorization split into arbitrary pieces of size > 1. > > */ > > + if (is_a <loop_vec_info> (vinfo) > > + && (i > 1 && i < group_size)) > > + { > > + gcc_assert ((const_nunits & (const_nunits - 1)) == 0); > > FWIW, we have pow2p_hwi for this. > > > + unsigned group1_size = i; > > + > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "Splitting SLP group at stmt %u\n", i); > > + > > + stmt_vec_info rest = vect_split_slp_store_group (stmt_info, > > + group1_size); > > + /* Loop vectorization cannot handle gaps in stores, make sure > > + the split group appears as strided. */ > > + STMT_VINFO_STRIDED_P (rest) = 1; > > + DR_GROUP_GAP (rest) = 0; > > + STMT_VINFO_STRIDED_P (stmt_info) = 1; > > + DR_GROUP_GAP (stmt_info) = 0; > > Does something undo the STMT_VINFO_STRIDED_P assignments if SLP > vectorisation fails? If not, mightn't this pessimise things in > that case? Realise that that won't be a concern once everything > is SLP, just wondering. > > (Sorry if this has already been dealt with by later patches or > is otherwise just noise.) Good question, and no. Likewise the splitting is never "undone". A later refactoring makes cleaning this up easier I guess (not sure at which point we actually need to do the splitting). Richard. > Thanks, > Richard > > > + > > + bool res = vect_analyze_slp_instance (vinfo, bst_map, stmt_info, > > + max_tree_size); > > + if (i + 1 < group_size) > > + res |= vect_analyze_slp_instance (vinfo, bst_map, > > + rest, max_tree_size); > > + > > + return res; > > + } > > + > > /* Even though the first vector did not all match, we might be able > > to SLP > > (some) of the remainder. FORNOW ignore this possibility. */ > > } > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imend