On Tue, 2 Nov 2021, Tamar Christina wrote: > Hi All, > > Following my current SVE predicate optimization series a problem has presented > itself in that the way vector masks are generated for masked operations relies > on CSE to share masks efficiently. > > The issue however is that masking is done using the & operand and & is > associative and so reassoc decides to reassociate the masked operations.
But it does this for the purpose of canonicalization and thus CSE. > This makes CSE then unable to CSE an unmasked and a masked operation leading > to > duplicate operations being performed. > > To counter this we want to add an RPO pass over the vectorized loop body when > vectorization succeeds. This makes it then no longer reliant on the RTL level > CSE. > > I have not added a testcase for this as it requires the changes in my patch > series, however the entire series relies on this patch to work so all the > tests there cover it. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > x86_64-linux-gnu and no issues. > > Ok for master? You are running VN over _all_ loop bodies rather only those vectorized. We loop over vectorized loops earlier for optimizing masked store sequences. I suppose you could hook in there. I'll also notice that we have pass_pre_slp_scalar_cleanup which eventually runs plus we have a late FRE. So I don't understand why it doesn't work to CSE later. for (i = 1; i < number_of_loops (cfun); i++) { loop_vec_info loop_vinfo; bool has_mask_store; loop = get_loop (cfun, i); if (!loop || !loop->aux) continue; loop_vinfo = (loop_vec_info) loop->aux; has_mask_store = LOOP_VINFO_HAS_MASK_STORE (loop_vinfo); delete loop_vinfo; if (has_mask_store && targetm.vectorize.empty_mask_is_expensive (IFN_MASK_STORE)) optimize_mask_stores (loop); loop->aux = NULL; } > Thanks, > Tamar > > gcc/ChangeLog: > > * tree-vectorizer.c (vectorize_loops): Do local CSE through RPVN upon > successful vectorization. > > --- inline copy of patch -- > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c > index > 4712dc6e7f907637774482a71036a0bd381c2bd2..1e370d60fb19b03c3b6bce45c660af4b6d32dc51 > 100644 > --- a/gcc/tree-vectorizer.c > +++ b/gcc/tree-vectorizer.c > @@ -81,7 +81,7 @@ along with GCC; see the file COPYING3. If not see > #include "gimple-pretty-print.h" > #include "opt-problem.h" > #include "internal-fn.h" > - > +#include "tree-ssa-sccvn.h" > > /* Loop or bb location, with hotness information. */ > dump_user_location_t vect_location; > @@ -1323,6 +1323,27 @@ vectorize_loops (void) > ??? Also while we try hard to update loop-closed SSA form we fail > to properly do this in some corner-cases (see PR56286). */ > rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa_only_virtuals); > + > + for (i = 1; i < number_of_loops (cfun); i++) > + { > + loop = get_loop (cfun, i); > + if (!loop || !single_exit (loop)) > + continue; > + > + bitmap exit_bbs; > + /* Perform local CSE, this esp. helps because we emit code for > + predicates that need to be shared for optimal predicate usage. > + However reassoc will re-order them and prevent CSE from working > + as it should. CSE only the loop body, not the entry. */ > + exit_bbs = BITMAP_ALLOC (NULL); > + bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index); > + bitmap_set_bit (exit_bbs, loop->latch->index); > + > + do_rpo_vn (cfun, loop_preheader_edge (loop), exit_bbs); > + > + BITMAP_FREE (exit_bbs); > + } > + > return TODO_cleanup_cfg; > } > > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)