On December 30, 2019 3:15:14 PM GMT+01:00, Richard Sandiford <richard.sandif...@arm.com> wrote: >This patch fixes an awkward corner case in which: > >(a) we apply if-conversion to a loop; > >(b) the original scalar loop doesn't have a vdef, and thus doesn't > need a virtual phi; > >(c) the vectorised main loop does need a vdef and a virtual phi (see >below); > >(d) we also vectorise the epilogue; and > >(e) the vectorised epilogue still needs a scalar epilogue > >The specific case in which (c) applies is if a read-only loop is >vectorised using IFN_LOAD_LANES, which uses clobber statements to >mark the lifetime of the temporary array. > >The vectoriser relies on the SSA renamer to update virtual operands. >All would probably be well if it postponed this update until after >it had vectorised both the main loop and the epilogue loop. However, >when vectorising the epilogue, vect_do_peeling does: > > create_lcssa_for_virtual_phi (loop); > update_ssa (TODO_update_ssa_only_virtuals); > >(with "loop" in this case being the to-be-vectorised epilogue loop). >So the vectoriser puts the virtual operand into SSA form for the >vectorised main loop as a separate step, during the early stages >of vectorising the epilogue. > >I wasn't sure at first why that update_ssa was there. It looked >initially like it was related to create_lcssa_for_virtual_phi, >which seemed strange when create_lcssa_for_virtual_phi keeps the >SSA form up-to-date. But before r241099 it had the following comment, >which AFAICT is still the reason: > > /* We might have a queued need to update virtual SSA form. As we > delete the update SSA machinery below after doing a regular > incremental SSA update during loop copying make sure we don't > lose that fact. > ??? Needing to update virtual SSA form by renaming is unfortunate > but not all of the vectorizer code inserting new loads / stores > properly assigns virtual operands to those statements. */ > >The patch restores that comment since IMO it's helpful. > >(a), (d) and (e) mean that we copy the original un-if-converted scalar >loop to act as the scalar epilogue. The update_ssa above means that >this >copying needs to cope with any new virtual SSA names in the main loop. >The code to do that (reasonably) assumed that one of two things was >true: > >(1) the scalar loop and the vector loops don't have vdefs, and so no > virtual operand update is needed. The definition that applies > on entry to the loops is the same in all cases. > >(2) the scalar loop and the vector loops have virtual phis, and so -- > after applying create_lcssa_for_virtual_phi on the to-be-vectorised > epilogue loop -- the virtual operand update can be handled in the > same way as for normal SSA names. > >But (b) and (c) together mean that the scalar loop and the >still-to-be-vectorised epilogue loop have no virtual phi that (2) >can use. We'd therefore keep the original vuses when duplicating, >rather than updating them to the definition that applies on exit >from the epilogue loop. (Since the epilogue is still unvectorised >and has no vdefs, the definition that applies on exit is the same >as the one that applies on entry.) > >This patch therefore adds a third case: the scalar loop and >to-be-vectorised epilogue have no virtual defs, but the main loop does. > >Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
Ok. Richard. >Richard > > >2019-12-30 Richard Sandiford <richard.sandif...@arm.com> > >gcc/ > * tree-vect-loop-manip.c (create_lcssa_for_virtual_phi): Return > the incoming virtual operand definition. > (vect_do_peeling): When vectorizing an epilogue loop, handle the > case in which the main loop has a virtual phi and the epilogue > and scalar loops don't. Restore an earlier comment about the > update_ssa call. > >gcc/testsuite/ > * gcc.dg/vect/vect-epilogues-2.c: New test. > >Index: gcc/tree-vect-loop-manip.c >=================================================================== >--- gcc/tree-vect-loop-manip.c 2019-12-30 14:12:25.000000000 +0000 >+++ gcc/tree-vect-loop-manip.c 2019-12-30 14:12:26.082419824 +0000 >@@ -1249,9 +1249,12 @@ slpeel_can_duplicate_loop_p (const class >the *guard[12] routines, which assume loop closed SSA form for all PHIs > (but normally loop closed SSA form doesn't require virtual PHIs to be > in the same form). Doing this early simplifies the checking what >- uses should be renamed. */ >+ uses should be renamed. > >-static void >+ If we create a new phi after the loop, return the definition that >+ applies on entry to the loop, otherwise return null. */ >+ >+static tree > create_lcssa_for_virtual_phi (class loop *loop) > { > gphi_iterator gsi; >@@ -1283,10 +1286,12 @@ create_lcssa_for_virtual_phi (class loop > && !flow_bb_inside_loop_p (loop, gimple_bb (stmt))) > FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) > SET_USE (use_p, new_vop); >+ >+ return PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop)); > } > break; > } >- >+ return NULL_TREE; > } > > /* Function vect_get_loop_location. >@@ -2483,9 +2488,42 @@ vect_do_peeling (loop_vec_info loop_vinf >class loop *prolog, *epilog = NULL, *loop = LOOP_VINFO_LOOP >(loop_vinfo); > class loop *first_loop = loop; >bool irred_flag = loop_preheader_edge (loop)->flags & >EDGE_IRREDUCIBLE_LOOP; >- create_lcssa_for_virtual_phi (loop); >+ >+ /* We might have a queued need to update virtual SSA form. As we >+ delete the update SSA machinery below after doing a regular >+ incremental SSA update during loop copying make sure we don't >+ lose that fact. >+ ??? Needing to update virtual SSA form by renaming is >unfortunate >+ but not all of the vectorizer code inserting new loads / stores >+ properly assigns virtual operands to those statements. */ > update_ssa (TODO_update_ssa_only_virtuals); > >+ create_lcssa_for_virtual_phi (loop); >+ >+ /* If we're vectorizing an epilogue loop, the update_ssa above will >+ have ensured that the virtual operand is in SSA form throughout >the >+ vectorized main loop. Normally it is possible to trace the >updated >+ vector-stmt vdefs back to scalar-stmt vdefs and vector-stmt vuses >+ back to scalar-stmt vuses, meaning that the effect of the SSA >update >+ remains local to the main loop. However, there are rare cases in >+ which the vectorized loop has vdefs even when the original scalar >+ loop didn't. For example, vectorizing a load with IFN_LOAD_LANES >+ introduces clobbers of the temporary vector array, which in turn >+ needs new vdefs. If the scalar loop doesn't write to memory, >these >+ new vdefs will be the only ones in the vector loop. >+ >+ In that case, update_ssa will have added a new virtual phi to the >+ main loop, which previously didn't need one. Ensure that we >(locally) >+ maintain LCSSA form for the virtual operand, just as we would >have >+ done if the virtual phi had existed from the outset. This makes >it >+ easier to duplicate the scalar epilogue loop below. */ >+ tree vop_to_rename = NULL_TREE; >+ if (loop_vec_info orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO >(loop_vinfo)) >+ { >+ class loop *orig_loop = LOOP_VINFO_LOOP (orig_loop_vinfo); >+ vop_to_rename = create_lcssa_for_virtual_phi (orig_loop); >+ } >+ > if (MAY_HAVE_DEBUG_BIND_STMTS) > { > gcc_assert (!adjust_vec.exists ()); >@@ -2706,6 +2744,26 @@ vect_do_peeling (loop_vec_info loop_vinf > as the transformations mentioned above make less or no sense when not > vectorizing. */ > epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop; >+ if (vop_to_rename) >+ { >+ /* Vectorizing the main loop can sometimes introduce a vdef to >+ a loop that previously didn't have one; see the comment above >+ the definition of VOP_TO_RENAME for details. The definition >+ D that holds on E will then be different from the definition >+ VOP_TO_RENAME that holds during SCALAR_LOOP, so we need to >+ rename VOP_TO_RENAME to D when copying the loop. >+ >+ The virtual operand is in LCSSA form for the main loop, >+ and no stmt between the main loop and E needs a vdef, >+ so we know that D is provided by a phi rather than by a >+ vdef on a normal gimple stmt. */ >+ basic_block vdef_bb = e->src; >+ gphi *vphi; >+ while (!(vphi = get_virtual_phi (vdef_bb))) >+ vdef_bb = get_immediate_dominator (CDI_DOMINATORS, vdef_bb); >+ gcc_assert (vop_to_rename != gimple_phi_result (vphi)); >+ set_current_def (vop_to_rename, gimple_phi_result (vphi)); >+ } > epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e); > if (!epilog) > { >Index: gcc/testsuite/gcc.dg/vect/vect-epilogues-2.c >=================================================================== >--- /dev/null 2019-09-17 11:41:18.176664108 +0100 >+++ gcc/testsuite/gcc.dg/vect/vect-epilogues-2.c 2019-12-30 >14:12:26.082419824 +0000 >@@ -0,0 +1,57 @@ >+/* { dg-do compile } */ >+ >+int >+f1 (int *x, int n) >+{ >+ int res = 0; >+ for (int i = 0; i < n; ++i) >+ res += x[i * 2] == 1 ? 2 : 3; >+ return res; >+} >+ >+int >+f2 (int *x) >+{ >+ int res = 0; >+ for (int i = 0; i < 0x83; ++i) >+ res += x[i * 2] == 1 ? 2 : 3; >+ return res; >+} >+ >+int >+f3 (int *x, int n) >+{ >+ int res = 0; >+ for (int i = 0; i < n; ++i) >+ res += x[i * 2] == 1 ? 2 : 3; >+ return res + x[0x100]; >+} >+ >+int >+f4 (int *x) >+{ >+ int res = 0; >+ for (int i = 0; i < 0x83; ++i) >+ res += x[i * 2] == 1 ? 2 : 3; >+ return res + x[0x100]; >+} >+ >+int >+f5 (int *x, int n, int a) >+{ >+ int res = 0; >+ for (int i = 0; i < n; ++i) >+ res += x[i * 2] == 1 ? 2 : 3; >+ x[a] += 1; >+ return res; >+} >+ >+int >+f6 (int *x, int a) >+{ >+ int res = 0; >+ for (int i = 0; i < 0x83; ++i) >+ res += x[i * 2] == 1 ? 2 : 3; >+ x[a] += 1; >+ return res; >+}