On Fri, 1 Mar 2024, Andre Vieira (lists) wrote: > Hi, > > Bootstrapped and tested the gcc-13 backport of this on gcc-12 for > aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu and no regressions. > > OK to push to gcc-12 branch?
OK. Thanks, Richard. > Kind regards, > Andre Vieira > > On 10/11/2023 13:16, Richard Biener wrote: > > The following fixes the issue that when SLP stmts are internal defs > > but appear invariant because they end up only using invariant defs > > then they get scheduled outside of the loop. This nice optimization > > breaks down when loop masks or lens are applied since those are not > > explicitly tracked as dependences. The following makes sure to never > > schedule internal defs outside of the vectorized loop when the > > loop uses masks/lens. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. > > > > PR tree-optimization/110221 > > * tree-vect-slp.cc (vect_schedule_slp_node): When loop > > masking / len is applied make sure to not schedule > > intenal defs outside of the loop. > > > > * gfortran.dg/pr110221.f: New testcase. > > --- > > gcc/testsuite/gfortran.dg/pr110221.f | 17 +++++++++++++++++ > > gcc/tree-vect-slp.cc | 10 ++++++++++ > > 2 files changed, 27 insertions(+) > > create mode 100644 gcc/testsuite/gfortran.dg/pr110221.f > > > > diff --git a/gcc/testsuite/gfortran.dg/pr110221.f > > b/gcc/testsuite/gfortran.dg/pr110221.f > > new file mode 100644 > > index 00000000000..8b57384313a > > --- /dev/null > > +++ b/gcc/testsuite/gfortran.dg/pr110221.f > > @@ -0,0 +1,17 @@ > > +C PR middle-end/68146 > > +C { dg-do compile } > > +C { dg-options "-O2 -w" } > > +C { dg-additional-options "-mavx512f --param vect-partial-vector-usage=2" { > > target avx512f } } > > + SUBROUTINE CJYVB(V,Z,V0,CBJ,CDJ,CBY,CYY) > > + IMPLICIT DOUBLE PRECISION (A,B,G,O-Y) > > + IMPLICIT COMPLEX*16 (C,Z) > > + DIMENSION CBJ(0:*),CDJ(0:*),CBY(0:*) > > + N=INT(V) > > + CALL GAMMA2(VG,GA) > > + DO 65 K=1,N > > + CBY(K)=CYY > > +65 CONTINUE > > + CDJ(0)=V0/Z*CBJ(0)-CBJ(1) > > + DO 70 K=1,N > > +70 CDJ(K)=-(K+V0)/Z*CBJ(K)+CBJ(K-1) > > + END > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > > index 3e5814c3a31..80e279d8f50 100644 > > --- a/gcc/tree-vect-slp.cc > > +++ b/gcc/tree-vect-slp.cc > > @@ -9081,6 +9081,16 @@ vect_schedule_slp_node (vec_info *vinfo, > > /* Emit other stmts after the children vectorized defs which is > > earliest possible. */ > > gimple *last_stmt = NULL; > > + if (auto loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) > > + if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) > > + || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) > > + { > > + /* But avoid scheduling internal defs outside of the loop when > > + we might have only implicitly tracked loop mask/len defs. */ > > + gimple_stmt_iterator si > > + = gsi_after_labels (LOOP_VINFO_LOOP (loop_vinfo)->header); > > + last_stmt = *si; > > + } > > bool seen_vector_def = false; > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > if (SLP_TREE_DEF_TYPE (child) == vect_internal_def) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)