On Fri, 1 Mar 2024, Andre Vieira (lists) wrote:
> Hi,
>
> Bootstrapped and tested the gcc-13 backport of this on gcc-12 for
> aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu and no regressions.
>
> OK to push to gcc-12 branch?
OK.
Thanks,
Richard.
> Kind regards,
> Andre Vieira
>
> On 10/11/2023 13:16, Richard Biener wrote:
> > The following fixes the issue that when SLP stmts are internal defs
> > but appear invariant because they end up only using invariant defs
> > then they get scheduled outside of the loop. This nice optimization
> > breaks down when loop masks or lens are applied since those are not
> > explicitly tracked as dependences. The following makes sure to never
> > schedule internal defs outside of the vectorized loop when the
> > loop uses masks/lens.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> >
> > PR tree-optimization/110221
> > * tree-vect-slp.cc (vect_schedule_slp_node): When loop
> > masking / len is applied make sure to not schedule
> > intenal defs outside of the loop.
> >
> > * gfortran.dg/pr110221.f: New testcase.
> > ---
> > gcc/testsuite/gfortran.dg/pr110221.f | 17 +++++++++++++++++
> > gcc/tree-vect-slp.cc | 10 ++++++++++
> > 2 files changed, 27 insertions(+)
> > create mode 100644 gcc/testsuite/gfortran.dg/pr110221.f
> >
> > diff --git a/gcc/testsuite/gfortran.dg/pr110221.f
> > b/gcc/testsuite/gfortran.dg/pr110221.f
> > new file mode 100644
> > index 00000000000..8b57384313a
> > --- /dev/null
> > +++ b/gcc/testsuite/gfortran.dg/pr110221.f
> > @@ -0,0 +1,17 @@
> > +C PR middle-end/68146
> > +C { dg-do compile }
> > +C { dg-options "-O2 -w" }
> > +C { dg-additional-options "-mavx512f --param vect-partial-vector-usage=2" {
> > target avx512f } }
> > + SUBROUTINE CJYVB(V,Z,V0,CBJ,CDJ,CBY,CYY)
> > + IMPLICIT DOUBLE PRECISION (A,B,G,O-Y)
> > + IMPLICIT COMPLEX*16 (C,Z)
> > + DIMENSION CBJ(0:*),CDJ(0:*),CBY(0:*)
> > + N=INT(V)
> > + CALL GAMMA2(VG,GA)
> > + DO 65 K=1,N
> > + CBY(K)=CYY
> > +65 CONTINUE
> > + CDJ(0)=V0/Z*CBJ(0)-CBJ(1)
> > + DO 70 K=1,N
> > +70 CDJ(K)=-(K+V0)/Z*CBJ(K)+CBJ(K-1)
> > + END
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index 3e5814c3a31..80e279d8f50 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -9081,6 +9081,16 @@ vect_schedule_slp_node (vec_info *vinfo,
> > /* Emit other stmts after the children vectorized defs which is
> > earliest possible. */
> > gimple *last_stmt = NULL;
> > + if (auto loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> > + if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> > + || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> > + {
> > + /* But avoid scheduling internal defs outside of the loop when
> > + we might have only implicitly tracked loop mask/len defs. */
> > + gimple_stmt_iterator si
> > + = gsi_after_labels (LOOP_VINFO_LOOP (loop_vinfo)->header);
> > + last_stmt = *si;
> > + }
> > bool seen_vector_def = false;
> > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> > if (SLP_TREE_DEF_TYPE (child) == vect_internal_def)
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)