Re: [PATCH] tree-optimization/110221 - SLP and loop mask/len

Richard Biener Fri, 01 Mar 2024 01:48:51 -0800

On Fri, 1 Mar 2024, Andre Vieira (lists) wrote:

> Hi,
> 
> Bootstrapped and tested the gcc-13 backport of this on gcc-12 for
> aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu and no regressions.
> 
> OK to push to gcc-12 branch?


OK.

Thanks,
Richard.

> Kind regards,
> Andre Vieira
> 
> On 10/11/2023 13:16, Richard Biener wrote:
> > The following fixes the issue that when SLP stmts are internal defs
> > but appear invariant because they end up only using invariant defs
> > then they get scheduled outside of the loop.  This nice optimization
> > breaks down when loop masks or lens are applied since those are not
> > explicitly tracked as dependences.  The following makes sure to never
> > schedule internal defs outside of the vectorized loop when the
> > loop uses masks/lens.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> > 
> >  PR tree-optimization/110221
> >  * tree-vect-slp.cc (vect_schedule_slp_node): When loop
> >  masking / len is applied make sure to not schedule
> >  intenal defs outside of the loop.
> > 
> >     * gfortran.dg/pr110221.f: New testcase.
> > ---
> >   gcc/testsuite/gfortran.dg/pr110221.f | 17 +++++++++++++++++
> >   gcc/tree-vect-slp.cc                 | 10 ++++++++++
> >   2 files changed, 27 insertions(+)
> >   create mode 100644 gcc/testsuite/gfortran.dg/pr110221.f
> > 
> > diff --git a/gcc/testsuite/gfortran.dg/pr110221.f
> > b/gcc/testsuite/gfortran.dg/pr110221.f
> > new file mode 100644
> > index 00000000000..8b57384313a
> > --- /dev/null
> > +++ b/gcc/testsuite/gfortran.dg/pr110221.f
> > @@ -0,0 +1,17 @@
> > +C PR middle-end/68146
> > +C { dg-do compile }
> > +C { dg-options "-O2 -w" }
> > +C { dg-additional-options "-mavx512f --param vect-partial-vector-usage=2" {
> > target avx512f } }
> > +      SUBROUTINE CJYVB(V,Z,V0,CBJ,CDJ,CBY,CYY)
> > +      IMPLICIT DOUBLE PRECISION (A,B,G,O-Y)
> > +      IMPLICIT COMPLEX*16 (C,Z)
> > +      DIMENSION CBJ(0:*),CDJ(0:*),CBY(0:*)
> > +      N=INT(V)
> > +      CALL GAMMA2(VG,GA)
> > +      DO 65 K=1,N
> > +        CBY(K)=CYY
> > +65    CONTINUE
> > +      CDJ(0)=V0/Z*CBJ(0)-CBJ(1)
> > +      DO 70 K=1,N
> > +70      CDJ(K)=-(K+V0)/Z*CBJ(K)+CBJ(K-1)
> > +      END
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index 3e5814c3a31..80e279d8f50 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -9081,6 +9081,16 @@ vect_schedule_slp_node (vec_info *vinfo,
> >           /* Emit other stmts after the children vectorized defs which is
> >     earliest possible.  */
> >         gimple *last_stmt = NULL;
> > +      if (auto loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> > +   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> > +       || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> > +     {
> > +       /* But avoid scheduling internal defs outside of the loop when
> > +          we might have only implicitly tracked loop mask/len defs.  */
> > +       gimple_stmt_iterator si
> > +         = gsi_after_labels (LOOP_VINFO_LOOP (loop_vinfo)->header);
> > +       last_stmt = *si;
> > +     }
> >         bool seen_vector_def = false;
> >         FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> >    if (SLP_TREE_DEF_TYPE (child) == vect_internal_def)
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] tree-optimization/110221 - SLP and loop mask/len

Reply via email to