> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: Tuesday, September 2, 2025 1:44 PM
> To: Tamar Christina <[email protected]>
> Cc: [email protected]; nd <[email protected]>
> Subject: Re: [PATCH 1/3]middle-end: clear the user unroll flag if the cost 
> model has
> overriden it
> 
> On Tue, 2 Sep 2025, Tamar Christina wrote:
> 
> > > What was it that made you propose this change?
> >
> > When we have a loop of say int and a pragma unroll 4
> >
> > If the vectorizer picks V4SI as the mode, the requested unroll ended up
> > exactly matching the VF. As such the requested unroll is 1 and we don't
> > clear the pragma.
> >
> > So it did honor the requested unroll factor. However since we didn't set
> > the unroll amount back and left it at 4 the rtl unroller won't use the
> > rtl cost model at all and just unroll the vector loop 4 times.
> 
> Ah, OK.
> 
> > This change isn't to bypass the rtl cost model, it's to allow it to be
> > used rather than overriding it after vectorization.
> 
> OK, fine.  But still, consider
> 
> #pragma unroll 4
>  for (int i = 0; i < 64; ++i)
>   {
>     a[4*i+0] = i;
>     a[4*i+1] = i;
>     a[4*i+2] = i;
>     a[4*i+3] = i;
>   }
> 
> so VF == 1, suggested_unroll_factor == 4.  If we don't up VF to 4
> should we still claim we did any unrolling?  If the target suggested
> a unroll factor of two, should we instead change ->unroll to 2?
> Should the user unroll factor override the vector target one?
> 

I think the target unroll factor should always win out, primarily because
of throughput based costing.  The loop above on a 4 VX system should
by the vectorizer already be using VF = 4, suggested_unroll_factor == 4.

We also don't ever force unrolling for predicated SVE because for
predicated SVE we have to balance predicate throughput limitations
of any given CPU.  Having the user unroll factor be able to override
the cost model one will almost certainly lead to worse performance
in this case.

> The documentation currently says
> 
> "If the loop was vectorized the unroll factor specified will be used to
> seed the
> vectorizer unroll factor.  Whether the loop is unrolled or not will be
> determined by target costing.  The resulting vectorized loop may still be
> unrolled more in later passes depending on the target costing."
> 
> The last sentence does I think support your patch, maybe it could
> be clarified by changing to "depending on the target costing as if
> no unroll pragma were specified."?  Or "After vectorization the
> unroll pragma will be dropped."
> 

I agree, will update.

> The docs do not say what happens when #pragma GCC unroll specifies
> a factor that's less than the VF, it only implicitly says it seeds
> the VF, but in reality it only seeds additional unrolling ontop of
> the VF choice.
> 
> Maybe it should say "If the loop is vectorized the vectorizer considers
> extra unrolling to honor the unroll factor.  After vectorizing a loop
> the pragma is dropped."
> 

Also agreed, will use this instead.

> Though I'd say it would be nice if #pragma GCC unroll 4 would get you
> V4SI and not V8SI.  On compare-costs targets the target costing hooks
> could see to boost preference of VF == unroll factor, but on x86 I
> can't see how this could be done but outside rejecting VF != unroll
> factor.

Did you mean the inverse here? that you get V8SI instead of V4SI with unroll == 
2?
I think my original patch here tried this, i.e. it increased assumed_vf instead 
of changing
the unroll factor,  but I think after review back and forth we ended up at 
seeding
suggested_unroll_factor instead.  Isn't one of the problems that if the 
assumed_vf is larger
than the largest vector size for that datatype that we may fail?

Thanks,
Tamar

> 
> Richard.
> 
> 
> > Tamar
> > ________________________________
> > From: Richard Biener <[email protected]>
> > Sent: Tuesday, September 2, 2025 12:10 PM
> > To: Tamar Christina <[email protected]>
> > Cc: [email protected] <[email protected]>; nd <[email protected]>
> > Subject: Re: [PATCH 1/3]middle-end: clear the user unroll flag if the cost 
> > model
> has overriden it
> >
> > On Tue, 2 Sep 2025, Tamar Christina wrote:
> >
> > > If the user has requested loop unrolling through pragma GCC unroll then 
> > > at the
> > > moment we only set LOOP_VINFO_USER_UNROLL if the vectorizer has not
> overrode the
> > > unroll factor (through backend costing) or if the VF made the requested 
> > > unroll
> > > factor be 1.
> > >
> > > But of these events are costing related, and so it stands to reason that 
> > > we
> > > should set LOOP_VINFO_USER_UNROLL to we return the RTL unroller to use
> the
> > > backend costing for any further unrolling.
> >
> > I can't parse this fully.
> >
> > Hmm, but shouldn't we at least special-case when !new_res (aka
> > the requested unrolling wasn't used) but VF == 1, aka vectorization
> > didn't do any unrolling?
> >
> > Or is the reasoning that the vector cost model is as good as the
> > RTL one (which doesn't exist...) and thus if that decided not to
> > unroll do not let the RTL cost model override it?  But that means
> > to set loop->unroll to 1, not 0 as we do for LOOP_VINFO_USER_UNROLL.
> > The RTL "cost mode" will simply unroll by 8 if possible.
> >
> > What was it that made you propose this change?
> >
> > Richard.
> >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > -m32, -m64 and no issues.
> > >
> > > Ok for master? Tests using this are in the next patch.
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >        * tree-vect-loop.cc (vect_analyze_loop_1): If the unroll pragma 
> > > was set
> > >        mark it as handled.
> > >
> > > ---
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > index
> 3b56f865ec7c3fdb8e9def37ec446a25086b4e28..f393b45232b3390959a2be75
> 6db3aa8b35002bf0 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -2941,11 +2941,13 @@ vect_analyze_loop_1 (class loop *loop,
> vec_info_shared *shared,
> > >              {
> > >                delete loop_vinfo;
> > >                loop_vinfo = unroll_vinfo;
> > > -             LOOP_VINFO_USER_UNROLL (loop_vinfo) = user_unroll > 1;
> > >              }
> > >            else
> > >              delete unroll_vinfo;
> > >          }
> > > +
> > > +     /* Record that we have honored a user unroll factor.  */
> > > +     LOOP_VINFO_USER_UNROLL (loop_vinfo) = user_unroll > 1;
> > >      }
> > >
> > >    /* Remember the autodetected vector mode.  */
> > >
> > >
> > >
> >
> > --
> > Richard Biener <[email protected]>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> >
> 
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to