On Tue, 2 Sep 2025, Tamar Christina wrote:
> > What was it that made you propose this change?
>
> When we have a loop of say int and a pragma unroll 4
>
> If the vectorizer picks V4SI as the mode, the requested unroll ended up
> exactly matching the VF. As such the requested unroll is 1 and we don't
> clear the pragma.
>
> So it did honor the requested unroll factor. However since we didn't set
> the unroll amount back and left it at 4 the rtl unroller won't use the
> rtl cost model at all and just unroll the vector loop 4 times.
Ah, OK.
> This change isn't to bypass the rtl cost model, it's to allow it to be
> used rather than overriding it after vectorization.
OK, fine. But still, consider
#pragma unroll 4
for (int i = 0; i < 64; ++i)
{
a[4*i+0] = i;
a[4*i+1] = i;
a[4*i+2] = i;
a[4*i+3] = i;
}
so VF == 1, suggested_unroll_factor == 4. If we don't up VF to 4
should we still claim we did any unrolling? If the target suggested
a unroll factor of two, should we instead change ->unroll to 2?
Should the user unroll factor override the vector target one?
The documentation currently says
"If the loop was vectorized the unroll factor specified will be used to
seed the
vectorizer unroll factor. Whether the loop is unrolled or not will be
determined by target costing. The resulting vectorized loop may still be
unrolled more in later passes depending on the target costing."
The last sentence does I think support your patch, maybe it could
be clarified by changing to "depending on the target costing as if
no unroll pragma were specified."? Or "After vectorization the
unroll pragma will be dropped."
The docs do not say what happens when #pragma GCC unroll specifies
a factor that's less than the VF, it only implicitly says it seeds
the VF, but in reality it only seeds additional unrolling ontop of
the VF choice.
Maybe it should say "If the loop is vectorized the vectorizer considers
extra unrolling to honor the unroll factor. After vectorizing a loop
the pragma is dropped."
Though I'd say it would be nice if #pragma GCC unroll 4 would get you
V4SI and not V8SI. On compare-costs targets the target costing hooks
could see to boost preference of VF == unroll factor, but on x86 I
can't see how this could be done but outside rejecting VF != unroll
factor.
Richard.
> Tamar
> ________________________________
> From: Richard Biener <[email protected]>
> Sent: Tuesday, September 2, 2025 12:10 PM
> To: Tamar Christina <[email protected]>
> Cc: [email protected] <[email protected]>; nd <[email protected]>
> Subject: Re: [PATCH 1/3]middle-end: clear the user unroll flag if the cost
> model has overriden it
>
> On Tue, 2 Sep 2025, Tamar Christina wrote:
>
> > If the user has requested loop unrolling through pragma GCC unroll then at
> > the
> > moment we only set LOOP_VINFO_USER_UNROLL if the vectorizer has not
> > overrode the
> > unroll factor (through backend costing) or if the VF made the requested
> > unroll
> > factor be 1.
> >
> > But of these events are costing related, and so it stands to reason that we
> > should set LOOP_VINFO_USER_UNROLL to we return the RTL unroller to use the
> > backend costing for any further unrolling.
>
> I can't parse this fully.
>
> Hmm, but shouldn't we at least special-case when !new_res (aka
> the requested unrolling wasn't used) but VF == 1, aka vectorization
> didn't do any unrolling?
>
> Or is the reasoning that the vector cost model is as good as the
> RTL one (which doesn't exist...) and thus if that decided not to
> unroll do not let the RTL cost model override it? But that means
> to set loop->unroll to 1, not 0 as we do for LOOP_VINFO_USER_UNROLL.
> The RTL "cost mode" will simply unroll by 8 if possible.
>
> What was it that made you propose this change?
>
> Richard.
>
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > Ok for master? Tests using this are in the next patch.
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-loop.cc (vect_analyze_loop_1): If the unroll pragma was
> > set
> > mark it as handled.
> >
> > ---
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index
> > 3b56f865ec7c3fdb8e9def37ec446a25086b4e28..f393b45232b3390959a2be756db3aa8b35002bf0
> > 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -2941,11 +2941,13 @@ vect_analyze_loop_1 (class loop *loop,
> > vec_info_shared *shared,
> > {
> > delete loop_vinfo;
> > loop_vinfo = unroll_vinfo;
> > - LOOP_VINFO_USER_UNROLL (loop_vinfo) = user_unroll > 1;
> > }
> > else
> > delete unroll_vinfo;
> > }
> > +
> > + /* Record that we have honored a user unroll factor. */
> > + LOOP_VINFO_USER_UNROLL (loop_vinfo) = user_unroll > 1;
> > }
> >
> > /* Remember the autodetected vector mode. */
> >
> >
> >
>
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)