RE: [PATCH 1/4]middle-end: document pragma unroll n [PR116140]

Tamar Christina Tue, 13 May 2025 06:08:25 -0700

> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: Tuesday, May 13, 2025 1:36 PM
> To: Jakub Jelinek <[email protected]>
> Cc: Tamar Christina <[email protected]>; Jonathan Wakely
> <[email protected]>; [email protected]; nd <[email protected]>
> Subject: Re: [PATCH 1/4]middle-end: document pragma unroll n
> <requested|preferred> [PR116140]
> 
> On Tue, 13 May 2025, Jakub Jelinek wrote:
> 
> > On Tue, May 13, 2025 at 10:40:16AM +0000, Tamar Christina wrote:
> > > That's true.  The names are already optional, I can just drop the 
> > > "requested"
> > > all together.
> > >
> > > I'll give it a few to give others a chance to commit and I'll respin 
> > > dropping
> "requested"
> >
> > Is the intended behavior of the "weak" version that the compiler can
> > increase or decrease it based on command line options etc., or that it
> > must unroll at least N times but with command line options etc. it could
> > be something higher than that?
> >
> > Perhaps
> > #pragma GCC unroll 16
> > vs.
> > #pragma GCC unroll >= 16
> > or
> > #pragma GCC unroll 16+
> > ?
> > As for keywords, I was worried about macros, but seems GCC unroll pragma
> > doesn't have macro expansion in the name nor arguments part, so when one
> > wants to macro expand the count, one needs to use _Pragma and create the
> > right expression as string literal.
> 
> I think the intent for the given case is that GCC unrolls the loop,
> but not as much as with -funroll-loops (factor 8 IIRC).  But when
> vectorizing then the unroll request is satisfied already (given
> vectorization effectively unrolls).
> 
> IMO it should be possible to just use
> 
> #pramga GCC unroll
> 
> for this.  That does't do the limiting to 4 times unrolling, but leaves
> it to the (non-existent) cost modeling of the RTL unroller.
> 
> I think we should avoid to overengineer this for PR116140
> which is just a case where we do _not_ want further unrolling
> after vectorization.


This particular patch is a case where the user may want more scalar
unrolling (has no bearing on the vector patch).   The comment was
that before with the hand unrolled loop, -funroll-loops could be used
to override this.

Unrolling by larger amounts is not free. The pre-header becomes more
expensive. And such unrolling more only makes sense *if* you micro-architecture
can actually do better on it.  This would be bad on e.g. inorder cores.

That's presumably why std::find unrolled by default only 4x as it made more
sense. Especially if used within a loop.

Without this patch, we can't have a good default, but allow users to override 
it.

Again, has nothing to do with vector at all.

Thanks,
Tamar

> 
> Richard.

RE: [PATCH 1/4]middle-end: document pragma unroll n [PR116140]

Reply via email to