On Thu, 20 Jul 2023, Richard Sandiford wrote:

> Tamar Christina <tamar.christ...@arm.com> writes:
> > Hi All,
> >
> > The resulting predicate register of a whilelo is not
> > restricted to the lower half of the predicate register file.
> >
> > As such these tests started failing after recent changes
> > because the whilelo outside the loop is getting assigned p15.
> 
> It's the whilelo in the loop for me.  We go from:
> 
> .L3:
>         ld1b    z31.b, p7/z, [x4, x3]
>         movprfx z30, z31
>         mul     z30.b, p5/m, z30.b, z29.b
>         st1b    z30.b, p7, [x4, x3]
>         mov     p6.b, p7.b
>         add     x3, x3, x0
>         whilelo p7.b, w3, w1
>         b.any   .L3
> 
> to:
> 
> .L3:
>         ld1b    z31.b, p7/z, [x3, x2]
>         movprfx z29, z31
>         mul     z29.b, p6/m, z29.b, z30.b
>         st1b    z29.b, p7, [x3, x2]
>         add     x2, x2, x0
>         whilelo p15.b, w2, w1
>         b.any   .L4
>         [...]
>         .p2align 2,,3
> .L4:
>         mov     p7.b, p15.b
>         b       .L3
> 
> This adds an extra (admittedly unconditional) branch to every non-final
> vector iteration, which seems unfortunate.  I don't think we'd see
> p8-p15 otherwise, since the result of the whilelo is used as a
> governing predicate by the next iteration of the loop.
> 
> This happens because the scalar loop is given an 89% chance of iterating.
> Previously we gave the vector loop an 83.33% chance of iterating, whereas
> after 061f74c06735e1fa35b910ae we give it a 12% chance.  0.89^16 == 15.50%,
> so the new probabilities definitely preserve the original probabilities
> more closely.  But for purely heuristic probabilities like these, I'm
> not sure we should lean so heavily into the idea that the vector
> latch is unlikely.
> 
> Honza, Richi, any thoughts?  Just wanted to double-check that this
> was operating as expected before making the tests accept the (arguably)
> less efficient code.  It looks like the commit was more aimed at fixing
> the profile counts for the epilogues, rather than the main loop.

The above looks like a failed coalescing, can you track down where
that happens and why?

And yes, the profile counts were supposed to be fixed, but not only
for the epilog but for header copying also for the main loop.  Not
sure if anything goes wrong here though - for estimates of course
it's only estimates and IIRC we estimate a loop to iterate 4 times
when we don't know better.

Richard.

> Thanks,
> Richard
> 
> > This widens the regexp.
> >
> > Tested on aarch64-none-linux-gnu and passes again.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/testsuite/ChangeLog:
> >
> >     * gcc.target/aarch64/sve/live_1.c: Update assembly.
> >
> > --- inline copy of patch -- 
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
> > index 
> > 80ee176d1807bf628ad47551d69ff5d84deda79e..2db6c3c209a9514646e92628f3d2dd58d466539c
> >  100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
> > @@ -27,10 +27,10 @@
> >  
> >  TEST_ALL (EXTRACT_LAST)
> >  
> > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].b, } 2 } } */
> > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].h, } 4 } } */
> > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].s, } 4 } } */
> > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].d, } 4 } } */
> > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.b, } 2 } } */
> > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, } 4 } } */
> > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.s, } 4 } } */
> > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.d, } 4 } } */
> >  
> >  /* { dg-final { scan-assembler-times {\tlastb\tb[0-9]+, p[0-7], 
> > z[0-9]+\.b\n} 1 } } */
> >  /* { dg-final { scan-assembler-times {\tlastb\th[0-9]+, p[0-7], 
> > z[0-9]+\.h\n} 2 } } */
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Reply via email to