On Tue, Jan 14, 2025 at 2:35 PM Richard Biener <rguent...@suse.de> wrote:
>
> On Tue, 14 Jan 2025, Christoph Müllner wrote:
>
> > On Tue, Jan 14, 2025 at 1:46 PM Richard Biener <rguent...@suse.de> wrote:
> > >
> > > On Tue, 14 Jan 2025, Christoph Müllner wrote:
> > >
> > > > As reported in PR117079, commit ab18785840d7b8 broke the test 
> > > > pr105493.c.
> > > > When looking at the generated code, we can see that the generated code
> > > > is vectorized differently, resulting in a reduction from 225 
> > > > instructions
> > > > down to 109. On the performance side, no changes were measured on a 
> > > > 5950X.
> > > >
> > > > This patch adjusts the test condition to fit how the function gets
> > > > vectorized after ab18785840d7b8 (and probably further related changes).
> > >
> > > Can you adjust the comment?  We didn't vectorize the first loop in GCC 14
> > > but now we do.  We should probably check that, dumping -vect-details
> > > and checking for both loops being vectorized.
> > >
> > > The slp1 scan should then be changed to scan there are no stores to tmp
> > > left, like with maybe
> > >
> > >  scan-tree-dump-not "MEM.*tmp.* = " "slp1"
> >
> > Thanks for this idea!
> > The exact proposal won't work because of the "tmp" (we had before
> > "MEM[(uint8_t *)pix1_90(D) + 4B];"
> > and similar). The significant difference is that we don't have "MEM("
> > anymore, but "MEM <vector".
> > So this works:
> >
> > /* All loops should be vectorized.  */
> > /* { dg-final { scan-tree-dump-not "MEM\\\[\\\(" "slp1" } } */
> > /* { dg-final { scan-tree-dump "MEM <vector\\\(8\\\) unsigned char>
> > \\\[\[\^\]\]\*\\\]" "slp1" } } */
>
> But this isn't a very meaningful test?  As the comment suggested
> we now eliminate all loads + stores to 'tmp' as we vectorize both
> the first loop and the final reduction.
>
> In the dump I'm looking at there isn't any MEM accessing 'tmp'
> anymore, so we should be able to assert this?

Understood now. I was misled by ivtmp_291 and friends and I thought
this might lead to an issue, but that's not the case (and should not happen).
Testing for the absence of any stores to tmp is the right thing to do.

Thanks!

>
> Richard.
>
>
> > >
> > > OK with those changes.
> > >
> > > Richard.
> > >
> > > > Signed-off-by: Christoph Müllner <christoph.muell...@vrull.eu>
> > > >
> > > >       PR target/117079
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >       * gcc.target/i386/pr105493.c: Fix expected vectorization
> > > > ---
> > > >  gcc/testsuite/gcc.target/i386/pr105493.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/gcc/testsuite/gcc.target/i386/pr105493.c 
> > > > b/gcc/testsuite/gcc.target/i386/pr105493.c
> > > > index c6fd16753cd9..fb0bf8aa0af7 100644
> > > > --- a/gcc/testsuite/gcc.target/i386/pr105493.c
> > > > +++ b/gcc/testsuite/gcc.target/i386/pr105493.c
> > > > @@ -48,4 +48,4 @@ foo ( uint8_t *pix1, int i_pix1, uint8_t *pix2, int 
> > > > i_pix2 )
> > > >
> > > >  /* The first loop should be vectorized, which will eliminate redundant 
> > > > stores
> > > >     and loads.  */
> > > > -/* { dg-final { scan-tree-dump-times "  MEM <vector\\\(4\\\) unsigned 
> > > > int> \\\[\[\^\]\]\*\\\] = " 4 "slp1" } } */
> > > > +/* { dg-final { scan-tree-dump "= MEM <vector\\\(8\\\) unsigned char> 
> > > > \\\[\[\^\]\]\*\\\]" "slp1" } } */
> > > >
> > >
> > > --
> > > Richard Biener <rguent...@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> >
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to