On Thu, Jul 6, 2023 at 11:37 PM Maciej W. Rozycki <ma...@embecosm.com> wrote: > > The bb-slp-pr95839.c test assumes quad-single float vector support, but > some targets only support pairs of floats, causing this test to fail > with such targets. Limit this test to targets that support at least > 128-bit vectors then, and add a complementing test that can be run with > targets that have support for 64-bit vectors only. There is no need to > adjust bb-slp-pr95839-2.c as 128 bits are needed even for the smallest > vector of doubles, so support is implied by the presence of vectors of > doubles.
I wonder why you see the testcase FAIL, on x86-64 when doing typedef float __attribute__((vector_size(32))) v4f32; v4f32 f(v4f32 a, v4f32 b) { /* Check that we vectorize this CTOR without any loads. */ return (v4f32){a[0] + b[0], a[1] + b[1], a[2] + b[2], a[3] + b[3], a[4] + b[4], a[5] + b[5], a[6] + b[6], a[7] + b[7]}; } I see we vectorize the add and the "store". We fail to perform extraction from the incoming vectors (unless you enable AVX), that's a missed optimization. So with paired floats I would expect sth similar? Maybe x86 is saved by kind-of-presence (but disabled) of V8SFmode vectors. That said, we should handle this better so can you file an enhancement bugreport for this? Thanks, Richard. > gcc/testsuite/ > * gcc.dg/vect/bb-slp-pr95839.c: Limit to `vect128' targets. > * gcc.dg/vect/bb-slp-pr95839-v8.c: New test. > --- > gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c | 14 ++++++++++++++ > gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c | 1 + > 2 files changed, 15 insertions(+) > > gcc-test-bb-slp-pr95839-vect128.diff > Index: gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c > =================================================================== > --- /dev/null > +++ gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_float } */ > +/* { dg-require-effective-target vect64 } */ > +/* { dg-additional-options "-w -Wno-psabi" } */ > + > +typedef float __attribute__((vector_size(8))) v2f32; > + > +v2f32 f(v2f32 a, v2f32 b) > +{ > + /* Check that we vectorize this CTOR without any loads. */ > + return (v2f32){a[0] + b[0], a[1] + b[1]}; > +} > + > +/* { dg-final { scan-tree-dump "optimized: basic block" "slp2" } } */ > Index: gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c > =================================================================== > --- gcc.orig/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c > +++ gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c > @@ -1,5 +1,6 @@ > /* { dg-do compile } */ > /* { dg-require-effective-target vect_float } */ > +/* { dg-require-effective-target vect128 } */ > /* { dg-additional-options "-w -Wno-psabi" } */ > > typedef float __attribute__((vector_size(16))) v4f32;