On Thu, 25 Aug 2022, Richard Sandiford wrote: > Builds of glibc with SVE enabled have been failing since V1DI was added > to the aarch64 port. The problem is that BB SLP starts the (hopeless) > attempt to use variable-length modes to vectorise a single-element > vector, and that now gets further than it did before. > > Initially we tried getting a vector mode with 1 + 1X DI elements > (i.e. 1 DI per 128-bit vector chunk). We don't provide such a mode -- > it would be VNx1DI -- because it isn't a native SVE format. We then > try just 1 DI, which previously failed but now succeeds. > > There are numerous ways we could fix this. Perhaps the most obvious > would be to skip variable-length modes for BB SLP. However, I think > that'd just be kicking the can down the road, since eventually we want > to support BB SLP and VLA vectors using predication. > > However, if we do use VLA vectors for BB SLP, the vector modes > we use should actually be variable length. We don't want to use > variable-length vectors for some element types/group sizes and > fixed-length vectors for others, since it would be difficult > to handle the seams. > > The same principle applies during loop vectorisation. We can't > use a mixture of variable-length and fixed-length vectors for > the same loop because the relative unroll/vectorisation factors > would not be constant (compile-time) multiples of each other. > > This patch therefore makes get_related_vectype_for_scalar_type > check that the provided number of units is interoperable with > the provided prevailing mode. The function is generally quite > forgiving -- it does basic things like checking for scalarness > itself rather than expecting callers to do them -- so the new > check feels in keeping with that. > > This seems to subsume the fix for PR96974. I'm not sure it's > worth reverting that code to an assert though, so the patch just > drops the scan for the associated message. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
OK. Thanks, Richard. > Richard > > > gcc/ > * tree-vect-stmts.cc (get_related_vectype_for_scalar_type): Check > that the requested number of units is interoperable with the requested > prevailing mode. > > gcc/testsuite/ > * gcc.target/aarch64/sve/slp_15.c: New test. > * g++.target/aarch64/sve/pr96974.C: Remove scan test. > --- > gcc/testsuite/g++.target/aarch64/sve/pr96974.C | 4 +--- > gcc/testsuite/gcc.target/aarch64/sve/slp_15.c | 17 +++++++++++++++++ > gcc/tree-vect-stmts.cc | 10 ++++++++++ > 3 files changed, 28 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/slp_15.c > > diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C > b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C > index 54000f568ab..2f6ebd6ce3d 100644 > --- a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C > +++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-Ofast -march=armv8.2-a+sve -fdisable-tree-fre4 > -fdump-tree-slp-details" } */ > +/* { dg-options "-Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" } */ > > float a; > int > @@ -14,5 +14,3 @@ struct c { > } > int coeffs[10]; > } f; > - > -/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of > vector subparts between" "slp1" { target lp64 } } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp_15.c > b/gcc/testsuite/gcc.target/aarch64/sve/slp_15.c > new file mode 100644 > index 00000000000..23f6d567cc5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/slp_15.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O3" } */ > + > +struct foo > +{ > + void *handle; > + void *arg; > +}; > + > +void > +dlinfo_doit (struct foo *args) > +{ > + __UINTPTR_TYPE__ **l = args->handle; > + > + *(__UINTPTR_TYPE__ *) args->arg = 0; > + *(__UINTPTR_TYPE__ *) args->arg = **l; > +} > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index c9dab217f05..7748c42c70f 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -11486,6 +11486,16 @@ get_related_vectype_for_scalar_type (machine_mode > prevailing_mode, > > unsigned int nbytes = GET_MODE_SIZE (inner_mode); > > + /* Interoperability between modes requires one to be a constant multiple > + of the other, so that the number of vectors required for each operation > + is a compile-time constant. */ > + if (prevailing_mode != VOIDmode > + && !constant_multiple_p (nunits * nbytes, > + GET_MODE_SIZE (prevailing_mode)) > + && !constant_multiple_p (GET_MODE_SIZE (prevailing_mode), > + nunits * nbytes)) > + return NULL_TREE; > + > /* For vector types of elements whose mode precision doesn't > match their types precision we use a element type of mode > precision. The vectorization routines will have to make sure > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)