On Mon, 20 Jan 2025, Tamar Christina wrote: > Hi All, > > When registering masks for SIMD clone we end up using nmasks instead of > nvectors where nmasks seems to compute the number of input masks required for > the call given the current simdlen. > > This is however wrong as vect_record_loop_mask wants to know how many masks > you > want to create from the given vectype. i.e. which level of rgroups to create. > > This ends up mismatching with vect_get_loop_mask which uses nvectors and if > the > return type is narrower than the input types there will be a mismatch which > causes us to try to read from the given rgroup. It only happens to work if > the > function had an additional argument that's wider or if all elements and return > types are the same size. > > This fixes it by using nvectors during registration as well, which has already > taken into account SLP and VF. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > -m32, -m64 and no issues. > > Ok for master?
OK. This was a fragile bit IIRC but your testing hopefully covered all important cases (GCN might be missing, but is somewhat peculiar to test). Thanks, Richard. > Thanks, > Tamar > > gcc/ChangeLog: > > PR middle-end/118273 > * tree-vect-stmts.cc (vectorizable_simd_clone_call): Use nvectors when > doing mask registrations. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vect-simd-clone-4.c: New test. > > --- > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-4.c > b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-4.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..9b52af70393333ffa4af2b49c7cef9ad93ca1525 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-4.c > @@ -0,0 +1,15 @@ > +/* { dg-do compile } */ > +/* { dg-options "-std=c99" } */ > +/* { dg-additional-options "-O3 -march=armv8-a" } */ > + > +#pragma GCC target ("+sve") > + > +extern char __attribute__ ((simd, const)) fn3 (short); > +void test_fn3 (float *a, float *b, double *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + a[i] = fn3 (c[i]); > +} > + > +/* { dg-final { scan-assembler {\s+_ZGVsMxv_fn3\n} } } */ > + > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index > 833029fcb00108abc605042376e9811651d5cd64..21fb5cf5bd47ad9e37762909c6103adbf8752e2a > 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -4561,14 +4561,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, > stmt_vec_info stmt_info, > case SIMD_CLONE_ARG_TYPE_MASK: > if (loop_vinfo > && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) > - { > - unsigned nmasks > - = exact_div (ncopies * bestn->simdclone->simdlen, > - TYPE_VECTOR_SUBPARTS (vectype)).to_constant (); > - vect_record_loop_mask (loop_vinfo, > - &LOOP_VINFO_MASKS (loop_vinfo), > - nmasks, vectype, op); > - } > + vect_record_loop_mask (loop_vinfo, > + &LOOP_VINFO_MASKS (loop_vinfo), > + ncopies, vectype, op); > > break; > } > > > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)