On Mon, 20 Jan 2025, Tamar Christina wrote:

> Hi All,
> 
> When registering masks for SIMD clone we end up using nmasks instead of
> nvectors where nmasks seems to compute the number of input masks required for
> the call given the current simdlen.
> 
> This is however wrong as vect_record_loop_mask wants to know how many masks 
> you
> want to create from the given vectype. i.e. which level of rgroups to create.
> 
> This ends up mismatching with vect_get_loop_mask which uses nvectors and if 
> the
> return type is narrower than the input types there will be a mismatch which
> causes us to try to read from the given rgroup.  It only happens to work if 
> the
> function had an additional argument that's wider or if all elements and return
> types are the same size.
> 
> This fixes it by using nvectors during registration as well, which has already
> taken into account SLP and VF.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?

OK.  This was a fragile bit IIRC but your testing hopefully covered
all important cases (GCN might be missing, but is somewhat peculiar to 
test).

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>       PR middle-end/118273
>       * tree-vect-stmts.cc (vectorizable_simd_clone_call): Use nvectors when
>       doing mask registrations.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.target/aarch64/vect-simd-clone-4.c: New test.
> 
> ---
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-4.c 
> b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-4.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..9b52af70393333ffa4af2b49c7cef9ad93ca1525
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-4.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile }  */
> +/* { dg-options "-std=c99" } */
> +/* { dg-additional-options "-O3 -march=armv8-a" } */
> +
> +#pragma GCC target ("+sve")
> +
> +extern char __attribute__ ((simd, const)) fn3 (short);
> +void test_fn3 (float *a, float *b, double *c, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    a[i] = fn3 (c[i]);
> +}
> +
> +/* { dg-final { scan-assembler {\s+_ZGVsMxv_fn3\n} } } */
> +
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 
> 833029fcb00108abc605042376e9811651d5cd64..21fb5cf5bd47ad9e37762909c6103adbf8752e2a
>  100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -4561,14 +4561,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>           case SIMD_CLONE_ARG_TYPE_MASK:
>             if (loop_vinfo
>                 && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> -             {
> -               unsigned nmasks
> -                 = exact_div (ncopies * bestn->simdclone->simdlen,
> -                              TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
> -               vect_record_loop_mask (loop_vinfo,
> -                                      &LOOP_VINFO_MASKS (loop_vinfo),
> -                                      nmasks, vectype, op);
> -             }
> +             vect_record_loop_mask (loop_vinfo,
> +                                    &LOOP_VINFO_MASKS (loop_vinfo),
> +                                    ncopies, vectype, op);
>  
>             break;
>           }
> 
> 
> 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to