Re: [PATCH][RFC] __builtin_shuffle sometimes should produce zip1 rather than TBL (PR82199)

Richard Sandiford Sat, 11 Jul 2020 01:40:16 -0700

Dmitrij Pochepko <dmitrij.poche...@bell-sw.com> writes:
> Hi,
>
> please take a look at updated version (attached).


Looks good, but a couple more points, sorry:

> +  /* Convert the perm constant if we can.  Require even, odd as the pairs.  
> */
> +  for (unsigned int i = 0; i < nelt; i += 2)
> +    {
> +      poly_int64 elt0 = d->perm[i];
> +      poly_int64 elt1 = d->perm[i + 1];
> +      poly_int64 newelt;
> +      if (!multiple_p (elt0, 2, &newelt) || maybe_ne (elt0 + 1, elt1))
> +     return false;
> +      newpermconst.quick_push (elt0.to_constant () / 2);

This should push “newelt” instead.

> diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_n_3.c 
> b/gcc/testsuite/gcc.target/aarch64/vdup_n_3.c
> new file mode 100644
> index 0000000..5234f5e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vdup_n_3.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define vector __attribute__((vector_size(4*sizeof(float))))
> +
> +/* These are both dups. */
> +vector float f(vector float a, vector float b)
> +{
> +  return __builtin_shuffle (a, a, (vector int){0, 1, 0, 1});
> +}
> +vector float f1(vector float a, vector float b)
> +{
> +  return __builtin_shuffle (a, a, (vector int){2, 3, 2, 3});
> +}
> +
> +/* { dg-final { scan-assembler-times {[ \t]*dup[ \t]+v[0-9]+\.2d} 2 } } */

This test is endian-agnostic and should work for all targets, but…

> diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_1.c 
> b/gcc/testsuite/gcc.target/aarch64/vzip_1.c
> new file mode 100644
> index 0000000..5998211
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vzip_1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define vector __attribute__((vector_size(2*sizeof(float))))
> +
> +vector float f(vector float a, vector float b)
> +{
> +  return __builtin_shuffle (a, b, (vector int){0, 2});
> +}
> +
> +/* { dg-final { scan-assembler-times {[ \t]*zip1[ \t]+v[0-9]+\.2s} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_2.c 
> b/gcc/testsuite/gcc.target/aarch64/vzip_2.c
> new file mode 100644
> index 0000000..2acafde
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vzip_2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define vector __attribute__((vector_size(4*sizeof(float))))
> +
> +vector float f(vector float a, vector float b)
> +{
> +  /* This is the same as zip1 v.2d as {0, 1, 4, 5} can be converted to {0, 
> 2}. */
> +  return __builtin_shuffle (a, b, (vector int){0, 1, 4, 5});
> +}
> +
> +/* { dg-final { scan-assembler-times {[ \t]*zip1[ \t]+v[0-9]+\.2d} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_3.c 
> b/gcc/testsuite/gcc.target/aarch64/vzip_3.c
> new file mode 100644
> index 0000000..8fa80f7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vzip_3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define vector __attribute__((vector_size(4*sizeof(float))))
> +
> +vector float f(vector float a, vector float b)
> +{
> +  /* This is the same as zip1 v.2d as {4, 5, 0, 1} can be converted to {2, 
> 0}. */
> +  return __builtin_shuffle (a, b, (vector int){4, 5, 0, 1});
> +}
> +
> +/* { dg-final { scan-assembler-times {[ \t]*zip1[ \t]+v[0-9]+\.2d} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_4.c 
> b/gcc/testsuite/gcc.target/aarch64/vzip_4.c
> new file mode 100644
> index 0000000..aefc55c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vzip_4.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define vector __attribute__((vector_size(4*sizeof(float))))
> +
> +vector float f(vector float a, vector float b)
> +{
> +  /* This is the same as zip2 v.2d as {2, 3, 6, 7} can be converted to {1, 
> 3}. */
> +  return __builtin_shuffle (a, b, (vector int){2, 3, 6, 7});
> +}
> +
> +/* { dg-final { scan-assembler-times {[ \t]*zip2[ \t]+v[0-9]+\.2d} 1 } } */

…the shuffles in these tests are specific to little-endian targets.
For big-endian targets, architectural lane 0 is the last in memory
rather than first, so e.g. the big-endian permute vector for vzip_4.c
would be: { 0, 1, 5, 6 } instead.

I guess the options are:

(1) add __ARM_BIG_ENDIAN preprocessor tests to choose the right mask
(2) restrict the tests to aarch64_little_endian.
(3) have two scan-assembler lines with opposite zip1/zip2 choices, e.g:

/* { dg-final { scan-assembler-times {[ \t]*zip1[ \t]+v[0-9]+\.2d} 1 { target 
aarch64_big_endian } } } */
/* { dg-final { scan-assembler-times {[ \t]*zip2[ \t]+v[0-9]+\.2d} 1 { target 
aarch64_little_endian } } } */

(untested).

I guess (3) is probably best, but (2) is fine too if you're not set
up to test big-endian.

Sorry for not noticing last time.  I only realised when testing the
patch locally.

Thanks,
Richard

Re: [PATCH][RFC] __builtin_shuffle sometimes should produce zip1 rather than TBL (PR82199)

Reply via email to