Dmitrij Pochepko <[email protected]> writes:
> Hi,
>
> please take a look at updated version (attached).
Looks good, but a couple more points, sorry:
> + /* Convert the perm constant if we can. Require even, odd as the pairs.
> */
> + for (unsigned int i = 0; i < nelt; i += 2)
> + {
> + poly_int64 elt0 = d->perm[i];
> + poly_int64 elt1 = d->perm[i + 1];
> + poly_int64 newelt;
> + if (!multiple_p (elt0, 2, &newelt) || maybe_ne (elt0 + 1, elt1))
> + return false;
> + newpermconst.quick_push (elt0.to_constant () / 2);
This should push “newelt” instead.
> diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_n_3.c
> b/gcc/testsuite/gcc.target/aarch64/vdup_n_3.c
> new file mode 100644
> index 0000000..5234f5e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vdup_n_3.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define vector __attribute__((vector_size(4*sizeof(float))))
> +
> +/* These are both dups. */
> +vector float f(vector float a, vector float b)
> +{
> + return __builtin_shuffle (a, a, (vector int){0, 1, 0, 1});
> +}
> +vector float f1(vector float a, vector float b)
> +{
> + return __builtin_shuffle (a, a, (vector int){2, 3, 2, 3});
> +}
> +
> +/* { dg-final { scan-assembler-times {[ \t]*dup[ \t]+v[0-9]+\.2d} 2 } } */
This test is endian-agnostic and should work for all targets, but…
> diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_1.c
> b/gcc/testsuite/gcc.target/aarch64/vzip_1.c
> new file mode 100644
> index 0000000..5998211
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vzip_1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define vector __attribute__((vector_size(2*sizeof(float))))
> +
> +vector float f(vector float a, vector float b)
> +{
> + return __builtin_shuffle (a, b, (vector int){0, 2});
> +}
> +
> +/* { dg-final { scan-assembler-times {[ \t]*zip1[ \t]+v[0-9]+\.2s} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_2.c
> b/gcc/testsuite/gcc.target/aarch64/vzip_2.c
> new file mode 100644
> index 0000000..2acafde
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vzip_2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define vector __attribute__((vector_size(4*sizeof(float))))
> +
> +vector float f(vector float a, vector float b)
> +{
> + /* This is the same as zip1 v.2d as {0, 1, 4, 5} can be converted to {0,
> 2}. */
> + return __builtin_shuffle (a, b, (vector int){0, 1, 4, 5});
> +}
> +
> +/* { dg-final { scan-assembler-times {[ \t]*zip1[ \t]+v[0-9]+\.2d} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_3.c
> b/gcc/testsuite/gcc.target/aarch64/vzip_3.c
> new file mode 100644
> index 0000000..8fa80f7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vzip_3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define vector __attribute__((vector_size(4*sizeof(float))))
> +
> +vector float f(vector float a, vector float b)
> +{
> + /* This is the same as zip1 v.2d as {4, 5, 0, 1} can be converted to {2,
> 0}. */
> + return __builtin_shuffle (a, b, (vector int){4, 5, 0, 1});
> +}
> +
> +/* { dg-final { scan-assembler-times {[ \t]*zip1[ \t]+v[0-9]+\.2d} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_4.c
> b/gcc/testsuite/gcc.target/aarch64/vzip_4.c
> new file mode 100644
> index 0000000..aefc55c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vzip_4.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define vector __attribute__((vector_size(4*sizeof(float))))
> +
> +vector float f(vector float a, vector float b)
> +{
> + /* This is the same as zip2 v.2d as {2, 3, 6, 7} can be converted to {1,
> 3}. */
> + return __builtin_shuffle (a, b, (vector int){2, 3, 6, 7});
> +}
> +
> +/* { dg-final { scan-assembler-times {[ \t]*zip2[ \t]+v[0-9]+\.2d} 1 } } */
…the shuffles in these tests are specific to little-endian targets.
For big-endian targets, architectural lane 0 is the last in memory
rather than first, so e.g. the big-endian permute vector for vzip_4.c
would be: { 0, 1, 5, 6 } instead.
I guess the options are:
(1) add __ARM_BIG_ENDIAN preprocessor tests to choose the right mask
(2) restrict the tests to aarch64_little_endian.
(3) have two scan-assembler lines with opposite zip1/zip2 choices, e.g:
/* { dg-final { scan-assembler-times {[ \t]*zip1[ \t]+v[0-9]+\.2d} 1 { target
aarch64_big_endian } } } */
/* { dg-final { scan-assembler-times {[ \t]*zip2[ \t]+v[0-9]+\.2d} 1 { target
aarch64_little_endian } } } */
(untested).
I guess (3) is probably best, but (2) is fine too if you're not set
up to test big-endian.
Sorry for not noticing last time. I only realised when testing the
patch locally.
Thanks,
Richard