Dmitrij Pochepko <dmitrij.poche...@bell-sw.com> writes: > Hi, > > please take a look at updated version (attached).
Looks good, but a couple more points, sorry: > + /* Convert the perm constant if we can. Require even, odd as the pairs. > */ > + for (unsigned int i = 0; i < nelt; i += 2) > + { > + poly_int64 elt0 = d->perm[i]; > + poly_int64 elt1 = d->perm[i + 1]; > + poly_int64 newelt; > + if (!multiple_p (elt0, 2, &newelt) || maybe_ne (elt0 + 1, elt1)) > + return false; > + newpermconst.quick_push (elt0.to_constant () / 2); This should push “newelt” instead. > diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_n_3.c > b/gcc/testsuite/gcc.target/aarch64/vdup_n_3.c > new file mode 100644 > index 0000000..5234f5e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vdup_n_3.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#define vector __attribute__((vector_size(4*sizeof(float)))) > + > +/* These are both dups. */ > +vector float f(vector float a, vector float b) > +{ > + return __builtin_shuffle (a, a, (vector int){0, 1, 0, 1}); > +} > +vector float f1(vector float a, vector float b) > +{ > + return __builtin_shuffle (a, a, (vector int){2, 3, 2, 3}); > +} > + > +/* { dg-final { scan-assembler-times {[ \t]*dup[ \t]+v[0-9]+\.2d} 2 } } */ This test is endian-agnostic and should work for all targets, but… > diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_1.c > b/gcc/testsuite/gcc.target/aarch64/vzip_1.c > new file mode 100644 > index 0000000..5998211 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vzip_1.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#define vector __attribute__((vector_size(2*sizeof(float)))) > + > +vector float f(vector float a, vector float b) > +{ > + return __builtin_shuffle (a, b, (vector int){0, 2}); > +} > + > +/* { dg-final { scan-assembler-times {[ \t]*zip1[ \t]+v[0-9]+\.2s} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_2.c > b/gcc/testsuite/gcc.target/aarch64/vzip_2.c > new file mode 100644 > index 0000000..2acafde > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vzip_2.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#define vector __attribute__((vector_size(4*sizeof(float)))) > + > +vector float f(vector float a, vector float b) > +{ > + /* This is the same as zip1 v.2d as {0, 1, 4, 5} can be converted to {0, > 2}. */ > + return __builtin_shuffle (a, b, (vector int){0, 1, 4, 5}); > +} > + > +/* { dg-final { scan-assembler-times {[ \t]*zip1[ \t]+v[0-9]+\.2d} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_3.c > b/gcc/testsuite/gcc.target/aarch64/vzip_3.c > new file mode 100644 > index 0000000..8fa80f7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vzip_3.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#define vector __attribute__((vector_size(4*sizeof(float)))) > + > +vector float f(vector float a, vector float b) > +{ > + /* This is the same as zip1 v.2d as {4, 5, 0, 1} can be converted to {2, > 0}. */ > + return __builtin_shuffle (a, b, (vector int){4, 5, 0, 1}); > +} > + > +/* { dg-final { scan-assembler-times {[ \t]*zip1[ \t]+v[0-9]+\.2d} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_4.c > b/gcc/testsuite/gcc.target/aarch64/vzip_4.c > new file mode 100644 > index 0000000..aefc55c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vzip_4.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#define vector __attribute__((vector_size(4*sizeof(float)))) > + > +vector float f(vector float a, vector float b) > +{ > + /* This is the same as zip2 v.2d as {2, 3, 6, 7} can be converted to {1, > 3}. */ > + return __builtin_shuffle (a, b, (vector int){2, 3, 6, 7}); > +} > + > +/* { dg-final { scan-assembler-times {[ \t]*zip2[ \t]+v[0-9]+\.2d} 1 } } */ …the shuffles in these tests are specific to little-endian targets. For big-endian targets, architectural lane 0 is the last in memory rather than first, so e.g. the big-endian permute vector for vzip_4.c would be: { 0, 1, 5, 6 } instead. I guess the options are: (1) add __ARM_BIG_ENDIAN preprocessor tests to choose the right mask (2) restrict the tests to aarch64_little_endian. (3) have two scan-assembler lines with opposite zip1/zip2 choices, e.g: /* { dg-final { scan-assembler-times {[ \t]*zip1[ \t]+v[0-9]+\.2d} 1 { target aarch64_big_endian } } } */ /* { dg-final { scan-assembler-times {[ \t]*zip2[ \t]+v[0-9]+\.2d} 1 { target aarch64_little_endian } } } */ (untested). I guess (3) is probably best, but (2) is fine too if you're not set up to test big-endian. Sorry for not noticing last time. I only realised when testing the patch locally. Thanks, Richard