Richard Biener <rguent...@suse.de> writes: > On Mon, 12 Jun 2017, Tamar Christina wrote: >> Hi All, >> >> this patch implements a optimization rewriting >> >> x * copysign (1.0, y) and >> x * copysign (-1.0, y) >> >> to: >> >> x ^ (y & (1 << sign_bit_position)) >> >> This is done by creating a special builtin during matching and generate the >> appropriate instructions during expand. This new builtin is called XORSIGN. >> >> The expansion of xorsign depends on if the backend has an appropriate optab >> available. If this is not the case then we use a modified version of >> the existing >> copysign which does not take the abs value of the first argument as a >> fall back. >> >> This patch is a revival of a previous patch >> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00069.html >> >> Bootstrapped on both aarch64-none-linux-gnu and x86_64 with no issues. >> Regression done on aarch64-none-linux-gnu and no regressions. >> >> Ok for trunk? > > Without looking at the patch a few comments. > > First, nowadays please add an internal function instead of builtins. > You can even take advantage of Richards work to directly tie those > to optabs (he might want to chime in to tell you how). You don't need > the fortran FE changes in that case.
Yeah, it should just be a case of adding: DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary) to internal-fn.def. The supposedly useful thing about this is that it automatically extends to vectors, so you shouldn't need the xorsign vector builtins or the aarch64_builtin_vectorized_function change. However, we don't yet support SLP vectorisation of internal functions. I have a patch for that that I've been looking for an excuse to post (at the moment I think it only helps SVE). If this goes in I can post it as a follow-on. In: > diff --git a/gcc/testsuite/gcc.dg/vec-xorsign_exec.c > b/gcc/testsuite/gcc.dg/vec-xorsign_exec.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..f8c8befd336c7f2743a1621d3b0f53d78bab9df7 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vec-xorsign_exec.c > @@ -0,0 +1,53 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ > +/* { dg-additional-options "-march=armv8-a" { target { aarch64*-*-* } } }*/ > + > +extern void abort (); > + > +#define N 16 > +float a[N] = {-0.1f, -3.2f, -6.3f, -9.4f, > + -12.5f, -15.6f, -18.7f, -21.8f, > + 24.9f, 27.1f, 30.2f, 33.3f, > + 36.4f, 39.5f, 42.6f, 45.7f}; > +float b[N] = {-1.2f, 3.4f, -5.6f, 7.8f, > + -9.0f, 1.0f, -2.0f, 3.0f, > + -4.0f, -5.0f, 6.0f, 7.0f, > + -8.0f, -9.0f, 10.0f, 11.0f}; > +float r[N]; > + > +float ad[N] = {-0.1fd, -3.2d, -6.3d, -9.4d, > + -12.5d, -15.6d, -18.7d, -21.8d, > + 24.9d, 27.1d, 30.2d, 33.3d, > + 36.4d, 39.5d, 42.6d, 45.7d}; > +float bd[N] = {-1.2d, 3.4d, -5.6d, 7.8d, > + -9.0d, 1.0d, -2.0d, 3.0d, > + -4.0d, -5.0d, 6.0d, 7.0d, > + -8.0d, -9.0d, 10.0d, 11.0d}; > +float rd[N]; Looks like these last three were meant to be doubles. > + > +int > +main (void) > +{ > + int i; > + > + for (i = 0; i < N; i++) > + r[i] = a[i] * _builtin_copysignf (1.0f, b[i]); > + > + /* check results: */ > + for (i = 0; i < N; i++) > + if (r[i] != a[i] * __builtin_copysignf (1.0f, b[i])) > + abort (); > + > + for (i = 0; i < N; i++) > + rd[i] = ad[i] * _builtin_copysignd (1.0d, bd[i]); > + > + /* check results: */ > + for (i = 0; i < N; i++) > + if (r[i] != ad[i] * __builtin_copysignd (1.0d, bd[i])) > + abort (); > + > + > + return 0; > +} > + > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ Why does only one loop get vectorised? Thanks, Richard