Re: [GSoC-19] Implementing narrowing functions like fadd

2019-07-06 Thread Tejas Joshi
Hello.
I am trying to add fadd function variants and as fadd takes two
arguments, the function should be called from fold_const_call_sss ().
The function is closely modeled on function calls and cases according
to real_nextafter (), also used gdb to look at backtrace. Although I
have made changes according to real_nextafter, the function real_fadd
is not called by the test program but real_nextafter does get called.
I cant find any other places to add calls for fadd. What is missing?
The patch is attached herewith.

int
main ()
{
  float x;
  x = __builtin_fadd (3.5,1.4);
}

Also, fadd function should not have faddf variant, but is introduced
only for the sake.

Thanks,
-Tejas

On Wed, 3 Jul 2019 at 18:29, Tejas Joshi  wrote:
>
> Hello.
> Functions like fadd, faddl take two arguments, do the addition and
> return the answer in narrower precision than the argument type. The
> thing that might be helpful is using the do_add function directly, if
> appropriate?
> The thing to consider about narrowed down return type is how it can be
> achieved. The functions that operate on real numbers like real_round
> and so on, do not consider the return type and do calculations on the
> entire real number representation. So just defining these functions
> and their return type in builtins.def and other appropriate places
> would do the trick?
> like:
> BT_FN_FLOAT_DOUBLE_DOUBLE as return and argument type for FADD
>
> Or it has to be narrowed down by zeroing down the trailing
> out-of-precision bits?
> Also, if the addition or any one of the argument exceeds the return
> size, the integer part of the addition would not fit in the narrowed
> type. Like, 2^32 would easily fit in double but will lose its least
> significant bit in float and become 2^31. How these types are supposed
> to be handled?
>
> Thanks,
> -Tejas
diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index e5c9e063c48..47cb81006e0 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -387,6 +387,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT_PTR,
 		 BT_VOID, BT_UINT, BT_PTR)
 DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT_FLOAT_FLOAT,
 		 BT_FLOAT, BT_FLOAT, BT_FLOAT)
+DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT_DOUBLE_DOUBLE,
+		 BT_FLOAT, BT_DOUBLE, BT_DOUBLE)
 DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_DOUBLE,
 		 BT_DOUBLE, BT_DOUBLE, BT_DOUBLE)
 DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE,
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 85a945877a4..371fb62b645 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2035,6 +2035,7 @@ mathfn_built_in_2 (tree type, combined_fn fn)
 CASE_MATHFN (EXP2)
 CASE_MATHFN (EXPM1)
 CASE_MATHFN (FABS)
+CASE_MATHFN (FADD)
 CASE_MATHFN (FDIM)
 CASE_MATHFN_FLOATN (FLOOR)
 CASE_MATHFN_FLOATN (FMA)
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 8bb7027aac7..1d065eae345 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -352,6 +352,9 @@ DEF_C99_C90RES_BUILTIN (BUILT_IN_FABSL, "fabsl", BT_FN_LONGDOUBLE_LONGDOUBLE, AT
 #define FABS_TYPE(F) BT_FN_##F##_##F
 DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_FABS, "fabs", FABS_TYPE, ATTR_CONST_NOTHROW_LEAF_LIST)
 #undef FABS_TYPE
+DEF_EXT_LIB_BUILTIN(BUILT_IN_FADD, "fadd", BT_FN_FLOAT_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_EXT_LIB_BUILTIN(BUILT_IN_FADDF, "faddf", BT_FN_FLOAT_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_EXT_LIB_BUILTIN(BUILT_IN_FADDL, "faddl", BT_FN_FLOAT_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_FABSD32, "fabsd32", BT_FN_DFLOAT32_DFLOAT32, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_FABSD64, "fabsd64", BT_FN_DFLOAT64_DFLOAT64, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_FABSD128, "fabsd128", BT_FN_DFLOAT128_DFLOAT128, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index d9b546e6803..ee939f85005 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -570,6 +570,16 @@ fold_const_nextafter (real_value *result, const real_value *arg0,
   return true;
 }
 
+static bool
+fold_const_fadd (real_value* result, const real_value *arg0,
+		 const real_value *arg1, const real_format *format)
+{
+  if (!real_fadd(result, format, arg0, arg1))
+return true;
+  else
+return false;
+}
+
 /* Try to evaluate:
 
   *RESULT = ldexp (*ARG0, ARG1)
@@ -1366,6 +1376,9 @@ fold_const_call_sss (real_value *result, combined_fn fn,
 CASE_CFN_NEXTTOWARD:
   return fold_const_nextafter (result, arg0, arg1, format);
 
+CASE_CFN_FADD:
+  return fold_const_fadd (result, arg0, arg1, format);
+
 default:
   return false;
 }
diff --git a/gcc/real.c b/gcc/real.c
index ab71430709f..6379cd0bcdc 100644
--- a/gcc/real.c
+++ b/gcc/real.c
@@ -5093,6 +5093,17 @@ real_roundeven (REAL_VALUE_TYPE *r, format_helper fmt,
 real_round (r, fmt, x);
 }
 
+bool
+real_fadd (REAL_VALUE_TYPE *r, format_helper fmt,
+	   const REAL_VALUE_TYPE *x, const REAL_VALUE_TYPE

gcc-9-20190706 is now available

2019-07-06 Thread gccadmin
Snapshot gcc-9-20190706 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/9-20190706/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-9-branch 
revision 273169

You'll find:

 gcc-9-20190706.tar.xzComplete GCC

  SHA256=ed1e0ec2ff59fbc12d7802b675de39391a46ef55ec41407ea08929d9e2b91f87
  SHA1=9fd1cac310277f046585fb486c755a1632ebbe89

Diffs from 9-20190629 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Nested loop vectorisation issue

2019-07-06 Thread Thomas Womack
Good morning.

I have some code that looks like

typedef unsigned long long uint64;
typedef unsigned int uint32;

typedef struct { uint64 x[8]; } __attribute__((aligned(64))) v_t;

inline v_t xor(v_t a, v_t b)
{
  v_t Q;
  for (int i=0; i<8; i++) Q.x[i] = a.x[i] ^ b.x[i];
  return Q;
}

void xor_matrix_precomp(v_t* __restrict__ a, v_t* __restrict__ c, v_t* 
__restrict__ d, int n)
{
  uint32 i,j;
  for (i=0; i>3, b=j&7;
  acc = xor(acc, c[j*256 + ((vi.x[w] >> (8*b))&0xff)]);
}
  d[i] = xor(d[i], acc);
}
}

built with 

/home/nfsworld/tooling/gcc-9.1-isl16/bin/gcc -O3 -fomit-frame-pointer 
-march=skylake-avx512 -mprefer-vector-width=512 -S badeg.c


and the inner xor_matrix loop is not vectorised at all: it carries ‘acc’ in 
eight x86-64 registers rather than one ZMM.  What am I missing?

-fopt-info-all-vec is not helping me, it points out that it can’t vectorise the 
i or j loop but says nothing about the loop in the inlined xor() call.

I’m sure I’m missing something obvious; many thanks for your help.

Tom