Hi Richard,

On 02/05/18 11:15, Richard Biener wrote:
> On Mon, Apr 30, 2018 at 7:41 PM, Kyrill  Tkachov
> <kyrylo.tkac...@foss.arm.com> wrote:
>> Hi all,
>>
>> We can improve the performance of complex floating-point multiplications by
>> inlining the expansion a bit more aggressively.
>> We can inline complex x = a * b as:
>> x = (ar*br - ai*bi) + i(ar*bi + br*ai);
>> if (isunordered (__real__ x, __imag__ x))
>>   x = __muldc3 (a, b); //Or __mulsc3 for single-precision
>>
>> That way the common case where no NaNs are produced we can avoid the libgcc
>> call and fall back to the
>> NaN handling stuff in libgcc if either components of the expansion are NaN.
>>
>> The implementation is done in expand_complex_multiplication in
>> tree-complex.c and the above expansion
>> will be done when optimising for -O1 and greater and when not optimising for
>> size.
>> At -O0 and -Os the single call to libgcc will be emitted.
>>
>> For the code:
>> __complex double
>> foo (__complex double a, __complex double b)
>> {
>>   return a * b;
>> }
>>
>> We will now emit at -O2 for aarch64:
>> foo:
>>         fmul    d16, d1, d3
>>         fmul    d6, d1, d2
>>         fnmsub  d5, d0, d2, d16
>>         fmadd   d4, d0, d3, d6
>>         fcmp    d5, d4
>>         bvs     .L8
>>         fmov    d1, d4
>>         fmov    d0, d5
>>         ret
>> .L8:
>>         stp     x29, x30, [sp, -16]!
>>         mov     x29, sp
>>         bl      __muldc3
>>         ldp     x29, x30, [sp], 16
>>         ret
>>
>> Instead of just a branch to __muldc3.
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
>> x86_64-unknown-linux-gnu.
>>
>> Ok for trunk? (GCC 9)
>
> +         /* If optimizing for size or not at all just do a libcall.  */
> +         if (optimize == 0 || optimize_function_for_size_p (cfun))
> +           {
> +             expand_complex_libcall (gsi, ar, ai, br, bi, MULT_EXPR);
> +             return;
> +           }
>
> use optimize_bb_for_size_p instead please (get BB from the mult stmt).
>

Done.

>  /* Expand a complex multiplication or division to a libcall to the c99
> +   compliant routines.  Unlike expand_complex_libcall create and insert
> +   the call, assign it to an output variable and return that rather than
> +   modifying existing statements in place.  */
> +
> +static tree
> +insert_complex_mult_libcall (gimple_stmt_iterator *gsi, tree type, tree ar,
> +                             tree ai, tree br, tree bi)
> +{
>
> can you please try merging both functions instead?
>

I tried that initially, it looked a bit awkward because we sometimes want to 
modify
an existing assignment in-place, other times we want to insert it as a new 
statement.
This latest version keeps only one function and adds an inplace_p parameter.

> Also it shows a possible issue if with -fnon-call-exceptions the original
> multiplication has EH edges.  I think you want to side-step that
> by doing the libcall-only way in that case as well (stmt_can_throw_internal).

Ok, done.

>
> +         tree isunordered_decl = builtin_decl_explicit 
(BUILT_IN_ISUNORDERED);
> +         tree isunordered_res = create_tmp_var (integer_type_node);
> +         gimple *tmpr_unord_check
> +           = gimple_build_call (isunordered_decl, 2, tmpr, tmpi);
> +         gimple_call_set_lhs (tmpr_unord_check, isunordered_res);
> +
> +         gsi_insert_before (gsi, tmpr_unord_check, GSI_SAME_STMT);
> +         gimple *check
> +           = gimple_build_cond (NE_EXPR, isunordered_res, integer_zero_node,
> +                                NULL_TREE, NULL_TREE);
>
> why use BUILT_IN_ISUNORDERED but not a GIMPLE_COND with
> UNORDERED_EXPR?  Note again that might trap/throw with -fsignalling-nans
> so better avoid this transform for flag_signalling_nans as well...


No reason, I was just not familiar enough with the tree-level expressions.
Using UNORDERED_EXPR does look much better.

>
> +         /* We have a conditional block with some assignments in cond_bb.
> +            Wire up the PHIs to wrap up.  */
> +         if (gimple_in_ssa_p (cfun))
> +           {
>
> we are always in SSA form(?)  (probably tree-complex.c can use some TLC here).

I think we're always in SSA form. I suspected this was an artifact from the 
past.
I've removed these conditionals in this latest version and testing doesn't show 
any problems.


>
> +       /* If we are not worrying about NaNs expand to
> +         (ar*br - ai*bi) + i(ar*bi + br*ai) directly.  */
> +       expand_complex_multiplication_limited_range (gsi, inner_type, ar, ai,
> +                                                     br, bi, &rr, &ri);
>
> I think the function is badly worded - this isn't about limited
> ranges, no?  Which
> also means that we can dispatch to this simple variant not only for
> flag_complex_method != 2 but for !HONOR_NANS && !HONOR_INFINITIES?
> Maybe that should be done as followup.
>

Hmm, I've struggled a bit to come up with a good names.
I've renamed it to expand_complex_multiplication_components.
Is that more descriptive?

I'm happy to enable this expansion path for !HONOR_NANS && !HONOR_INFINITIES as 
a separate patch.

Bootstrapped and tested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, 
x86_64-unknown-linux-gnu.

Thanks,
Kyrill

2018-05-02  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    PR tree-optimization/70291
    * tree-complex.c (expand_complex_libcall): Add type, inplace_p
    arguments.  Change return type to tree.  Emit libcall as a new
    statement rather than replacing existing one when inplace_p is true.
    (expand_complex_multiplication_components): New function.
    (expand_complex_multiplication): Expand floating-point complex
    multiplication using the above.
    (expand_complex_division): Rename inner_type parameter to type.
    Update expand_complex_libcall call-site.
    (expand_complex_operations_1): Update expand_complex_multiplication
    and expand_complex_division call-sites.

2018-05-02  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    PR tree-optimization/70291
    * gcc.dg/complex-6.c: New test.
    * gcc.dg/complex-7.c: Likewise.
diff --git a/gcc/testsuite/gcc.dg/complex-6.c b/gcc/testsuite/gcc.dg/complex-6.c
new file mode 100644
index 0000000000000000000000000000000000000000..e70322bf6f378d1d6947de4c10f671bc0a7ded49
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/complex-6.c
@@ -0,0 +1,13 @@
+/* PR tree-optimization/70291.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cplxlower" } */
+
+__complex float
+foo (__complex float a, __complex float b)
+{
+  return a * b;
+}
+
+/* { dg-final { scan-tree-dump-times "unord" 1 "cplxlower1" } } */
+/* { dg-final { scan-tree-dump-times "__mulsc3" 1 "cplxlower1" } } */
diff --git a/gcc/testsuite/gcc.dg/complex-7.c b/gcc/testsuite/gcc.dg/complex-7.c
new file mode 100644
index 0000000000000000000000000000000000000000..78f1a290e34af0c092a00639d979bc32b332b1da
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/complex-7.c
@@ -0,0 +1,13 @@
+/* PR tree-optimization/70291.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cplxlower" } */
+
+__complex double
+foo (__complex double a, __complex double b)
+{
+  return a * b;
+}
+
+/* { dg-final { scan-tree-dump-times "unord" 1 "cplxlower1" } } */
+/* { dg-final { scan-tree-dump-times "__muldc3" 1 "cplxlower1" } } */
diff --git a/gcc/tree-complex.c b/gcc/tree-complex.c
index 622b8696399b9e9d8bddcc6340d2f8d8ca852637..d194493ff859783678036ca3527ea7942eecc869 100644
--- a/gcc/tree-complex.c
+++ b/gcc/tree-complex.c
@@ -978,22 +978,22 @@ expand_complex_addition (gimple_stmt_iterator *gsi, tree inner_type,
 }
 
 /* Expand a complex multiplication or division to a libcall to the c99
-   compliant routines.  */
+   compliant routines.  TYPE is the complex type of the operation.
+   If INPLACE_P replace the statement at GSI with
+   the libcall and return NULL_TREE.  Else insert the call, assign its
+   result to an output variable and return that variable.  If INPLACE_P
+   is true then the statement being replaced should be an assignment
+   statement.  */
 
-static void
-expand_complex_libcall (gimple_stmt_iterator *gsi, tree ar, tree ai,
-			tree br, tree bi, enum tree_code code)
+static tree
+expand_complex_libcall (gimple_stmt_iterator *gsi, tree type, tree ar, tree ai,
+			tree br, tree bi, enum tree_code code, bool inplace_p)
 {
   machine_mode mode;
   enum built_in_function bcode;
-  tree fn, type, lhs;
-  gimple *old_stmt;
+  tree fn, lhs;
   gcall *stmt;
 
-  old_stmt = gsi_stmt (*gsi);
-  lhs = gimple_assign_lhs (old_stmt);
-  type = TREE_TYPE (lhs);
-
   mode = TYPE_MODE (type);
   gcc_assert (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT);
 
@@ -1008,21 +1008,65 @@ expand_complex_libcall (gimple_stmt_iterator *gsi, tree ar, tree ai,
   fn = builtin_decl_explicit (bcode);
 
   stmt = gimple_build_call (fn, 4, ar, ai, br, bi);
-  gimple_call_set_lhs (stmt, lhs);
-  update_stmt (stmt);
-  gsi_replace (gsi, stmt, false);
 
-  if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
-    gimple_purge_dead_eh_edges (gsi_bb (*gsi));
 
-  if (gimple_in_ssa_p (cfun))
+  if (inplace_p)
     {
+      gimple *old_stmt = gsi_stmt (*gsi);
+      lhs = gimple_assign_lhs (old_stmt);
+      gimple_call_set_lhs (stmt, lhs);
+      update_stmt (stmt);
+      gsi_replace (gsi, stmt, false);
+
+      if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
+	gimple_purge_dead_eh_edges (gsi_bb (*gsi));
+
       type = TREE_TYPE (type);
       update_complex_components (gsi, stmt,
-				 build1 (REALPART_EXPR, type, lhs),
-				 build1 (IMAGPART_EXPR, type, lhs));
+				  build1 (REALPART_EXPR, type, lhs),
+				  build1 (IMAGPART_EXPR, type, lhs));
       SSA_NAME_DEF_STMT (lhs) = stmt;
+      return NULL_TREE;
     }
+
+  lhs = create_tmp_var (type);
+  gimple_call_set_lhs (stmt, lhs);
+
+  lhs = make_ssa_name (lhs, stmt);
+  gimple_call_set_lhs (stmt, lhs);
+
+  update_stmt (stmt);
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+  return lhs;
+}
+
+/* Perform a complex multiplication on two complex constants A, B represented
+   by AR, AI, BR, BI of type TYPE.
+   The operation we want is: a * b = (ar*br - ai*bi) + i(ar*bi + br*ai).
+   Insert the GIMPLE statements into GSI.  Store the real and imaginary
+   components of the result into RR and RI.  */
+
+static void
+expand_complex_multiplication_components (gimple_stmt_iterator *gsi,
+					     tree type, tree ar, tree ai,
+					     tree br, tree bi,
+					     tree *rr, tree *ri)
+{
+  tree t1, t2, t3, t4;
+
+  t1 = gimplify_build2 (gsi, MULT_EXPR, type, ar, br);
+  t2 = gimplify_build2 (gsi, MULT_EXPR, type, ai, bi);
+  t3 = gimplify_build2 (gsi, MULT_EXPR, type, ar, bi);
+
+  /* Avoid expanding redundant multiplication for the common
+     case of squaring a complex number.  */
+  if (ar == br && ai == bi)
+    t4 = t3;
+  else
+    t4 = gimplify_build2 (gsi, MULT_EXPR, type, ai, br);
+
+  *rr = gimplify_build2 (gsi, MINUS_EXPR, type, t1, t2);
+  *ri = gimplify_build2 (gsi, PLUS_EXPR, type, t3, t4);
 }
 
 /* Expand complex multiplication to scalars:
@@ -1030,11 +1074,12 @@ expand_complex_libcall (gimple_stmt_iterator *gsi, tree ar, tree ai,
 */
 
 static void
-expand_complex_multiplication (gimple_stmt_iterator *gsi, tree inner_type,
+expand_complex_multiplication (gimple_stmt_iterator *gsi, tree type,
 			       tree ar, tree ai, tree br, tree bi,
 			       complex_lattice_t al, complex_lattice_t bl)
 {
   tree rr, ri;
+  tree inner_type = TREE_TYPE (type);
 
   if (al < bl)
     {
@@ -1080,27 +1125,79 @@ expand_complex_multiplication (gimple_stmt_iterator *gsi, tree inner_type,
     case PAIR (VARYING, VARYING):
       if (flag_complex_method == 2 && SCALAR_FLOAT_TYPE_P (inner_type))
 	{
-	  expand_complex_libcall (gsi, ar, ai, br, bi, MULT_EXPR);
-	  return;
-	}
-      else
-	{
-	  tree t1, t2, t3, t4;
-
-	  t1 = gimplify_build2 (gsi, MULT_EXPR, inner_type, ar, br);
-	  t2 = gimplify_build2 (gsi, MULT_EXPR, inner_type, ai, bi);
-	  t3 = gimplify_build2 (gsi, MULT_EXPR, inner_type, ar, bi);
-
-	  /* Avoid expanding redundant multiplication for the common
-	     case of squaring a complex number.  */
-	  if (ar == br && ai == bi)
-	    t4 = t3;
-	  else
-	    t4 = gimplify_build2 (gsi, MULT_EXPR, inner_type, ai, br);
+	  /* If optimizing for size or not at all just do a libcall.
+	     Same if there are exception-handling edges or signaling NaNs.  */
+	  if (optimize == 0 || optimize_bb_for_size_p (gsi_bb (*gsi))
+	     || stmt_can_throw_internal (gsi_stmt (*gsi))
+	     || flag_signaling_nans)
+	    {
+	      expand_complex_libcall (gsi, type, ar, ai, br, bi,
+				      MULT_EXPR, true);
+	      return;
+	    }
 
-	  rr = gimplify_build2 (gsi, MINUS_EXPR, inner_type, t1, t2);
-	  ri = gimplify_build2 (gsi, PLUS_EXPR, inner_type, t3, t4);
+	  /* Else, expand x = a * b into
+	     x = (ar*br - ai*bi) + i(ar*bi + br*ai);
+	     if (isunordered (__real__ x, __imag__ x))
+		x = __muldc3 (a, b);  */
+
+	  tree tmpr, tmpi;
+	  expand_complex_multiplication_components (gsi, inner_type, ar, ai,
+						     br, bi, &tmpr, &tmpi);
+
+	  gimple *check
+	    = gimple_build_cond (UNORDERED_EXPR, tmpr, tmpi,
+				 NULL_TREE, NULL_TREE);
+
+	  basic_block orig_bb = gsi_bb (*gsi);
+	  /* We want to keep track of the original complex multiplication
+	     statement as we're going to modify it later in
+	     update_complex_assignment.  Make sure that insert_cond_bb leaves
+	     that statement in the join block.  */
+	  gsi_prev (gsi);
+	  basic_block cond_bb
+	    = insert_cond_bb (gsi_bb (*gsi), gsi_stmt (*gsi), check,
+			      profile_probability::very_unlikely ());
+
+
+	  gimple_stmt_iterator cond_bb_gsi = gsi_last_bb (cond_bb);
+	  gsi_insert_after (&cond_bb_gsi, gimple_build_nop (), GSI_NEW_STMT);
+
+	  tree libcall_res
+	    = expand_complex_libcall (&cond_bb_gsi, type, ar, ai, br,
+				       bi, MULT_EXPR, false);
+	  tree cond_real = gimplify_build1 (&cond_bb_gsi, REALPART_EXPR,
+					    inner_type, libcall_res);
+	  tree cond_imag = gimplify_build1 (&cond_bb_gsi, IMAGPART_EXPR,
+					    inner_type, libcall_res);
+
+	  basic_block join_bb = single_succ_edge (cond_bb)->dest;
+	  *gsi = gsi_start_nondebug_after_labels_bb (join_bb);
+
+	  /* We have a conditional block with some assignments in cond_bb.
+	     Wire up the PHIs to wrap up.  */
+	  rr = create_tmp_var (inner_type);
+	  rr = make_ssa_name (rr);
+	  ri = create_tmp_var (inner_type);
+	  ri = make_ssa_name (ri);
+	  edge cond_to_join = single_succ_edge (cond_bb);
+	  edge orig_to_join = find_edge (orig_bb, join_bb);
+
+	  gphi *real_phi = create_phi_node (rr, gsi_bb (*gsi));
+	  add_phi_arg (real_phi, cond_real, cond_to_join,
+			UNKNOWN_LOCATION);
+	  add_phi_arg (real_phi, tmpr, orig_to_join, UNKNOWN_LOCATION);
+
+	  gphi *imag_phi = create_phi_node (ri, gsi_bb (*gsi));
+	  add_phi_arg (imag_phi, cond_imag, cond_to_join,
+			UNKNOWN_LOCATION);
+	  add_phi_arg (imag_phi, tmpi, orig_to_join, UNKNOWN_LOCATION);
 	}
+      else
+	/* If we are not worrying about NaNs expand to
+	  (ar*br - ai*bi) + i(ar*bi + br*ai) directly.  */
+	expand_complex_multiplication_components (gsi, inner_type, ar, ai,
+						      br, bi, &rr, &ri);
       break;
 
     default:
@@ -1308,13 +1405,14 @@ expand_complex_div_wide (gimple_stmt_iterator *gsi, tree inner_type,
 /* Expand complex division to scalars.  */
 
 static void
-expand_complex_division (gimple_stmt_iterator *gsi, tree inner_type,
+expand_complex_division (gimple_stmt_iterator *gsi, tree type,
 			 tree ar, tree ai, tree br, tree bi,
 			 enum tree_code code,
 			 complex_lattice_t al, complex_lattice_t bl)
 {
   tree rr, ri;
 
+  tree inner_type = TREE_TYPE (type);
   switch (PAIR (al, bl))
     {
     case PAIR (ONLY_REAL, ONLY_REAL):
@@ -1362,7 +1460,7 @@ expand_complex_division (gimple_stmt_iterator *gsi, tree inner_type,
 	case 2:
 	  if (SCALAR_FLOAT_TYPE_P (inner_type))
 	    {
-	      expand_complex_libcall (gsi, ar, ai, br, bi, code);
+	      expand_complex_libcall (gsi, type, ar, ai, br, bi, code, true);
 	      break;
 	    }
 	  /* FALLTHRU */
@@ -1630,7 +1728,7 @@ expand_complex_operations_1 (gimple_stmt_iterator *gsi)
       break;
 
     case MULT_EXPR:
-      expand_complex_multiplication (gsi, inner_type, ar, ai, br, bi, al, bl);
+      expand_complex_multiplication (gsi, type, ar, ai, br, bi, al, bl);
       break;
 
     case TRUNC_DIV_EXPR:
@@ -1638,7 +1736,7 @@ expand_complex_operations_1 (gimple_stmt_iterator *gsi)
     case FLOOR_DIV_EXPR:
     case ROUND_DIV_EXPR:
     case RDIV_EXPR:
-      expand_complex_division (gsi, inner_type, ar, ai, br, bi, code, al, bl);
+      expand_complex_division (gsi, type, ar, ai, br, bi, code, al, bl);
       break;
 
     case NEGATE_EXPR:

Reply via email to