On 10 November 2015 at 20:11, Richard Biener <[email protected]> wrote:
> On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote:
>
>> On 4 November 2015 at 20:35, Richard Biener <[email protected]> wrote:
>> >
>> > Btw, did you investigate code gen differences on x86_64/i586? That
>> > target expands all divisions/modulo ops via divmod, relying on CSE
>> > solely as the HW always computes both div and mod (IIRC).
>> x86_64 has optab_handler for divmod defined, so the transform won't
>> take place on x86.
>
> Ok.
>
>> > +
>> > + gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs
>> > (use_stmt), rhs);
>> > + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
>> >
>> > Ick. Please use
>> >
>> > gimple_set_rhs_from_tree (use_stmt, res);
>> Um there doesn't seem to be gimple_set_rhs_from_tree.
>> I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt.
>> Is that OK ?
>
> Yes.
>
>> > update_stmt (use_stmt);
>> > if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> > cfg_changed = true;
>> >
>> > + free_dominance_info (CDI_DOMINATORS);
>> >
>> > do not free dominators.
>>
>> I have done the suggested changes in the attached patch.
>> I have a few questions:
>>
>> a) Does the change to insert DIVMOD call before topmost div or mod
>> stmt with matching operands
>> look correct ?
>
> + /* Insert call-stmt just before the topmost div/mod stmt.
> + top_bb dominates all other basic blocks containing div/mod stms
> + so, the topmost stmt would be the first div/mod stmt with matching
> operands
> + in top_bb. */
> +
> + gcc_assert (top_bb != 0);
> + gimple_stmt_iterator gsi;
> + for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next
> (&gsi))
> + {
> + gimple *g = gsi_stmt (gsi);
> + if (is_gimple_assign (g)
> + && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR
> + || gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR)
> + && operand_equal_p (op1, gimple_assign_rhs1 (g), 0)
> + && operand_equal_p (op2, gimple_assign_rhs2 (g), 0))
> + break;
>
> Looks overly complicated to me. Just remember "topmost" use_stmt
> alongside top_bb (looks like you'll no longer need top_bb if you
> retail top_stmt). And then do
>
> gsi = gsi_for_stmt (top_stmt);
>
> and insert before that.
Thanks, done in this patch. Does it look OK ?
IIUC gimple_uid (stmt1) < gimple_uid (stmt2) can be used to check if
stmt1 occurs before stmt2
only if stmt1 and stmt2 are in the same basic block ?
>
>> b) Handling constants - I dropped handling constants in the attached
>> patch. IIUC we don't want
>> to enable this transform if there's a specialized expansion for some
>> constants for div or mod ?
>
> See expand_divmod which has lots of special cases for constant operands
> not requiring target support for div or mod.
Thanks, would it be OK if I do this in follow up patch ?
>
>> I suppose this would also be target dependent and require a target hook ?
>> For instance arm defines modsi3 pattern to expand mod when 2nd operand
>> is constant and <= 0 or power of 2,
>> while for other cases goes the expand_divmod() route to generate call
>> to __aeabi_idivmod libcall.
>
> Ok, so it lacks a signed mod instruction.
>
>> c) Gating the divmod transform -
>> I tried gating it on checks for optab_handlers on div and mod, however
>> this doesn't enable transform for arm cortex-a9
>> anymore (cortex-a9 doesn't have hardware instructions for integer div and
>> mod).
>> IIUC for cortex-a9,
>> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because
>> HAVE_divsi3 is 0.
>> However optab_handler (smod_optab, SImode) matches since optab_handler
>> only checks for existence of pattern
>> (and not whether the pattern gets matched).
>> I suppose we should enable the transform only if the divmod, div, and
>> mod pattern do not match rather than checking
>> if the patterns exist via optab_handler ? For a general x % y, modsi3
>> would fail to match but optab_handler(smod_optab, SImode ) still
>> says it's matched.
>
> Ah, of course. Querying for an optab handler is just a cheap
> guesstimate... Not sure how to circumvent this best (sub-target
> enablement of patterns). RTL expansion just goes ahead (of course)
> and sees if expansion eventually fails. Richard?
>
>> Should we define a new target hook combine_divmod, which returns true
>> if transforming to divmod is desirable for that
>> target ?
>> The default definition could be:
>> bool default_combine_divmod (enum machine_mode mode, tree op1, tree op2)
>> {
>> // check for optab_handlers for div/mod/divmod and libfunc for divmod
>> }
>>
>> And for arm, it could be over-ridden to return false if op2 is
>> constant and <= 0 or power of 2.
>> I am not really sure if this is a good idea since I am replicating
>> information from modsi3 pattern.
>> Any change to the pattern may require corresponding change to the hook :/
>
> Yeah, I don't think that is desirable. Ideally we'd have a way
> to query HAVE_* for CODE_FOR_* which would mean target-insns.def
> support for all div/mod/divmod patterns(?) and queries...
>
> Not sure if what would be enough though.
>
> Note that the divmod check is equally flawed.
>
> I think with the above I'd enable the transform when
>
> + if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
> + || (optab_libfunc (divmod_optab, mode) != NULL_RTX
> && optab_handler ([su]div_optab, mode) == CODE_FOR_nothing))
> + return false;
Um this fails for the arm backend (for cortex-a9) because
optab_handler (divmod_optab, mode) != CODE_FOR_nothing is false
optab_libfunc (divmod_optab, mode) != NULL_RTX is true.
optab_handler (div_optab, mode) == CODE_FOR_nothing is true.
which comes down to false || (true && true) which is true and we hit
return false.
AFAIU, we want the transform to be disabled if:
a) optab_handler exists for divmod.
b) optab_handler exists for div.
c) optab_libfunc does not exist for divmod. */
+ if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
+ || optab_handler (div_optab, mode) != CODE_FOR_nothing
+ || optab_libfunc (divmod_optab, mode) == NULL_RTX)
+ return false;
Does that look correct ?
>
> so we either will have a divmod instruction (hopefully not sub-target
> disabled for us) or a libfunc for divmod and for sure no HW divide
> instruction (HW mod can be emulated by HW divide but not the other
> way around).
>
>> d) Adding effective-target-check for divmod: I just enabled it for
>> arm*-*-* for now. I could additionally append more targets,
>> not sure if this is the right approach.
>
> Looks good to me.
Is this version OK if bootstrap/testing passes ?
Thanks,
Prathamesh
>
> Thanks,
> Richard.
2015-11-11 Prathamesh Kulkarni <[email protected]>
Kugan Vivekanandarajah <[email protected]>
* internal-fn.def: Add entry for expand_DIVMOD.
* internal-fn.c (expand_DIVMOD): New function.
* tree-ssa-math-opts.c: Include optabs-libfuncs.h.
(maybe_record_divmod): New function.
(divmod_candidate_p): Likewise.
(convert_to_divmod): Likewise.
(widen_mul_stats): New member divmod_calls_inserted.
(pass_optimize_widening_mul::execute): Call caluclate_dominace_info,
call renumber_gimple_stmt_uids, and add check for TRUNC_MOD_EXPR.
testsuite/
* gcc.dg/pr43721-1.c: New test.
* gcc.dg/pr43721-2.c: Likewise.
* gcc.dg/pr43721-3.c: Likewise.
* gcc.dg/pr43721-4.c: Likewise.
* lib/target-supports.exp: Add check for divmod.
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index a7da373..c0c84d8 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2045,6 +2045,47 @@ expand_GOACC_LOOP (gcall *stmt ATTRIBUTE_UNUSED)
gcc_unreachable ();
}
+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available.
+ b) If optab_handler doesn't exist, Generate call to
+ optab_libfunc for udivmod/sdivmod. */
+
+static void
+expand_DIVMOD (gcall *stmt)
+{
+ tree lhs = gimple_call_lhs (stmt);
+ tree arg0 = gimple_call_arg (stmt, 0);
+ tree arg1 = gimple_call_arg (stmt, 1);
+
+ gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE);
+ tree type = TREE_TYPE (TREE_TYPE (lhs));
+ machine_mode mode = TYPE_MODE (type);
+ optab tab = (TYPE_UNSIGNED (type)) ? udivmod_optab : sdivmod_optab;
+
+ rtx op0 = expand_normal (arg0);
+ rtx op1 = expand_normal (arg1);
+ rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+ rtx libfunc = optab_libfunc (tab, mode);
+ gcc_assert (libfunc != NULL_RTX);
+
+ machine_mode libval_mode =
+ smallest_mode_for_size (2 * GET_MODE_BITSIZE (mode), MODE_INT);
+
+ rtx libval = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
+ libval_mode, 2, op0, mode, op1, mode);
+
+ rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0);
+ rtx remainder = simplify_gen_subreg (mode, libval, libval_mode,
+ GET_MODE_SIZE (mode));
+
+ /* Wrap the return value (quotient, remaineder) within COMPLEX_EXPR */
+ expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+ make_tree (TREE_TYPE (arg0), quotient),
+ make_tree (TREE_TYPE (arg1), remainder)),
+ target, VOIDmode, EXPAND_NORMAL);
+}
+
/* Routines to expand each internal function, indexed by function number.
Each routine has the prototype:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 78266d9..f28579b 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -83,3 +83,5 @@ DEF_INTERNAL_FN (GOACC_DIM_POS, ECF_PURE | ECF_NOTHROW |
ECF_LEAF, ".")
/* OpenACC looping abstraction. See internal-fn.h for usage. */
DEF_INTERNAL_FN (GOACC_LOOP, ECF_PURE | ECF_NOTHROW, NULL)
+
+DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
diff --git a/gcc/testsuite/gcc.dg/pr43721-1.c b/gcc/testsuite/gcc.dg/pr43721-1.c
new file mode 100644
index 0000000..2ec1118
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-1.c
@@ -0,0 +1,11 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+/* { dg-require-effective-target divmod } */
+
+int f(int x, int y)
+{
+ int quotient = x / y;
+ int remainder = x % y;
+ return quotient + remainder;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 1 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43721-2.c b/gcc/testsuite/gcc.dg/pr43721-2.c
new file mode 100644
index 0000000..974099c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-2.c
@@ -0,0 +1,17 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+/* { dg-require-effective-target divmod } */
+
+int f(int x, int y)
+{
+ extern int early_exit;
+
+ int quot = x / y;
+
+ if (early_exit)
+ return 0;
+
+ int rem = x % y;
+ return quot + rem;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 1 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43721-3.c b/gcc/testsuite/gcc.dg/pr43721-3.c
new file mode 100644
index 0000000..41cda2e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-3.c
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+/* { dg-require-effective-target divmod } */
+
+int f(int x, int y)
+{
+ extern int flag;
+ int quot;
+
+ if (flag)
+ quot = x / y;
+ else
+ quot = 0;
+
+ int rem = x % y;
+ return quot + rem;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43721-4.c b/gcc/testsuite/gcc.dg/pr43721-4.c
new file mode 100644
index 0000000..b4f69b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-4.c
@@ -0,0 +1,19 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+/* { dg-require-effective-target divmod } */
+
+int f(int x, int y)
+{
+ int quot = 0;
+ int rem = 0;
+
+ extern int flag;
+
+ if (flag)
+ quot = x / y;
+ else
+ rem = x % y;
+
+ return quot + rem;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index b543519..c3895a7 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6494,3 +6494,12 @@ proc check_effective_target_vect_max_reduc { } {
}
return 0
}
+
+# Return 1 if the target supports divmod
+
+proc check_effective_target_divmod { } {
+ if { [istarget arm*-*-*] } {
+ return 1
+ }
+ return 0
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 41fcabf..bd6dc9c 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -110,6 +110,8 @@ along with GCC; see the file COPYING3. If not see
#include "tree-ssa.h"
#include "builtins.h"
#include "params.h"
+#include "optabs-libfuncs.h"
+#include "tree-eh.h"
/* This structure represents one basic block that either computes a
division, or is a common dominator for basic block that compute a
@@ -182,6 +184,9 @@ static struct
/* Number of fp fused multiply-add ops inserted. */
int fmas_inserted;
+
+ /* Number of divmod calls inserted. */
+ int divmod_calls_inserted;
} widen_mul_stats;
/* The instance of "struct occurrence" representing the highest
@@ -3493,6 +3498,162 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree
op2)
return true;
}
+/* Set top_stmt to topmost stmt between top_stmt and use_stmt, and add
+ use_stmt to vector use_stmts provided basic block containing top_stmt
+ dominates use_stmt or vice versa. */
+
+static void
+maybe_record_divmod (vec<gimple *>& stmts, gimple *&top_stmt, gimple *use_stmt)
+{
+ basic_block bb = gimple_bb (use_stmt);
+ basic_block top_bb = gimple_bb (top_stmt);
+
+ if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
+ {
+ if (bb != top_bb
+ || gimple_uid (use_stmt) < gimple_uid (top_stmt))
+ top_stmt = use_stmt;
+ }
+ else if (!dominated_by_p (CDI_DOMINATORS, bb, top_bb))
+ return;
+
+ stmts.safe_push (use_stmt);
+}
+
+/* Check if the stmt is a candidate for divmod transform. */
+
+static bool
+divmod_candidate_p (gassign *stmt)
+{
+ enum machine_mode mode = TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt)));
+ const_tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+ optab divmod_optab, div_optab;
+
+ if (TYPE_UNSIGNED (type))
+ {
+ divmod_optab = udivmod_optab;
+ div_optab = udiv_optab;
+ }
+ else
+ {
+ divmod_optab = sdivmod_optab;
+ div_optab = sdiv_optab;
+ }
+
+ /* Disable the transform if:
+ a) optab_handler exists for divmod.
+ b) optab_handler exists for div.
+ c) optab_libfunc does not exist for divmod. */
+
+ if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
+ || optab_handler (div_optab, mode) != CODE_FOR_nothing
+ || optab_libfunc (divmod_optab, mode) == NULL_RTX)
+ return false;
+
+ tree op1 = gimple_assign_rhs1 (stmt);
+ tree op2 = gimple_assign_rhs2 (stmt);
+
+ /* FIXME: Drop handling constants for now. */
+ if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
+ return false;
+
+ if (TYPE_OVERFLOW_TRAPS (type))
+ return false;
+
+ return true;
+}
+
+/* This function looks for:
+ t1 = a TRUNC_DIV_EXPR b;
+ t2 = a TRUNC_MOD_EXPR b;
+ and transforms it to the following sequence:
+ complex_tmp = DIVMOD (a, b);
+ t1 = REALPART_EXPR(a);
+ t2 = IMAGPART_EXPR(b);
+ This change is done only if the target has support for divmod.
+
+ The pass works in two phases:
+ 1) Walk through all immediate uses of stmt's operand and find a
+ TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
+ it to stmts vector.
+ 2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that
+ dominates other div/mod stmts with same operands) and update entries in
+ stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
+ IMAGPART_EXPR for mod). */
+
+static bool
+convert_to_divmod (gassign *stmt)
+{
+ if (!divmod_candidate_p (stmt))
+ return false;
+
+ tree op1 = gimple_assign_rhs1 (stmt);
+ tree op2 = gimple_assign_rhs2 (stmt);
+
+ vec<gimple *> stmts = vNULL;
+ stmts.safe_push (stmt);
+
+ imm_use_iterator use_iter;
+ gimple *use_stmt;
+ gimple *top_stmt = stmt;
+
+ FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
+ {
+ if (is_gimple_assign (use_stmt)
+ && gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+ && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
+ && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
+ maybe_record_divmod (stmts, top_stmt, use_stmt);
+ }
+
+ if (stmts.length () == 1)
+ return false;
+
+ /* Create the library call and insert the call stmt before top_stmt. */
+ gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
+ tree res = make_temp_ssa_name (
+ build_complex_type (TREE_TYPE (gimple_assign_lhs (stmt))),
+ call_stmt, "divmod_tmp");
+
+ gimple_call_set_lhs (call_stmt, res);
+ gimple_stmt_iterator top_stmt_gsi = gsi_for_stmt (top_stmt);
+ gsi_insert_before (&top_stmt_gsi, call_stmt, GSI_SAME_STMT);
+
+ widen_mul_stats.divmod_calls_inserted++;
+
+ /* Update stmts. */
+ bool cfg_changed = false;
+ for (unsigned i = 0; i < stmts.length (); ++i)
+ {
+ tree rhs;
+ use_stmt = stmts[i];
+
+ switch (gimple_assign_rhs_code (use_stmt))
+ {
+ case TRUNC_DIV_EXPR:
+ rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+ break;
+
+ case TRUNC_MOD_EXPR:
+ rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op1), res);
+ break;
+
+ default:
+ gcc_unreachable ();
+ }
+
+ gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+ gimple_assign_set_rhs_from_tree (&gsi, rhs);
+ update_stmt (use_stmt);
+
+ if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+ cfg_changed = true;
+ }
+
+ stmts.release ();
+ return cfg_changed;
+}
+
/* Find integer multiplications where the operands are extended from
smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
where appropriate. */
@@ -3535,7 +3696,10 @@ pass_optimize_widening_mul::execute (function *fun)
basic_block bb;
bool cfg_changed = false;
+ calculate_dominance_info (CDI_DOMINATORS);
+
memset (&widen_mul_stats, 0, sizeof (widen_mul_stats));
+ renumber_gimple_stmt_uids ();
FOR_EACH_BB_FN (bb, fun)
{
@@ -3563,6 +3727,10 @@ pass_optimize_widening_mul::execute (function *fun)
}
break;
+ case TRUNC_MOD_EXPR:
+ convert_to_divmod (as_a<gassign *> (stmt));
+ break;
+
case PLUS_EXPR:
case MINUS_EXPR:
convert_plusminus_to_widen (&gsi, stmt, code);
@@ -3614,6 +3782,8 @@ pass_optimize_widening_mul::execute (function *fun)
widen_mul_stats.maccs_inserted);
statistics_counter_event (fun, "fused multiply-adds inserted",
widen_mul_stats.fmas_inserted);
+ statistics_counter_event (fun, "divmod calls inserted",
+ widen_mul_stats.divmod_calls_inserted);
return cfg_changed ? TODO_cleanup_cfg : 0;
}