On Wed, Sep 18, 2013 at 11:45 AM, Zhenqiang Chen <[email protected]> wrote:
>
>> -----Original Message-----
>> From: Richard Biener [mailto:[email protected]]
>> Sent: Tuesday, August 27, 2013 8:18 PM
>> To: Richard Earnshaw
>> Cc: Zhenqiang Chen; GCC Patches
>> Subject: Re: [PATCH 1/n] Add conditional compare support
>>
>> On Tue, Aug 27, 2013 at 1:56 PM, Richard Earnshaw <[email protected]>
>> wrote:
>> > On 27/08/13 12:10, Richard Biener wrote:
>> >> What's this for and what's the desired semantics? I don't like having
>> >> extra tree codes for this. Is this for a specific instruction set
>> >> feature?
>> >
>> > The background is to support the conditional compare instructions in
>> > ARM (more effectively) and AArch64 at all.
>> >
>> > The current method used in ARM is to expand into a series of
>> > store-flag instructions and then hope that combine can optimize them
>> > away (though that fails far too often, particularly when the first
>> > instruction in the sequence is combined into another pattern). To
>> > make it work at all the compiler has to lie about the costs of various
>> > store-flag type operations which overall risks producing worse code
>> > and means we also have to support many more complex multi-instruction
>> > patterns than is desirable. I really don't want to go down the same
> route
>> for AArch64.
>> >
>> > The idea behind all this is to capture potential conditional compare
>> > operations early enough in the mid end that we can keep track of them
>> > until RTL expand time and then to emit the correct logic on all
>> > targets depending on what is the best thing for that target. The
>> > current method of lowering into store-flag sequences doesn't really cut
> it.
>>
>> It seems to me that then the initial instruction selection process (aka
> RTL
>> expansion) needs to be improved. As we are expanding with having the CFG
>> around it should be easy enough to detect AND/ORIF cases and do better
>> here. Yeah, I suppose this asks to turn existing jump expansion
> optimizations
>> up-side-down to optimize with the GIMPLE CFG in mind.
>>
>> The current way of LOGICAL_OP_NON_SHORT_CIRCUIT is certainly bogus -
>> fold-const.c is way too early to decide this. Similar to the ongoing work
> of
>> expanding / building-up switch expressions in a GIMPLE pass, moving expand
>> complexity up the pipeline this asks for a GIMPLE phase that moves this
>> decision down closer to RTL expansion.
>> (We have tree-ssa-ifcombine.c that is a related GIMPLE transform pass)
>>
>
> The patch is updated according to your comments. It is a basic support,
> which does not touch ifcombine and jump related optimizations yet.
>
> Current method is:
> 1) In fold-const, if HAVE_conditional_compare, set
> LOGICAL_OP_NON_SHORT_CIRCUIT
> to optimize_function_for_speed_p. So we do not depend on BRANCH_COST.
> 2) Identify CCMP during expanding. A CCMP candidate is a BIT_AND_EXPR
> or BIT_IOR_EXPR, whose operators are compares.
> 3) Add a new op in optab to expand the CCMP to optimized RTL,
> e.g. and_scc_scc/ior_scc_scc in ARM.
>
> Bootstrap on ARM Chrome book.
> No make check regression on Pandaboard.
That's better. A few observations
+static bool
+is_conditional_compare_candidate_p (gimple g)
+{
+ tree rhs = gimple_assign_rhs_to_tree (g);
+ tree lhs, op0, op1;
+ gimple gs0, gs1;
+
+ if (TREE_CODE (rhs) != BIT_AND_EXPR && TREE_CODE (rhs) != BIT_IOR_EXPR)
+ return false;
+
+ lhs = gimple_assign_lhs (g);
+ op0 = TREE_OPERAND (rhs, 0);
+ op1 = TREE_OPERAND (rhs, 1);
+
+ if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME)
+ || !has_single_use (lhs))
+ return false;
+
+ gs0 = SSA_NAME_DEF_STMT (op0);
+ gs1 = SSA_NAME_DEF_STMT (op1);
you are not allowed to use arbitrary definition stmts for code generation (you
may have a look at them though). Use the get_gimple_for_ssa_name helper
to only access definition stmts that have no code generated yet and that
you can modify.
+ gimple gs0 = SSA_NAME_DEF_STMT (TREE_OPERAND (exp, 0));
+ gimple gs1 = SSA_NAME_DEF_STMT (TREE_OPERAND (exp, 1));
See above.
+ if (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0)))
+ && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1))))
+ {
+ rtx op0, op1, op2, op3, tmp;
+ tree cmp0_op0, cmp0_op1, cmp1_op0, cmp1_op1;
+ enum machine_mode mode;
+
+ cmp0_op0 = gimple_assign_rhs1 (gs0);
+ cmp0_op1 = gimple_assign_rhs2 (gs0);
+ cmp1_op0 = gimple_assign_rhs1 (gs1);
+ cmp1_op1 = gimple_assign_rhs2 (gs1);
+
+ expand_operands (cmp0_op0, cmp0_op1, NULL_RTX, &op0, &op1,
EXPAND_NORMAL);
+ expand_operands (cmp1_op0, cmp1_op1, NULL_RTX, &op2, &op3,
EXPAND_NORMAL);
as you are not expanding the result of gs0 or gs1 you indeed have to
use get_gimple_for_ssa_name.
+ else
+ r = expand_expr_real (rhs, target, tmode, modifier, NULL);
+#else
+ r = expand_expr_real (rhs, target, tmode, modifier, NULL);
+#endif
just place the #else after the else, no need to duplicate the
expand_expr_real call.
+#ifdef HAVE_conditional_compare
+#undef LOGICAL_OP_NON_SHORT_CIRCUIT
+#define LOGICAL_OP_NON_SHORT_CIRCUIT optimize_function_for_speed_p (cfun)
+#else
#ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
#define LOGICAL_OP_NON_SHORT_CIRCUIT \
(BRANCH_COST (optimize_function_for_speed_p (cfun), \
false) >= 2)
#endif
+#endif
that's of course a hack ;) If we want to keep LOGICAL_OP_NON_SHORT_CIRCUIT
then it should become a target hook at this point. But I'd argue that
at the level
of fold-const.c it should simply always be false.
@@ -187,6 +187,7 @@ OPTAB_D (movcc_optab, "mov$acc")
OPTAB_D (cmov_optab, "cmov$a6")
OPTAB_D (cstore_optab, "cstore$a4")
OPTAB_D (ctrap_optab, "ctrap$a4")
+OPTAB_D (ccmp_optab, "conditional_compare")
I'd say the two should match, so ccmp_obtab, "ccmp"
You miss updates to md.texi for the new named patterns.
Richard.
> Thanks!
> -Zhenqiang
>
> ChangeLog:
> 2013-09-18 Zhenqiang Chen <[email protected]>
>
> * config/arm/arm.md (conditional_compare): New.
> * expr.c (is_conditional_compare_candidate_p, expand_ccmp_expr):
> New.
> (expand_expr_real_1): Identify conditional compare.
> * fold-const.c (LOGICAL_OP_NON_SHORT_CIRCUIT): Update.
> * optabs.c (expand_ccmp_op): New.
> (get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR.
> * optabs.def (ccmp_optab): New.
> * optabs.h (expand_ccmp_op): New.