[RFC] Add conditional compare support
Hi, Several ports in gcc support conditional compare instructions which is an efficient way to handle SHORT_CIRCUIT. But the information is not represented in TREE or GIMPLE. And it depends on BRANCH_COST and combine pass to generate the instructions. To explicitly represent the semantics of conditional compare (CCMP) and remove the dependence on BRANCH_COST and combine, we propose to add a set of new keywords/operators on TREE. A CCMP operator has three operands: two are from the compare and the other is from the result of previous compare. e.g. r = CMP (a, b) && CMP (c, d) /*CMP can be >, >=, etc.*/ in current gcc, if LOGICAL_OP_NON_SHORT_CIRCUIT, it is like t1 = CMP (a, b) t2 = CMP (c, d) r = t1 & t2 with CCMP, it will be t1 = CMP (a, b) r = CCMP (t1, c, d) SHORT_CIRCUIT expression will be converted to CCMP expressions at the end of front-end. The CCMP will keep until expand pass. In expand, we will define a new operator for ports to generate optimized RTL directly. Current status: * The basic support can bootstrap on ARM Chomebook and x86-64, no ICE and runtime fail in regression tests. * uninit, vrp and reassoc passes can handle CCMP. I will post a series of patches for this feature, which includes * Add new keywords support. * Handle the new keywords in middle-end passes: uninit, vrp, reassoc, ... * Add new op to expand CCMP. Thanks! -Zhenqiang
[PATCH 1/n] Add conditional compare support
Hi, The attached patch is the basic support for conditional compare (CCMP). It adds a set of keywords on TREE to represent CCMP: DEFTREECODE (TRUTH_ANDIF_LT_EXPR, "truth_andif_lt_expr", tcc_ccomparison, 3) DEFTREECODE (TRUTH_ANDIF_LE_EXPR, "truth_andif_le_expr", tcc_ccomparison, 3) DEFTREECODE (TRUTH_ANDIF_GT_EXPR, "truth_andif_gt_expr", tcc_ccomparison, 3) DEFTREECODE (TRUTH_ANDIF_GE_EXPR, "truth_andif_ge_expr", tcc_ccomparison, 3) DEFTREECODE (TRUTH_ANDIF_EQ_EXPR, "truth_andif_eq_expr", tcc_ccomparison, 3) DEFTREECODE (TRUTH_ANDIF_NE_EXPR, "truth_andif_ne_expr", tcc_ccomparison, 3) DEFTREECODE (TRUTH_ORIF_LT_EXPR, "truth_orif_lt_expr", tcc_ccomparison, 3) DEFTREECODE (TRUTH_ORIF_LE_EXPR, "truth_orif_le_expr", tcc_ccomparison, 3) DEFTREECODE (TRUTH_ORIF_GT_EXPR, "truth_orif_gt_expr", tcc_ccomparison, 3) DEFTREECODE (TRUTH_ORIF_GE_EXPR, "truth_orif_ge_expr", tcc_ccomparison, 3) DEFTREECODE (TRUTH_ORIF_EQ_EXPR, "truth_orif_eq_expr", tcc_ccomparison, 3) DEFTREECODE (TRUTH_ORIF_NE_EXPR, "truth_orif_ne_expr", tcc_ccomparison, 3) To distinguish others, the patch dumps CCMP as && to "?&&" || to "?||" A CCMP operator has three operands: two are from the compare and the other is from the result of previous compare. To reuse current codes, the result of previous compare is in TREE_OPERAND (ccmp, 2)/gimple_assign_rhs3. e.g. r = (a > b) && (c > d) with CCMP, it will be t1 = GT_EXPR (a, b) r = TRUTH_ANDIF_GT_EXPR (c, d, t1) The patch does not include ops to expand CCMP to RTL. It just roll CCMP back to BIT operators. So with a dummy "conditional_compare", we can test the patch in any port. In the patch, dummy conditional_compare for i386 and arm are added for test. Bootstrap on x86-64 and ARM Chromebook. No ICE and runtime errors in regression tests. Thanks! -Zhenqiang ChangeLog: 2013-08-21 Zhenqiang Chen * tree.def: Add conditional compare ops: TRUTH_ANDIF_LT_EXPR, TRUTH_ANDIF_LE_EXPR, TRUTH_ANDIF_GT_EXPR, TRUTH_ANDIF_GE_EXPR, TRUTH_ANDIF_EQ_EXPR, TRUTH_ANDIF_NE_EXPR, TRUTH_ORIF_LT_EXPR, TRUTH_ORIF_LE_EXPR, TRUTH_ORIF_GT_EXPR, TRUTH_ORIF_GE_EXPR, TRUTH_ORIF_EQ_EXPR, TRUTH_ORIF_NE_EXPR. * tree.h: Add misc utils for conditional compare. * tree.c (tree_node_structure_for_code, tree_code_size, record_node_allocation_statistics, contains_placeholder_p, find_placeholder_in_expr, substitute_in_expr, substitute_placeholder_in_expr, stabilize_reference_1, build2_stat, simple_cst_equal): Handle conditional compare. (get_code_from_ccompare_expr): New added. (generate_ccompare_code ): New added. * c-family/c-pretty-print.c (pp_c_expression): Handle conditional compare. * cp/error.c (dump_expr): Likewise. * cp/semantics.c (cxx_eval_constant_expression): Likewise. (potential_constant_expression_1): Likewise. (cxx_eval_ccmp_expression): New added. * cfgexpand.c (expand_debug_expr): Handle conditional compare. * expr.c (safe_from_p, expand_expr_real_2): Likewise. (expand_ccmp_to_bitop): New added. * gimple-pretty-print.c (dump_ccomparison_rhs): New added. (dump_gimple_assign): Handle conditional compare. * print-tree.c (print_node): Likewise * tree-dump.c (dequeue_and_dump): Likewise. * tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise. * gimple.c (recalculate_side_effects): Likewise. (get_gimple_rhs_num_ops): Likewise * gimplify.c (goa_stabilize_expr, gimplify_expr, gimple_boolify): Likewise. * tree-inline.c (estimate_operator_cost): Likewise. * tree-ssa-operands.c (get_expr_operands): Likewise. * tree-ssa-loop-niter.c (get_val_for): Likewise. * tree-cfg.c (verify_gimple_assign_ternary): Likewise. (verify_gimple_comparison_operands): New added. (verify_gimple_comparison): Call verify_gimple_comparison_operands. * fold-const.c (fold_truth_andor): Generate conditinal compare. * lra-constraints.c (remove_inheritance_pseudos): Initialize variable "set" to NULL_RTX. basic-conditional-compare-support.patch Description: Binary data
[PATCH 2/n] Handle conditional compare in uninit pass
Hi, The attached patch enables uninit pass to handle conditional compare (CCMP). CCMP is a combine of BIT_AND_EXPR/BIT_IOR_EXPR and CMP expression. The codes are similar with those to handle BIT_AND_EXPR/BIT_IOR_EXPR and CMP expression. Bootstrap on x86-64 and ARM Chromebook. No ICE and runtime errors in regression tests. Not test cases prefixed with uninit-pred- fail. Thanks! -Zhenqiang ChangeLog: 2013-08-21 Zhenqiang Chen * tree-ssa-uninit.c (normalize_cond_1, normalize_cond, is_gcond_subset_of, is_norm_cond_subset_of): Handle conditional compare. uninit-CCMP.patch Description: Binary data
[PATCH 3/n] Handle conditional compare in vrp pass
Hi, The patch updates vrp pass to handle conditional compare (CCMP). CCMP is a combine of BIT_AND_EXPR/BIT_IOR_EXPR and CMP expression. The codes are similar with those to handle BIT_AND_EXPR/BIT_IOR_EXPR and CMP expression. When the compare in CCMP can be simplified, CCMP will be converted to a copy, compare or a bit operator. Bootstrap on x86-64 and ARM Chromebook. No ICE and runtime errors in regression tests. All "vrp" related test cases PASS. Thanks! -Zhenqiang ChangeLog: 2013-08-30 Zhenqiang Chen * tree-vrp.c (extract_range_from_assignment, register_edge_assert_for, simplify_stmt_using_ranges): Handle conditional compare. (extract_range_from_comparison, can_simplify_ops_using_ranges_p, simplify_ccmp_to_cmp, simple_ccmp_second_compare, simplify_ccmp_to_op, simplify_ccmp_stmt_using_ranges): New added. testsuite/ChangeLog: 2013-08-30 Zhenqiang Chen * gcc.dg/tree-ssa/ccmp*.c: New test cases. vrp-CCMP.patch Description: Binary data
RE: [PATCH 1/n] Add conditional compare support
> -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Tuesday, August 27, 2013 8:18 PM > To: Richard Earnshaw > Cc: Zhenqiang Chen; GCC Patches > Subject: Re: [PATCH 1/n] Add conditional compare support > > On Tue, Aug 27, 2013 at 1:56 PM, Richard Earnshaw > wrote: > > On 27/08/13 12:10, Richard Biener wrote: > >> What's this for and what's the desired semantics? I don't like having > >> extra tree codes for this. Is this for a specific instruction set > >> feature? > > > > The background is to support the conditional compare instructions in > > ARM (more effectively) and AArch64 at all. > > > > The current method used in ARM is to expand into a series of > > store-flag instructions and then hope that combine can optimize them > > away (though that fails far too often, particularly when the first > > instruction in the sequence is combined into another pattern). To > > make it work at all the compiler has to lie about the costs of various > > store-flag type operations which overall risks producing worse code > > and means we also have to support many more complex multi-instruction > > patterns than is desirable. I really don't want to go down the same route > for AArch64. > > > > The idea behind all this is to capture potential conditional compare > > operations early enough in the mid end that we can keep track of them > > until RTL expand time and then to emit the correct logic on all > > targets depending on what is the best thing for that target. The > > current method of lowering into store-flag sequences doesn't really cut it. > > It seems to me that then the initial instruction selection process (aka RTL > expansion) needs to be improved. As we are expanding with having the CFG > around it should be easy enough to detect AND/ORIF cases and do better > here. Yeah, I suppose this asks to turn existing jump expansion optimizations > up-side-down to optimize with the GIMPLE CFG in mind. > > The current way of LOGICAL_OP_NON_SHORT_CIRCUIT is certainly bogus - > fold-const.c is way too early to decide this. Similar to the ongoing work of > expanding / building-up switch expressions in a GIMPLE pass, moving expand > complexity up the pipeline this asks for a GIMPLE phase that moves this > decision down closer to RTL expansion. > (We have tree-ssa-ifcombine.c that is a related GIMPLE transform pass) > The patch is updated according to your comments. It is a basic support, which does not touch ifcombine and jump related optimizations yet. Current method is: 1) In fold-const, if HAVE_conditional_compare, set LOGICAL_OP_NON_SHORT_CIRCUIT to optimize_function_for_speed_p. So we do not depend on BRANCH_COST. 2) Identify CCMP during expanding. A CCMP candidate is a BIT_AND_EXPR or BIT_IOR_EXPR, whose operators are compares. 3) Add a new op in optab to expand the CCMP to optimized RTL, e.g. and_scc_scc/ior_scc_scc in ARM. Bootstrap on ARM Chrome book. No make check regression on Pandaboard. Thanks! -Zhenqiang ChangeLog: 2013-09-18 Zhenqiang Chen * config/arm/arm.md (conditional_compare): New. * expr.c (is_conditional_compare_candidate_p, expand_ccmp_expr): New. (expand_expr_real_1): Identify conditional compare. * fold-const.c (LOGICAL_OP_NON_SHORT_CIRCUIT): Update. * optabs.c (expand_ccmp_op): New. (get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR. * optabs.def (ccmp_optab): New. * optabs.h (expand_ccmp_op): New. basic-conditional-compare-support2.patch Description: Binary data
RE: [PATCH 1/n] Add conditional compare support
> -Original Message- > From: Richard Earnshaw > Sent: Thursday, September 19, 2013 5:13 PM > To: Zhenqiang Chen > Cc: 'Richard Biener'; GCC Patches > Subject: Re: [PATCH 1/n] Add conditional compare support > > On 18/09/13 10:45, Zhenqiang Chen wrote: > > > >> -Original Message- > >> From: Richard Biener [mailto:richard.guent...@gmail.com] > >> Sent: Tuesday, August 27, 2013 8:18 PM > >> To: Richard Earnshaw > >> Cc: Zhenqiang Chen; GCC Patches > >> Subject: Re: [PATCH 1/n] Add conditional compare support > >> > >> On Tue, Aug 27, 2013 at 1:56 PM, Richard Earnshaw > >> wrote: > >>> On 27/08/13 12:10, Richard Biener wrote: > >>>> What's this for and what's the desired semantics? I don't like > >>>> having extra tree codes for this. Is this for a specific > >>>> instruction set feature? > >>> > >>> The background is to support the conditional compare instructions in > >>> ARM (more effectively) and AArch64 at all. > >>> > >>> The current method used in ARM is to expand into a series of > >>> store-flag instructions and then hope that combine can optimize them > >>> away (though that fails far too often, particularly when the first > >>> instruction in the sequence is combined into another pattern). To > >>> make it work at all the compiler has to lie about the costs of > >>> various store-flag type operations which overall risks producing > >>> worse code and means we also have to support many more complex > >>> multi-instruction patterns than is desirable. I really don't want > >>> to go down the same > > route > >> for AArch64. > >>> > >>> The idea behind all this is to capture potential conditional compare > >>> operations early enough in the mid end that we can keep track of > >>> them until RTL expand time and then to emit the correct logic on all > >>> targets depending on what is the best thing for that target. The > >>> current method of lowering into store-flag sequences doesn't really > >>> cut > > it. > >> > >> It seems to me that then the initial instruction selection process > >> (aka > > RTL > >> expansion) needs to be improved. As we are expanding with having the > >> CFG around it should be easy enough to detect AND/ORIF cases and do > >> better here. Yeah, I suppose this asks to turn existing jump > >> expansion > > optimizations > >> up-side-down to optimize with the GIMPLE CFG in mind. > >> > >> The current way of LOGICAL_OP_NON_SHORT_CIRCUIT is certainly bogus > - > >> fold-const.c is way too early to decide this. Similar to the ongoing > >> work > > of > >> expanding / building-up switch expressions in a GIMPLE pass, moving > >> expand complexity up the pipeline this asks for a GIMPLE phase that > >> moves this decision down closer to RTL expansion. > >> (We have tree-ssa-ifcombine.c that is a related GIMPLE transform > >> pass) > >> > > > > The patch is updated according to your comments. It is a basic > > support, which does not touch ifcombine and jump related optimizations > yet. > > > > Current method is: > > 1) In fold-const, if HAVE_conditional_compare, set > > LOGICAL_OP_NON_SHORT_CIRCUIT > >to optimize_function_for_speed_p. So we do not depend on > BRANCH_COST. > > 2) Identify CCMP during expanding. A CCMP candidate is a BIT_AND_EXPR > >or BIT_IOR_EXPR, whose operators are compares. > > 3) Add a new op in optab to expand the CCMP to optimized RTL, > > e.g. and_scc_scc/ior_scc_scc in ARM. > > > > Bootstrap on ARM Chrome book. > > No make check regression on Pandaboard. > > > > Thanks! > > -Zhenqiang > > > > ChangeLog: > > 2013-09-18 Zhenqiang Chen > > > > * config/arm/arm.md (conditional_compare): New. > > * expr.c (is_conditional_compare_candidate_p, expand_ccmp_expr): > > New. > > (expand_expr_real_1): Identify conditional compare. > > * fold-const.c (LOGICAL_OP_NON_SHORT_CIRCUIT): Update. > > * optabs.c (expand_ccmp_op): New. > > (get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR. > > * optabs.def (ccmp_optab): New. > > * optabs.h (expand_ccmp_op): New. > > > > > > basic-conditional-compare-support2.patch > > > > > > N¬n‡r¥ªíÂ)emçhÂyhi×¢w
[PATCH, ARM] Fix PR target/58423
Hi, The patch fixes PR target/58423. Bootstrap and no make check regression on Chromebook with ARM mode. Is it OK for trunk? Thanks! -Zhenqiang ChangeLog: 2013-09-23 Zhenqiang Chen PR target/58423 * config/arm/arm.c (arm_emit_ldrd_pop): Attach RTX_FRAME_RELATED_P on INSN. --- clean-trunk/gcc/config/arm/arm.c2013-09-17 14:29:45.632457018 +0800 +++ pr58423/gcc/config/arm/arm.c2013-09-18 14:34:24.708892318 +0800 @@ -17645,8 +17645,8 @@ mem = gen_frame_mem (DImode, stack_pointer_rtx); tmp = gen_rtx_SET (DImode, gen_rtx_REG (DImode, j), mem); -RTX_FRAME_RELATED_P (tmp) = 1; tmp = emit_insn (tmp); +RTX_FRAME_RELATED_P (tmp) = 1; /* Generate dwarf info. */ @@ -17674,8 +17674,8 @@ mem = gen_frame_mem (SImode, stack_pointer_rtx); tmp = gen_rtx_SET (SImode, gen_rtx_REG (SImode, j), mem); -RTX_FRAME_RELATED_P (tmp) = 1; tmp = emit_insn (tmp); +RTX_FRAME_RELATED_P (tmp) = 1; /* Generate dwarf info. */ REG_NOTES (tmp) = alloc_reg_note (REG_CFA_RESTORE, @@ -17699,8 +17699,9 @@ plus_constant (Pmode, stack_pointer_rtx, offset)); - RTX_FRAME_RELATED_P (tmp) = 1; - emit_insn (tmp); + tmp = emit_insn (tmp); + arm_add_cfa_adjust_cfa_note (tmp, offset, + stack_pointer_rtx, stack_pointer_rtx); offset = 0; } @@ -17723,6 +17724,8 @@ gen_rtx_REG (SImode, PC_REGNUM), NULL_RTX); REG_NOTES (par) = dwarf; + arm_add_cfa_adjust_cfa_note (par, UNITS_PER_WORD, + stack_pointer_rtx, stack_pointer_rtx); } }
RE: [PATCH, ARM] Fix PR target/58423
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Yufeng Zhang > Sent: Monday, September 23, 2013 3:16 PM > To: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH, ARM] Fix PR target/58423 > > On 09/23/13 07:58, Zhenqiang Chen wrote: > > --- clean-trunk/gcc/config/arm/arm.c2013-09-17 14:29:45.632457018 > +0800 > > +++ pr58423/gcc/config/arm/arm.c2013-09-18 14:34:24.708892318 +0800 > > @@ -17645,8 +17645,8 @@ > > mem = gen_frame_mem (DImode, stack_pointer_rtx); > > > > tmp = gen_rtx_SET (DImode, gen_rtx_REG (DImode, j), mem); > > -RTX_FRAME_RELATED_P (tmp) = 1; > > tmp = emit_insn (tmp); > > +RTX_FRAME_RELATED_P (tmp) = 1; > > The indent doesn't seem right. Thanks for the comments. My gmail server changes "tab" to "4 spaces". Recreate the patch to remove "tab" and use attachment to make sure no other automatic changes. -Zhenqiang pr58423.patch Description: Binary data
[PATCH] Enhance phiopt to handle BIT_AND_EXPR
Hi, The patch enhances phiopt to handle cases like: if (a == 0 && (...)) return 0; return a; Boot strap and no make check regression on X86-64 and ARM. Is it OK for trunk? Thanks! -Zhenqiang ChangeLog: 2013-09-30 Zhenqiang Chen * tree-ssa-phiopt.c (operand_equal_for_phi_arg_p_1): New. (value_replacement): Move a check to operand_equal_for_phi_arg_p_1. testsuite/ChangeLog: 2013-09-30 Zhenqiang Chen * gcc.dg/tree-ssa/phi-opt-11.c: New test case. phiopt.patch Description: Binary data
[PATCH, ARM] Fix PR target/54892 - [4.7/4.8 Regression], ICE in extract_insn, at recog.c:2123
Hi, In function arm_expand_compare_and_swap, oldval is converted to SImode when its "mode" is QImode/HImode. After "FALLTHRU" to "case SImode", we should use "SImode", other than "mode" (which is QImode/HImode). And INSN atomic_compare_and_swap_1 expects the operand in SImode. No make check regression. Is it OK for 4.7 and trunk? Thanks! -Zhenqiang ChangeLog: 2012-10-19 Zhenqiang Chen PR target/54892 * config/arm/arm.c (arm_expand_compare_and_swap): Use SImode to make sure the mode is correct when falling through from above cases. testsuite/ChangeLog: 2012-10-19 Zhenqiang Chen PR target/54892 * gcc.target/arm/pr54892.c: New. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index fc3a508..265e1cb 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -25437,8 +25437,8 @@ arm_expand_compare_and_swap (rtx operands[]) case SImode: /* Force the value into a register if needed. We waited until after the zero-extension above to do this properly. */ - if (!arm_add_operand (oldval, mode)) - oldval = force_reg (mode, oldval); + if (!arm_add_operand (oldval, SImode)) + oldval = force_reg (SImode, oldval); break; case DImode: /* { dg-do compile } */ int set_role(unsigned char role_id, short m_role) { return __sync_bool_compare_and_swap(&m_role, -1, role_id); }
Re: [PATCH, ARM] Fix PR target/54892 - [4.7/4.8 Regression], ICE in extract_insn, at recog.c:2123
On 19 October 2012 14:53, Ramana Radhakrishnan wrote: > On Fri, Oct 19, 2012 at 7:46 AM, Zhenqiang Chen > wrote: >> Hi, >> >> In function arm_expand_compare_and_swap, oldval is converted to SImode >> when its "mode" is QImode/HImode. After "FALLTHRU" to "case SImode", >> we should use "SImode", other than "mode" (which is QImode/HImode). >> And INSN atomic_compare_and_swap_1 expects the operand in >> SImode. >> >> No make check regression. >> >> Is it OK for 4.7 and trunk? > > Makes sense , OK for both. > Thanks. Committed to 4.7 and trunk. -Zhenqiang
RE: [PATCH] Enhance phiopt to handle BIT_AND_EXPR
> -Original Message- > From: Jeff Law [mailto:l...@redhat.com] > Sent: Wednesday, October 09, 2013 5:00 AM > To: Andrew Pinski; Zhenqiang Chen > Cc: GCC Patches > Subject: Re: [PATCH] Enhance phiopt to handle BIT_AND_EXPR > > On 09/30/13 09:57, Andrew Pinski wrote: > > On Mon, Sep 30, 2013 at 2:29 AM, Zhenqiang Chen > wrote: > >> Hi, > >> > >> The patch enhances phiopt to handle cases like: > >> > >>if (a == 0 && (...)) > >> return 0; > >>return a; > >> > >> Boot strap and no make check regression on X86-64 and ARM. > >> > >> Is it OK for trunk? > > > > From someone who wrote lot of this code (value_replacement in fact), > > this looks good, though I would pull: > > + if (TREE_CODE (gimple_assign_rhs1 (def)) == SSA_NAME) > > +{ > > + gimple def1 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (def)); > > + if (is_gimple_assign (def1) && gimple_assign_rhs_code (def1) == > > + EQ_EXPR) { tree op0 = gimple_assign_rhs1 (def1); tree op1 = > > + gimple_assign_rhs2 (def1); if ((operand_equal_for_phi_arg_p (arg0, > > + op0) > > + && operand_equal_for_phi_arg_p (arg1, op1)) > > + || (operand_equal_for_phi_arg_p (arg0, op1) > > + && operand_equal_for_phi_arg_p (arg1, op0))) > > +{ > > + *code = gimple_assign_rhs_code (def1); > > + return 1; > > +} > > + } > > +} > > > > Out into its own function since it is repeated again for > > gimple_assign_rhs2 (def). > Agreed. Repeating blobs of code like that should be pulled out into its own > subroutine. > > It's been 10 years since we wrote that code Andrew and in looking at it again, > I wonder if/why DOM doesn't handle the stuff in value_replacement. It > really just looks like it's propagation of an edge equivalence. do you recall > the motivation behind value_replacement and why we didn't just let DOM > handle it? > > > > > Also what about cascading BIT_AND_EXPR > > like: > > if((a == 0) & (...) & (...)) > > > > I notice you don't handle that either. > Well, given the structure of how that code is going to look, it'd just be > repeated walking through the input chains. Do-able, yes. But I don't think > handling that should be a requirement for integration. > > I'll go ahead and pull the common bits into a single function and commit on > Zhenqiang's behalf. Thank you! -Zhenqiang
RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0
> -Original Message- > From: Jeff Law [mailto:l...@redhat.com] > Sent: Thursday, October 10, 2013 11:05 AM > To: Zhenqiang Chen; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - > CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0 > > On 08/05/13 02:08, Zhenqiang Chen wrote: > > Hi > > > > The patch reassociates X == CST1 || X == CST2 if popcount (CST2 - > > CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0. > > > > Bootstrap on x86-64 and ARM chromebook. > > No make check regression on x86-64 and panda board. > > > > For some targets/options, the "(X == CST1 || X == CST2)" might be > > converted to "if (x == CST1) else if (x == CST2)" at the beginning. In > > such case, the patch will not be executed. It is hard to write a > > reliable testcase to detect the transformation. So the testcase does > > not check the dump logs from > > reassoc1 pass. It just checks the runtime result. > > > > Is it OK for trunk? > > > > Thanks! > > -Zhenqiang > > > > ChangeLog > > 2013-08-05 Zhenqiang Chen > > > > * tree-ssa-reassoc.c (optimize_range_tests): Reasociate > > X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into > > ((X - CST1) & ~(CST2 - CST1)) == 0. > > > > testsuite/ChangeLog > > 2013-08-05 Zhenqiang Chen > > > > * gcc.dg/reassoc1.c: New testcase. > This is an interesting transformation. I suspect we'd want to gate it on > something like BRANCH_COST. For a related example, see how we handle > (a != 0) || (b != 0) -> (a | b) != 0 in fold-const.c. I suspect the conditions > under which we want to do your transformation are going to be similar if not > the same as those transformations in fold-const.c. Thank you for the comments. I will update the patch to check BRANCH_COST. > Note I've been suggesting the bits I'm referring to in fold-const.c move out > into the tree-ssa optimizers. If they fit well into tree-ssa-reassoc.c I'd look > favorably upon a patch which moved them. The code is similar with the code (in tree-ssa-reassoc.c) for Optimize X == CST1 || X == CST2 if popcount (CST1 ^ CST2) == 1 into (X & ~(CST1 ^ CST2)) == (CST1 & ~(CST1 ^ CST2)) > As far as testing, the way to go will be to look at the reassoc1 dumps, but skip > the test on targets with the "wrong" branch cost. > tree-ssa/vrp87.c has a suitable condition to filter out the test on targets were > the branch cost is too small. I will update the test case. > Out of curiosity, what prompted you to make this transformation? Was it > mostly to deal with a codesize problem or is it a significant win on some hot > code in a benchmark, or something else? Closely related, have you done > anything like instrument the transformation to see how often it applies > during a GCC bootstrap? It comes from Coremark. The code is: if (NEXT_SYMBOL == '+' || NEXT_SYMBOL == '-') For ARM, there are three instructions rather than 4 (in thumb state). For some older gcc, I got performance improvement on ARM chromebook. But I can not reproduce the performance gain now. I will collect logs during GCC bootstrap. Thanks! -Zhenqiang
RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0
> -Original Message- > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Thursday, October 10, 2013 7:13 PM > To: Zhenqiang Chen > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - > CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0 > > On Mon, Aug 05, 2013 at 04:08:58PM +0800, Zhenqiang Chen wrote: > > ChangeLog > > 2013-08-05 Zhenqiang Chen > > > > * tree-ssa-reassoc.c (optimize_range_tests): Reasociate > > X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into > > ((X - CST1) & ~(CST2 - CST1)) == 0. > > > > testsuite/ChangeLog > > 2013-08-05 Zhenqiang Chen > > > > * gcc.dg/reassoc1.c: New testcase. > > + /* Optimize X == CST1 || X == CST2 > + if popcount (CST2 - CST1) == 1 into > + ((X - CST1) & ~(CST2 - CST1)) == 0. */ if (!any_changes && > + (opcode == BIT_IOR_EXPR)) > > Wouldn't it be better to put it into the same loop that handles the other > merges, rather than separate? As you had mentioned, the transition in this patch does not reduce instructions. But the preexisting optimization does. So we prefer to do the preexisting optimization first. E.g. X == 10 || X == 12 || X == 26 The preexisting one will optimize X == 10 || X == 26. And the patch will transfer X == 10 || X == 12. So I prefer to separate them. > Why the !any_changes guard? Why opcode > == BIT_IOR_EXPR guard? Can't you optimize X != CST1 && X != CST2 if > popcount (CST2 - CST1) == 1 similarly into ((X - CST1) & ~(CST2 - CST1)) != 0? They are not necessary. Patch is updated to check only the BRANCH_COST. > Also, like the preexisting optimization has been generalized for ranges, can't > you generalize this one too? > > I mean if you have say: > (X - 43U) <= 3U || (X - 75U) <= 3U > (originally e.g. X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46 || > X == 75 || X == 45) where popcount (75U - 43U) == 1, can't this be: > ((X - 43U) & ~(75U - 43U)) <= 3U > > For X in [LOWCST1, HIGHCST1] || X in [LOWCST2, HIGHCST2] the > requirements for this optimization would be > 1) LOWCST2 > HIGHCST1 > 2) HIGHCST1 - LOWCST1 == HIGHCST2 - LOWCST2 > 3) popcount (LOWCST2 - LOWCST1) == 1 > > 1) and 2) are the same requirements for the other optimization, while 3) > would be specific to this and would be used only if popcount (LOWCST1 ^ > LOWCST2) != 1. > Because of 1), LOWCST2 - LOWCST1 should be bigger than > HIGHCST1 - LOWCST1 (i.e. the value we <= against in the end), thus IMHO it > should work fine even after generalization. Patch is updated. > +if (tree_log2 (tem) < 0) > + continue; > > This is I guess cheaper than what I was doing there: > gcc_checking_assert (!integer_zerop (lowxor)); > tem = fold_binary (MINUS_EXPR, type, lowxor, > build_int_cst (type, 1)); > if (tem == NULL_TREE) > continue; > tem = fold_binary (BIT_AND_EXPR, type, lowxor, tem); > if (tem == NULL_TREE || !integer_zerop (tem)) > continue; > to check for popcount (x) == 1. Can you replace it even in the preexisting > code? > if (tree_log2 (lowxor) < 0) > continue; Updated. > Of course, if the two optimizations are merged into the same loop, then > some of the continue will need to be replaced by just conditional code inside > of the loop, handling the two different optimizations. > > Thanks for working on this. > > Jakub Bootstrap and no make check regression on X86-64 and ARM. Thanks! -Zhenqiang reassoc-new.patch Description: Binary data
RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0
> -Original Message- > From: Jeff Law [mailto:l...@redhat.com] > Sent: Friday, October 11, 2013 1:20 PM > To: Zhenqiang Chen > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - > CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0 > > On 10/10/13 03:25, Zhenqiang Chen wrote: > > > > > It comes from Coremark. The code is: > > > > if (NEXT_SYMBOL == '+' || NEXT_SYMBOL == '-') > I should have guessed ;-) > > > > > > For ARM, there are three instructions rather than 4 (in thumb state). > > For some older gcc, I got performance improvement on ARM chromebook. > > But I can not reproduce the performance gain now. > > > > I will collect logs during GCC bootstrap. > Thanks. It doesn't have to be anything particularly complex. When I'm > looking at a transformation I usually just put a printf when it triggers and grep > for the string in a make log. Thanks. I check the make log. There are 1906 occurrences. > Depending on what I'm doing, I may dig more deeply into the situations > where its triggering to make sure I fully understand the primary and > secondary effects of a transformation which often leads to tweaking the > implementation. I don't think that level of rigor is needed here. > > Jeff >
RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0
Patch and test cases are updated according to your comments. Bootstrap and no make check regression on X86-64. ChangeLog: 2013-10-14 Zhenqiang Chen * tree-ssa-reassoc.c (try_transfer_range_tests_1): New function, extracted from optimize_range_tests * (try_transfer_range_tests_2): New function. * (try_transfer_range_tests): New function, extracted from optimize_range_tests and calls try_transfer_range_tests_1 and try_transfer_range_tests_2, * (optimize_range_tests): Use try_transfer_range_tests. testsuite/ChangeLog: 2013-10-14 Zhenqiang Chen * gcc.dg/tree-ssa/reassoc-32.c: New test case. * gcc.dg/tree-ssa/reassoc-33.c: New test case. * gcc.dg/tree-ssa/reassoc-34.c: New test case. * gcc.dg/tree-ssa/reassoc-35.c: New test case. * gcc.dg/tree-ssa/reassoc-36.c: New test case. > -Original Message- > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Saturday, October 12, 2013 3:40 PM > To: Zhenqiang Chen > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - > CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0 > > On Sat, Oct 12, 2013 at 10:08:12AM +0800, Zhenqiang Chen wrote: > > As you had mentioned, the transition in this patch does not reduce > > instructions. But the preexisting optimization does. So we prefer to > > do the preexisting optimization first. E.g. > > > > X == 10 || X == 12 || X == 26 > > Ok, that makes sense. > > @@ -2279,6 +2275,71 @@ optimize_range_tests (enum tree_code opcode, > } > } > > + /* Optimize X == CST1 || X == CST2 > + if popcount (CST2 - CST1) == 1 into > + ((X - CST1) & ~(CST2 - CST1)) == 0. */ > > Mention here also similarly to the other comment that it works also with > ranges. > > + if (BRANCH_COST (optimize_function_for_speed_p (cfun), false) >= 2) > +for (i = first; i < length; i++) > + { > + tree lowi, highi, lowj, highj, type, tem1, tem2, mask; > + > + if (ranges[i].exp == NULL_TREE || ranges[i].in_p) > + continue; > + type = TREE_TYPE (ranges[i].exp); > + if (!INTEGRAL_TYPE_P (type)) > + continue; > + lowi = ranges[i].low; > + if (lowi == NULL_TREE) > + continue; > > The other loop has here: > if (lowi == NULL_TREE) > lowi = TYPE_MIN_VALUE (type); > instead, which is better, can you please change it? > > + highi = ranges[i].high; > + if (highi == NULL_TREE) > + continue; > + for (j = i + 1; j < length && j < i + 64; j++) > + { > + lowj = ranges[j].low; > + if (lowj == NULL_TREE) > + continue; > + highj = ranges[j].high; > + if (highj == NULL_TREE) > + continue; > > The other loop has > if (highj == NULL_TREE) > highj = TYPE_MAX_VALUE (type); > here instead. > > + if (ranges[j].exp == NULL_TREE || ranges[j].in_p > + || (ranges[i].exp != ranges[j].exp)) > + continue; > > Can you please move this test the lowj = assignment, and remove the > ranges[j].exp == NULL_TREE test (both here and in the earlier loop)? I mean, > we've checked at the beginning of for (i = first; i < length; i++) loop that > ranges[i].exp is not NULL_TREE, and if ranges[j].exp is NULL_TREE, it will be > different than ranges[i].exp. > So > if (ranges[i].exp != ranges[j].exp || ranges[j].in_p) > continue; > (and please also collapse the two checks in the first loop, so that both look > similar). > > + /* Check lowj > highi. */ > + tem1 = fold_binary (GT_EXPR, boolean_type_node, > +lowj, highi); > + if (tem1 == NULL_TREE || !integer_onep (tem1)) > + continue; > + /* Check highi - lowi == highj - lowj. */ > + tem1 = fold_binary (MINUS_EXPR, type, highi, lowi); > + if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST) > + continue; > + tem2 = fold_binary (MINUS_EXPR, type, highj, lowj); > + if (tem2 == NULL_TREE || TREE_CODE (tem2) != INTEGER_CST) > + continue; > + if (!tree_int_cst_equal (tem1, tem2)) > + continue; > + /* Check popcount (lowj - lowi) == 1. */ > + tem1 = fold_binary (MINUS_EXPR, type, lowj, lowi); > + if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST) > + continue; > + if (tree_log2 (tem1) < 0) > + continue; > + mask = fold_build1 (BIT_NOT_EXPR, type, tem1); > + tem1 = fold_binary (MINUS_EXPR, type, ranges[i].exp, l
RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0
> -Original Message- > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Monday, October 14, 2013 4:49 PM > To: Zhenqiang Chen > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - > CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0 > > On Mon, Oct 14, 2013 at 03:10:12PM +0800, Zhenqiang Chen wrote: > > @@ -2131,6 +2133,155 @@ update_range_test (struct range_entry *range, > struct range_entry *otherrange, >return true; > } > > +/* Optimize X == CST1 || X == CST2 > + if popcount (CST1 ^ CST2) == 1 into > + (X & ~(CST1 ^ CST2)) == (CST1 & ~(CST1 ^ CST2)). > + Similarly for ranges. E.g. > + X != 2 && X != 3 && X != 10 && X != 11 > + will be transformed by the previous optimization into > + (X - 2U) <= 1U && (X - 10U) <= 1U > + and this loop can transform that into > + ((X & ~8) - 2U) <= 1U. */ > + > +static bool > +try_transfer_range_tests_1 (enum tree_code opcode, int i, int j, tree type, > + tree lowi, tree lowj, tree highi, tree highj, > + vec *ops, > + struct range_entry *ranges) > > The function names are bad, you aren't transfering anything, but optimizing. > Please rename try_transfer_range_tests to optimize_range_tests_1 and > try_transfer_range_tests_{1,2} to optimize_range_tests_{2,3} or perhaps > better yet to optimize_range_tests_{xor,diff}. try_transfer_range_tests is changed to optimize_range_tests_1 try_transfer_range_tests_1 is changed to optimize_range_tests_xor try_transfer_range_tests_2 is changed to optimize_range_tests_diff > Also, perhaps instead of passing ranges and i and j to these two functions > you could pass struct range_entry *range1, struct range_entry *range2 > (caller would pass ranges + i, ranges + j). Updated to rangei and rangej since we use i, j, lowi, lowj etc at other places. > +/* It does some common checks for function try_transfer_range_tests_1 > and > + try_transfer_range_tests_2. > > Please adjust the comment for the renaming. Please change trans_option to > bool optimize_xor. Updated. > + if (trans_option == 1) > + { > + if (try_transfer_range_tests_1 (opcode, i, j, type, lowi, lowj, > + highi, highj, ops, ranges)) > + { > + any_changes = true; > + break; > + } > + } > + else if (trans_option == 2) > + { > + if (try_transfer_range_tests_2 (opcode, i, j, type, lowi, lowj, > + highi, highj, ops, ranges)) > + { > + any_changes = true; > + break; > + } > + } > > I'd prefer > if (optimize_xor) > any_changes > = optimize_range_tests_xor (opcode, type, lowi, lowj, highi, > highj, ops, ranges + i, ranges + j); > else > any_changes > = optimize_range_tests_xor (opcode, type, lowi, lowj, highi, > highj, ops, ranges + i, ranges + j); > if (any_changes) > break; This is incorrect. The "any_changes" should be "|=" of the return values. In the updated patch, I changed the code segment as bool changes; ... if (optimize_xor) changes = optimize_range_tests_xor (opcode, type, lowi, lowj, highi, highj, ops, ranges + i, ranges + j); else changes = optimize_range_tests_diff (opcode, type, lowi, lowj, highi, highj, ops, ranges + i, ranges + j); if (changes) { any_changes = true; break; } Is it OK? Thanks! -Zhenqiang > Ok with those changes. > > Jakub reassoc-final.patch Description: Binary data
[PATCH] Enhance ifcombine to recover non short circuit branches
Hi, The patch enhances ifcombine pass to recover some non short circuit branches. Basically, it will do the following transformation: Case 1: if (cond1) if (cond2) ==> if (cond1 && cond2) Case 2: if (cond1) goto L1 if (cond2) goto L1 ==> if (cond1 || cond2) goto L1 Case 3: if (cond1) goto L1 else goto L2 L1: if (cond2) goto L2 ==> if (invert (cond1) || cond2) goto L2 Case 4: if (cond1) goto L1 if (cond2) goto L2 L1: ==> if (invert (cond1) && cond2) goto L2 Bootstrap on X86-64 and ARM. Two new FAILs in regression tests: gcc.dg/uninit-pred-8_b.c gcc.dg/uninit-pred-9_b.c uninit pass should be enhanced to handle more complex conditions. Will submit a bug to track it and fix it later. Is it OK for trunk? Thanks! -Zhenqiang ChangeLog: 2013-10-16 Zhenqiang Chen * fold-const.c (simple_operand_p_2): Make it global. * tree.h (simple_operand_p_2): Declare it. * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. (bb_has_overhead_p, generate_condition_node, ifcombine_ccmp): New functions. (ifcombine_fold_ifandif): New function, extracted from ifcombine_ifandif. (ifcombine_ifandif): Call ifcombine_ccmp. (tree_ssa_ifcombine_bb): Skip optimized bb. testsuite/ChangeLog 2013-10-16 Zhenqiang Chen * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case. * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Updated. ifcombine.patch Description: Binary data
Re: [PATCH] Enhance ifcombine to recover non short circuit branches
On 18 October 2013 00:58, Jeff Law wrote: > On 10/17/13 05:03, Richard Biener wrote: Is it OK for trunk? >>> >>> >>> I had a much simpler change which did basically the same from 4.7 (I >>> can update it if people think this is a better approach). >> >> >> I like that more (note you can now use is_gimple_condexpr as predicate >> for force_gimple_operand). > > The obvious question is whether or not Andrew's simpler change picks up as > many transformations as Zhenqiang's change. If not are the things missed > important. > > Zhenqiang, can you do some testing of your change vs Andrew P.'s change? Here is a rough compare: 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included in my patch). Root cause is that it does not skip "LABEL". The guard to do this opt should be the same the bb_has_overhead_p in my patch. 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate _3 = a_2(D) > 0; _5 = b_4(D) > 0; _6 = _3 | _5; _9 = c_7(D) <= 0; _10 = ~_6; _11 = _9 & _10; if (_11 == 0) With my patch, it will generate _3 = a_2(D) > 0; _5 = b_4(D) > 0; _6 = _3 | _5; _9 = c_7(D) > 0; _10 = _6 | _9; if (_10 != 0) 3) The good thing of Andrew P.'s change is that "Inverse the order of the basic block walk" so it can do combine recursively. But I think we need some heuristic to control the number of ifs. Move too much compares from the inner_bb to outer_bb is not good. 4) Another good thing of Andrew P.'s change is that it reuses some existing functions. So it looks much simple. >> >> With that we should be able to kill the fold-const.c transform? > > That would certainly be nice and an excellent follow-up for Zhenqiang. That's my final goal to "kill the fold-const.c transform". I think we may combine the two changes to make a "simple" and "good" patch. Thanks! -Zhenqiang
Re: [PATCH] Enhance ifcombine to recover non short circuit branches
On 18 October 2013 17:53, Richard Biener wrote: > On Fri, Oct 18, 2013 at 11:21 AM, Zhenqiang Chen > wrote: >> On 18 October 2013 00:58, Jeff Law wrote: >>> On 10/17/13 05:03, Richard Biener wrote: >>>>>> >>>>>> Is it OK for trunk? >>>>> >>>>> >>>>> I had a much simpler change which did basically the same from 4.7 (I >>>>> can update it if people think this is a better approach). >>>> >>>> >>>> I like that more (note you can now use is_gimple_condexpr as predicate >>>> for force_gimple_operand). >>> >>> The obvious question is whether or not Andrew's simpler change picks up as >>> many transformations as Zhenqiang's change. If not are the things missed >>> important. >>> >>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change? >> >> Here is a rough compare: >> >> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included >> in my patch). Root cause is that it does not skip "LABEL". The guard >> to do this opt should be the same the bb_has_overhead_p in my patch. > > I think we want a "proper" predicate in tree-cfg.c for this, like maybe > a subset of tree_forwarder_block_p or whatever it will end up looking > like (we need "effectively empty BB" elsewhere, for example in vectorization, > add a flag to allow a condition ending the BB and the predicate is done). > >> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not >> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate >> >> _3 = a_2(D) > 0; >> _5 = b_4(D) > 0; >> _6 = _3 | _5; >> _9 = c_7(D) <= 0; >> _10 = ~_6; >> _11 = _9 & _10; >> if (_11 == 0) >> >> With my patch, it will generate >> >> _3 = a_2(D) > 0; >> _5 = b_4(D) > 0; >> _6 = _3 | _5; >> _9 = c_7(D) > 0; >> _10 = _6 | _9; >> if (_10 != 0) > > But that seems like a missed simplification in predicate combining > which should be fixed more generally. > >> 3) The good thing of Andrew P.'s change is that "Inverse the order of >> the basic block walk" so it can do combine recursively. >> >> But I think we need some heuristic to control the number of ifs. Move >> too much compares from >> the inner_bb to outer_bb is not good. > > True, but that's what fold-const.c does, no? Based on current fold-const, we can not generate more than "two" compares in a basic block. But if ifs can be combined recursively in ifcombine, we might generate many compares in a basic block. >> 4) Another good thing of Andrew P.'s change is that it reuses some >> existing functions. So it looks much simple. > > Indeed - that's what I like about it. > >>>> >>>> With that we should be able to kill the fold-const.c transform? >>> >>> That would certainly be nice and an excellent follow-up for Zhenqiang. >> >> That's my final goal to "kill the fold-const.c transform". I think we >> may combine the two changes to make a "simple" and "good" patch. > > Thanks, > Richard. > >> Thanks! >> -Zhenqiang
Re: [PATCH] Enhance ifcombine to recover non short circuit branches
On 18 October 2013 18:15, Richard Biener wrote: > On Fri, Oct 18, 2013 at 12:06 PM, Zhenqiang Chen > wrote: >> On 18 October 2013 17:53, Richard Biener wrote: >>> On Fri, Oct 18, 2013 at 11:21 AM, Zhenqiang Chen >>> wrote: >>>> On 18 October 2013 00:58, Jeff Law wrote: >>>>> On 10/17/13 05:03, Richard Biener wrote: >>>>>>>> >>>>>>>> Is it OK for trunk? >>>>>>> >>>>>>> >>>>>>> I had a much simpler change which did basically the same from 4.7 (I >>>>>>> can update it if people think this is a better approach). >>>>>> >>>>>> >>>>>> I like that more (note you can now use is_gimple_condexpr as predicate >>>>>> for force_gimple_operand). >>>>> >>>>> The obvious question is whether or not Andrew's simpler change picks up as >>>>> many transformations as Zhenqiang's change. If not are the things missed >>>>> important. >>>>> >>>>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change? >>>> >>>> Here is a rough compare: >>>> >>>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included >>>> in my patch). Root cause is that it does not skip "LABEL". The guard >>>> to do this opt should be the same the bb_has_overhead_p in my patch. >>> >>> I think we want a "proper" predicate in tree-cfg.c for this, like maybe >>> a subset of tree_forwarder_block_p or whatever it will end up looking >>> like (we need "effectively empty BB" elsewhere, for example in >>> vectorization, >>> add a flag to allow a condition ending the BB and the predicate is done). >>> >>>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not >>>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate >>>> >>>> _3 = a_2(D) > 0; >>>> _5 = b_4(D) > 0; >>>> _6 = _3 | _5; >>>> _9 = c_7(D) <= 0; >>>> _10 = ~_6; >>>> _11 = _9 & _10; >>>> if (_11 == 0) >>>> >>>> With my patch, it will generate >>>> >>>> _3 = a_2(D) > 0; >>>> _5 = b_4(D) > 0; >>>> _6 = _3 | _5; >>>> _9 = c_7(D) > 0; >>>> _10 = _6 | _9; >>>> if (_10 != 0) >>> >>> But that seems like a missed simplification in predicate combining >>> which should be fixed more generally. >>> >>>> 3) The good thing of Andrew P.'s change is that "Inverse the order of >>>> the basic block walk" so it can do combine recursively. >>>> >>>> But I think we need some heuristic to control the number of ifs. Move >>>> too much compares from >>>> the inner_bb to outer_bb is not good. >>> >>> True, but that's what fold-const.c does, no? >> >> Based on current fold-const, we can not generate more than "two" >> compares in a basic block. > > I think you are wrong as fold is invoked recursively on a && b && c && d. Please try to compile it with -fdump-tree-gimple. It will be broken into if (a && b) if (c && d) >> But if ifs can be combined recursively in ifcombine, we might generate >> many compares in a basic block. > > Yes, we can end up with that for the other simplifications done in > ifcombine as well. Eventually there needs to be a cost model for this > (use edge probabilities for example, so that if you have profile data > you never combine cold blocks). > > Any takers? I will take it. Thanks! -Zhenqiang > Richard. > >>>> 4) Another good thing of Andrew P.'s change is that it reuses some >>>> existing functions. So it looks much simple. >>> >>> Indeed - that's what I like about it. >>> >>>>>> >>>>>> With that we should be able to kill the fold-const.c transform? >>>>> >>>>> That would certainly be nice and an excellent follow-up for Zhenqiang. >>>> >>>> That's my final goal to "kill the fold-const.c transform". I think we >>>> may combine the two changes to make a "simple" and "good" patch. >>> >>> Thanks, >>> Richard. >>> >>>> Thanks! >>>> -Zhenqiang
[PATCH] Fix PR58775
Hi, The patch fixes PR58775. Bootstrap and no make check regression on X86-64? Is it OK? Thanks! -Zhenqiang ChangeLog: 2013-10-18 Zhenqiang Chen PR tree-optimization/58775 * tree-ssa-reassoc.c (next_stmt_of): New function. (not_dominated_by): Check next_stmt_of for new statement. (rewrite_expr_tree): Call ensure_ops_are_available. testsuite/ChangeLog: 2013-10-18 Zhenqiang Chen * gcc.dg/tree-ssa/pr58775.c: New test case. pr58775.patch Description: Binary data
Re: [PATCH] Enhance ifcombine to recover non short circuit branches
On 18 October 2013 10:13, Andrew Pinski wrote: > On Thu, Oct 17, 2013 at 6:39 PM, Andrew Pinski wrote: >> On Thu, Oct 17, 2013 at 4:03 AM, Richard Biener >> wrote: >>> On Thu, Oct 17, 2013 at 4:14 AM, Andrew Pinski wrote: >>>> On Wed, Oct 16, 2013 at 2:12 AM, Zhenqiang Chen >>>> wrote: >>>>> Hi, >>>>> >>>>> The patch enhances ifcombine pass to recover some non short circuit >>>>> branches. Basically, it will do the following transformation: >>>>> >>>>> Case 1: >>>>> if (cond1) >>>>> if (cond2) >>>>> ==> >>>>> if (cond1 && cond2) >>>>> >>>>> Case 2: >>>>> if (cond1) >>>>> goto L1 >>>>> if (cond2) >>>>> goto L1 >>>>> ==> >>>>> if (cond1 || cond2) >>>>> goto L1 >>>>> >>>>> Case 3: >>>>> if (cond1) >>>>> goto L1 >>>>> else >>>>> goto L2 >>>>> L1: >>>>> if (cond2) >>>>> goto L2 >>>>> ==> >>>>> if (invert (cond1) || cond2) >>>>> goto L2 >>>>> >>>>> Case 4: >>>>> if (cond1) >>>>> goto L1 >>>>> if (cond2) >>>>> goto L2 >>>>> L1: >>>>> ==> >>>>> if (invert (cond1) && cond2) >>>>> goto L2 >>>>> >>>>> Bootstrap on X86-64 and ARM. >>>>> >>>>> Two new FAILs in regression tests: >>>>> gcc.dg/uninit-pred-8_b.c >>>>> gcc.dg/uninit-pred-9_b.c >>>>> uninit pass should be enhanced to handle more complex conditions. Will >>>>> submit a bug to track it and fix it later. >>>>> >>>>> Is it OK for trunk? >>>> >>>> I had a much simpler change which did basically the same from 4.7 (I >>>> can update it if people think this is a better approach). >>> >>> I like that more (note you can now use is_gimple_condexpr as predicate >>> for force_gimple_operand). >> >> >> Ok, with both this email and Jakub's email, I decided to port the >> patch to the trunk but I ran into a bug in reassoc which I submitted >> as PR 58775 (with a testcase which shows the issue without this >> patch). > > I forgot to say that the tree-ssa-ifcombine.c hunk of the 4.7 patch > applies correctly. Once PR 58775 is fixed, I will be testing and > submitting the full patch including with the testcases from Zhenqiang. I just post a patch to fix PR 58775. With the patch, your changes can bootstrap on X86-64. Thanks! -Zhenqiang > Thanks, > Andrew > >> >> >>> >>> With that we should be able to kill the fold-const.c transform? >> >> >> I think so but I have never tested removing it. >> >> Thanks, >> Andrew Pinski >> >> >>> >>> Thanks, >>> Richard. >>> >>>> Thanks, >>>> Andrew Pinski >>>> >>>> >>>> 2012-09-29 Andrew Pinski >>>> >>>> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. >>>> (ifcombine_ifandif): Handle cases where >>>> maybe_fold_and_comparisons fails, combining the branches >>>> anyways. >>>> (tree_ssa_ifcombine): Inverse the order of >>>> the basic block walk, increases the number of combinings. >>>> * Makefile.in (tree-ssa-ifcombine.o): Update dependencies. >>>> >>>> * testsuite/gcc.dg/tree-ssa/phi-opt-2.c: Expect zero ifs as the >>>> compiler >>>> produces a & b now. >>>> * testsuite/gcc.dg/tree-ssa/phi-opt-9.c: Use a function call >>>> to prevent conditional move to be used. >>>> * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for >>>> "one or more intermediate". >>>> >>>> >>>>> >>>>> Thanks! >>>>> -Zhenqiang >>>>> >>>>> ChangeLog: >>>>> 2013-10-16 Zhenqiang Chen >>>>> >>>>> * fold-const.c (simple_operand_p_2): Make it global. >>>>> * tree.h (simple_operand_p_2): Declare it. >>>>> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. >>>>> (bb_has_overhead_p, generate_condition_node, >>>>> ifcombine_ccmp): New functions. >>>>> (ifcombine_fold_ifandif): New function, extracted from >>>>> ifcombine_ifandif. >>>>> (ifcombine_ifandif): Call ifcombine_ccmp. >>>>> (tree_ssa_ifcombine_bb): Skip optimized bb. >>>>> >>>>> testsuite/ChangeLog >>>>> 2013-10-16 Zhenqiang Chen >>>>> >>>>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case. >>>>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case. >>>>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case. >>>>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case. >>>>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case. >>>>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case. >>>>> * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Updated.
Re: [PATCH] Fix PR58775
On 18 October 2013 19:09, Jakub Jelinek wrote: > On Fri, Oct 18, 2013 at 06:36:44PM +0800, Zhenqiang Chen wrote: > --- a/gcc/tree-ssa-reassoc.c > +++ b/gcc/tree-ssa-reassoc.c > @@ -2861,6 +2861,19 @@ swap_ops_for_binary_stmt (vec ops, > } > } > > +/* Determine if stmt A is in th next list of stmt B. */ > +static inline bool > +next_stmt_of (gimple a, gimple b) > +{ > + gimple_stmt_iterator gsi; > + for (gsi = gsi_for_stmt (b); !gsi_end_p (gsi); gsi_next (&gsi)) > +{ > + if (gsi_stmt (gsi) == a) > + return true; > +} > + return false; > +} > > How is that different from appears_later_in_bb? More importantly, > not_dominated_by shouldn't be called with the same uid and basic block, > unless the stmts are the same, except for the is_gimple_debug case > (which looks like a bug). And more importantly, if a stmt has zero uid, > we'd better set it from the previous or next non-zero uid stmt, so that > we don't lineary traverse the whole bb many times. Thanks for the comments. Patch is updated to set uid in update_range_test after force_gimple_operand_gsi. I am not sure the patch can cover all cases. If not, I think force_gimple_operand_gsi_1 maybe the only place to set uid for all new generated statements. Thanks! -Zhenqiang > That said, I've lost track of the bugfixes for this, don't know if there is > any patch still pending or not. > > Jakub pr58775-2.patch Description: Binary data
RE: [PATCH 1/n] Add conditional compare support
Hi, The patch is updated according to the comments. Changes include: * Rewrite codes according to Richard Biener's comments. * Change the algorithm to recursively combine compares. So it can handle any number of compares. * Add a set of instruction patterns in ARM backend to match the conditional compares. With this patch, the conditional compare sequence is like CC1 = CCMP (CMP (a, b), CMP (c, d)); CC2 = CCMP (NE (CC1, 0), CMP (e, f)); ... CCn/reg = CCMP (NE (CCn-1, 0), CMP (...)); To test more than two compares, you need the patch discussed in the thread. http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg63743.html Bootstrap and no make check regression for both ARM and THUMB modes on ARM Chromebook. ChangeLog: 2013-10-22 Zhenqiang Chen * config/arm/arm.c (arm_fixed_condition_code_regs, arm_ccmode_to_code, arm_select_dominance_ccmp_mode): New functions. (arm_select_dominance_cc_mode_1): New function extracted from arm_select_dominance_cc_mode. (arm_select_dominance_cc_mode): Call arm_select_dominance_cc_mode_1. * config/arm/arm.md (ccmp, cbranchcc4, ccmp_and, ccmp_ior, ccmp_ior_scc_scc, ccmp_ior_scc_scc_cmp, ccmp_and_scc_scc, ccmp_and_scc_scc_cmp): New. * config/arm/arm-protos.h (arm_select_dominance_ccmp_mode): New. * expr.c (ccmp_candidate_p, used_in_cond_stmt_p, expand_ccmp_expr_2, expand_ccmp_expr_3, expand_ccmp_expr_1, expand_ccmp_expr): New. (expand_expr_real_1): Handle ccmp. * optabs.c: Include gimple.h. (expand_ccmp_op): New. (get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR. * optabs.def (ccmp): New. * optabs.h (expand_ccmp_op): New. * doc/md.texi (ccmp): New index. Thanks! -Zhenqiang > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Zhenqiang Chen > Sent: Monday, September 23, 2013 2:50 PM > To: Richard Earnshaw > Cc: 'Richard Biener'; GCC Patches > Subject: RE: [PATCH 1/n] Add conditional compare support > > > > > -Original Message- > > From: Richard Earnshaw > > Sent: Thursday, September 19, 2013 5:13 PM > > To: Zhenqiang Chen > > Cc: 'Richard Biener'; GCC Patches > > Subject: Re: [PATCH 1/n] Add conditional compare support > > > > On 18/09/13 10:45, Zhenqiang Chen wrote: > > > > > >> -Original Message- > > >> From: Richard Biener [mailto:richard.guent...@gmail.com] > > >> Sent: Tuesday, August 27, 2013 8:18 PM > > >> To: Richard Earnshaw > > >> Cc: Zhenqiang Chen; GCC Patches > > >> Subject: Re: [PATCH 1/n] Add conditional compare support > > >> > > >> On Tue, Aug 27, 2013 at 1:56 PM, Richard Earnshaw > > >> > > >> wrote: > > >>> On 27/08/13 12:10, Richard Biener wrote: > > >>>> What's this for and what's the desired semantics? I don't like > > >>>> having extra tree codes for this. Is this for a specific > > >>>> instruction set feature? > > >>> > > >>> The background is to support the conditional compare instructions > > >>> in ARM (more effectively) and AArch64 at all. > > >>> > > >>> The current method used in ARM is to expand into a series of > > >>> store-flag instructions and then hope that combine can optimize > > >>> them away (though that fails far too often, particularly when the > > >>> first instruction in the sequence is combined into another > > >>> pattern). To make it work at all the compiler has to lie about > > >>> the costs of various store-flag type operations which overall > > >>> risks producing worse code and means we also have to support many > > >>> more complex multi-instruction patterns than is desirable. I > > >>> really don't want to go down the same > > > route > > >> for AArch64. > > >>> > > >>> The idea behind all this is to capture potential conditional > > >>> compare operations early enough in the mid end that we can keep > > >>> track of them until RTL expand time and then to emit the correct > > >>> logic on all targets depending on what is the best thing for that > > >>> target. The current method of lowering into store-flag sequences > > >>> doesn't really cut > > > it. > > >> > > >> It seems to me that then the initial instruction selection process > > >> (aka > > > RTL > >
RE: [PATCH 1/n] Add conditional compare support
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Thursday, October 24, 2013 12:14 AM > To: Zhenqiang Chen; Richard Earnshaw; 'Richard Biener' > Cc: GCC Patches > Subject: Re: [PATCH 1/n] Add conditional compare support > > > +static enum rtx_code > > +arm_ccmode_to_code (enum machine_mode mode) { > > + switch (mode) > > +{ > > +case CC_DNEmode: > > + return NE; > > Why would you need to encode comparisons in CCmodes? > That looks like a mis-design to me. The CCmodes are used to check whether the result of a previous conditional compare can combine with current compare. By changing it to rtx_code, I can reuse current code (arm_select_dominance_cc_mode_1) to check it. The CCmodes are also used to emit the "condition code" for a conditional execution. E.g. CC1 (CC_DGEmode) = CCMP (a >= 0, b >= 0) ==> cmp a, #0 cmpge b, #0 CC2 = CCMP ( CC1 != 0, c >= 0) ==> cmpge c, #0 The "ge" is from the mode of CC1 in "CC1 != 0". And the mode of CC2 is not necessary the same as CC1. > > +Conditional compare instruction. Operand 2 and 5 are RTLs which > > +perform two comparisons. Operand 1 is AND or IOR, which operates on > > +the result of Operand 2 and 5. Operand 0 is the result of operand 1. > > + > > +A typical @code{ccmp} pattern looks like > > + > > +@smallexample > > +(define_expand "ccmp" > > + [(set (match_operand 0 "register_operand" "") > > +(match_operator 1 "" > > + [(match_operator:SI 2 "comparison_operator" > > + [(match_operand:SI 3 "register_operand") > > + (match_operand:SI 4 "register_operand")]) > > + (match_operator:SI 5 "comparison_operator" > > + [(match_operand:SI 6 "register_operand") > > + (match_operand:SI 7 "register_operand")])]))] > > + "" > > + "@dots{}") > > +@end smallexample > > This documentation is inadequate. What sorts of input combinations are > valid? It depends on target. I will update it with your later comments on * How to use cmp_hook and ccmp_hook to check valid combinations. * How to write patterns to support more than two compares. > How is the middle-end expected to compose more than two compares, as > you describe in the mail? What mode should the middle-end use to allocate > that output register? The ccmp_candidate_p and expand_ccmp_expr_* are using recursively algorithm. One conditional compare can have two and only two compares. But if the operand is the result of a previous conditional compare, we have more than two compares. i.e. CC1 = CCMP (CMP1, CMP2) CC2 = CCMP (CC1 != 0, CMP3) ... CCn = CCMP (CCn-1 != 0, CMPn+1) And these are the representations we want to have on TREE/GIMPLE after front-end in my original patch. http://gcc.1065356.n5.nabble.com/PATCH-1-n-Add-conditional-compare-support-td961953.html > The only thing that makes a sort of sense to me is something akin to how the > old cmp + bcc patterns worked. Except that's not especially clean, and > there's a reason we got rid of them. > > I think the only clean interface is going to be a new hook or hooks. The > return value of the hook with be an rtx akin to the PTEST value returned by > prepare_cmp_insn. Which is a proper input to cbranch/cstore/etc. > > I might think that two hooks would be appropriate because it might make the > ccmp hook easier. I.e. > > res = targetm.cmp_hook(...); > while (more) > res = targetm.ccmp_hook(res, ...); > > For combinations that can't be implemented with ccmp, the hook can return > null. Yip. This will be more efficient. I will update the patch. In current patch, arm_select_dominance_cc_mode is the "cmp_hook" and arm_select_dominance_ccmp_mode is the "ccmp_hook". > > +;; The first compare in this pattern is the result of a previous CCMP. > > +;; We can not swap it. And we only need its flag. > > +(define_insn "*ccmp_and" > > + [(set (match_operand 6 "dominant_cc_register" "") > > + (compare > > +(and:SI > > + (match_operator 4 "expandable_comparison_operator" > > + [(match_operand 0 "dominant_cc_register" "") > > + (match_operand:SI 1 "arm_add_operand" "")]) > > + (match_operator:SI 5 "arm_comparison_operator" > > + [(match_operand:SI 2 "s_register_operand" > > + "l,r,r,r,r") > > + (match_operand:SI 3 "arm
[PATCH] Keep REG_INC note in subreg2 pass
Hi, REG_INC note is lost in subreg2 pass when resolve_simple_move, which might lead to wrong dependence for ira. e.g. In function validate_equiv_mem of ira.c, it checks REG_INC note: for (note = REG_NOTES (insn); note; note = XEXP (note, 1)) if ((REG_NOTE_KIND (note) == REG_INC || REG_NOTE_KIND (note) == REG_DEAD) && REG_P (XEXP (note, 0)) && reg_overlap_mentioned_p (XEXP (note, 0), memref)) return 0; Without REG_INC note, validate_equiv_mem will return a wrong result. Refer https://bugs.launchpad.net/gcc-linaro/+bug/1243022 for more detail about a real case in kernel. Bootstrap and no make check regression on X86-64 and ARM. Is it OK for trunk and 4.8? Thanks! -Zhenqiang ChangeLog: 2013-10-24 Zhenqiang Chen * lower-subreg.c (resolve_simple_move): Copy REG_INC note. testsuite/ChangeLog: 2013-10-24 Zhenqiang Chen * gcc.target/arm/lp1243022.c: New test. diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c index 57b4b3c..e710fa5 100644 --- a/gcc/lower-subreg.c +++ b/gcc/lower-subreg.c @@ -1056,6 +1056,22 @@ resolve_simple_move (rtx set, rtx insn) mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), 0); minsn = emit_move_insn (real_dest, mdest); +#ifdef AUTO_INC_DEC + /* Copy the REG_INC notes. */ + if (MEM_P (real_dest) && !(resolve_reg_p (real_dest) + || resolve_subreg_p (real_dest))) + { + rtx note = find_reg_note (insn, REG_INC, NULL_RTX); + if (note) + { + if (!REG_NOTES (minsn)) + REG_NOTES (minsn) = note; + else + add_reg_note (minsn, REG_INC, note); + } + } +#endif + smove = single_set (minsn); gcc_assert (smove != NULL_RTX); diff --git a/gcc/testsuite/gcc.target/arm/lp1243022.c b/gcc/testsuite/gcc.target/arm/lp1243022.c new file mode 100644 index 000..91a544d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/lp1243022.c @@ -0,0 +1,201 @@ +/* { dg-do compile { target arm_thumb2 } } */ +/* { dg-options "-O2 -fdump-rtl-subreg2" } */ + +/* { dg-final { scan-rtl-dump "REG_INC" "subreg2" } } */ +/* { dg-final { cleanup-rtl-dump "subreg2" } } */ +struct device; +typedef unsigned int __u32; +typedef unsigned long long u64; +typedef __u32 __le32; +typedef u64 dma_addr_t; +typedef unsigned gfp_t; +int dev_warn (const struct device *dev, const char *fmt, ...); +struct usb_bus +{ +struct device *controller; +}; +struct usb_hcd +{ +struct usb_bus self; +}; +struct xhci_generic_trb +{ +__le32 field[4]; +}; +union xhci_trb +{ +struct xhci_generic_trb generic; +}; +struct xhci_segment +{ +union xhci_trb *trbs; +dma_addr_t dma; +}; +struct xhci_ring +{ +struct xhci_segment *first_seg; +}; +struct xhci_hcd +{ +struct xhci_ring *cmd_ring; +struct xhci_ring *event_ring; +}; +struct usb_hcd *xhci_to_hcd (struct xhci_hcd *xhci) +{ +} +dma_addr_t xhci_trb_virt_to_dma (struct xhci_segment * seg, + union xhci_trb * trb); +struct xhci_segment *trb_in_td (struct xhci_segment *start_seg, +dma_addr_t suspect_dma); +xhci_test_trb_in_td (struct xhci_hcd *xhci, struct xhci_segment *input_seg, + union xhci_trb *start_trb, union xhci_trb *end_trb, + dma_addr_t input_dma, struct xhci_segment *result_seg, + char *test_name, int test_number) +{ +unsigned long long start_dma; +unsigned long long end_dma; +struct xhci_segment *seg; +start_dma = xhci_trb_virt_to_dma (input_seg, start_trb); +end_dma = xhci_trb_virt_to_dma (input_seg, end_trb); +{ +dev_warn (xhci_to_hcd (xhci)->self.controller, + "%d\n", test_number); +dev_warn (xhci_to_hcd (xhci)->self.controller, + "Expected seg %p, got seg %p\n", result_seg, seg); +} +} +xhci_check_trb_in_td_math (struct xhci_hcd *xhci, gfp_t mem_flags) +{ +struct +{ +dma_addr_t input_dma; +struct xhci_segment *result_seg; +} +simple_test_vector[] = +{ +{ +0, ((void *) 0) +} +, +{ +xhci->event_ring->first_seg->dma - 16, ((void *) 0)} +, +{ +xhci->event_ring->first_seg->dma - 1, ((void *) 0)} +, +{ +xhci->event_ring->first_seg->dma, xhci->event_ring->first_seg} +, +{ +xhci->event_ring->first_seg->dma + (64 - 1) * 16, +xhci->event_ring->first_seg +} +, +{ +xhci->event_ring->first_seg->dma + (64 - 1) * 16 + 1, ((void *) 0)} +, +{ +xhci->event_ring->first_seg->dma + (64) * 16, ((void *) 0)} +, +{ +(dma_addr_t) (~0), ((void *) 0) +} +}; +
Re: [PATCH] Enhance ifcombine to recover non short circuit branches
On 28 October 2013 02:55, Andrew Pinski wrote: > On Sat, Oct 26, 2013 at 4:49 PM, Andrew Pinski wrote: >> On Sat, Oct 26, 2013 at 2:30 PM, Andrew Pinski wrote: >>> On Fri, Oct 18, 2013 at 2:21 AM, Zhenqiang Chen >>> wrote: >>>> On 18 October 2013 00:58, Jeff Law wrote: >>>>> On 10/17/13 05:03, Richard Biener wrote: >>>>>>>> >>>>>>>> Is it OK for trunk? >>>>>>> >>>>>>> >>>>>>> I had a much simpler change which did basically the same from 4.7 (I >>>>>>> can update it if people think this is a better approach). >>>>>> >>>>>> >>>>>> I like that more (note you can now use is_gimple_condexpr as predicate >>>>>> for force_gimple_operand). >>>>> >>>>> The obvious question is whether or not Andrew's simpler change picks up as >>>>> many transformations as Zhenqiang's change. If not are the things missed >>>>> important. >>>>> >>>>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change? >>>> >>>> Here is a rough compare: >>>> >>>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included >>>> in my patch). Root cause is that it does not skip "LABEL". The guard >>>> to do this opt should be the same the bb_has_overhead_p in my patch. >>> >>> This should be an easy change, I am working on this right now. >>> >>>> >>>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not >>>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate >>>> >>>> _3 = a_2(D) > 0; >>>> _5 = b_4(D) > 0; >>>> _6 = _3 | _5; >>>> _9 = c_7(D) <= 0; >>>> _10 = ~_6; >>>> _11 = _9 & _10; >>>> if (_11 == 0) >>>> >>>> With my patch, it will generate >>>> >>>> _3 = a_2(D) > 0; >>>> _5 = b_4(D) > 0; >>>> _6 = _3 | _5; >>>> _9 = c_7(D) > 0; >>>> _10 = _6 | _9; >>>> if (_10 != 0) >>> >>> As mentioned otherwise, this seems like a missed optimization inside >>> forwprop. When I originally wrote this code there used to be two >>> cases one for & and one for |, but this was removed sometime and I >>> just made the code evolve with that. >> >> Actually I can make a small patch (3 lines) to my current patch which >> causes tree-ssa-ifcombine.c to produce the behavior of your patch. >> >> if (result_inv) >>{ >> t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t); >> result_inv = false; >>} >> >> I will submit a new patch after some testing of my current patch which >> fixes items 1 and 2. > > > Here is my latest patch which adds the testcases from Zhenqiang's > patch and fixes item 1 and 2. The patch works fine for me. Bootstrap and no make check regression on ARM Chromebook. Thanks! -Zhenqiang > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. > > Thanks, > Andrew Pinski > > ChangeLog: > * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. > (ifcombine_ifandif): Handle cases where > maybe_fold_and_comparisons fails, combining the branches > anyways. > (tree_ssa_ifcombine): Inverse the order of > the basic block walk, increases the number of combinings. > * gimple.h (gsi_start_nondebug_after_labels_bb): New function. > > > testsuite/ChangeLog: > > * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case. > * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case. > * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case. > * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case. > * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case. > * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case. > > * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent > conditional move to be used. > * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more > intermediate". > > > > >> >> Thanks, >> Andrew Pinski >> >> >>> >>>> >>>> 3) The good thing of Andrew P.'s change is that "Inverse the order of >>>> the basic block walk" so it can do combine recursively. >>>> >>>> But I think we need some heuristic to control the number of ifs. Move >>>> too much compares from >>>> the inner_bb to outer_bb is not good. >>> >>> >>> I think this depends on the target. For MIPS we don't want an upper >>> bound as integer comparisons go directly to GPRs. I wrote this patch >>> with MIPS in mind as that was the target I was working on. >>> >>>> >>>> 4) Another good thing of Andrew P.'s change is that it reuses some >>>> existing functions. So it looks much simple. >>> >>> >>> Thanks, >>> Andrew Pinski >>> >>>> >>>>>> >>>>>> With that we should be able to kill the fold-const.c transform? >>>>> >>>>> That would certainly be nice and an excellent follow-up for Zhenqiang. >>>> >>>> That's my final goal to "kill the fold-const.c transform". I think we >>>> may combine the two changes to make a "simple" and "good" patch. >>>> >>>> Thanks! >>>> -Zhenqiang
Re: [PATCH] Keep REG_INC note in subreg2 pass
On 30 October 2013 02:47, Jeff Law wrote: > On 10/24/13 02:20, Zhenqiang Chen wrote: >> >> Hi, >> >> REG_INC note is lost in subreg2 pass when resolve_simple_move, which >> might lead to wrong dependence for ira. e.g. In function >> validate_equiv_mem of ira.c, it checks REG_INC note: >> >> for (note = REG_NOTES (insn); note; note = XEXP (note, 1)) >> if ((REG_NOTE_KIND (note) == REG_INC >> || REG_NOTE_KIND (note) == REG_DEAD) >> && REG_P (XEXP (note, 0)) >> && reg_overlap_mentioned_p (XEXP (note, 0), memref)) >>return 0; >> >> Without REG_INC note, validate_equiv_mem will return a wrong result. >> >> Referhttps://bugs.launchpad.net/gcc-linaro/+bug/1243022 for more >> >> detail about a real case in kernel. >> >> Bootstrap and no make check regression on X86-64 and ARM. >> >> Is it OK for trunk and 4.8? >> >> Thanks! >> -Zhenqiang >> >> ChangeLog: >> 2013-10-24 Zhenqiang Chen >> >> * lower-subreg.c (resolve_simple_move): Copy REG_INC note. >> >> testsuite/ChangeLog: >> 2013-10-24 Zhenqiang Chen >> >> * gcc.target/arm/lp1243022.c: New test. > > This clearly handles adding a note when the destination is a MEM with a side > effect. What about cases where the side effect is associated with a load > from memory rather than a store to memory? Yes. We should handle load from memory. >> >> >> lp1243022.patch >> >> >> diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c >> index 57b4b3c..e710fa5 100644 >> --- a/gcc/lower-subreg.c >> +++ b/gcc/lower-subreg.c >> @@ -1056,6 +1056,22 @@ resolve_simple_move (rtx set, rtx insn) >> mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), 0); >> minsn = emit_move_insn (real_dest, mdest); >> >> +#ifdef AUTO_INC_DEC >> + /* Copy the REG_INC notes. */ >> + if (MEM_P (real_dest) && !(resolve_reg_p (real_dest) >> +|| resolve_subreg_p (real_dest))) >> + { >> + rtx note = find_reg_note (insn, REG_INC, NULL_RTX); >> + if (note) >> + { >> + if (!REG_NOTES (minsn)) >> + REG_NOTES (minsn) = note; >> + else >> + add_reg_note (minsn, REG_INC, note); >> + } >> + } >> +#endif > > If MINSN does not have any notes, then this results in MINSN and INSN > sharing the note. Note carefully that notes are chained (see implementation > of add_reg_note). Thus the sharing would result in MINSN and INSN actually > sharing a chain of notes. I'm pretty sure that's not what you intended. I > think you need to always use add_reg_note. Yes. I should use add_reg_note. Here is the updated patch: diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c index ebf364f..16dfa62 100644 --- a/gcc/lower-subreg.c +++ b/gcc/lower-subreg.c @@ -967,7 +967,20 @@ resolve_simple_move (rtx set, rtx insn) rtx reg; reg = gen_reg_rtx (orig_mode); + +#ifdef AUTO_INC_DEC + { + rtx move = emit_move_insn (reg, src); + if (MEM_P (src)) + { + rtx note = find_reg_note (insn, REG_INC, NULL_RTX); + if (note) + add_reg_note (move, REG_INC, XEXP (note, 0)); + } + } +#else emit_move_insn (reg, src); +#endif src = reg; } @@ -1057,6 +1070,16 @@ resolve_simple_move (rtx set, rtx insn) mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), 0); minsn = emit_move_insn (real_dest, mdest); +#ifdef AUTO_INC_DEC + if (MEM_P (real_dest) && !(resolve_reg_p (real_dest) +|| resolve_subreg_p (real_dest))) +{ + rtx note = find_reg_note (insn, REG_INC, NULL_RTX); + if (note) + add_reg_note (minsn, REG_INC, XEXP (note, 0)); +} +#endif + smove = single_set (minsn); gcc_assert (smove != NULL_RTX);
RE: [PATCH 1/n] Add conditional compare support
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Monday, October 28, 2013 11:07 PM > To: Zhenqiang Chen; Richard Earnshaw; 'Richard Biener' > Cc: GCC Patches > Subject: Re: [PATCH 1/n] Add conditional compare support > > On 10/28/2013 01:32 AM, Zhenqiang Chen wrote: > > Patch is updated according to your comments. Main changes are: > > * Add two hooks: legitimize_cmp_combination and > > legitimize_ccmp_combination > > * Improve document. > > No, these are not the hooks I proposed. > > You should *not* have a ccmp_optab, because the middle-end has > absolutely no idea what mode TARGET should be in. > > The hook needs to return an rtx expression appropriate for feeding to > cbranch et al. E.g. for arm, > > (ne (reg:CC_Z CC_REGNUM) (const_int 0)) > > after having emitted the instruction that sets CC_REGNUM. > > We need to push this to the backend because for ia64 the expression needs > to be of te form > > (ne (reg:BI new_pseudo) (const_int 0)) Thanks for the clarification. Patch is updated: * ccmp_optab is removed. * Add two hooks gen_ccmp_with_cmp_cmp and gen_ccmp_with_ccmp_cmp for backends to generate the conditional compare instructions. Thanks! -Zhenqiang ccmp-hook2.patch Description: Binary data
RE: [PATCH 1/n] Add conditional compare support
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Hans-Peter Nilsson > Sent: Tuesday, October 29, 2013 10:38 AM > To: Zhenqiang Chen > Cc: Richard Earnshaw; 'Richard Biener'; GCC Patches > Subject: RE: [PATCH 1/n] Add conditional compare support > > On Tue, 22 Oct 2013, Zhenqiang Chen wrote: > > ChangeLog: > > 2013-10-22 Zhenqiang Chen > > > > * config/arm/arm.c (arm_fixed_condition_code_regs, > arm_ccmode_to_code, > > arm_select_dominance_ccmp_mode): New functions. > > (arm_select_dominance_cc_mode_1): New function extracted from > > arm_select_dominance_cc_mode. > > (arm_select_dominance_cc_mode): Call > arm_select_dominance_cc_mode_1. > > * config/arm/arm.md (ccmp, cbranchcc4, ccmp_and, ccmp_ior, > > ccmp_ior_scc_scc, ccmp_ior_scc_scc_cmp, ccmp_and_scc_scc, > > ccmp_and_scc_scc_cmp): New. > > * config/arm/arm-protos.h (arm_select_dominance_ccmp_mode): > New. > > * expr.c (ccmp_candidate_p, used_in_cond_stmt_p, > expand_ccmp_expr_2, > > expand_ccmp_expr_3, expand_ccmp_expr_1, expand_ccmp_expr): > New. > > (expand_expr_real_1): Handle ccmp. > > * optabs.c: Include gimple.h. > > (expand_ccmp_op): New. > > (get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR. > > * optabs.def (ccmp): New. > > * optabs.h (expand_ccmp_op): New. > > * doc/md.texi (ccmp): New index. > > One thing I don't see other people mentioning, is that this patch has just too > much code inside #ifdef HAVE_ccmp ... #endif. > > I couldn't actually find the part that *requires* that, i.e. > code that does something like gen_ccmp (...) but maybe it's there. In the arm.md, I define "(define_expand "ccmp" ...". It will be transferred to function gen_ccmp (...). And it will define HAVE_ccmp. In the updated patch, I add two backends hooks to generate conditional compare instruction. > Where needed and where the conditioned code is more than a few lines, > such code is preferably transformed into "if (0)":d code using constructs like > that at builtins.c:5354: > > #ifndef HAVE_atomic_clear > # define HAVE_atomic_clear 0 > # define gen_atomic_clear(x,y) (gcc_unreachable (), NULL_RTX) #endif > > Right, this causes dead code, but for maintenance it's *much* better than > when only a fraction of the code being compiled for other targets. (Also, the HAVE_ccmp is automatically generated when the target has ccmp instruction. The code segments in #ifdef HAVE_ccmp ... #endif will *not* be compiled for other targets, which do not define ccmp instruction. > dead code may be eliminated by gcc.) Unfortunately the number of > examples (as above) are few compared to the pages of #if > HAVE_thisorthat'd code. :( > > (And IMHO that whole construct should be the default implementation and > shouldn't have to be written manually in the first place. But that's material > for an invasive patch.) After adding another two hooks, the code in middle-end is just a framework. It fully depends on backend to generate the instruction. Thanks! -Zhenqiang
Re: [PATCH] Keep REG_INC note in subreg2 pass
On 31 October 2013 00:08, Jeff Law wrote: > On 10/30/13 00:09, Zhenqiang Chen wrote: >> >> On 30 October 2013 02:47, Jeff Law wrote: >>> >>> On 10/24/13 02:20, Zhenqiang Chen wrote: >>>> >>>> >>>> Hi, >>>> >>>> REG_INC note is lost in subreg2 pass when resolve_simple_move, which >>>> might lead to wrong dependence for ira. e.g. In function >>>> validate_equiv_mem of ira.c, it checks REG_INC note: >>>> >>>>for (note = REG_NOTES (insn); note; note = XEXP (note, 1)) >>>> if ((REG_NOTE_KIND (note) == REG_INC >>>>|| REG_NOTE_KIND (note) == REG_DEAD) >>>> && REG_P (XEXP (note, 0)) >>>> && reg_overlap_mentioned_p (XEXP (note, 0), memref)) >>>> return 0; >>>> >>>> Without REG_INC note, validate_equiv_mem will return a wrong result. >>>> >>>> Referhttps://bugs.launchpad.net/gcc-linaro/+bug/1243022 for more >>>> >>>> detail about a real case in kernel. >>>> >>>> Bootstrap and no make check regression on X86-64 and ARM. >>>> >>>> Is it OK for trunk and 4.8? >>>> >>>> Thanks! >>>> -Zhenqiang >>>> >>>> ChangeLog: >>>> 2013-10-24 Zhenqiang Chen >>>> >>>> * lower-subreg.c (resolve_simple_move): Copy REG_INC note. >>>> >>>> testsuite/ChangeLog: >>>> 2013-10-24 Zhenqiang Chen >>>> >>>> * gcc.target/arm/lp1243022.c: New test. >>> >>> >>> This clearly handles adding a note when the destination is a MEM with a >>> side >>> effect. What about cases where the side effect is associated with a load >>> from memory rather than a store to memory? >> >> >> Yes. We should handle load from memory. >> >>>> >>>> >>>> lp1243022.patch >>>> >>>> >>>> diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c >>>> index 57b4b3c..e710fa5 100644 >>>> --- a/gcc/lower-subreg.c >>>> +++ b/gcc/lower-subreg.c >>>> @@ -1056,6 +1056,22 @@ resolve_simple_move (rtx set, rtx insn) >>>> mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), >>>> 0); >>>> minsn = emit_move_insn (real_dest, mdest); >>>> >>>> +#ifdef AUTO_INC_DEC >>>> + /* Copy the REG_INC notes. */ >>>> + if (MEM_P (real_dest) && !(resolve_reg_p (real_dest) >>>> +|| resolve_subreg_p (real_dest))) >>>> + { >>>> + rtx note = find_reg_note (insn, REG_INC, NULL_RTX); >>>> + if (note) >>>> + { >>>> + if (!REG_NOTES (minsn)) >>>> + REG_NOTES (minsn) = note; >>>> + else >>>> + add_reg_note (minsn, REG_INC, note); >>>> + } >>>> + } >>>> +#endif >>> >>> >>> If MINSN does not have any notes, then this results in MINSN and INSN >>> sharing the note. Note carefully that notes are chained (see >>> implementation >>> of add_reg_note). Thus the sharing would result in MINSN and INSN >>> actually >>> sharing a chain of notes. I'm pretty sure that's not what you intended. >>> I >>> think you need to always use add_reg_note. >> >> >> Yes. I should use add_reg_note. >> >> Here is the updated patch: >> >> diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c >> index ebf364f..16dfa62 100644 >> --- a/gcc/lower-subreg.c >> +++ b/gcc/lower-subreg.c >> @@ -967,7 +967,20 @@ resolve_simple_move (rtx set, rtx insn) >> rtx reg; >> >> reg = gen_reg_rtx (orig_mode); >> + >> +#ifdef AUTO_INC_DEC >> + { >> + rtx move = emit_move_insn (reg, src); >> + if (MEM_P (src)) >> + { >> + rtx note = find_reg_note (insn, REG_INC, NULL_RTX); >> + if (note) >> + add_reg_note (move, REG_INC, XEXP (note, 0)); >> + } >> + } >> +#else >> emit_move_insn (reg, src); >> +#endif >> src = reg; >> } >> >> @@ -1057,6 +1070,16 @@ resolve_simple_move (rtx set, rtx insn) >> mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), >> 0); >> minsn = emit_move_insn (real_dest, mdest); >> >> +#ifdef AUTO_INC_DEC >> + if (MEM_P (real_dest) && !(resolve_reg_p (real_dest) >> +|| resolve_subreg_p (real_dest))) > > Formatting nit. This should be formatted as > > if (MEM_P (real_dest) > && !(resolve_reg_p (real_dest) || resolve_subreg_p (real_dest))) > > If that results in too long of a line, then it should wrap like this: > > > if (MEM_P (real_dest) > && !(resolve_reg_p (real_dest) > || resolve_subreg_p (real_dest))) > > OK with that change. Please install on the trunk. The 4.8 maintainers have > the final call for the 4.8 release branch. Thanks. Patch is committed to trunk@r204247 with the change. -Zhenqiang
RE: [PATCH 1/n] Add conditional compare support
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Thursday, October 31, 2013 4:14 AM > To: Zhenqiang Chen > Cc: Richard Earnshaw; 'Richard Biener'; GCC Patches > Subject: Re: [PATCH 1/n] Add conditional compare support > > > +/* RCODE0, RCODE1 and a valid return value should be enum rtx_code. > > + TCODE should be enum tree_code. > > + Check whether two compares are a valid combination in the target to > generate > > + a conditional compare. If valid, return the new compare after > combination. > > + */ > > +DEFHOOK > > +(legitimize_cmp_combination, > > + "This target hook returns the dominance compare if the two compares > > +are\n\ a valid combination. This target hook is required only when > > +the target\n\ supports conditional compare, such as ARM.", int, (int > > +rcode0, int rcode1, int tcode), > > + default_legitimize_cmp_combination) > > + > > +/* RCODE0, RCODE1 and a valid return value should be enum rtx_code. > > + TCODE should be enum tree_code. > > + Check whether two compares are a valid combination in the target to > generate > > + a conditional compare. If valid, return the new compare after > combination. > > + The difference from legitimize_cmp_combination is that its first > compare is > > + the result of a previous conditional compare, which leads to more > constrain > > + on it, since no way to swap the two compares. */ DEFHOOK > > +(legitimize_ccmp_combination, "This target hook returns the > > +dominance compare if the two compares are\n\ a valid combination. > > +This target hook is required only when the target\n\ supports > > +conditional compare, such as ARM.", int, (int rcode0, int rcode1, > > +int tcode), > > + default_legitimize_ccmp_combination) > > + > > Why do these hooks still exist? They should be redundant with ... They are not necessary. Will remove them. > > +/* CMP0 and CMP1 are two compares. USED_AS_CC_P indicates whether > the target > > + is used as CC or not. TCODE should be enum tree_code. > > + The hook will return a condtional compare RTX if all checkes are > > +OK. */ DEFHOOK (gen_ccmp_with_cmp_cmp, "This target hook returns a > > +condtional compare RTX if the two compares are\n\ a valid > > +combination. This target hook is required only when the target\n\ > > +supports conditional compare, such as ARM.", rtx, (gimple cmp0, > > +gimple cmp1, int tcode, bool used_as_cc_p), > > + default_gen_ccmp_with_cmp_cmp) > > + > > +/* CC is the result of a previous conditional compare. CMP1 is a compare. > > + USED_AS_CC_P indicates whether the target is used as CC or not. > > + TCODE should be enum tree_code. > > + The hook will return a condtional compare rtx if all checkes are > > +OK. */ DEFHOOK (gen_ccmp_with_ccmp_cmp, "This target hook returns > a > > +condtional compare RTX if the CC and CMP1 are\n\ a valid combination. > > +This target hook is required only when the target\n\ supports > > +conditional compare, such as ARM.", rtx, (rtx cc, gimple cmp1, int > > +tcode, bool used_as_cc_p), > > + default_gen_ccmp_with_ccmp_cmp) > > + > > ... these. > > Why in the world are you passing down gimple to the backends? The > expand_operands done therein should have been done in expr.c. > The hooks are still not what I suggested, particularly > gen_ccmp_with_cmp_cmp is what I was trying to avoid, passing down two > initial compares like that. > > To be 100% explicit this time, I think the hooks should be Thank you very much! > DEFHOOK > (gen_ccmp_first, > "This function emits a comparison insn for the first of a sequence of\n\ > conditional comparisions. It returns a comparison expression appropriate\n\ > for passing to @code{gen_ccmp_next} or to @code{cbranch_optab}.", rtx, > (int code, rtx op0, rtx op1), > NULL) > DEFHOOK > (gen_ccmp_next, > "This function emits a conditional comparison within a sequence of\n\ > conditional comparisons. The @code{prev} expression is the result of a\n\ > prior call to @code{gen_ccmp_first} or @code{gen_ccmp_next}. It may > return\n\ @code{NULL} if the combination of @code{prev} and this > comparison is\n\ not supported, otherwise the result must be appropriate > for passing to @code{gen_ccmp_next} or @code{cbranch_optab}.", rtx, (rtx > prev, int code, rtx op0, rtx op1), > NULL) gen_ccmp_first and gen_ccmp_next are better. But need some improvement. With two compares, we might swap the order of the two compares to get a valid combination.
RE: [PATCH 1/n] Add conditional compare support
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Friday, November 01, 2013 12:27 AM > To: Zhenqiang Chen > Cc: Richard Earnshaw; 'Richard Biener'; GCC Patches > Subject: Re: [PATCH 1/n] Add conditional compare support > > On 10/31/2013 02:06 AM, Zhenqiang Chen wrote: > > With two compares, we might swap the order of the two compares to get > > a valid combination. This is what current ARM does (Please refer > > arm_select_dominace_cc_mode, cmp_and, cmp_ior, and_scc_scc and > ior_scc_scc). > > To improve gen_ccmp_first, we need another function/hook to determine > > which compare is done first. I will double check ARM backend to avoid hook. > > Hmm. I hadn't thought of that, and missed that point when looking through > your previous patches. I think you should have a third hook for that. ARM > will need it, but IA-64 doesn't. Thanks. I add a new hook. The default function will return -1 if the target does not care about the order. +DEFHOOK +(select_ccmp_cmp_order, + "For some target (like ARM), the order of two compares is sensitive for\n\ +conditional compare. cmp0-cmp1 might be an invalid combination. But when\n\ +swapping the order, cmp1-cmp0 is valid. The function will return\n\ + -1: if @code{code1} and @code{code2} are valid combination.\n\ + 1: if @code{code2} and @code{code1} are valid combination.\n\ + 0: both are invalid.", + int, (int code1, int code2), + default_select_ccmp_cmp_order) > > Hook gen_ccmp_next needs one more parameter to indicate AND/IOR > since > > they will generate different instructions. > > True, I forgot about that while writing the hook descriptions. For gen_ccmp_next, I add another parameter CC_P to indicate the result is used as CC or not. If CC_P is false, the gen_ccmp_next will return a general register. This is for code like int test (int a, int b) { return a > 0 && b > 0; } During expand, there might have no branch at all. So gen_ccmp_next can not return CC for "a > 0 && b > 0". +DEFHOOK +(gen_ccmp_next, + "This function emits a conditional comparison within a sequence of\n\ + conditional comparisons. The @code{prev} expression is the result of a\n\ + prior call to @code{gen_ccmp_first} or @code{gen_ccmp_next}. It may return\n\ + @code{NULL} if the combination of @code{prev} and this comparison is\n\ + not supported, otherwise the result must be appropriate for passing to\n\ + @code{gen_ccmp_next} or @code{cbranch_optab} if @code{cc_p} is true.\n\ + If @code{cc_p} is false, it returns a general register. @code{bit_code}\n\ + is AND or IOR, which is the op on the two compares.", + rtx, (rtx prev, int cmp_code, rtx op0, rtx op1, int bit_code, bool cc_p), + NULL) A updated patch is attached. Thanks! -Zhenqiang ccmp-hook3.patch Description: Binary data
[PING] [PATCH, ARM, testcase] Skip target arm-neon for lp1243022.c
Ping? Thanks! -Zhenqiang > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Zhenqiang Chen > Sent: Friday, November 08, 2013 10:51 AM > To: gcc-patches@gcc.gnu.org > Cc: Ramana Radhakrishnan > Subject: [PATCH, ARM, testcase] Skip target arm-neon for lp1243022.c > > Hi, > > lp1243022.c will fail with options: -mfpu=neon -mfloat-abi=hard. > > Logs show it does not generate auto-incremental instruction in pass > auto_inc_dec. In this case, the check of REG_INC note at subreg2 will be > invalid. So skip the check for target arm-neon. > > All PASS with the following options: > > -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard > -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft > -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp > -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft/-mfpu=vfpv3 > -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp/-mfpu=vfpv3 > -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard/-mfpu=vfpv3 > -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft/-mfpu=neon > -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp/-mfpu=neon > -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard/-mfpu=neon > -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard > -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft > -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp > -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft/-mfpu=vfpv4 > -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp/-mfpu=vfpv4 > -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard/-mfpu=vfpv4 > -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft/-mfpu=neon > -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp/-mfpu=neon > -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard/-mfpu=neon > > Is it OK? > > Thanks! > -Zhenqiang > > testsuite/ChangeLog: > 2013-11-08 Zhenqiang Chen > > * gcc.target/arm/lp1243022.c: Skip target arm-neon. > > --- a/gcc/testsuite/gcc.target/arm/lp1243022.c > +++ b/gcc/testsuite/gcc.target/arm/lp1243022.c > @@ -1,7 +1,7 @@ > /* { dg-do compile { target arm_thumb2 } } */ > /* { dg-options "-O2 -fdump-rtl-subreg2" } */ > > -/* { dg-final { scan-rtl-dump "REG_INC" "subreg2" } } */ > +/* { dg-final { scan-rtl-dump "REG_INC" "subreg2" { target { ! > arm_neon} } } } */ > /* { dg-final { cleanup-rtl-dump "subreg2" } } */ struct device; typedef > unsigned int __u32;
RE: [PING] [PATCH, ARM, testcase] Skip target arm-neon for lp1243022.c
> -Original Message- > From: Jeff Law [mailto:l...@redhat.com] > Sent: Thursday, November 28, 2013 12:43 AM > To: Zhenqiang Chen; gcc-patches@gcc.gnu.org > Cc: Ramana Radhakrishnan; Richard Earnshaw > Subject: Re: [PING] [PATCH, ARM, testcase] Skip target arm-neon for > lp1243022.c > > On 11/27/13 02:05, Zhenqiang Chen wrote: > > Ping? > Thanks for including the actual patch you're pinging, it helps :-) > > >>> Hi, > >> > >> lp1243022.c will fail with options: -mfpu=neon -mfloat-abi=hard. > >> > >> Logs show it does not generate auto-incremental instruction in pass > >> auto_inc_dec. In this case, the check of REG_INC note at subreg2 will > >> be invalid. So skip the check for target arm-neon. > >> > >> All PASS with the following options: > >> > >> -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard > >> -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft > >> -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp > >> -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft/-mfpu=vfpv3 > >> -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp/-mfpu=vfpv3 > >> -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard/-mfpu=vfpv3 > >> -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft/-mfpu=neon > >> -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp/-mfpu=neon > >> -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard/-mfpu=neon > >> -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard > >> -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft > >> -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp > >> -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft/-mfpu=vfpv4 > >> -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp/-mfpu=vfpv4 > >> -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard/-mfpu=vfpv4 > >> -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft/-mfpu=neon > >> -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp/-mfpu=neon > >> -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard/-mfpu=neon > >> > >> Is it OK? > >> > >> Thanks! > >> -Zhenqiang > >> > >> testsuite/ChangeLog: > >> 2013-11-08 Zhenqiang Chen > >> > >> * gcc.target/arm/lp1243022.c: Skip target arm-neon. > It seems to me you should be xfailing arm-neon, not skipping the test. > Unless there is some fundamental reason why we can not generate auto-inc > instructions on the neon. Thanks for the comments. Update the test case as xfail. diff --git a/gcc/testsuite/gcc.target/arm/lp1243022.c b/gcc/testsuite/gcc.target/arm/lp1243022.c index 91a544d..b2ebe7e 100644 --- a/gcc/testsuite/gcc.target/arm/lp1243022.c +++ b/gcc/testsuite/gcc.target/arm/lp1243022.c @@ -1,7 +1,7 @@ /* { dg-do compile { target arm_thumb2 } } */ /* { dg-options "-O2 -fdump-rtl-subreg2" } */ -/* { dg-final { scan-rtl-dump "REG_INC" "subreg2" } } */ +/* { dg-final { scan-rtl-dump "REG_INC" "subreg2" { xfail arm_neon } } } */ /* { dg-final { cleanup-rtl-dump "subreg2" } } */ struct device; typedef unsigned int __u32;
[PING ^ 2] [PATCH 1/n] Add conditional compare support
Hi, Patch is rebased with the latest trunk with changes: * simplify_while_replacing (recog.c) should not swap operands of compares in CCMP. * make sure no other instructions can clobber CC except compares in CCMP when expanding CCMP. For easy to read, I add the following description(in expr.c). The following functions expand conditional compare (CCMP) instructions. Here is a short description about the over all algorithm: * ccmp_candidate_p is used to identify the CCMP candidate * expand_ccmp_expr is the main entry, which calls expand_ccmp_expr_1 to expand CCMP. * expand_ccmp_expr_1 uses a recursive algorithm to expand CCMP. It calls two target hooks gen_ccmp_first and gen_ccmp_next to generate CCMP instructions. - gen_ccmp_first expands the first compare in CCMP. - gen_ccmp_next expands the following compares. Another hook select_ccmp_cmp_order is called to determine which compare is done first since not all combination of compares are legal in some target like ARM. We might get more chance when swapping the compares. During expanding, we must make sure that no instruction can clobber the CC reg except the compares. So clobber_cc_p and check_clobber_cc are introduced to do the check. * If the final result is not used in a COND_EXPR (checked by function used_in_cond_stmt_p), it calls cstorecc4 pattern to store the CC to a general register. Bootstrap and no make check regression on X86-64 and ARM chrome book. ChangeLog: 2013-11-28 Zhenqiang Chen * config/arm/arm-protos.h (arm_select_dominance_ccmp_mode, arm_ccmode_to_code): New prototypes. * config/arm/arm.c (arm_select_dominance_cc_mode_1): New function extracted from arm_select_dominance_cc_mode. (arm_ccmode_to_code, arm_code_to_ccmode, arm_convert_to_SImode, arm_select_dominance_ccmp_mode): New functions. (arm_select_ccmp_cmp_order, arm_gen_ccmp_first, arm_gen_ccmp_next): New hooks. (arm_select_dominance_cc_mode): Call arm_select_dominance_cc_mode_1. * config/arm/arm.md (cbranchcc4, cstorecc4, ccmp_and, ccmp_ior): New instruction patterns. * doc/md.texi (ccmp): New index. * doc/tm.texi (TARGET_SELECT_CCMP_CMP_ORDER, TARGET_GEN_CCMP_FIRST, TARGET_GEN_CCMP_NEXT): New hooks. * doc/tm.texi (TARGET_SELECT_CCMP_CMP_ORDER, TARGET_GEN_CCMP_FIRST, TARGET_GEN_CCMP_NEXT): New hooks. * doc/tm.texi.in (TARGET_SELECT_CCMP_CMP_ORDER, TARGET_GEN_CCMP_FIRST, TARGET_GEN_CCMP_NEXT): New hooks. * expmed.c (emit_cstore): Make it global. * expr.c: Include tree-phinodes.h and ssa-iterators.h. (ccmp_candidate_p, used_in_cond_stmt_p, check_clobber_cc, clobber_cc_p, gen_ccmp_next, expand_ccmp_expr_1, expand_ccmp_expr): New functions. (expand_expr_real_1): Handle conditional compare. * optabs.c (get_rtx_code): Make it global and handle BIT_AND_EXPR and BIT_IOR_EXPR. * optabs.h (get_rtx_code, emit_cstore): New prototypes. * recog.c (ccmp_insn_p): New function. (simplify_while_replacing): Do not swap conditional compare insn. * target.def (select_ccmp_cmp_order, gen_ccmp_first, gen_ccmp_next): Define hooks. * targhooks.c (default_select_ccmp_cmp_order): New function. * targhooks.h (default_select_ccmp_cmp_order): New prototypes. diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index c5b16da..e3162c1 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -117,6 +117,9 @@ extern bool gen_movmem_ldrd_strd (rtx *); extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx); extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx, HOST_WIDE_INT); +extern enum machine_mode arm_select_dominance_ccmp_mode (rtx, enum machine_mode, +HOST_WIDE_INT); +enum rtx_code arm_ccmode_to_code (enum machine_mode mode); extern rtx arm_gen_compare_reg (RTX_CODE, rtx, rtx, rtx); extern rtx arm_gen_return_addr_mask (void); extern void arm_reload_in_hi (rtx *); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 129e428..b0fc4f4 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -287,6 +287,12 @@ static unsigned arm_add_stmt_cost (void *data, int count, static void arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1, bool op0_preserve_value); static unsigned HOST_WIDE_INT arm_asan_shadow_offset (void); +static int arm_select_ccmp_cmp_order (int, int); +static rtx arm_gen_ccmp_first (int, rtx, rtx); +static rtx arm_gen_ccmp_next (rtx, int, rtx, rtx, int); +static enum machine_mode arm_select_dominance_cc_mode_1 (enum rtx_code cond1
RE: [PING] [PATCH, ARM, testcase] Skip target arm-neon for lp1243022.c
> -Original Message- > From: Richard Earnshaw > Sent: Friday, November 29, 2013 1:01 AM > To: Zhenqiang Chen > Cc: 'Jeff Law'; gcc-patches@gcc.gnu.org; Ramana Radhakrishnan > Subject: Re: [PING] [PATCH, ARM, testcase] Skip target arm-neon for > lp1243022.c > > On 28/11/13 06:17, Zhenqiang Chen wrote: > > > > > >> -Original Message- > >> From: Jeff Law [mailto:l...@redhat.com] > >> Sent: Thursday, November 28, 2013 12:43 AM > >> To: Zhenqiang Chen; gcc-patches@gcc.gnu.org > >> Cc: Ramana Radhakrishnan; Richard Earnshaw > >> Subject: Re: [PING] [PATCH, ARM, testcase] Skip target arm-neon for > >> lp1243022.c > >> > >> On 11/27/13 02:05, Zhenqiang Chen wrote: > >>> Ping? > >> Thanks for including the actual patch you're pinging, it helps :-) > >> > >>>>> Hi, > >>>> > >>>> lp1243022.c will fail with options: -mfpu=neon -mfloat-abi=hard. > >>>> > >>>> Logs show it does not generate auto-incremental instruction in pass > >>>> auto_inc_dec. In this case, the check of REG_INC note at subreg2 > >>>> will be invalid. So skip the check for target arm-neon. > >>>> > >>>> All PASS with the following options: > >>>> > >>>> -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard > >>>> -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft > >>>> -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp > >>>> -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft/-mfpu=vfpv3 > >>>> -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp/-mfpu=vfpv3 > >>>> -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard/-mfpu=vfpv3 > >>>> -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft/-mfpu=neon > >>>> -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp/-mfpu=neon > >>>> -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard/-mfpu=neon > >>>> -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard > >>>> -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft > >>>> -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp > >>>> -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft/-mfpu=vfpv4 > >>>> -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp/-mfpu=vfpv4 > >>>> -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard/-mfpu=vfpv4 > >>>> -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft/-mfpu=neon > >>>> -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp/-mfpu=neon > >>>> -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard/-mfpu=neon > >>>> > >>>> Is it OK? > >>>> > >>>> Thanks! > >>>> -Zhenqiang > >>>> > >>>> testsuite/ChangeLog: > >>>> 2013-11-08 Zhenqiang Chen > >>>> > >>>> * gcc.target/arm/lp1243022.c: Skip target arm-neon. > >> It seems to me you should be xfailing arm-neon, not skipping the test. > >> Unless there is some fundamental reason why we can not generate > >> auto-inc instructions on the neon. > > > > Thanks for the comments. Update the test case as xfail. > > > > diff --git a/gcc/testsuite/gcc.target/arm/lp1243022.c > > b/gcc/testsuite/gcc.target/arm/lp1243022.c > > index 91a544d..b2ebe7e 100644 > > --- a/gcc/testsuite/gcc.target/arm/lp1243022.c > > +++ b/gcc/testsuite/gcc.target/arm/lp1243022.c > > @@ -1,7 +1,7 @@ > > /* { dg-do compile { target arm_thumb2 } } */ > > /* { dg-options "-O2 -fdump-rtl-subreg2" } */ > > > > -/* { dg-final { scan-rtl-dump "REG_INC" "subreg2" } } */ > > +/* { dg-final { scan-rtl-dump "REG_INC" "subreg2" { xfail arm_neon } > > +} } */ > > /* { dg-final { cleanup-rtl-dump "subreg2" } } */ struct device; > > typedef unsigned int __u32; > > > > > > This test looks horribly fragile, since it's taking a large chunk of code and > expecting a specific optimization to have occurred in exactly one place. The > particular instruction was a large pre-modify offset, which isn't supported > > Looking back through the original bug report, the problem was that the > subreg2 pass was losing a REG_INC note that had previously been created. > Of course it's not a bug if it was never created before, but there's no easy > way to tell that. > > On that basis, I think the original patch is the correct one, please install that. Thanks. The original patch was committed @r205509. -Zhenqiang > I must say that I do wonder what the value of some of these tests are in the > absence of a proper unit test environment. > > R.
[PING] [PATCH libgcc] Fix ARM uclinux libgcc config order issue
Ping? This is definitely a bug. The LIB1ASMFUNCS defined in t-bpabi should not be overridden by t-arm. OK for 4.8 and trunk Thanks! -Zhenqiang > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Zhenqiang Chen > Sent: Monday, March 04, 2013 4:26 PM > To: gcc-patches@gcc.gnu.org > Subject: RE: [PATCH libgcc] Fix ARM uclinux libgcc config order issue > > Ping? > > Thanks! > -Zhenqiang > > > -Original Message- > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > ow...@gcc.gnu.org] On Behalf Of Zhenqiang Chen > > Sent: Friday, January 11, 2013 5:21 PM > > To: gcc-patches@gcc.gnu.org > > Subject: [PATCH libgcc] Fix ARM uclinux libgcc config order issue > > > > Hi, > > > > The patch is to adjust ARM uclinux libgcc config order of arm/t-bpabi > > and > t- > > arm since LIB1ASMFUNCS defined in t-bpabi is overridden in t-arm. > > > > Is it OK for trunk, 4.8 and 4.7? > > > > Thanks! > > -Zhenqiang > > > > 2012-03-11 Zhenqiang Chen > > > > * config.host (arm*-*-uclinux*): Move arm/t-arm before arm/t- > > bpabi. > > > > diff --git a/libgcc/config.host b/libgcc/config.host index > ffd047f..eb65941 > > 100644 > > --- a/libgcc/config.host > > +++ b/libgcc/config.host > > @@ -332,10 +332,10 @@ arm*-*-linux*)# ARM > > GNU/Linux with > > ELF > > ;; > > arm*-*-uclinux*) # ARM ucLinux > > tmake_file="${tmake_file} t-fixedpoint-gnu-prefix" > > + tmake_file="$tmake_file arm/t-arm arm/t-elf t-softfp-sfdf > > t-softfp-excl arm/t-softfp t-softfp" > > tmake_file="${tmake_file} arm/t-bpabi" > > tm_file="$tm_file arm/bpabi-lib.h" > > unwind_header=config/arm/unwind-arm.h > > - tmake_file="$tmake_file arm/t-arm arm/t-elf t-softfp-sfdf > > t-softfp-excl arm/t-softfp t-softfp" > > extra_parts="$extra_parts crti.o crtn.o" > > ;; > > arm*-*-eabi* | arm*-*-symbianelf* | arm*-*-rtems*) > > > > > > > > > >
[PATCH, AARCH64] MULTIARCH_DIRNAME breaks multiarch build
Hi, MULTIARCH_DIRNAME was removed @r196649 since the dir info had been combined in MULTILIB_OSDIRNAMES. But MULTIARCH_DIRNAME was re-added @r201164. With this change, the final multiarch_dir is combined as "aarch64-linux-gnu:aarch64-linux-gnu", which is incorrect and leads to multiarch build fail if the sysroot is in correct multiarch layout. Any reason to add MULTIARCH_DIRNAME? If it is not necessary, can we remove it as the patch? Thanks! -Zhenqiang ChangeLog: 2014-01-10 Zhenqiang Chen * config/aarch64/t-aarch64-linux (MULTIARCH_DIRNAME): Remove. diff --git a/gcc/config/aarch64/t-aarch64-linux b/gcc/config/aarch64/t-aarch64-linux index 147452b..77e33ea 100644 --- a/gcc/config/aarch64/t-aarch64-linux +++ b/gcc/config/aarch64/t-aarch64-linux @@ -23,7 +23,6 @@ LIB1ASMFUNCS = _aarch64_sync_cache_range AARCH_BE = $(if $(findstring TARGET_BIG_ENDIAN_DEFAULT=1, $(tm_defines)),_be) MULTILIB_OSDIRNAMES = .=../lib64$(call if_multiarch,:aarch64$(AARCH_BE)-linux-gnu) -MULTIARCH_DIRNAME = $(call if_multiarch,aarch64$(AARCH_BE)-linux-gnu) # Disable the multilib for linux-gnu targets for the time being; focus # on the baremetal targets.
Re: [PATCH, AARCH64] MULTIARCH_DIRNAME breaks multiarch build
On 10 January 2014 17:23, Matthias Klose wrote: > Am 10.01.2014 09:23, schrieb Zhenqiang Chen: >> Hi, >> >> MULTIARCH_DIRNAME was removed @r196649 since the dir info had been >> combined in MULTILIB_OSDIRNAMES. >> >> But MULTIARCH_DIRNAME was re-added @r201164. With this change, the >> final multiarch_dir is combined as >> "aarch64-linux-gnu:aarch64-linux-gnu", which is incorrect and leads to >> multiarch build fail if the sysroot is in correct multiarch layout. >> >> Any reason to add MULTIARCH_DIRNAME? If it is not necessary, can we >> remove it as the patch? > > see the thread "[patch] set MULTIARCH_DIRNAME for multilib architectures" from > June 2013. I think it is necessary to have the default defined. Yesterday's > build looks ok for me, looking at default and include paths, so maybe I don't > yet understand the issue. In our build, we configure eglbc with rtlddir=/lib libdir=/usr/lib/aarch64-linux-gnu slibdir=/lib/aarch64-linux-gnu And we configure gcc with "--disable-multilib --enable-multiarch", But when building gcc libraries, configure FAIL since it can not find the C libraries. And I try ./xgcc --print-multiarch the output is "aarch64-linux-gnu:aarch64-linux-gnu" Any comments? Thanks! -Zhenqiang > I think aarch64 is the only architecture which introduces MULTILIB_* macros > without actually building any multilib, just to set the default library name > to > lib64. So maybe this has some side effects. > > Matthias >
[PATCH, ARM] Fix two IT issues
Hi, The patch fixes two IT issues: 1) IT block is splitted for A15 (and Cortex-M). For A15 tune, max cond insns (max_insns_skipped) is 2, which is set as the maximum allowed insns in an IT block (see thumb2_final_prescan_insn). So IT which has 3 or 4 instructions, will be splitted. Take the first if-then-else in the test case of the patch as example, it will generate ITs like: cmp r0, #10 ite gt subgt r0, r0, r1 suble r0, r1, r0 ite gt addgt r0, r0, #10 suble r0, r0, #7 It does not make sense to split the IT. For cortex-m, the IT can not be folded if the previous insn is 4 BYTES. 2) For arm_v7m_tune, max cond insns is not aligned with its branch cost. In ifcvt.c, the relation between MAX_CONDITIONAL_EXECUTE (max cond insns) and branch cost is: #ifndef MAX_CONDITIONAL_EXECUTE #define MAX_CONDITIONAL_EXECUTE \ (BRANCH_COST (optimize_function_for_speed_p (cfun), false) \ + 1) #endif So the best value of max cond insns for v7m_tune should be 2. Bootstrap and no make check regression on ARM Chrome book. No make check regression for Cortex-M3. Cortex-M4 O2 performance changes on coremark, dhrystone and eembc-v1: coremark: -0.11% dhrystone: 1.26%. a2time01_lite: 2.63% canrdr01_lite: 4.27% iirflt01_lite: 6.51% rspeed01_lite: 6.51% dither01_lite: 7.36% The biggest regression in eembc-v1 is pntrch01_lite: -0.51% All other regressions < 0.1% Cortex-M4 O3 performance changes are similar with O2, except one regression due to loop alignment change. OK for trunk? Thanks! -Zhenqiang 2014-01-14 Zhenqiang Chen * config/arm/arm.c (arm_v7m_tune): Set max_insns_skipped to 2. (thumb2_final_prescan_insn): Set max to MAX_INSN_PER_IT_BLOCK. testsuite/ChangeLog: 2014-01-14 Zhenqiang Chen * gcc.target/arm/its.c: New test. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 39d23cc..8751675 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1696,7 +1696,7 @@ const struct tune_params arm_v7m_tune = &v7m_extra_costs, NULL,/* Sched adj cost. */ 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, true,/* Prefer constant pool. */ arm_cortex_m_branch_cost, @@ -22131,11 +22131,11 @@ thumb2_final_prescan_insn (rtx insn) int mask; int max; - /* Maximum number of conditionally executed instructions in a block - is minimum of the two max values: maximum allowed in an IT block - and maximum that is beneficial according to the cost model and tune. */ - max = (max_insns_skipped < MAX_INSN_PER_IT_BLOCK) ? -max_insns_skipped : MAX_INSN_PER_IT_BLOCK; + /* max_insns_skipped in the tune was already taken into account in the + cost model of ifcvt pass when generating COND_EXEC insns. At this stage + just emit the IT blocks as we can. It does not make sense to split + the IT blocks. */ + max = MAX_INSN_PER_IT_BLOCK; /* Remove the previous insn from the count of insns to be output. */ if (arm_condexec_count) diff --git a/gcc/testsuite/gcc.target/arm/its.c b/gcc/testsuite/gcc.target/arm/its.c new file mode 100644 index 000..8eecdf4 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/its.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +int test (int a, int b) +{ + int r; + if (a > 10) +{ + r = a - b; + r += 10; +} + else +{ + r = b - a; + r -= 7; +} + if (r > 0) +r -= 3; + return r; +} +/* { dg-final { scan-assembler-times "\tit" 2 { target arm_thumb2 } } } */
[PATCH, ARM] ICE when building kernel raid6 neon code
Hi, The patch fixes ICE when building kernel raid6 neon code. lib/raid6/neon4.c: In function 'raid6_ neon4_gen_syndrome_real': lib/raid6/neon4.c:113:1: internal compiler error: in dwarf2out_frame_debug_adjust_cfa, at dwarf2cfi.c:1090 } https://bugs.launchpad.net/gcc-linaro/+bug/1268893 Root cause: When expanding epilogue, REG_CFA_ADJUST_CFA NOTE is added to handle dwarf info issue for shrink-wrap. But for TARGET_APCS_FRAME, shrink-wrap is disabled. And not all dwarf info in arm_expand_epilogue_apcs_frame are correctly updated. arm_emit_vfp_multi_reg_pop is called by both arm_expand_epilogue_apcs_frame and arm_expand_epilogue. So we should not add the NOTE in arm_emit_vfp_multi_reg_pop if shrink-wrap is not enabled. Boot strap and no make check regression on ARM Chromebook. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-01-15 Zhenqiang Chen * config/arm/arm.c (arm_emit_vfp_multi_reg_pop): Do not add REG_CFA_ADJUST_CFA NOTE if shrink-wrap is not enabled. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 18196b3..1ccb796 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -19890,8 +19890,12 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg) par = emit_insn (par); REG_NOTES (par) = dwarf; - arm_add_cfa_adjust_cfa_note (par, 2 * UNITS_PER_WORD * num_regs, - base_reg, base_reg); + /* REG_CFA_ADJUST_CFA NOTE is added to handle dwarf info issue when + shrink-wrap is enabled. So when shrink-wrap is not enabled, we should + not add the note. */ + if (flag_shrink_wrap) +arm_add_cfa_adjust_cfa_note (par, 2 * UNITS_PER_WORD * num_regs, +base_reg, base_reg); } /* Generate and emit a pattern that will be recognized as LDRD pattern. If even
Re: [PATCH, ARM] Keep constants in register when expanding
On 8 August 2014 23:22, Ramana Radhakrishnan wrote: > On Tue, Aug 5, 2014 at 10:31 AM, Zhenqiang Chen > wrote: >> Hi, >> >> For some large constants, ARM will split them during expanding, which >> makes impossible to hoist them out the loop or shared by different >> references (refer the test case in the patch). >> >> The patch keeps some constants in registers. If the constant can not >> be optimized, the cprop and combine passes can optimize them as what >> we do in current expand pass with >> >> define_insn_and_split "*arm_subsi3_insn" >> define_insn_and_split "*arm_andsi3_insn" >> define_insn_and_split "*iorsi3_insn" >> define_insn_and_split "*arm_xorsi3" >> >> The patch does not modify addsi3 since the define_insn_and_split >> "*arm_addsi3" is only valid when (reload_completed || >> !arm_eliminable_register (operands[1])). The cprop and combine passes >> can not optimize the large constant if we put it in register, which >> will lead to regression. >> >> For logic operators, the patch skips changes for constants: >> >> INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2]) >> >> since expand pass always uses "sign-extend" to get the value >> (trunc_int_for_mode called from immed_wide_int_const) for rtl, and >> logs show most negative values are UNSIGNED when they are TREE node. >> And combine pass is smart enough to recover the negative value to >> positive value. > > I am unable to verify any change in code generation for this testcase > with and without the patch when I had a play with the patch. > > what gives ? Thanks for trying the patch. Do you add option -fno-gcse which is mentioned in dg-options " -O2 -fno-gcse "? Without it, there is no change for the testcase since cprop pass will propagate the constant to AND expr (A patch to enhance cprop pass was discussed at https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01321.html). In addition, if the constant can not be hoisted out the loop, later combine pass can also optimize it. These (cprop and combine) are reasons why the patch itself has little impact on current tests. Test case in the patch is updated since it does not work for armv7-m. Thanks! -Zhenqiang > Ramana > > >> >> Bootstrap and no make check regression on Chrome book. >> For coremark, dhrystone and eembcv1, no any code size and performance >> change on Cortex-M4. >> No any file in CSiBE has code size change for Cortex-A15 and Cortex-M4. >> No Spec2000 performance regression on Chrome book and dumped assemble >> codes only show very few difference. >> >> OK for trunk? >> >> Thanks! >> -Zhenqiang >> >> ChangeLog: >> 2014-08-05 Zhenqiang Chen >> >> * config/arm/arm.md (subsi3, andsi3, iorsi3, xorsi3): Keep some >> large constants in register other than split them. >> >> testsuite/ChangeLog: >> 2014-08-05 Zhenqiang Chen >> >> * gcc.target/arm/maskdata.c: New test. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index bd8ea8f..c8b3001 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -1162,9 +1162,16 @@ { if (TARGET_32BIT) { - arm_split_constant (MINUS, SImode, NULL_RTX, - INTVAL (operands[1]), operands[0], - operands[2], optimize && can_create_pseudo_p ()); + if (!const_ok_for_arm (INTVAL (operands[1])) + && can_create_pseudo_p ()) + { + operands[1] = force_reg (SImode, operands[1]); + emit_insn (gen_subsi3 (operands[0], operands[1], operands[2])); + } + else + arm_split_constant (MINUS, SImode, NULL_RTX, +INTVAL (operands[1]), operands[0], operands[2], +optimize && can_create_pseudo_p ()); DONE; } else /* TARGET_THUMB1 */ @@ -2077,6 +2084,17 @@ emit_insn (gen_thumb2_zero_extendqisi2_v6 (operands[0], operands[1])); } + else if (!(const_ok_for_arm (INTVAL (operands[2])) + || const_ok_for_arm (~INTVAL (operands[2])) + /* zero_extendhi instruction is efficient enough. */ + || INTVAL (operands[2]) == 0x + || (INTVAL (operands[2]) < 0 + && const_ok_for_arm (-INTVAL (operands[2] + && can_create_pseudo_p ()) + { + operands[2] = force_reg (SImode, operands[2]); + emit_insn (gen_andsi3 (operands[0], operands[1], operands[2])); + } else arm_split_constant (AND, SImode, NULL_RTX, INTVAL (operands[2]), operands[0], @@ -2882,9 +2900,20 @@ { if (TARGET_32BIT) { - arm_split_constant (IOR, SImode, NULL_RTX, -
[PING ^ 2] [PATCH, ARM] Set max_insns_skipped to MAX_INSN_PER_IT_BLOCK when optimize_size for THUMB2
Ping ^ 2? Thanks! -Zhenqiang > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Zhenqiang Chen > Sent: Tuesday, April 29, 2014 5:38 PM > To: gcc-patches@gcc.gnu.org > Cc: Ramana Radhakrishnan > Subject: [PING] [PATCH, ARM] Set max_insns_skipped to > MAX_INSN_PER_IT_BLOCK when optimize_size for THUMB2 > > Ping? OK for trunk? > > Thanks! > -Zhenqiang > > > -Original Message- > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > ow...@gcc.gnu.org] On Behalf Of Zhenqiang Chen > > Sent: Tuesday, February 25, 2014 5:35 PM > > To: gcc-patches@gcc.gnu.org > > Cc: Ramana Radhakrishnan > > Subject: [PATCH, ARM] Set max_insns_skipped to > MAX_INSN_PER_IT_BLOCK > > when optimize_size for THUMB2 > > > > Hi, > > > > Current value for max_insns_skipped is 6. For THUMB2, it needs 2 > > (IF-THEN) or 3 (IF-THEN-ELSE) IT blocks to hold all the instructions. > > The overhead > of IT is > > 4 or 6 BYTES. > > > > If we do not generate IT blocks, for IF-THEN, the overhead of > > conditional jump is 2 or 4; for IF-THEN-ELSE, the overhead is 4, 6, or 8. > > > > Most THUMB2 jump instructions are 2 BYTES. Tests on CSiBE show no one > > file has code size regression. So The patch sets max_insns_skipped to > > MAX_INSN_PER_IT_BLOCK. > > > > No make check regression on cortex-m3. > > For CSiBE, no any file has code size regression. And overall there is > >0.01% code size improvement for cortex-a9 and cortex-m4. > > > > Is it OK? > > > > Thanks! > > -Zhenqiang > > > > 2014-02-25 Zhenqiang Chen > > > > * config/arm/arm.c (arm_option_override): Set max_insns_skipped > > to MAX_INSN_PER_IT_BLOCK when optimize_size for THUMB2. > > > > testsuite/ChangeLog: > > 2014-02-25 Zhenqiang Chen > > > > * gcc.target/arm/max-insns-skipped.c: New test. > > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index > > b49f43e..99cdbc4 100644 > > --- a/gcc/config/arm/arm.c > > +++ b/gcc/config/arm/arm.c > > @@ -2743,6 +2743,15 @@ arm_option_override (void) > >/* If optimizing for size, bump the number of instructions that we > > are prepared to conditionally execute (even on a StrongARM). */ > >max_insns_skipped = 6; > > + > > + /* For THUMB2, it needs 2 (IF-THEN) or 3 (IF-THEN-ELSE) IT > > + blocks > to > > +hold all the instructions. The overhead of IT is 4 or 6 BYTES. > > +If we do not generate IT blocks, for IF-THEN, the overhead of > > +conditional jump is 2 or 4; for IF-THEN-ELSE, the overhead is 4, 6 > > +or 8. Most THUMB2 jump instructions are 2 BYTES. > > +So set max_insns_skipped to MAX_INSN_PER_IT_BLOCK. */ > > + if (TARGET_THUMB2) > > + max_insns_skipped = MAX_INSN_PER_IT_BLOCK; > > } > >else > > max_insns_skipped = current_tune->max_insns_skipped; diff --git > > a/gcc/testsuite/gcc.target/arm/max-insns-skipped.c > > b/gcc/testsuite/gcc.target/arm/max-insns-skipped.c > > new file mode 100644 > > index 000..0a11554 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/arm/max-insns-skipped.c > > @@ -0,0 +1,21 @@ > > +/* { dg-do assemble { target arm_thumb2 } } */ > > +/* { dg-options " -Os " } */ > > + > > +int t (int a, int b, int c, int d) > > +{ > > + int r; > > + if (a > 0) { > > +r = a + b; > > +r += 0x456; > > +r *= 0x1234567; > > +} > > + else { > > +r = b - a; > > +r -= 0x123; > > +r *= 0x12387; > > +r += d; > > + } > > + return r; > > +} > > + > > +/* { dg-final { object-size text <= 40 } } */ > > > > > > > > > >
Re: [PATCH, ARM] Keep constants in register when expanding
On 11 August 2014 19:14, Ramana Radhakrishnan wrote: > On Mon, Aug 11, 2014 at 3:35 AM, Zhenqiang Chen > wrote: >> On 8 August 2014 23:22, Ramana Radhakrishnan >> wrote: >>> On Tue, Aug 5, 2014 at 10:31 AM, Zhenqiang Chen >>> wrote: >>>> Hi, >>>> >>>> For some large constants, ARM will split them during expanding, which >>>> makes impossible to hoist them out the loop or shared by different >>>> references (refer the test case in the patch). >>>> >>>> The patch keeps some constants in registers. If the constant can not >>>> be optimized, the cprop and combine passes can optimize them as what >>>> we do in current expand pass with >>>> >>>> define_insn_and_split "*arm_subsi3_insn" >>>> define_insn_and_split "*arm_andsi3_insn" >>>> define_insn_and_split "*iorsi3_insn" >>>> define_insn_and_split "*arm_xorsi3" >>>> >>>> The patch does not modify addsi3 since the define_insn_and_split >>>> "*arm_addsi3" is only valid when (reload_completed || >>>> !arm_eliminable_register (operands[1])). The cprop and combine passes >>>> can not optimize the large constant if we put it in register, which >>>> will lead to regression. >>>> >>>> For logic operators, the patch skips changes for constants: >>>> >>>> INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2]) >>>> >>>> since expand pass always uses "sign-extend" to get the value >>>> (trunc_int_for_mode called from immed_wide_int_const) for rtl, and >>>> logs show most negative values are UNSIGNED when they are TREE node. >>>> And combine pass is smart enough to recover the negative value to >>>> positive value. >>> >>> I am unable to verify any change in code generation for this testcase >>> with and without the patch when I had a play with the patch. >>> >>> what gives ? >> >> Thanks for trying the patch. >> >> Do you add option -fno-gcse which is mentioned in dg-options " -O2 >> -fno-gcse "? Without it, there is no change for the testcase since >> cprop pass will propagate the constant to AND expr (A patch to enhance >> cprop pass was discussed at >> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01321.html). > > Probably not and I can now see the difference in code generated for > Thumb state. Why is it that in ARM state with -mcpu=cortex-a15 we see > the hoisting of the constant without your patch with -fno-gcse ? The difference between ARM and THUMB2 modes are due to rtx_cost difference. For ARM mode, the constant is force_reg in function avoid_expensive_constant (obtabs.c) before gen_andsi3 when expanding. > So, the patch improves code generation for -mcpu=cortex-a15 -mthumb > -fno-gcse for the given testcase ? Yes. >> >> In addition, if the constant can not be hoisted out the loop, later >> combine pass can also optimize it. These (cprop and combine) are >> reasons why the patch itself has little impact on current tests. > > Does this mean you need the referred to patch to be useful as a > pre-requisite ? I fail to understand why this patch needs to go in if > it makes no difference without disabling GCSE. I cannot see -fno-gcse > being used by default for performant code. For some codes, -fno-gcse might get better performance. Please refer paper: A case study: optimizing GCC on ARM for performance of libevas rasterization library http://ctuning.org/dissemination/grow10-03.pdf The issues mentioned in the paper had been solved since arm_split_constant is smart enough to handle the 0xff00ff. But what for other irregular constant? The patch gives a chance to handle them. Thanks! -Zhenqiang > regards > Ramana
Re: [PATCH, ARM] Keep constants in register when expanding
On 12 August 2014 17:24, Richard Earnshaw wrote: > On 05/08/14 10:31, Zhenqiang Chen wrote: >> Hi, >> >> For some large constants, ARM will split them during expanding, which >> makes impossible to hoist them out the loop or shared by different >> references (refer the test case in the patch). >> >> The patch keeps some constants in registers. If the constant can not >> be optimized, the cprop and combine passes can optimize them as what >> we do in current expand pass with >> >> define_insn_and_split "*arm_subsi3_insn" >> define_insn_and_split "*arm_andsi3_insn" >> define_insn_and_split "*iorsi3_insn" >> define_insn_and_split "*arm_xorsi3" >> >> The patch does not modify addsi3 since the define_insn_and_split >> "*arm_addsi3" is only valid when (reload_completed || >> !arm_eliminable_register (operands[1])). The cprop and combine passes >> can not optimize the large constant if we put it in register, which >> will lead to regression. >> >> For logic operators, the patch skips changes for constants: >> >> INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2]) >> >> since expand pass always uses "sign-extend" to get the value >> (trunc_int_for_mode called from immed_wide_int_const) for rtl, and >> logs show most negative values are UNSIGNED when they are TREE node. >> And combine pass is smart enough to recover the negative value to >> positive value. >> >> Bootstrap and no make check regression on Chrome book. >> For coremark, dhrystone and eembcv1, no any code size and performance >> change on Cortex-M4. >> No any file in CSiBE has code size change for Cortex-A15 and Cortex-M4. >> No Spec2000 performance regression on Chrome book and dumped assemble >> codes only show very few difference. >> >> OK for trunk? >> > > Not yet. > > I think the principle is sound, but there are some issues that need > sorting out. > > 1) We shouldn't do this when not optimizing; there's no advantage to > hoisting out the constants in that case. Indeed, it may not be > worth-while at -O1 either. Updated. Only apply it when optimize >= 2 > 2) I think you should be using const_ok_for_op rather than working > through all the permitted values. If that function isn't doing the > right thing, then it probably needs extending to handle those cases as well. There are some difference. I want to filter cases for which arm_split_constant will get better result. Patch is updated to add a function const_ok_for_split, in which (i < 0) && const_ok_for_arm (-i) and 0x can not fit into const_ok_for_op. > 3) Why haven't you handled addsi3? As I had mentioned in the original mail, since the define_insn_and_split "*arm_addsi3" is only valid when (reload_completed || !arm_eliminable_register (operands[1])). The cprop and combine passes can not optimize the large constant if we put it in register (and it is not hoisted out of the loop), which will lead to regression. Any reason for *arm_addsi3 with additional condition (reload_completed || !arm_eliminable_register (operands[1]))? Without it, addsi3 can be handled as subsi3. > See below for some specific comments. All are updated. Thanks! -Zhenqiang >> >> ChangeLog: >> 2014-08-05 Zhenqiang Chen >> >> * config/arm/arm.md (subsi3, andsi3, iorsi3, xorsi3): Keep some >> large constants in register other than split them. >> >> testsuite/ChangeLog: >> 2014-08-05 Zhenqiang Chen >> >> * gcc.target/arm/maskdata.c: New test. >> >> >> skip-split.patch >> >> >> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md >> index bd8ea8f..c8b3001 100644 >> --- a/gcc/config/arm/arm.md >> +++ b/gcc/config/arm/arm.md >> @@ -1162,9 +1162,16 @@ >> { >>if (TARGET_32BIT) >> { >> - arm_split_constant (MINUS, SImode, NULL_RTX, >> - INTVAL (operands[1]), operands[0], >> - operands[2], optimize && can_create_pseudo_p ()); >> + if (!const_ok_for_arm (INTVAL (operands[1])) >> + && can_create_pseudo_p ()) >> + { >> + operands[1] = force_reg (SImode, operands[1]); >> + emit_insn (gen_subsi3 (operands[0], operands[1], operands[2])); > > The final emit_insn shouldn't be be needed. Once you've forced > operands[1] into a register, you can just fall out the bottom of the > pattern and let the default expansion t
[PATCH, ira] Miss checks in split_live_ranges_for_shrink_wrap
Hi, Function split_live_ranges_for_shrink_wrap has code if (!flag_shrink_wrap) return false; But flag_shrink_wrap is TRUE by default when optimize > 0 even if the port does not support shrink-wrap. To make sure shrink-wrap is enabled, "HAVE_simple_return" must be defined and "HAVE_simple_return" must be TRUE. Please refer function.c and shrink-wrap.c on how shrink-wrap is enabled in thread_prologue_and_epilogue_insns. To make the check easy, the patch defines a MICRO: SUPPORT_SHRINK_WRAP_P and replace the uses in ira.c and ifcvt.c Bootstrap and no make check regression on X86-64. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-08-14 Zhenqiang Chen * shrink-wrap.h: #define SUPPORT_SHRINK_WRAP_P. * ira.c: #include "shrink-wrap.h" (split_live_ranges_for_shrink_wrap): Use SUPPORT_SHRINK_WRAP_P. * ifcvt.c: #include "shrink-wrap.h" (dead_or_predicable): Use SUPPORT_SHRINK_WRAP_P. testsuite/ChangeLog: 2014-08-14 Zhenqiang Chen * gcc.target/arm/split-live-ranges-for-shrink-wrap.c: New test. diff --git a/gcc/shrink-wrap.h b/gcc/shrink-wrap.h index bccfb31..31ce2d4 100644 --- a/gcc/shrink-wrap.h +++ b/gcc/shrink-wrap.h @@ -45,6 +45,9 @@ extern edge get_unconverted_simple_return (edge, bitmap_head, extern void convert_to_simple_return (edge entry_edge, edge orig_entry_edge, bitmap_head bb_flags, rtx returnjump, vec unconverted_simple_returns); +#define SUPPORT_SHRINK_WRAP_P (flag_shrink_wrap && HAVE_simple_return) +#else +#define SUPPORT_SHRINK_WRAP_P false #endif #endif /* GCC_SHRINK_WRAP_H */ diff --git a/gcc/ira.c b/gcc/ira.c index ccc6c79..8d58b60 100644 --- a/gcc/ira.c +++ b/gcc/ira.c @@ -392,6 +392,7 @@ along with GCC; see the file COPYING3. If not see #include "lra.h" #include "dce.h" #include "dbgcnt.h" +#include "shrink-wrap.h" struct target_ira default_target_ira; struct target_ira_int default_target_ira_int; @@ -4775,7 +4776,7 @@ split_live_ranges_for_shrink_wrap (void) bitmap_head need_new, reachable; vec queue; - if (!flag_shrink_wrap) + if (!SUPPORT_SHRINK_WRAP_P) return false; bitmap_initialize (&need_new, 0); diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index e44c1dc..c44e1c2 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -42,6 +42,7 @@ #include "df.h" #include "vec.h" #include "dbgcnt.h" +#include "shrink-wrap.h" #ifndef HAVE_conditional_move #define HAVE_conditional_move 0 @@ -4239,14 +4240,13 @@ dead_or_predicable (basic_block test_bb, basic_block merge_bb, if (NONDEBUG_INSN_P (insn)) df_simulate_find_defs (insn, merge_set); -#ifdef HAVE_simple_return /* If shrink-wrapping, disable this optimization when test_bb is the first basic block and merge_bb exits. The idea is to not move code setting up a return register as that may clobber a register used to pass function parameters, which then must be saved in caller-saved regs. A caller-saved reg requires the prologue, killing a shrink-wrap opportunity. */ - if ((flag_shrink_wrap && HAVE_simple_return && !epilogue_completed) + if ((SUPPORT_SHRINK_WRAP_P && !epilogue_completed) && ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb == test_bb && single_succ_p (new_dest) && single_succ (new_dest) == EXIT_BLOCK_PTR_FOR_FN (cfun) @@ -4293,7 +4293,6 @@ dead_or_predicable (basic_block test_bb, basic_block merge_bb, } BITMAP_FREE (return_regs); } -#endif } no_body:
[Committed] [PATCH, ARM] Set max_insns_skipped to MAX_INSN_PER_IT_BLOCK when optimize_size for THUMB2
On 13 August 2014 20:29, Richard Earnshaw wrote: > On 25/02/14 09:34, Zhenqiang Chen wrote: >> Hi, >> >> Current value for max_insns_skipped is 6. For THUMB2, it needs 2 (IF-THEN) >> or 3 (IF-THEN-ELSE) IT blocks to hold all the instructions. The overhead of >> IT is 4 or 6 BYTES. >> >> If we do not generate IT blocks, for IF-THEN, the overhead of conditional >> jump is 2 or 4; for IF-THEN-ELSE, the overhead is 4, 6, or 8. >> >> Most THUMB2 jump instructions are 2 BYTES. Tests on CSiBE show no one file >> has code size regression. So The patch sets max_insns_skipped to >> MAX_INSN_PER_IT_BLOCK. >> >> No make check regression on cortex-m3. >> For CSiBE, no any file has code size regression. And overall there is >0.01% >> code size improvement for cortex-a9 and cortex-m4. >> >> Is it OK? >> >> Thanks! >> -Zhenqiang >> >> 2014-02-25 Zhenqiang Chen >> >> * config/arm/arm.c (arm_option_override): Set max_insns_skipped >> to MAX_INSN_PER_IT_BLOCK when optimize_size for THUMB2. >> >> testsuite/ChangeLog: >> 2014-02-25 Zhenqiang Chen >> >> * gcc.target/arm/max-insns-skipped.c: New test. >> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >> index b49f43e..99cdbc4 100644 >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -2743,6 +2743,15 @@ arm_option_override (void) >>/* If optimizing for size, bump the number of instructions that we >> are prepared to conditionally execute (even on a StrongARM). */ >>max_insns_skipped = 6; >> + >> + /* For THUMB2, it needs 2 (IF-THEN) or 3 (IF-THEN-ELSE) IT blocks to >> + hold all the instructions. The overhead of IT is 4 or 6 BYTES. >> + If we do not generate IT blocks, for IF-THEN, the overhead of >> + conditional jump is 2 or 4; for IF-THEN-ELSE, the overhead is 4, 6 >> + or 8. Most THUMB2 jump instructions are 2 BYTES. >> + So set max_insns_skipped to MAX_INSN_PER_IT_BLOCK. */ >> + if (TARGET_THUMB2) >> + max_insns_skipped = MAX_INSN_PER_IT_BLOCK; > > Replacing a single 2-byte branch with a 2-byte IT insn doesn't save any > space in itself. > > Pedantically, we save space with IT blocks if either: > a) we can replace an else clause (saving a branch around that) > b) we can use non-flag setting versions of insns to replace what would > otherwise be 4-byte insns with 2-byte versions. > > I agree that multiple IT instructions is probably not a win (the > cond-exec code doesn't know how to reason about the finer points of item > b) and doesn't even really consider a) either). > > So this is OK, but I think the comment should be simplified. Just say, > for thumb2 we limit the conditional sequence to one IT block. > > OK with that change. Thanks! Commit the patch with the comment change @r213939. ChangeLog: 2014-08-14 Zhenqiang Chen * config/arm/arm.c (arm_option_override): Set max_insns_skipped to MAX_INSN_PER_IT_BLOCK when optimize_size for THUMB2. testsuite/ChangeLog: 2014-08-14 Zhenqiang Chen * gcc.target/arm/max-insns-skipped.c: New test. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7f62ca4..2f8d327 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2989,6 +2989,10 @@ arm_option_override (void) /* If optimizing for size, bump the number of instructions that we are prepared to conditionally execute (even on a StrongARM). */ max_insns_skipped = 6; + + /* For THUMB2, we limit the conditional sequence to one IT block. */ + if (TARGET_THUMB2) + max_insns_skipped = MAX_INSN_PER_IT_BLOCK; } else max_insns_skipped = current_tune->max_insns_skipped; diff --git a/gcc/testsuite/gcc.target/arm/max-insns-skipped.c b/gcc/testsuite/gcc.target/arm/max-insns-skipped.c new file mode 100644 index 000..0a11554 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/max-insns-skipped.c @@ -0,0 +1,21 @@ +/* { dg-do assemble { target arm_thumb2 } } */ +/* { dg-options " -Os " } */ + +int t (int a, int b, int c, int d) +{ + int r; + if (a > 0) { +r = a + b; +r += 0x456; +r *= 0x1234567; +} + else { +r = b - a; +r -= 0x123; +r *= 0x12387; +r += d; + } + return r; +} + +/* { dg-final { object-size text <= 40 } } */ > >> } >>else >> max_insns_skipped = current_tune->max_insns_skipped; >> diff --git a/gcc/testsuite/gcc.target/arm/max-insns-skipped.c >> b/gcc/testsuite/gcc.target/arm/max-insns-skipped.c >> new file mode 100644 >> index 000..0a11554 >> --- /dev/null >> +++ b/
RE: [PATCH, ira] Miss checks in split_live_ranges_for_shrink_wrap
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Jeff Law > Sent: Saturday, August 30, 2014 4:54 AM > To: Zhenqiang Chen; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH, ira] Miss checks in split_live_ranges_for_shrink_wrap > > On 08/13/14 20:55, Zhenqiang Chen wrote: > > Hi, > > > > Function split_live_ranges_for_shrink_wrap has code > > > >if (!flag_shrink_wrap) > > return false; > > > > But flag_shrink_wrap is TRUE by default when optimize > 0 even if the > > port does not support shrink-wrap. To make sure shrink-wrap is > > enabled, "HAVE_simple_return" must be defined and > "HAVE_simple_return" > > must be TRUE. > > > > Please refer function.c and shrink-wrap.c on how shrink-wrap is > > enabled in thread_prologue_and_epilogue_insns. > > > > To make the check easy, the patch defines a MICRO: > > SUPPORT_SHRINK_WRAP_P and replace the uses in ira.c and ifcvt.c > > > > Bootstrap and no make check regression on X86-64. > > > > OK for trunk? > > > > Thanks! > > -Zhenqiang > > > > ChangeLog: > > 2014-08-14 Zhenqiang Chen > > > > * shrink-wrap.h: #define SUPPORT_SHRINK_WRAP_P. > > * ira.c: #include "shrink-wrap.h" > > (split_live_ranges_for_shrink_wrap): Use SUPPORT_SHRINK_WRAP_P. > > * ifcvt.c: #include "shrink-wrap.h" > > (dead_or_predicable): Use SUPPORT_SHRINK_WRAP_P. > So what's the motivation behind this patch? I can probably guess the > motivation, but I might guess wrong. Since you know the motivation, it's > best if you just tell everyone what it is. To split live-range of register, split_live_ranges_for_shrink_wrap will introduce additional register copies. If such copies can not be optimized by later optimizations, it will lead to code size and performance regression. My tests on ARM THUMB1 code size show lots of regressions due to additional register copies. Shrink-wrap is not enabled for ARM THUMB1, so I think split_live_ranges_for_shrink_wrap should not be called. > > > > testsuite/ChangeLog: > > 2014-08-14 Zhenqiang Chen > > > > * gcc.target/arm/split-live-ranges-for-shrink-wrap.c: New test. > Testcase wasn't included in the patchkit. > > From a pure bikeshedding standpoint "SUPPORT_SHRINK_WRAP_P" seems > poorly named. SHRINK_WRAPPING_ENABLED seems like a better name to > me. > > Can you repost with the testcase included, name change and basic > rationale behind why you want to make this change. I'm pretty sure > it'll be OK at that point. Thanks. Patch is updated according to your comments. -Zhenqiang ChangeLog: 2014-09-01 Zhenqiang Chen * shrink-wrap.h: #define SHRINK_WRAPPING_ENABLED. * ira.c: #include "shrink-wrap.h" (split_live_ranges_for_shrink_wrap): Use SHRINK_WRAPPING_ENABLED. * ifcvt.c: #include "shrink-wrap.h" (dead_or_predicable): Use SHRINK_WRAPPING_ENABLED. testsuite/ChangeLog: 2014-09-01 Zhenqiang Chen * gcc.target/arm/split-live-ranges-for-shrink-wrap.c: New test. diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 94b96f3..d2af0f9 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -42,6 +42,7 @@ #include "df.h" #include "vec.h" #include "dbgcnt.h" +#include "shrink-wrap.h" #ifndef HAVE_conditional_move #define HAVE_conditional_move 0 @@ -4287,14 +4288,13 @@ dead_or_predicable (basic_block test_bb, basic_block merge_bb, if (NONDEBUG_INSN_P (insn)) df_simulate_find_defs (insn, merge_set); -#ifdef HAVE_simple_return /* If shrink-wrapping, disable this optimization when test_bb is the first basic block and merge_bb exits. The idea is to not move code setting up a return register as that may clobber a register used to pass function parameters, which then must be saved in caller-saved regs. A caller-saved reg requires the prologue, killing a shrink-wrap opportunity. */ - if ((flag_shrink_wrap && HAVE_simple_return && !epilogue_completed) + if ((SHRINK_WRAPPING_ENABLED && !epilogue_completed) && ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb == test_bb && single_succ_p (new_dest) && single_succ (new_dest) == EXIT_BLOCK_PTR_FOR_FN (cfun) @@ -4341,7 +4341,6 @@ dead_or_predicable (basic_block test_bb, basic_block merge_bb, } BITMAP_FREE (return_regs); } -#endif } no_body: diff --git a/gcc/ira.c b/gcc/ira.c index 7c18496..f4140e4 100644 --- a/gcc/ira.c +++ b/gcc/ira.
RE: [PATCH 1/n] Add conditional compare support
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Tuesday, November 05, 2013 4:39 AM > To: Zhenqiang Chen > Cc: Richard Earnshaw; 'Richard Biener'; GCC Patches > Subject: Re: [PATCH 1/n] Add conditional compare support > > On 11/04/2013 08:00 PM, Zhenqiang Chen wrote: > > Thanks. I add a new hook. The default function will return -1 if the > > target does not care about the order. > > > > +DEFHOOK > > +(select_ccmp_cmp_order, > > + "For some target (like ARM), the order of two compares is sensitive > > +for\n\ conditional compare. cmp0-cmp1 might be an invalid > > +combination. But when\n\ swapping the order, cmp1-cmp0 is valid. > > +The function will return\n\ > > + -1: if @code{code1} and @code{code2} are valid combination.\n\ > > + 1: if @code{code2} and @code{code1} are valid combination.\n\ > > + 0: both are invalid.", > > + int, (int code1, int code2), > > + default_select_ccmp_cmp_order) > > Fair enough. I'd originally been thinking that returning a tri-state value > akin > to the comparison callback to qsort would allow easy sorting of a whole list > of > comparisons. But probably just as easy to open-code while checking for > invalid combinations. > > Checking for invalid while sorting means that we can then disallow returning > NULL from the other two hooks. Because the backend has already had a > chance to indicate failure. The check is only for the first two compares. And the following compares are not checked. In addition, backend might check more staffs (e.g. arm_select_dominance_cc_mode) to generate a valid compare instruction. > > For gen_ccmp_next, I add another parameter CC_P to indicate the result > > is used as CC or not. If CC_P is false, the gen_ccmp_next will return > > a general register. This is for code like > > > > int test (int a, int b) > > { > > return a > 0 && b > 0; > > } > > During expand, there might have no branch at all. So gen_ccmp_next can > > not return CC for "a > 0 && b > 0". > > Uh, no, this is a terrible idea. There's no need for gen_ccmp_next to re-do > the work of cstore_optab. > > I believe you can use emit_store_flag as a high-level interface here, since > there are technically vagaries due to STORE_FLAG_VALUE. If that turns out > to crash or fail in some way, we can talk about using cstore_optab directly > given some restrictions. emit_store_flag does too much checks. I use cstore_optab to emit the insn. + icode = optab_handler (cstore_optab, CCmode); + if (icode != CODE_FOR_nothing) + { + rtx target = gen_reg_rtx (word_mode); + tmp = emit_cstore (target, icode, NE, CCmode, CCmode, +0, tmp, const0_rtx, 1, word_mode); + if (tmp) + return tmp; + } > It also means that you shouldn't need all of and_scc_scc, ior_scc_scc, > ccmp_and_scc_scc, ccmp_ior_scc_scc. Yes. We only need ccmp_and and ccmp_ior now. I will verify to remove the existing and_scc_scc, ior_scc_scc, and_scc_scc_cmp, ior_scc_scc_cmp once conditional compare is enabled. > Although I don't see cstorecc4 defined for ARM, so there is something > missing. cstorecc4 is added. > > +static int > > +arm_select_ccmp_cmp_order (int cond1, int cond2) { > > + if (cond1 == cond2) > > +return -1; > > + if (comparison_dominates_p ((enum rtx_code) cond1, (enum rtx_code) > cond2)) > > +return 1; > > + if (comparison_dominates_p ((enum rtx_code) cond2, (enum rtx_code) > cond1)) > > +return -1; > > + return 0; > > + > > +} > > This sort does not look stable. In particular, > > if (cond1 == cond2) > return 1; > > would seem to better preserve the original order of the comparisons. -1 is to keep the original order. Anyway I change the function as: +/* COND1 and COND2 should be enum rtx_code, which represent two compares. + There are order sensitive for conditional compare. It returns + 1: Keep current order. + -1: Swap the two compares. + 0: Invalid combination. */ + +static int +arm_select_ccmp_cmp_order (int cond1, int cond2) +{ + /* THUMB1 does not support conditional compare. */ + if (TARGET_THUMB1) +return 0; + + if (cond1 == cond2) +return 1; + if (comparison_dominates_p ((enum rtx_code) cond1, (enum rtx_code) cond2)) +return -1; + if (comparison_dominates_p ((enum rtx_code) cond2, (enum rtx_code) cond1)) +return 1; + + return 0; +} Thanks! -Zhenqiang ccmp-hook4.patch Description: Binary data
[PATCH, ARM, testcase] Skip target arm-neon for lp1243022.c
Hi, lp1243022.c will fail with options: -mfpu=neon -mfloat-abi=hard. Logs show it does not generate auto-incremental instruction in pass auto_inc_dec. In this case, the check of REG_INC note at subreg2 will be invalid. So skip the check for target arm-neon. All PASS with the following options: -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft/-mfpu=vfpv3 -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp/-mfpu=vfpv3 -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard/-mfpu=vfpv3 -mthumb/-mcpu=cortex-a9/-mfloat-abi=soft/-mfpu=neon -mthumb/-mcpu=cortex-a9/-mfloat-abi=softfp/-mfpu=neon -mthumb/-mcpu=cortex-a9/-mfloat-abi=hard/-mfpu=neon -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft/-mfpu=vfpv4 -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp/-mfpu=vfpv4 -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard/-mfpu=vfpv4 -mthumb/-mcpu=cortex-a15/-mfloat-abi=soft/-mfpu=neon -mthumb/-mcpu=cortex-a15/-mfloat-abi=softfp/-mfpu=neon -mthumb/-mcpu=cortex-a15/-mfloat-abi=hard/-mfpu=neon Is it OK? Thanks! -Zhenqiang testsuite/ChangeLog: 2013-11-08 Zhenqiang Chen * gcc.target/arm/lp1243022.c: Skip target arm-neon. --- a/gcc/testsuite/gcc.target/arm/lp1243022.c +++ b/gcc/testsuite/gcc.target/arm/lp1243022.c @@ -1,7 +1,7 @@ /* { dg-do compile { target arm_thumb2 } } */ /* { dg-options "-O2 -fdump-rtl-subreg2" } */ -/* { dg-final { scan-rtl-dump "REG_INC" "subreg2" } } */ +/* { dg-final { scan-rtl-dump "REG_INC" "subreg2" { target { ! arm_neon} } } } */ /* { dg-final { cleanup-rtl-dump "subreg2" } } */ struct device; typedef unsigned int __u32;
[PATCH, 4.8] Backport patches to fix parallel build fail PR 57683
Hi, The issue files in PR 57683 include lra-constraints.o, lra-eliminations.o and tree-switch-conversion.o. Jeff had fixed them in trunk (r197467 and r198999): http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00957.html http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00211.html Can we backport the two patches to fix the parallel build issues in 4.8? Thanks! -Zhenqiang
[PING] [PATCH 1/n] Add conditional compare support
Ping? Thanks! -Zhenqiang > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Zhenqiang Chen > Sent: Wednesday, November 06, 2013 3:39 PM > To: 'Richard Henderson' > Cc: Richard Earnshaw; 'Richard Biener'; GCC Patches > Subject: RE: [PATCH 1/n] Add conditional compare support > > > > -Original Message- > > From: Richard Henderson [mailto:r...@redhat.com] > > Sent: Tuesday, November 05, 2013 4:39 AM > > To: Zhenqiang Chen > > Cc: Richard Earnshaw; 'Richard Biener'; GCC Patches > > Subject: Re: [PATCH 1/n] Add conditional compare support > > > > On 11/04/2013 08:00 PM, Zhenqiang Chen wrote: > > > Thanks. I add a new hook. The default function will return -1 if the > > > target does not care about the order. > > > > > > +DEFHOOK > > > +(select_ccmp_cmp_order, > > > + "For some target (like ARM), the order of two compares is > > > +sensitive for\n\ conditional compare. cmp0-cmp1 might be an > > > +invalid combination. But when\n\ swapping the order, cmp1-cmp0 is > valid. > > > +The function will return\n\ > > > + -1: if @code{code1} and @code{code2} are valid combination.\n\ > > > + 1: if @code{code2} and @code{code1} are valid combination.\n\ > > > + 0: both are invalid.", > > > + int, (int code1, int code2), > > > + default_select_ccmp_cmp_order) > > > > Fair enough. I'd originally been thinking that returning a tri-state > > value akin to the comparison callback to qsort would allow easy > > sorting of a whole list of comparisons. But probably just as easy to > > open-code while checking for invalid combinations. > > > > Checking for invalid while sorting means that we can then disallow > > returning NULL from the other two hooks. Because the backend has > > already had a chance to indicate failure. > > The check is only for the first two compares. And the following compares are > not checked. In addition, backend might check more staffs (e.g. > arm_select_dominance_cc_mode) to generate a valid compare instruction. > > > > For gen_ccmp_next, I add another parameter CC_P to indicate the > > > result is used as CC or not. If CC_P is false, the gen_ccmp_next > > > will return a general register. This is for code like > > > > > > int test (int a, int b) > > > { > > > return a > 0 && b > 0; > > > } > > > During expand, there might have no branch at all. So gen_ccmp_next > > > can not return CC for "a > 0 && b > 0". > > > > Uh, no, this is a terrible idea. There's no need for gen_ccmp_next to > > re-do the work of cstore_optab. > > > > I believe you can use emit_store_flag as a high-level interface here, > > since there are technically vagaries due to STORE_FLAG_VALUE. If that > > turns out to crash or fail in some way, we can talk about using > > cstore_optab directly given some restrictions. > > emit_store_flag does too much checks. I use cstore_optab to emit the insn. > > + icode = optab_handler (cstore_optab, CCmode); > + if (icode != CODE_FOR_nothing) > + { > + rtx target = gen_reg_rtx (word_mode); > + tmp = emit_cstore (target, icode, NE, CCmode, CCmode, > + 0, tmp, const0_rtx, 1, word_mode); > + if (tmp) > + return tmp; > + } > > > It also means that you shouldn't need all of and_scc_scc, ior_scc_scc, > > ccmp_and_scc_scc, ccmp_ior_scc_scc. > > Yes. We only need ccmp_and and ccmp_ior now. > > I will verify to remove the existing and_scc_scc, ior_scc_scc, > and_scc_scc_cmp, ior_scc_scc_cmp once conditional compare is enabled. > > > Although I don't see cstorecc4 defined for ARM, so there is something > > missing. > > cstorecc4 is added. > > > > +static int > > > +arm_select_ccmp_cmp_order (int cond1, int cond2) { > > > + if (cond1 == cond2) > > > +return -1; > > > + if (comparison_dominates_p ((enum rtx_code) cond1, (enum > > > +rtx_code) > > cond2)) > > > +return 1; > > > + if (comparison_dominates_p ((enum rtx_code) cond2, (enum > > > + rtx_code) > > cond1)) > > > +return -1; > > > + return 0; > > > + > > > +} > > > > This sort does not look stable. In particular, > > > > if (cond1 == cond2) > > return 1; > > > > would seem to better preserve the origi
[PATCH] Clean up duplicated function seq_cost
Hi, The are two implementations of seq_cost. The function bodies are exactly the same. The patch removes one of them and make the other global. Bootstrap and no make check regression on X86-64. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-10-09 Zhenqiang Chen * cfgloopanal.c (seq_cost): Make it global. * rtl.h (seq_cost): New prototype. * tree-ssa-loop-ivopts.c (seq_cost): Delete. diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c index 7ea1a5f..dd37aa0 100644 --- a/gcc/cfgloopanal.c +++ b/gcc/cfgloopanal.c @@ -304,7 +304,7 @@ get_loop_level (const struct loop *loop) /* Returns estimate on cost of computing SEQ. */ -static unsigned +unsigned seq_cost (const rtx_insn *seq, bool speed) { unsigned cost = 0; diff --git a/gcc/rtl.h b/gcc/rtl.h index e73f731..b697417 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -2921,6 +2921,7 @@ extern rtx_insn *find_first_parameter_load (rtx_insn *, rtx_insn *); extern bool keep_with_call_p (const rtx_insn *); extern bool label_is_jump_target_p (const_rtx, const rtx_insn *); extern int insn_rtx_cost (rtx, bool); +extern unsigned seq_cost (const rtx_insn *, bool); /* Given an insn and condition, return a canonical description of the test being made. */ diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 400798a..087ca26 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -2842,26 +2842,6 @@ get_use_iv_cost (struct ivopts_data *data, struct iv_use *use, return NULL; } -/* Returns estimate on cost of computing SEQ. */ - -static unsigned -seq_cost (rtx_insn *seq, bool speed) -{ - unsigned cost = 0; - rtx set; - - for (; seq; seq = NEXT_INSN (seq)) -{ - set = single_set (seq); - if (set) - cost += set_src_cost (SET_SRC (set), speed); - else - cost++; -} - - return cost; -} - /* Produce DECL_RTL for object obj so it looks like it is stored in memory. */ static rtx produce_memory_decl_rtl (tree obj, int *regno)
RE: [PATCH] Clean up duplicated function seq_cost
> -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Thursday, October 09, 2014 5:21 PM > To: Zhenqiang Chen > Cc: GCC Patches > Subject: Re: [PATCH] Clean up duplicated function seq_cost > > On Thu, Oct 9, 2014 at 11:20 AM, Richard Biener > wrote: > > On Thu, Oct 9, 2014 at 11:14 AM, Zhenqiang Chen > wrote: > >> Hi, > >> > >> The are two implementations of seq_cost. The function bodies are > >> exactly the same. The patch removes one of them and make the other > global. > >> > >> Bootstrap and no make check regression on X86-64. > >> > >> OK for trunk? > > > > The prototype should go to cfgloopanal.c. > > Err - cfgloopanal.h of course ;) Or rather the function sounds misplaced in > cfgloopanal.c. Thanks for the comments. I think seq_cost should be in rtlanal.c, just like rtx_cost, insn_rtx_cost and so on. ChangeLog: 2014-10-10 Zhenqiang Chen * cfgloopanal.c (seq_cost): Delete. * rtl.h (seq_cost): New prototype. * rtlanal.c (seq_cost): New function. * tree-ssa-loop-ivopts.c (seq_cost): Delete. diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c index 7ea1a5f..006b419 100644 --- a/gcc/cfgloopanal.c +++ b/gcc/cfgloopanal.c @@ -302,26 +302,6 @@ get_loop_level (const struct loop *loop) return mx; } -/* Returns estimate on cost of computing SEQ. */ - -static unsigned -seq_cost (const rtx_insn *seq, bool speed) -{ - unsigned cost = 0; - rtx set; - - for (; seq; seq = NEXT_INSN (seq)) -{ - set = single_set (seq); - if (set) - cost += set_rtx_cost (set, speed); - else - cost++; -} - - return cost; -} - /* Initialize the constants for computing set costs. */ void diff --git a/gcc/rtl.h b/gcc/rtl.h index e73f731..b697417 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -2921,6 +2921,7 @@ extern rtx_insn *find_first_parameter_load (rtx_insn *, rtx_insn *); extern bool keep_with_call_p (const rtx_insn *); extern bool label_is_jump_target_p (const_rtx, const rtx_insn *); extern int insn_rtx_cost (rtx, bool); +extern unsigned seq_cost (const rtx_insn *, bool); /* Given an insn and condition, return a canonical description of the test being made. */ diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 3063458..93eda12 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -5017,6 +5017,26 @@ insn_rtx_cost (rtx pat, bool speed) return cost > 0 ? cost : COSTS_N_INSNS (1); } +/* Returns estimate on cost of computing SEQ. */ + +unsigned +seq_cost (const rtx_insn *seq, bool speed) +{ + unsigned cost = 0; + rtx set; + + for (; seq; seq = NEXT_INSN (seq)) +{ + set = single_set (seq); + if (set) +cost += set_rtx_cost (set, speed); + else +cost++; +} + + return cost; +} + /* Given an insn INSN and condition COND, return the condition in a canonical form to simplify testing by callers. Specifically: diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 400798a..087ca26 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -2842,26 +2842,6 @@ get_use_iv_cost (struct ivopts_data *data, struct iv_use *use, return NULL; } -/* Returns estimate on cost of computing SEQ. */ - -static unsigned -seq_cost (rtx_insn *seq, bool speed) -{ - unsigned cost = 0; - rtx set; - - for (; seq; seq = NEXT_INSN (seq)) -{ - set = single_set (seq); - if (set) - cost += set_src_cost (SET_SRC (set), speed); - else - cost++; -} - - return cost; -} - /* Produce DECL_RTL for object obj so it looks like it is stored in memory. */ static rtx produce_memory_decl_rtl (tree obj, int *regno)
[PATCH, ifcvt] Check size cost in noce_try_store_flag_mask
Hi, Function noce_try_store_flag_mask converts "if (test) x = 0;" to "x &= -(test == 0);" But from code size view, "x &= -(test == 0);" might have more instructions than "if (test) x = 0;". The patch checks the cost to determine the conversion is valuable or not. Bootstrap and no make check regression on X86-64. No make check regression with Cortex-M0 qemu. For CSiBE, ARM Cortex-m0 result is a little better. A little regression for MIPS. Roughly no change for PowerPC. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-10-27 Zhenqiang Chen * ifcvt.c (noce_try_store_flag_mask): Check rtx cost. testsuite/ChangeLog: 2014-10-27 Zhenqiang Chen * gcc.target/arm/ifcvt-size-check.c: New test. diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 949d2b4..3abd518 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -1393,6 +1393,14 @@ noce_try_store_flag_mask (struct noce_if_info *if_info) if (!seq) return FALSE; + if (optimize_function_for_size_p (cfun)) + { + int old_cost = COSTS_N_INSNS (if_info->branch_cost + 1); + int new_cost = seq_cost (seq, 0); + if (new_cost > old_cost) + return FALSE; + } + emit_insn_before_setloc (seq, if_info->jump, INSN_LOCATION (if_info->insn_a)); return TRUE; diff --git a/gcc/testsuite/gcc.target/arm/ifcvt-size-check.c b/gcc/testsuite/gcc.target/arm/ifcvt-size-check.c new file mode 100644 index 000..43fa16b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/ifcvt-size-check.c @@ -0,0 +1,13 @@ +/* { dg-do assemble } */ +/* { dg-options "-mthumb -Os " } */ +/* { dg-require-effective-target arm_thumb1_ok } */ + +int +test (unsigned char iov_len, int count, int i) +{ + unsigned char bytes = 0; + if ((unsigned char) ((char) 127 - bytes) < iov_len) +return 22; + return 0; +} +/* { dg-final { object-size text <= 12 } } */
RE: [Ping] [PATCH, 1/10] two hooks for conditional compare (ccmp)
Thanks for the comments. All comments are accepted and the updated patch is attached. -Zhenqiang > -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Saturday, October 11, 2014 11:00 PM > To: Zhenqiang Chen; gcc-patches@gcc.gnu.org > Subject: Re: [Ping] [PATCH, 1/10] two hooks for conditional compare (ccmp) > > On 09/22/2014 11:43 PM, Zhenqiang Chen wrote: > > > > +@cindex @code{ccmp} instruction pattern @item @samp{ccmp} > Conditional > > +compare instruction. Operand 2 and 5 are RTLs which perform two > > +comparisons. Operand 1 is AND or IOR, which operates on the result > > +of operand 2 and 5. > > +It uses recursive method to support more than two compares. e.g. > > + > > + CC0 = CMP (a, b); > > + CC1 = CCMP (NE (CC0, 0), CMP (e, f)); ... > > + CCn = CCMP (NE (CCn-1, 0), CMP (...)); > > + > > +Two target hooks are used to generate conditional compares. > > +GEN_CCMP_FISRT is used to generate the first CMP. And > GEN_CCMP_NEXT > > +is used to generate the following CCMPs. Operand 1 is AND or IOR. > > +Operand 3 is the result of GEN_CCMP_FISRT or a previous > GEN_CCMP_NEXT. Operand 2 is NE. > > +Operand 4, 5 and 6 is another compare expression. > > + > > +A typical CCMP pattern looks like > > + > > +@smallexample > > +(define_insn "*ccmp_and_ior" > > + [(set (match_operand 0 "dominant_cc_register" "") > > +(compare > > + (match_operator 1 > > + (match_operator 2 "comparison_operator" > > + [(match_operand 3 "dominant_cc_register") > > +(const_int 0)]) > > + (match_operator 4 "comparison_operator" > > + [(match_operand 5 "register_operand") > > +(match_operand 6 "compare_operand"])) > > + (const_int 0)))] > > + "" > > + "@dots{}") > > +@end smallexample > > + > > This whole section should be removed. You do not have a named ccmp > pattern. > Even your example below is an *unnamed pattern. This is an > implementation detail of the aarch64 backend. > > Named patterns are used when that is the interface the middle-end uses to > emit code. But you're not using named patterns, you're using: > > > > > +@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_FIRST (int > @var{code}, > > +rtx @var{op0}, rtx @var{op1}) This function emits a comparison insn > > +for the first of a sequence of conditional comparisions. It returns > > +a comparison expression appropriate for passing to > @code{gen_ccmp_next} or to @code{cbranch_optab}. > > + @code{unsignedp} is used when converting @code{op0} and > @code{op1}'s mode. > > +@end deftypefn > > + > > +@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_NEXT (rtx @var{prev}, > > +int @var{cmp_code}, rtx @var{op0}, rtx @var{op1}, int @var{bit_code}) > > +This function emits a conditional comparison within a sequence of > > +conditional comparisons. The @code{prev} expression is the result of > > +a prior call to @code{gen_ccmp_first} or @code{gen_ccmp_next}. It > > +may return @code{NULL} if the combination of @code{prev} and this > > +comparison is not supported, otherwise the result must be > > +appropriate for passing to @code{gen_ccmp_next} or > @code{cbranch_optab}. @code{bit_code} is AND or IOR, which is the op on > the two compares. > > +@end deftypefn > > Every place above where you refer to the arguments of the function should > use @var; you're using @code for most of them. Use @code{AND} and > @code{IOR}. > > > r~ 1-hooks.patch Description: Binary data
RE: [Ping] [PATCH, 5/10] aarch64: add ccmp operand predicate
Thanks for the comments. All comments are accepted and the updated patch is attached. -Zhenqiang > -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Saturday, October 11, 2014 11:22 PM > To: Zhenqiang Chen; gcc-patches@gcc.gnu.org > Subject: Re: [Ping] [PATCH, 5/10] aarch64: add ccmp operand predicate > > On 09/22/2014 11:44 PM, Zhenqiang Chen wrote: > > +/* Return true if val can be encoded as a 5-bit unsigned immediate. > > +*/ bool > > +aarch64_uimm5 (HOST_WIDE_INT val) > > +{ > > + return (val & (HOST_WIDE_INT) 0x1f) == val; } > > This is just silly. > > > +(define_constraint "Usn" > > + "A constant that can be used with a CCMN operation (once negated)." > > + (and (match_code "const_int") > > + (match_test "aarch64_uimm5 (-ival)"))) > > (match_test "IN_RANGE (ival, -31, 0)") > > > +(define_predicate "aarch64_ccmp_immediate" > > + (and (match_code "const_int") > > + (ior (match_test "aarch64_uimm5 (INTVAL (op))") > > + (match_test "aarch64_uimm5 (-INTVAL (op))" > > (and (match_code "const_int") > (match_test "IN_RANGE (INTVAL (op), -31, 31)")) > > > r~ 5-ccmp-operants.patch Description: Binary data
RE: [Ping] [PATCH, 2/10] prepare ccmp
Thanks for the comments. Patch is updated. -Zhenqiang > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Richard Henderson > Sent: Saturday, October 11, 2014 11:03 PM > To: Zhenqiang Chen; gcc-patches@gcc.gnu.org > Subject: Re: [Ping] [PATCH, 2/10] prepare ccmp > > On 09/22/2014 11:43 PM, Zhenqiang Chen wrote: > > + /* If jumps are cheap and the target does not support conditional > > +compare, turn some more codes into jumpy sequences. */ > > + else if (BRANCH_COST (optimize_insn_for_speed_p (), false) < 4 > > + && (targetm.gen_ccmp_first == NULL)) > > Don't add unnecessary parenthesis around the == expression. > > Otherwise ok. > > > r~ 2-prepare.patch Description: Binary data
RE: [Ping] [PATCH, 6/10] aarch64: add ccmp CC mode
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Saturday, October 11, 2014 11:32 PM > To: Zhenqiang Chen; gcc-patches@gcc.gnu.org > Subject: Re: [Ping] [PATCH, 6/10] aarch64: add ccmp CC mode > > On 09/22/2014 11:44 PM, Zhenqiang Chen wrote: > > +case CC_DNEmode: > > + return comp_code == NE ? AARCH64_NE : AARCH64_EQ; > > +case CC_DEQmode: > > + return comp_code == NE ? AARCH64_EQ : AARCH64_NE; > > +case CC_DGEmode: > > + return comp_code == NE ? AARCH64_GE : AARCH64_LT; > > +case CC_DLTmode: > > + return comp_code == NE ? AARCH64_LT : AARCH64_GE; > > +case CC_DGTmode: > > + return comp_code == NE ? AARCH64_GT : AARCH64_LE; > > +case CC_DLEmode: > > + return comp_code == NE ? AARCH64_LE : AARCH64_GT; > > +case CC_DGEUmode: > > + return comp_code == NE ? AARCH64_CS : AARCH64_CC; > > +case CC_DLTUmode: > > + return comp_code == NE ? AARCH64_CC : AARCH64_CS; > > +case CC_DGTUmode: > > + return comp_code == NE ? AARCH64_HI : AARCH64_LS; > > +case CC_DLEUmode: > > + return comp_code == NE ? AARCH64_LS : AARCH64_HI; > > I think these should return -1 if comp_code is not EQ. Like the CC_Zmode > case below. Since the code can not guarantee that the CC is used in cbranchcc insns when expand, it maybe in a tmp register. After some optimizations the CC is forwarded in cbranchcc insn. So the comp_code might be any legal COMPARE. And ccmp CC mode has the flag based on NE, so the patch reverts the flag if not NE. > Perhaps you can share some code to make the whole thing less bulky. E.g. > > ... > case CC_DLEUmode: > ne = AARCH64_LS; > eq = AARCH64_HI; > break; > case CC_Zmode: > ne = AARCH64_NE; > eq = AARCH64_EQ; > break; > } > if (code == NE) > return ne; > if (code == EQ) > return eq; > return -1; Thanks for the comments. Patch is updated. > This does beg the question of whether you need both CC_Zmode and > CC_DNEmode. > I'll leave it to an ARM maintainer to say which one of the two should be kept. > > > r~ 6-ccmp-cc-mode.patch Description: Binary data
RE: [Ping] [PATCH, 7/10] aarch64: add function to output ccmp insn
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Sunday, October 12, 2014 12:52 PM > To: Zhenqiang Chen; gcc-patches@gcc.gnu.org > Subject: Re: [Ping] [PATCH, 7/10] aarch64: add function to output ccmp insn > > On 10/11/2014 09:11 AM, Richard Henderson wrote: > > On 09/22/2014 11:45 PM, Zhenqiang Chen wrote: > >> +static unsigned int > >> +aarch64_code_to_nzcv (enum rtx_code code, bool inverse) { > >> + switch (code) > >> +{ > >> +case NE: /* NE, Z == 0. */ > >> + return inverse ? AARCH64_CC_Z : 0; > >> +case EQ: /* EQ, Z == 1. */ > >> + return inverse ? 0 : AARCH64_CC_Z; > >> +case LE: /* LE, !(Z == 0 && N == V). */ > >> + return inverse ? AARCH64_CC_N | AARCH64_CC_V : AARCH64_CC_Z; > >> +case GT: /* GT, Z == 0 && N == V. */ > >> + return inverse ? AARCH64_CC_Z : AARCH64_CC_N | AARCH64_CC_V; > >> +case LT: /* LT, N != V. */ > >> + return inverse ? AARCH64_CC_N | AARCH64_CC_V : AARCH64_CC_N; > >> +case GE: /* GE, N == V. */ > >> + return inverse ? AARCH64_CC_N : AARCH64_CC_N | AARCH64_CC_V; > >> +case LEU: /* LS, !(C == 1 && Z == 0). */ > >> + return inverse ? AARCH64_CC_C: AARCH64_CC_Z; > >> +case GTU: /* HI, C ==1 && Z == 0. */ > >> + return inverse ? AARCH64_CC_Z : AARCH64_CC_C; > >> +case LTU: /* CC, C == 0. */ > >> + return inverse ? AARCH64_CC_C : 0; > >> +case GEU: /* CS, C == 1. */ > >> + return inverse ? 0 : AARCH64_CC_C; > >> +default: > >> + gcc_unreachable (); > >> + return 0; > >> +} > >> +} > >> + > > > > I'm not overly fond of this, since "code" doesn't map 1-1. It needs > > the context of a mode to provide a unique mapping. > > > > I think it would be better to rearrange the existing aarch64_cond_code > > enum such that AARCH64_NE et al are meaningful wrt NZCV. Then you can > > use > > aarch64_get_condition_code_1 to get this mapping. > > Slight mistake in the advice here. I think you should use > aarch64_get_conditional_code_1 to get an aarch64_cond_code, and use that > to index an array to get the nzcv bits. Thanks for the comments. Patch is updated. > Further, does it actually make sense to store both nzcv and its inverse, or > does it work to use nzcv and ~nzcv? Flag GE checks N == V and LT checks N != V. We can no distinguish them with nzcv and ~nzcv Since aarch64_output_ccmp is removed as your comments, the patch is combined with [PATCH, 8/10] aarch64: ccmp insn patterns. Please find the updated patch in later email. Thanks! -Zhenqiang
RE: [Ping] [PATCH, 8/10] aarch64: ccmp insn patterns
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Sunday, October 12, 2014 4:12 AM > To: Zhenqiang Chen; gcc-patches@gcc.gnu.org > Subject: Re: [Ping] [PATCH, 8/10] aarch64: ccmp insn patterns > > On 09/22/2014 11:45 PM, Zhenqiang Chen wrote: > > +(define_expand "cbranchcc4" > > + [(set (pc) (if_then_else > > + (match_operator 0 "aarch64_comparison_operator" > > + [(match_operand 1 "cc_register" "") > > + (const_int 0)]) > > + (label_ref (match_operand 3 "" "")) > > + (pc)))] > > + "" > > + " ") > > Extra space. Updated. > > +(define_insn "*ccmp_and" > > + [(set (match_operand 6 "ccmp_cc_register" "") > > + (compare > > +(and:SI > > + (match_operator 4 "aarch64_comparison_operator" > > + [(match_operand 0 "ccmp_cc_register" "") > > + (match_operand 1 "aarch64_plus_operand" "")]) > > + (match_operator 5 "aarch64_comparison_operator" > > + [(match_operand:GPI 2 "register_operand" "r,r,r") > > + (match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn")])) > > +(const_int 0)))] > > + "" > > + { > > +return aarch64_output_ccmp (operands, true, which_alternative); > > + } > > + [(set_attr "type" "alus_sreg,alus_imm,alus_imm")] > > +) > > + > > +(define_insn "*ccmp_ior" > > + [(set (match_operand 6 "ccmp_cc_register" "") > > + (compare > > +(ior:SI > > + (match_operator 4 "aarch64_comparison_operator" > > + [(match_operand 0 "ccmp_cc_register" "") > > + (match_operand 1 "aarch64_plus_operand" "")]) > > + (match_operator 5 "aarch64_comparison_operator" > > + [(match_operand:GPI 2 "register_operand" "r,r,r") > > + (match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn")])) > > +(const_int 0)))] > > + "" > > + { > > +return aarch64_output_ccmp (operands, false, which_alternative); > > + } > > + [(set_attr "type" "alus_sreg,alus_imm,alus_imm")] > > Surely not aarch64_plus_operand for operand 1. That's a comparison with > the flags register. Surely (const_int 0) is the only valid operand there. > > These could be combined with a code iterator, and thus there would be > exactly one call to aarch64_output_ccmp, and thus inlined. Although... > > It seems to me that you don't need a function call at all. How about > > AND > "@ >ccmp\\t%2, %3, %K5, %m4 >ccmp\\t%2, %3, %K5, %m4 >ccmn\\t%2, #%n3, %K5, %m4" > > IOR > "@ >ccmp\\t%2, %3, %k5, %M4 >ccmp\\t%2, %3, %k5, %M4 >ccmn\\t%2, #%n3, %k5, %M4" > > where 'k' and 'K' are new print_operand codes that output the nzcv (or its > inverse) integer for the comparison, much like 'm' and 'M' print the name of > the comparison. Updated with a bit change. + ccmp\\t%2, %3, %z5, %k4 + ccmp\\t%2, %3, %z5, %k4 + ccmn\\t%2, #%n3, %z5, %k4" z/Z is the k/K as you suggested. I can not reuse m/M since they are special for cbranchcc (with CCMP). So I use new added k/K. Thanks! -Zhenqiang 7-8-ccmp-patterns.patch Description: Binary data
RE: [Ping] [PATCH, 9/10] aarch64: generate conditional compare instructions
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Sunday, October 12, 2014 4:46 AM > To: Zhenqiang Chen; gcc-patches@gcc.gnu.org > Subject: Re: [Ping] [PATCH, 9/10] aarch64: generate conditional compare > instructions > > On 09/22/2014 11:46 PM, Zhenqiang Chen wrote: > > +static bool > > +aarch64_convert_mode (rtx* op0, rtx* op1, int unsignedp) { > > + enum machine_mode mode; > > + > > + mode = GET_MODE (*op0); > > + if (mode == VOIDmode) > > +mode = GET_MODE (*op1); > > + > > + if (mode == QImode || mode == HImode) > > +{ > > + *op0 = convert_modes (SImode, mode, *op0, unsignedp); > > + *op1 = convert_modes (SImode, mode, *op1, unsignedp); > > +} > > + else if (mode != SImode && mode != DImode) > > +return false; > > + > > + return true; > > +} > > Hum. I'd rather not replicate too much of the expander logic here. > > We could avoid that by using struct expand_operand, create_input_operand > et al, then expand_insn. That does require that the target hooks be given > trees rather than rtl as input. I had tried to use tree/gimple as input. But the codes was more complexity than current one. And comments in https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02027.html " I suspect it might be better to just hoist any preparation operations above the entire conditional compare sequence, so that by the time we start the ccmp expansion we're dealing with operands that are in the 'natural' sizes for the machine (breaking up the conditional compare sequence for what are almost side-effect operations sounds like a source of potential bugs). This would also ensure that the back-end can safely re-order at least some comparison operations if this leads a workable conditional compare sequence. " I think the mode conversion (to SImode or DImode) is target dependent. Thanks! -Zhenqiang
RE: [Ping] [PATCH, 10/10] aarch64: Handle ccmp in ifcvt to make it work with cmov
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Sunday, October 12, 2014 5:40 AM > To: Zhenqiang Chen; gcc-patches@gcc.gnu.org > Subject: Re: [Ping] [PATCH, 10/10] aarch64: Handle ccmp in ifcvt to make it > work with cmov > > On 09/22/2014 11:46 PM, Zhenqiang Chen wrote: > > @@ -2375,10 +2387,21 @@ noce_get_condition (rtx_insn *jump, rtx_insn > **earliest, bool then_else_reversed > >return cond; > > } > > > > + /* For conditional compare, set ALLOW_CC_MODE to TRUE. */ > > + if (targetm.gen_ccmp_first) > > +{ > > + rtx prev = prev_nonnote_nondebug_insn (jump); > > + if (prev > > + && NONJUMP_INSN_P (prev) > > + && BLOCK_FOR_INSN (prev) == BLOCK_FOR_INSN (jump) > > + && ccmp_insn_p (prev)) > > + allow_cc_mode = true; > > +} > > + > >/* Otherwise, fall back on canonicalize_condition to do the dirty > > work of manipulating MODE_CC values and COMPARE rtx codes. */ > >tmp = canonicalize_condition (jump, cond, reverse, earliest, > > - NULL_RTX, false, true); > > + NULL_RTX, allow_cc_mode, true); > > This needs a lot more explanation. Why it it ok to allow a cc_mode when the > source is a ccmp, and not for any other comparison? > > The issue is going to be how we use the comparison once we've finished with > the transformation. Is it going to be able to be properly handled by > emit_conditional_move? > > If the target doesn't have cbranchcc4, I think that prep_cmp_insn will fail. > But as you show from Good point. It is not ccmp special. It is cbranchcc4 related. If I understand correct, without cbranchcc4, we need put the result to a tmp register and generate additional compares, which is not good for performance. I update the patch to check +#if HAVE_cbranchcc4 + allow_cc_mode = true; +#endif Thanks! -Zhenqiang > > +++ b/gcc/config/aarch64/aarch64.md > > @@ -2589,15 +2589,19 @@ > >(match_operand:ALLI 3 "register_operand" "")))] > >"" > >{ > > -rtx ccreg; > > enum rtx_code code = GET_CODE (operands[1]); > > > > if (code == UNEQ || code == LTGT) > >FAIL; > > > > -ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0), > > - XEXP (operands[1], 1)); > > -operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx); > > +if (!ccmp_cc_register (XEXP (operands[1], 0), > > + GET_MODE (XEXP (operands[1], 0 > > + { > > + rtx ccreg; > > + ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0), > > +XEXP (operands[1], 1)); > > + operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx); > > + } > > this change, even more than that may be required. > > > r~ 10-ifcvt.patch Description: Binary data
[Ping ^ 2] [PATCH] Fix PR 61225
Ping ^2? Thanks! -Zhenqiang On 28 May 2014 15:02, Zhenqiang Chen wrote: > Ping? > > Thanks! > -Zhenqiang > > On 22 May 2014 17:52, Zhenqiang Chen wrote: >> On 21 May 2014 20:43, Steven Bosscher wrote: >>> On Wed, May 21, 2014 at 11:58 AM, Zhenqiang Chen wrote: >>>> Hi, >>>> >>>> The patch fixes the gcc.target/i386/pr49095.c FAIL in PR61225. The >>>> test case tends to check a peephole2 optimization, which optimizes the >>>> following sequence >>>> >>>> 2: bx:SI=ax:SI >>>> 25: ax:SI=[bx:SI] >>>> 7: {ax:SI=ax:SI-0x1;clobber flags:CC;} >>>> 8: [bx:SI]=ax:SI >>>> 9: flags:CCZ=cmp(ax:SI,0) >>>> to >>>>2: bx:SI=ax:SI >>>>41: {flags:CCZ=cmp([bx:SI]-0x1,0);[bx:SI]=[bx:SI]-0x1;} >>>> >>>> The enhanced shrink-wrapping, which calls copyprop_hardreg_forward >>>> changes the INSN 25 to >>>> >>>> 25: ax:SI=[ax:SI] >>>> >>>> Then peephole2 can not optimize it since two memory_operands look like >>>> different. >>>> >>>> To fix it, the patch adds another peephole2 rule to read one more >>>> insn. From the register copy, it knows the address is the same. >>> >>> That is one complex peephole2 to deal with a transformation like this. >>> It seems to be like it's a too specific solution for a bigger problem. >>> >>> Could you please try one of the following solutions instead: >>> >>> 1. Track register values for peephole2 and try different alternatives >>> based on known register equivalences? E.g. in your example, perhaps >>> there is already a REG_EQUAL/REG_EQUIV note available on insn 25 after >>> copyprop_hardreg_forward, to annotate that [ax:SI] is equivalent to >>> [bx:SI] at that point (or if that information is not available, it is >>> not very difficult to make it available). Then you could try applying >>> peephole2 on the original pattern but also on patterns modified with >>> the known equivalences (i.e. try peephole2 on multiple equivalent >>> patterns for the same insn). This may expose other peephole2 >>> opportunities, not just the specific one your patch addresses. >> >> Patch is updated according to the comment. There is no REG_EQUAL. So I >> add it when replace_oldest_value_reg. >> >> ChangeLog: >> 2014-05-22 Zhenqiang Chen >> >> Part of PR rtl-optimization/61225 >> * config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p): New proto. >> * config/i386/i386.c (ix86_peephole2_rtx_equal_p): New function. >> * regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note when >> propagating to SET. >> >> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h >> index 39462bd..0c4a2b9 100644 >> --- a/gcc/config/i386/i386-protos.h >> +++ b/gcc/config/i386/i386-protos.h >> @@ -42,6 +42,7 @@ extern enum calling_abi ix86_function_type_abi >> (const_tree); >> >> extern void ix86_reset_previous_fndecl (void); >> >> +extern bool ix86_peephole2_rtx_equal_p (rtx, rtx, rtx, rtx); >> #ifdef RTX_CODE >> extern int standard_80387_constant_p (rtx); >> extern const char *standard_80387_constant_opcode (rtx); >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> index 6ffb788..583ebe8 100644 >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -46856,6 +46856,29 @@ ix86_atomic_assign_expand_fenv (tree *hold, >> tree *clear, tree *update) >> atomic_feraiseexcept_call); >> } >> >> +/* OP0 is the SET_DEST of INSN and OP1 is the SET_SRC of INSN. >> + Check whether OP1 and OP6 is equal. */ >> + >> +bool >> +ix86_peephole2_rtx_equal_p (rtx insn, rtx op0, rtx op1, rtx op6) >> +{ >> + rtx note; >> + >> + if (!reg_overlap_mentioned_p (op0, op1) && rtx_equal_p (op1, op6)) >> +return true; >> + >> + gcc_assert (single_set (insn) >> + && op0 == SET_DEST (single_set (insn)) >> + && op1 == SET_SRC (single_set (insn))); >> + >> + note = find_reg_note (insn, REG_EQUAL, NULL_RTX); >> + if (note >> + && !reg_overlap_mentioned_p (op0, XEXP (note, 0)) >> + && rtx_equal_p (XEXP (note, 0), op6)) >> +return true; >> + >> + return false; >> +} >> /* Initial
[PATCH, loop2_invariant, 1/2] Check only one register class
Hi, For loop2-invariant pass, when flag_ira_loop_pressure is enabled, function gain_for_invariant checks the pressures of all register classes. This does not make sense since one invariant might impact only one register class. The patch enhances functions get_inv_cost and gain_for_invariant to check only the register pressure of the invariant if possible. Bootstrap and no make check regression on X86-64. Bootstrap and no make check regression on X86-64 with flag_ira_loop_pressure = true. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-10 Zhenqiang Chen * loop-invariant.c (get_inv_cost): Handle register class. (gain_for_invariant): Check the register pressure of the inv, other than all. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 100a2c1..e822bb6 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1058,16 +1058,22 @@ get_pressure_class_and_nregs (rtx insn, int *nregs) } /* Calculates cost and number of registers needed for moving invariant INV - out of the loop and stores them to *COST and *REGS_NEEDED. */ + out of the loop and stores them to *COST and *REGS_NEEDED. *CL will be + the REG_CLASS of INV. Return + 0: if INV is invalid. + 1: if INV and its depends_on have same reg_class + > 1: if INV and its depends_on have different reg_classes. */ -static void -get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) +static int +get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed, + enum reg_class *cl) { int i, acomp_cost; unsigned aregs_needed[N_REG_CLASSES]; unsigned depno; struct invariant *dep; bitmap_iterator bi; + int ret = 2; /* Find the representative of the class of the equivalent invariants. */ inv = invariants[inv->eqto]; @@ -1083,7 +1089,7 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) if (inv->move || inv->stamp == actual_stamp) -return; +return 0; inv->stamp = actual_stamp; if (! flag_ira_loop_pressure) @@ -1095,6 +1101,8 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) pressure_class = get_pressure_class_and_nregs (inv->insn, &nregs); regs_needed[pressure_class] += nregs; + *cl = pressure_class; + ret = 1; } if (!inv->cheap_address @@ -1135,10 +1143,12 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) EXECUTE_IF_SET_IN_BITMAP (inv->depends_on, 0, depno, bi) { bool check_p; + enum reg_class dep_cl = NO_REGS; + int dep_ret; dep = invariants[depno]; - get_inv_cost (dep, &acomp_cost, aregs_needed); + dep_ret = get_inv_cost (dep, &acomp_cost, aregs_needed, &dep_cl); if (! flag_ira_loop_pressure) check_p = aregs_needed[0] != 0; @@ -1148,6 +1158,11 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) if (aregs_needed[ira_pressure_classes[i]] != 0) break; check_p = i < ira_pressure_classes_num; + + if (dep_ret > 1) + ret += dep_ret; + else if ((dep_ret == 1) && (*cl != dep_cl)) + ret++; } if (check_p /* We need to check always_executed, since if the original value of @@ -1181,6 +1196,7 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) } (*comp_cost) += acomp_cost; } + return ret; } /* Calculates gain for eliminating invariant INV. REGS_USED is the number @@ -1195,10 +1211,12 @@ gain_for_invariant (struct invariant *inv, unsigned *regs_needed, bool speed, bool call_p) { int comp_cost, size_cost; + enum reg_class cl; + int ret; actual_stamp++; - get_inv_cost (inv, &comp_cost, regs_needed); + ret = get_inv_cost (inv, &comp_cost, regs_needed, &cl); if (! flag_ira_loop_pressure) { @@ -1207,6 +1225,24 @@ gain_for_invariant (struct invariant *inv, unsigned *regs_needed, - estimate_reg_pressure_cost (new_regs[0], regs_used, speed, call_p)); } + else if (ret == 0) +return -1; + else if (ret == 1) +{ + /* Hoist it anyway since it does not impact register pressure. */ + if (cl == NO_REGS) + return 1; + + size_cost = (int) new_regs[cl] ++ (int) regs_needed[cl] ++ LOOP_DATA (curr_loop)->max_reg_pressure[cl] ++ IRA_LOOP_RESERVED_REGS +- ira_class_hard_regs_num[cl]; + if (size_cost > 0) + return -1; + else + size_cost = 0; +} else { int i;
[PATCH, loop2_invariant, 2/2] Change heuristics for identical invariants
Hi, When analysing logs of loop2-invariant of eembc, I found the same invariant occurred lots of times in a loop. But it was not selected since its cost was not high and register pressure was high. Logs show performance improvement by giving them higher priority to move. The patch changes the heuristics to move identical invariants: * add a new member eqno, which records the number of invariants eqto the inv. * set its cost to: inv->cost * inv->eqno; * compare with spill_cost if register pressure is high. Bootstrap and no make check regression on X86-64. Bootstrap and no make check regression on X86-64 with flag_ira_loop_pressure = true. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-10 Zhenqiang Chen * loop-invariant.c (struct invariant): Add a new member: eqno; (find_identical_invariants): Update eqno; (create_new_invariant): Init eqno; (get_inv_cost): Compute comp_cost wiht eqno; (gain_for_invariant): Take spill cost into account. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 92388f5..c43206a 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -104,6 +104,9 @@ struct invariant /* The number of the invariant with the same value. */ unsigned eqto; + /* The number of invariants which eqto this. */ + unsigned eqno; + /* If we moved the invariant out of the loop, the register that contains its value. */ rtx reg; @@ -498,6 +501,7 @@ find_identical_invariants (invariant_htab_type eq, struct invariant *inv) struct invariant *dep; rtx expr, set; enum machine_mode mode; + struct invariant *tmp; if (inv->eqto != ~0u) return; @@ -513,7 +517,12 @@ find_identical_invariants (invariant_htab_type eq, struct invariant *inv) mode = GET_MODE (expr); if (mode == VOIDmode) mode = GET_MODE (SET_DEST (set)); - inv->eqto = find_or_insert_inv (eq, expr, mode, inv)->invno; + + tmp = find_or_insert_inv (eq, expr, mode, inv); + inv->eqto = tmp->invno; + + if (tmp->invno != inv->invno && inv->always_executed) +tmp->eqno++; if (dump_file && inv->eqto != inv->invno) fprintf (dump_file, @@ -725,6 +734,10 @@ create_new_invariant (struct def *def, rtx insn, bitmap depends_on, inv->invno = invariants.length (); inv->eqto = ~0u; + + /* Itself. */ + inv->eqno = 1; + if (def) def->invno = inv->invno; invariants.safe_push (inv); @@ -1107,7 +1120,7 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed, if (!inv->cheap_address || inv->def->n_addr_uses < inv->def->n_uses) -(*comp_cost) += inv->cost; +(*comp_cost) += inv->cost * inv->eqno; #ifdef STACK_REGS { @@ -1243,7 +1256,13 @@ gain_for_invariant (struct invariant *inv, unsigned *regs_needed, + IRA_LOOP_RESERVED_REGS - ira_class_hard_regs_num[cl]; if (size_cost > 0) - return -1; + { + int spill_cost = target_spill_cost [speed] * (int) regs_needed[cl]; + if (comp_cost <= spill_cost) + return -1; + + return 2; + } else size_cost = 0; }
[PATCH, loop2_invariant] Skip inv (marked as move) from depends_on
Hi, The patch skips an invariant from depends_on if it has been marked as "move" since its register pressure and cost had been taken into account in previous iterations. Bootstrap and no make check regression on X86-64. Bootstrap and no make check regression on X86-64 with flag_ira_loop_pressure = true. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-10 Zhenqiang Chen * loop-invariant.c (get_inv_cost): Skip invariants, which are marked as "move", from depends_on. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index e822bb6..fca9c2f 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1148,6 +1148,10 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed, dep = invariants[depno]; + /* If DEP is moved out of the loop, it is not a depends_on any more. */ + if (dep->move) + continue; + dep_ret = get_inv_cost (dep, &acomp_cost, aregs_needed, &dep_cl); if (! flag_ira_loop_pressure)
[PATCH, loop2_invariant] Pre-check invariants
Hi, During tests, I found some invariants could not be replaced at the last stage. If we can identify such invariants earlier, we can skip them and give the chance to other invariants. So the patch pre-checks candidates to skip the one which can not make a valid insn during replacement in move_invariant_reg. Bootstrap and no make check regression on X86-64. Bootstrap and no make check regression on X86-64 with flag_ira_loop_pressure = true. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-10 Zhenqiang Chen * loop-invariant.c (find_invariant_insn): Skip invariants, which can not make a valid insn during replacement in move_invariant_reg. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index c43206a..7be4b29 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -881,6 +881,35 @@ find_invariant_insn (rtx insn, bool always_reached, bool always_executed) || HARD_REGISTER_P (dest)) simple = false; + /* Pre-check candidate to skip the one which can not make a valid insn + during move_invariant_reg. */ + if (flag_ira_loop_pressure && df_live && simple + && REG_P (dest) && DF_REG_DEF_COUNT (REGNO (dest)) > 1) +{ + df_ref use; + rtx ref; + unsigned int i = REGNO (dest); + struct df_insn_info *insn_info; + df_ref *def_rec; + + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use)) + { + ref = DF_REF_INSN (use); + insn_info = DF_INSN_INFO_GET (ref); + + for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++) + if (DF_REF_REGNO (*def_rec) == i) + { + /* Multi definitions at this stage, most likely are due to + instruction constrain, which requires both read and write + on the same register. Since move_invariant_reg is not + powerful enough to handle such cases, just ignore the INV + and leave the chance to others. */ + return; + } + } +} + if (!may_assign_reg_p (SET_DEST (set)) || !check_maybe_invariant (SET_SRC (set))) return;
Re: [PATCH, loop2_invariant] Pre-check invariants
On 10 June 2014 19:01, Steven Bosscher wrote: > On Tue, Jun 10, 2014 at 11:55 AM, Zhenqiang Chen wrote: >> >> * loop-invariant.c (find_invariant_insn): Skip invariants, which >> can not make a valid insn during replacement in move_invariant_reg. >> >> --- a/gcc/loop-invariant.c >> +++ b/gcc/loop-invariant.c >> @@ -881,6 +881,35 @@ find_invariant_insn (rtx insn, bool >> always_reached, bool always_executed) >>|| HARD_REGISTER_P (dest)) >> simple = false; >> >> + /* Pre-check candidate to skip the one which can not make a valid insn >> + during move_invariant_reg. */ >> + if (flag_ira_loop_pressure && df_live && simple >> + && REG_P (dest) && DF_REG_DEF_COUNT (REGNO (dest)) > 1) > > Why only do this with (flag_ira_loop_pressure && df_live)? If the > invariant can't be moved, we should ignore it regardless of whether > register pressure is taken into account. Thanks for the comments. df_live seams redundant. With flag_ira_loop_pressure, the pass will call df_analyze () at the beginning, which can make sure all the DF info are correct. Can we guarantee all DF_... correct without df_analyze ()? >> +{ >> + df_ref use; >> + rtx ref; >> + unsigned int i = REGNO (dest); >> + struct df_insn_info *insn_info; >> + df_ref *def_rec; >> + >> + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use)) >> + { >> + ref = DF_REF_INSN (use); >> + insn_info = DF_INSN_INFO_GET (ref); >> + >> + for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++) >> + if (DF_REF_REGNO (*def_rec) == i) >> + { >> + /* Multi definitions at this stage, most likely are due to >> + instruction constrain, which requires both read and write >> + on the same register. Since move_invariant_reg is not >> + powerful enough to handle such cases, just ignore the INV >> + and leave the chance to others. */ >> + return; >> + } >> + } >> +} >> + >>if (!may_assign_reg_p (SET_DEST (set)) >>|| !check_maybe_invariant (SET_SRC (set))) >> return; > > > Can you put your new check between "may_assign_reg_p (dest)" and > "check_maybe_invariant"? The may_assign_reg_p check is cheap and > triggers quite often. Updated and also removed the "flag_ira_loop_pressure && df_live". To make it easy, I move the codes to a new function. OK for trunk? diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index c6bf19b..d19f3c8 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -839,6 +852,39 @@ check_dependencies (rtx insn, bitmap depends_on) return true; } +/* Pre-check candidate DEST to skip the one which can not make a valid insn + during move_invariant_reg. SIMPlE is to skip HARD_REGISTER. */ +static bool +pre_check_invariant_p (bool simple, rtx dest) +{ + if (simple && REG_P (dest) && DF_REG_DEF_COUNT (REGNO (dest)) > 1) +{ + df_ref use; + rtx ref; + unsigned int i = REGNO (dest); + struct df_insn_info *insn_info; + df_ref *def_rec; + + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use)) + { + ref = DF_REF_INSN (use); + insn_info = DF_INSN_INFO_GET (ref); + + for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++) + if (DF_REF_REGNO (*def_rec) == i) + { + /* Multi definitions at this stage, most likely are due to + instruction constrain, which requires both read and write + on the same register. Since move_invariant_reg is not + powerful enough to handle such cases, just ignore the INV + and leave the chance to others. */ + return false; + } + } +} + return true; +} + /* Finds invariant in INSN. ALWAYS_REACHED is true if the insn is always executed. ALWAYS_EXECUTED is true if the insn is always executed, unless the program ends due to a function call. */ @@ -868,7 +914,8 @@ find_invariant_insn (rtx insn, bool always_reached, bool always_executed) || HARD_REGISTER_P (dest)) simple = false; - if (!may_assign_reg_p (SET_DEST (set)) + if (!may_assign_reg_p (dest) + || !pre_check_invariant_p (simple, dest) || !check_maybe_invariant (SET_SRC (set))) return;
Re: [PATCH, loop2_invariant, 1/2] Check only one register class
On 10 June 2014 19:06, Steven Bosscher wrote: > On Tue, Jun 10, 2014 at 11:22 AM, Zhenqiang Chen wrote: >> Hi, >> >> For loop2-invariant pass, when flag_ira_loop_pressure is enabled, >> function gain_for_invariant checks the pressures of all register >> classes. This does not make sense since one invariant might impact >> only one register class. >> >> The patch enhances functions get_inv_cost and gain_for_invariant to >> check only the register pressure of the invariant if possible. > > This patch may work for targets with more-or-less orthogonal reg > classes, but not if there is a lot of overlap between reg classes. Yes. I need check the overlap between reg classes. Patch is updated to check all overlap reg classes by reg_classes_intersect_p: diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index c76a2a0..6e43b49 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1092,16 +1092,22 @@ get_pressure_class_and_nregs (rtx insn, int *nregs) } /* Calculates cost and number of registers needed for moving invariant INV - out of the loop and stores them to *COST and *REGS_NEEDED. */ + out of the loop and stores them to *COST and *REGS_NEEDED. *CL will be + the REG_CLASS of INV. Return + 0: if INV is invalid. + 1: if INV and its depends_on have same reg_class + > 1: if INV and its depends_on have different reg_classes. */ -static void -get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) +static int +get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed, + enum reg_class *cl) { int i, acomp_cost; unsigned aregs_needed[N_REG_CLASSES]; unsigned depno; struct invariant *dep; bitmap_iterator bi; + int ret = 2; /* Find the representative of the class of the equivalent invariants. */ inv = invariants[inv->eqto]; @@ -1117,7 +1123,7 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) if (inv->move || inv->stamp == actual_stamp) -return; +return 0; inv->stamp = actual_stamp; if (! flag_ira_loop_pressure) @@ -1129,6 +1135,8 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) pressure_class = get_pressure_class_and_nregs (inv->insn, &nregs); regs_needed[pressure_class] += nregs; + *cl = pressure_class; + ret = 1; } if (!inv->cheap_address @@ -1169,6 +1177,8 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) EXECUTE_IF_SET_IN_BITMAP (inv->depends_on, 0, depno, bi) { bool check_p; + enum reg_class dep_cl = ALL_REGS; + int dep_ret; dep = invariants[depno]; @@ -1176,7 +1186,7 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) if (dep->move) continue; - get_inv_cost (dep, &acomp_cost, aregs_needed); + dep_ret = get_inv_cost (dep, &acomp_cost, aregs_needed, &dep_cl); if (! flag_ira_loop_pressure) check_p = aregs_needed[0] != 0; @@ -1186,6 +1196,12 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) if (aregs_needed[ira_pressure_classes[i]] != 0) break; check_p = i < ira_pressure_classes_num; + + if ((dep_ret > 1) || ((dep_ret == 1) && (*cl != dep_cl))) + { + *cl = ALL_REGS; + ret ++; + } } if (check_p /* We need to check always_executed, since if the original value of @@ -1219,6 +1235,7 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) } (*comp_cost) += acomp_cost; } + return ret; } /* Calculates gain for eliminating invariant INV. REGS_USED is the number @@ -1233,10 +1250,12 @@ gain_for_invariant (struct invariant *inv, unsigned *regs_needed, bool speed, bool call_p) { int comp_cost, size_cost; + enum reg_class cl; + int ret; actual_stamp++; - get_inv_cost (inv, &comp_cost, regs_needed); + ret = get_inv_cost (inv, &comp_cost, regs_needed, &cl); if (! flag_ira_loop_pressure) { @@ -1245,6 +1264,11 @@ gain_for_invariant (struct invariant *inv, unsigned *regs_needed, - estimate_reg_pressure_cost (new_regs[0], regs_used, speed, call_p)); } + else if (ret == 0) +return -1; + else if ((ret == 1) && (cl == NO_REGS)) +/* Hoist it anyway since it does not impact register pressure. */ +return 1; else { int i; @@ -1253,6 +1277,10 @@ gain_for_invariant (struct invariant *inv, unsigned *regs_needed, for (i = 0; i < ira_pressure_classes_num; i++) { pressure_class = ira_pressure_classes[i]; + + if (!reg_classes_intersect_p (pressure_class, cl)) +
[PATCH, cprop] Check rtx_cost when propagating constant
Hi, For some large constant, ports like ARM, need one more instructions to operate it. e.g #define MASK 0xfe00ff void maskdata (int * data, int len) { int i = len; for (; i > 0; i -= 2) { data[i] &= MASK; data[i + 1] &= MASK; } } Need two instructions for each AND operation: andr3, r3, #16711935 bicr3, r3, #65536 If we keep the MASK in a register, loop2_invariant pass can hoist it out the loop. And it can be shared by different references. So the patch skips constant propagation if it makes INSN's cost higher. Bootstrap and no make check regression on X86-64 and ARM Chrome book. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-17 Zhenqiang Chen * cprop.c (try_replace_reg): Check cost for constants. diff --git a/gcc/cprop.c b/gcc/cprop.c index aef3ee8..c9cf02a 100644 --- a/gcc/cprop.c +++ b/gcc/cprop.c @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn) rtx src = 0; int success = 0; rtx set = single_set (insn); + int old_cost = 0; + bool copy_p = false; + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); + + if (set && SET_SRC (set) && REG_P (SET_SRC (set))) +copy_p = true; + else +old_cost = set_rtx_cost (set, speed); /* Usually we substitute easy stuff, so we won't copy everything. We however need to take care to not duplicate non-trivial CONST @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn) to = copy_rtx (to); validate_replace_src_group (from, to, insn); + + /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop. + And it can be shared by different references. So skip propagation if + it makes INSN's rtx cost higher. */ + if (set && !copy_p && CONSTANT_P (to)) +{ + int new_cost = set_rtx_cost (set, speed); + if (new_cost > old_cost) + { + cancel_changes (0); + return false; + } +} + if (num_changes_pending () && apply_change_group ()) success = 1;
[Ping ^ 3] [PATCH] Fix PR 61225
Ping? Thanks! -Zhenqiang On 9 June 2014 17:08, Zhenqiang Chen wrote: > Ping ^2? > > Thanks! > -Zhenqiang > > On 28 May 2014 15:02, Zhenqiang Chen wrote: >> Ping? >> >> Thanks! >> -Zhenqiang >> >> On 22 May 2014 17:52, Zhenqiang Chen wrote: >>> On 21 May 2014 20:43, Steven Bosscher wrote: >>>> On Wed, May 21, 2014 at 11:58 AM, Zhenqiang Chen wrote: >>>>> Hi, >>>>> >>>>> The patch fixes the gcc.target/i386/pr49095.c FAIL in PR61225. The >>>>> test case tends to check a peephole2 optimization, which optimizes the >>>>> following sequence >>>>> >>>>> 2: bx:SI=ax:SI >>>>> 25: ax:SI=[bx:SI] >>>>> 7: {ax:SI=ax:SI-0x1;clobber flags:CC;} >>>>> 8: [bx:SI]=ax:SI >>>>> 9: flags:CCZ=cmp(ax:SI,0) >>>>> to >>>>>2: bx:SI=ax:SI >>>>>41: {flags:CCZ=cmp([bx:SI]-0x1,0);[bx:SI]=[bx:SI]-0x1;} >>>>> >>>>> The enhanced shrink-wrapping, which calls copyprop_hardreg_forward >>>>> changes the INSN 25 to >>>>> >>>>> 25: ax:SI=[ax:SI] >>>>> >>>>> Then peephole2 can not optimize it since two memory_operands look like >>>>> different. >>>>> >>>>> To fix it, the patch adds another peephole2 rule to read one more >>>>> insn. From the register copy, it knows the address is the same. >>>> >>>> That is one complex peephole2 to deal with a transformation like this. >>>> It seems to be like it's a too specific solution for a bigger problem. >>>> >>>> Could you please try one of the following solutions instead: >>>> >>>> 1. Track register values for peephole2 and try different alternatives >>>> based on known register equivalences? E.g. in your example, perhaps >>>> there is already a REG_EQUAL/REG_EQUIV note available on insn 25 after >>>> copyprop_hardreg_forward, to annotate that [ax:SI] is equivalent to >>>> [bx:SI] at that point (or if that information is not available, it is >>>> not very difficult to make it available). Then you could try applying >>>> peephole2 on the original pattern but also on patterns modified with >>>> the known equivalences (i.e. try peephole2 on multiple equivalent >>>> patterns for the same insn). This may expose other peephole2 >>>> opportunities, not just the specific one your patch addresses. >>> >>> Patch is updated according to the comment. There is no REG_EQUAL. So I >>> add it when replace_oldest_value_reg. >>> >>> ChangeLog: >>> 2014-05-22 Zhenqiang Chen >>> >>> Part of PR rtl-optimization/61225 >>> * config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p): New proto. >>> * config/i386/i386.c (ix86_peephole2_rtx_equal_p): New function. >>> * regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note when >>> propagating to SET. >>> >>> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h >>> index 39462bd..0c4a2b9 100644 >>> --- a/gcc/config/i386/i386-protos.h >>> +++ b/gcc/config/i386/i386-protos.h >>> @@ -42,6 +42,7 @@ extern enum calling_abi ix86_function_type_abi >>> (const_tree); >>> >>> extern void ix86_reset_previous_fndecl (void); >>> >>> +extern bool ix86_peephole2_rtx_equal_p (rtx, rtx, rtx, rtx); >>> #ifdef RTX_CODE >>> extern int standard_80387_constant_p (rtx); >>> extern const char *standard_80387_constant_opcode (rtx); >>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >>> index 6ffb788..583ebe8 100644 >>> --- a/gcc/config/i386/i386.c >>> +++ b/gcc/config/i386/i386.c >>> @@ -46856,6 +46856,29 @@ ix86_atomic_assign_expand_fenv (tree *hold, >>> tree *clear, tree *update) >>> atomic_feraiseexcept_call); >>> } >>> >>> +/* OP0 is the SET_DEST of INSN and OP1 is the SET_SRC of INSN. >>> + Check whether OP1 and OP6 is equal. */ >>> + >>> +bool >>> +ix86_peephole2_rtx_equal_p (rtx insn, rtx op0, rtx op1, rtx op6) >>> +{ >>> + rtx note; >>> + >>> + if (!reg_overlap_mentioned_p (op0, op1) && rtx_equal_p (op1, op6)) >>> +return true; >>> + >>> + gcc_assert (single_set (insn) >>
Re: [PATCH, cprop] Check rtx_cost when propagating constant
On 17 June 2014 16:15, Richard Biener wrote: > On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen > wrote: >> Hi, >> >> For some large constant, ports like ARM, need one more instructions to >> operate it. e.g >> >> #define MASK 0xfe00ff >> void maskdata (int * data, int len) >> { >>int i = len; >>for (; i > 0; i -= 2) >> { >> data[i] &= MASK; >> data[i + 1] &= MASK; >> } >> } >> >> Need two instructions for each AND operation: >> >> andr3, r3, #16711935 >> bicr3, r3, #65536 >> >> If we keep the MASK in a register, loop2_invariant pass can hoist it >> out the loop. And it can be shared by different references. >> >> So the patch skips constant propagation if it makes INSN's cost higher. > > So cprop undos invariant motions work here? Yes. GLOBAL CONST-PROP will undo invariant motions. > Should we make sure we add a REG_EQUAL note when not propagating? Logs show there already has REG_EQUAL note. >> Bootstrap and no make check regression on X86-64 and ARM Chrome book. >> >> OK for trunk? >> >> Thanks! >> -Zhenqiang >> >> ChangeLog: >> 2014-06-17 Zhenqiang Chen >> >> * cprop.c (try_replace_reg): Check cost for constants. >> >> diff --git a/gcc/cprop.c b/gcc/cprop.c >> index aef3ee8..c9cf02a 100644 >> --- a/gcc/cprop.c >> +++ b/gcc/cprop.c >> @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn) >>rtx src = 0; >>int success = 0; >>rtx set = single_set (insn); >> + int old_cost = 0; >> + bool copy_p = false; >> + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); >> + >> + if (set && SET_SRC (set) && REG_P (SET_SRC (set))) >> +copy_p = true; >> + else >> +old_cost = set_rtx_cost (set, speed); > > Looks bogus for set == NULL? set_rtx_cost has checked it. If it is NULL, the function will return 0; > Also what about register pressure? Do you think it has big register pressure impact? I think it does not increase register pressure. > I think this kind of change needs wider testing as RTX costs are > usually not fully implemented and you introduce a new use kind > (or is it already used elsewhere in this way to compute cost > difference of a set with s/reg/const?). Passes like fwprop, cse, auto_inc_dec, uses RTX costs to make the decision. e.g. in function attempt_change of auto-inc-dec.c, it has code segments like: old_cost = (set_src_cost (mem, speed) + set_rtx_cost (PATTERN (inc_insn.insn), speed)); new_cost = set_src_cost (mem_tmp, speed); ... if (old_cost < new_cost) { ... return false; } The usage of RTX costs in this patch is similar. I had run X86-64 bootstrap and regression tests with --enable-languages=c,c++,lto,fortran,go,ada,objc,obj-c++,java And ARM bootstrap and regression tests with --enable-languages=c,c++,fortran,lto,objc,obj-c++ I will run tests on i686. What other tests do you think I have to run? > What kind of performance difference do you see? I had run coremark, dhrystone, eembc on ARM Cortex-M4 (with some arm backend changes). Coremark with some options show >10% performance improvement. dhrystone is a little better. Some wave in eembc, but overall result is better. I will run spec2000 on X86-64 and ARM, and back to you about the performance changes. Thanks! -Zhenqiang > Thanks, > Richard. > >>/* Usually we substitute easy stuff, so we won't copy everything. >> We however need to take care to not duplicate non-trivial CONST >> @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn) >>to = copy_rtx (to); >> >>validate_replace_src_group (from, to, insn); >> + >> + /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop. >> + And it can be shared by different references. So skip propagation if >> + it makes INSN's rtx cost higher. */ >> + if (set && !copy_p && CONSTANT_P (to)) >> +{ >> + int new_cost = set_rtx_cost (set, speed); >> + if (new_cost > old_cost) >> + { >> + cancel_changes (0); >> + return false; >> + } >> +} >> + >>if (num_changes_pending () && apply_change_group ()) >> success = 1;
Re: [PATCH, loop2_invariant, 1/2] Check only one register class
On 18 June 2014 05:49, Jeff Law wrote: > On 06/11/14 04:05, Zhenqiang Chen wrote: >> >> On 10 June 2014 19:06, Steven Bosscher wrote: >>> >>> On Tue, Jun 10, 2014 at 11:22 AM, Zhenqiang Chen wrote: >>>> >>>> Hi, >>>> >>>> For loop2-invariant pass, when flag_ira_loop_pressure is enabled, >>>> function gain_for_invariant checks the pressures of all register >>>> classes. This does not make sense since one invariant might impact >>>> only one register class. >>>> >>>> The patch enhances functions get_inv_cost and gain_for_invariant to >>>> check only the register pressure of the invariant if possible. >>> >>> >>> This patch may work for targets with more-or-less orthogonal reg >>> classes, but not if there is a lot of overlap between reg classes. >> >> >> Yes. I need check the overlap between reg classes. >> >> Patch is updated to check all overlap reg classes by >> reg_classes_intersect_p: > > Just so I'm sure I know what you're trying to do. > > You want to map the pseudo back to its likely class(es) then look at how > those classes (and only those classes) would be impacted from a register > pressure standpoint if the pseudo was hoisted as an invariant? Yes. > This is primarily achieved by returning the class of the invariant, then > filtering out any non-intersecting classes in gain_for_invariant, right? Yes. This is what I want to do since I found some invariant which register class is NO_REGS (memory write) or SSE_REGS is blocked by GENERAL_REGS' register pressure. Thanks! -Zhenqiang
Re: [PATCH, loop2_invariant, 2/2] Change heuristics for identical invariants
On 10 June 2014 19:16, Steven Bosscher wrote: > On Tue, Jun 10, 2014 at 11:23 AM, Zhenqiang Chen wrote: >> * loop-invariant.c (struct invariant): Add a new member: eqno; >> (find_identical_invariants): Update eqno; >> (create_new_invariant): Init eqno; >> (get_inv_cost): Compute comp_cost wiht eqno; >> (gain_for_invariant): Take spill cost into account. > > Look OK except ... > >> @@ -1243,7 +1256,13 @@ gain_for_invariant (struct invariant *inv, >> unsigned *regs_needed, >> + IRA_LOOP_RESERVED_REGS >> - ira_class_hard_regs_num[cl]; >>if (size_cost > 0) >> - return -1; >> + { >> + int spill_cost = target_spill_cost [speed] * (int) regs_needed[cl]; >> + if (comp_cost <= spill_cost) >> + return -1; >> + >> + return 2; >> + } >>else >> size_cost = 0; >> } > > ... why "return 2", instead of just falling through to "return > comp_cost - size_cost;"? Thanks for the comments. Updated. As your comments for the previous patch, I should also check the overlap between reg classes. So I change the logic to check spill cost. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 6e43b49..af0c95b 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -104,6 +104,9 @@ struct invariant /* The number of the invariant with the same value. */ unsigned eqto; + /* The number of invariants which eqto this. */ + unsigned eqno; + /* If we moved the invariant out of the loop, the register that contains its value. */ rtx reg; @@ -498,6 +501,7 @@ find_identical_invariants (invariant_htab_type eq, struct invariant *inv) struct invariant *dep; rtx expr, set; enum machine_mode mode; + struct invariant *tmp; if (inv->eqto != ~0u) return; @@ -513,7 +517,12 @@ find_identical_invariants (invariant_htab_type eq, struct invariant *inv) mode = GET_MODE (expr); if (mode == VOIDmode) mode = GET_MODE (SET_DEST (set)); - inv->eqto = find_or_insert_inv (eq, expr, mode, inv)->invno; + + tmp = find_or_insert_inv (eq, expr, mode, inv); + inv->eqto = tmp->invno; + + if (tmp->invno != inv->invno && inv->always_executed) +tmp->eqno++; if (dump_file && inv->eqto != inv->invno) fprintf (dump_file, @@ -725,6 +734,10 @@ create_new_invariant (struct def *def, rtx insn, bitmap depends_on, inv->invno = invariants.length (); inv->eqto = ~0u; + + /* Itself. */ + inv->eqno = 1; + if (def) def->invno = inv->invno; invariants.safe_push (inv); @@ -1141,7 +1154,7 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed, if (!inv->cheap_address || inv->def->n_addr_uses < inv->def->n_uses) -(*comp_cost) += inv->cost; +(*comp_cost) += inv->cost * inv->eqno; #ifdef STACK_REGS { @@ -1249,7 +1262,7 @@ gain_for_invariant (struct invariant *inv, unsigned *regs_needed, unsigned *new_regs, unsigned regs_used, bool speed, bool call_p) { - int comp_cost, size_cost; + int comp_cost, size_cost = 0; enum reg_class cl; int ret; @@ -1273,6 +1286,8 @@ gain_for_invariant (struct invariant *inv, unsigned *regs_needed, { int i; enum reg_class pressure_class; + int spill_cost = 0; + int base_cost = target_spill_cost [speed]; for (i = 0; i < ira_pressure_classes_num; i++) { @@ -1286,30 +1301,13 @@ gain_for_invariant (struct invariant *inv, unsigned *regs_needed, + LOOP_DATA (curr_loop)->max_reg_pressure[pressure_class] + IRA_LOOP_RESERVED_REGS > ira_class_hard_regs_num[pressure_class]) - break; + { + spill_cost += base_cost * (int) regs_needed[pressure_class]; + size_cost = -1; + } } - if (i < ira_pressure_classes_num) - /* There will be register pressure excess and we want not to - make this loop invariant motion. All loop invariants with - non-positive gains will be rejected in function - find_invariants_to_move. Therefore we return the negative - number here. - - One could think that this rejects also expensive loop - invariant motions and this will hurt code performance. - However numerous experiments with different heuristics - taking invariant cost into account did not confirm this - assumption. There are possible explanations for this - result: - o probably all expensive invariants were already moved out - of the loop by PRE and gimple invariant motion pass. - o expensive
Re: [PATCH, cprop] Check rtx_cost when propagating constant
On 17 June 2014 17:42, Zhenqiang Chen wrote: > On 17 June 2014 16:15, Richard Biener wrote: >> On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen >> wrote: >>> Hi, >>> >>> For some large constant, ports like ARM, need one more instructions to >>> operate it. e.g >>> >>> #define MASK 0xfe00ff >>> void maskdata (int * data, int len) >>> { >>>int i = len; >>>for (; i > 0; i -= 2) >>> { >>> data[i] &= MASK; >>> data[i + 1] &= MASK; >>> } >>> } >>> >>> Need two instructions for each AND operation: >>> >>> andr3, r3, #16711935 >>> bicr3, r3, #65536 >>> >>> If we keep the MASK in a register, loop2_invariant pass can hoist it >>> out the loop. And it can be shared by different references. >>> >>> So the patch skips constant propagation if it makes INSN's cost higher. >> >> So cprop undos invariant motions work here? > > Yes. GLOBAL CONST-PROP will undo invariant motions. > >> Should we make sure we add a REG_EQUAL note when not propagating? > > Logs show there already has REG_EQUAL note. > >>> Bootstrap and no make check regression on X86-64 and ARM Chrome book. >>> >>> OK for trunk? >>> >>> Thanks! >>> -Zhenqiang >>> >>> ChangeLog: >>> 2014-06-17 Zhenqiang Chen >>> >>> * cprop.c (try_replace_reg): Check cost for constants. >>> >>> diff --git a/gcc/cprop.c b/gcc/cprop.c >>> index aef3ee8..c9cf02a 100644 >>> --- a/gcc/cprop.c >>> +++ b/gcc/cprop.c >>> @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn) >>>rtx src = 0; >>>int success = 0; >>>rtx set = single_set (insn); >>> + int old_cost = 0; >>> + bool copy_p = false; >>> + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); >>> + >>> + if (set && SET_SRC (set) && REG_P (SET_SRC (set))) >>> +copy_p = true; >>> + else >>> +old_cost = set_rtx_cost (set, speed); >> >> Looks bogus for set == NULL? > > set_rtx_cost has checked it. If it is NULL, the function will return 0; > >> Also what about register pressure? > > Do you think it has big register pressure impact? I think it does not > increase register pressure. > >> I think this kind of change needs wider testing as RTX costs are >> usually not fully implemented and you introduce a new use kind >> (or is it already used elsewhere in this way to compute cost >> difference of a set with s/reg/const?). > > Passes like fwprop, cse, auto_inc_dec, uses RTX costs to make the > decision. e.g. in function attempt_change of auto-inc-dec.c, it has > code segments like: > > old_cost = (set_src_cost (mem, speed) > + set_rtx_cost (PATTERN (inc_insn.insn), speed)); > new_cost = set_src_cost (mem_tmp, speed); > ... > if (old_cost < new_cost) > { > ... > return false; > } > > The usage of RTX costs in this patch is similar. > > I had run X86-64 bootstrap and regression tests with > --enable-languages=c,c++,lto,fortran,go,ada,objc,obj-c++,java > > And ARM bootstrap and regression tests with > --enable-languages=c,c++,fortran,lto,objc,obj-c++ > > I will run tests on i686. What other tests do you think I have to run? > >> What kind of performance difference do you see? > > I had run coremark, dhrystone, eembc on ARM Cortex-M4 (with some arm > backend changes). Coremark with some options show >10% performance > improvement. dhrystone is a little better. Some wave in eembc, but > overall result is better. > > I will run spec2000 on X86-64 and ARM, and back to you about the > performance changes. Please ignore my previous comments about Cortex-M4 performance since it does not base on clean codes. Here is a summary for performance result on X86-64 and ARM. For X86-64, I run SPEC2000 INT and FP (-O3). There is no improvement or regression. As tests, I moved the code segment to end of function try_replace_reg and check insns which meet "success && new_cost > old_cost". Logs show only 52 occurrences for all SPEC2000 build and the only one instruction pattern: *adddi_1 is impacted. For *adddi_1, rtx_cost increases from 8 to 10 when changing a register operand to a constant. For ARM Cortex-M4, minimal changes for Coremark, Dhrystone and EEMBC. For ARM Chrome book (Cortex-A15), some wave in SPEC2000 INT test. But the final result does not show impro
[Committed] [PATCH, loop2_invariant] Pre-check invariants
On 18 June 2014 05:32, Jeff Law wrote: > On 06/11/14 03:35, Zhenqiang Chen wrote: >> >> >> Thanks for the comments. df_live seams redundant. >> >> With flag_ira_loop_pressure, the pass will call df_analyze () at the >> beginning, which can make sure all the DF info are correct. >> >> Can we guarantee all DF_... correct without df_analyze ()? > > They should be fine in this context. > > > >> +/* Pre-check candidate DEST to skip the one which can not make a valid >> insn >> + during move_invariant_reg. SIMPlE is to skip HARD_REGISTER. */ > > s/SIMPlE/SIMPLE/ > > > >> + { >> + /* Multi definitions at this stage, most likely are due to >> + instruction constrain, which requires both read and >> write > > s/constrain/constraints/ > > Though that doesn't make sense. Constraints don't come into play until much > later in the pipeline. Certainly there's been code in the expanders and > elsewhere to try and make the code we generate more acceptable to 2-address > targets and that's probably what you're really running into. I think the > code is fine, but that you need to improve the comment. > > ISTM that if your primary focus is to filter out read/write operands, then > just say that and ignore the constraints or other mechanisms by which we got > a read/write pseudo. > > So I think with those two small comment changes, this patch is OK for the > trunk. Please post the final version for archival purposes before checking > it in. Thanks. Patch was installed @r211885 with the two comment changes and one more change to use FOR_EACH_INSN_INFO_DEF other than for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++) Bootstrap and no make check regression on X86-64. Index: gcc/loop-invariant.c === --- gcc/loop-invariant.c(revision 211832) +++ gcc/loop-invariant.c(working copy) @@ -839,6 +839,39 @@ return true; } +/* Pre-check candidate DEST to skip the one which can not make a valid insn + during move_invariant_reg. SIMPLE is to skip HARD_REGISTER. */ +static bool +pre_check_invariant_p (bool simple, rtx dest) +{ + if (simple && REG_P (dest) && DF_REG_DEF_COUNT (REGNO (dest)) > 1) +{ + df_ref use; + rtx ref; + unsigned int i = REGNO (dest); + struct df_insn_info *insn_info; + df_ref def_rec; + + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use)) + { + ref = DF_REF_INSN (use); + insn_info = DF_INSN_INFO_GET (ref); + + FOR_EACH_INSN_INFO_DEF (def_rec, insn_info) + if (DF_REF_REGNO (def_rec) == i) + { + /* Multi definitions at this stage, most likely are due to + instruction constraints, which requires both read and write + on the same register. Since move_invariant_reg is not + powerful enough to handle such cases, just ignore the INV + and leave the chance to others. */ + return false; + } + } +} + return true; +} + /* Finds invariant in INSN. ALWAYS_REACHED is true if the insn is always executed. ALWAYS_EXECUTED is true if the insn is always executed, unless the program ends due to a function call. */ @@ -868,7 +901,8 @@ || HARD_REGISTER_P (dest)) simple = false; - if (!may_assign_reg_p (SET_DEST (set)) + if (!may_assign_reg_p (dest) + || !pre_check_invariant_p (simple, dest) || !check_maybe_invariant (SET_SRC (set))) return;
[PATCH, 1/10] two hooks for conditional compare (ccmp)
Hi, The patch adds two hooks for backends to generate conditional compare instructions. * gen_ccmp_first is for the first compare. * gen_ccmp_next is for for the following compares. The patch is separated from https://gcc.gnu.org/ml/gcc-patches/2014-02/msg01407.html. And the original discussion about the hooks was in thread: https://gcc.gnu.org/ml/gcc-patches/2013-10/msg02601.html OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-23 Zhenqiang Chen * doc/md.texi (ccmp): Add description about conditional compare instruction pattern. (TARGET_GEN_CCMP_FIRST): Define. (TARGET_GEN_CCMP_NEXT): Define. * doc/tm.texi.in (TARGET_GEN_CCMP_FIRST, TARGET_GEN_CCMP_NEXT): New. * target.def (gen_ccmp_first, gen_ccmp_first): Add two new hooks. diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index e17ffca..988c288 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -6216,6 +6216,42 @@ A typical @code{ctrap} pattern looks like "@dots{}") @end smallexample +@cindex @code{ccmp} instruction pattern +@item @samp{ccmp} +Conditional compare instruction. Operand 2 and 5 are RTLs which perform +two comparisons. Operand 1 is AND or IOR, which operates on the result of +operand 2 and 5. +It uses recursive method to support more than two compares. e.g. + + CC0 = CMP (a, b); + CC1 = CCMP (NE (CC0, 0), CMP (e, f)); + ... + CCn = CCMP (NE (CCn-1, 0), CMP (...)); + +Two target hooks are used to generate conditional compares. GEN_CCMP_FISRT +is used to generate the first CMP. And GEN_CCMP_NEXT is used to generate the +following CCMPs. Operand 1 is AND or IOR. Operand 3 is the result of +GEN_CCMP_FISRT or a previous GEN_CCMP_NEXT. Operand 2 is NE. +Operand 4, 5 and 6 is another compare expression. + +A typical CCMP pattern looks like + +@smallexample +(define_insn "*ccmp_and_ior" + [(set (match_operand 0 "dominant_cc_register" "") +(compare + (match_operator 1 + (match_operator 2 "comparison_operator" + [(match_operand 3 "dominant_cc_register") +(const_int 0)]) + (match_operator 4 "comparison_operator" + [(match_operand 5 "register_operand") +(match_operand 6 "compare_operand"])) + (const_int 0)))] + "" + "@dots{}") +@end smallexample + @cindex @code{prefetch} instruction pattern @item @samp{prefetch} diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index c272630..93f7c74 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11021,6 +11021,23 @@ This target hook is required only when the target has several different modes and they have different conditional execution capability, such as ARM. @end deftypefn +@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_FIRST (int @var{code}, rtx @var{op0}, rtx @var{op1}) +This function emits a comparison insn for the first of a sequence of + conditional comparisions. It returns a comparison expression appropriate + for passing to @code{gen_ccmp_next} or to @code{cbranch_optab}. + @code{unsignedp} is used when converting @code{op0} and @code{op1}'s mode. +@end deftypefn + +@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_NEXT (rtx @var{prev}, int @var{cmp_code}, rtx @var{op0}, rtx @var{op1}, int @var{bit_code}) +This function emits a conditional comparison within a sequence of + conditional comparisons. The @code{prev} expression is the result of a + prior call to @code{gen_ccmp_first} or @code{gen_ccmp_next}. It may return + @code{NULL} if the combination of @code{prev} and this comparison is + not supported, otherwise the result must be appropriate for passing to + @code{gen_ccmp_next} or @code{cbranch_optab}. @code{bit_code} + is AND or IOR, which is the op on the two compares. +@end deftypefn + @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST (unsigned @var{nunroll}, struct loop *@var{loop}) This target hook returns a new value for the number of times @var{loop} should be unrolled. The parameter @var{nunroll} is the number of times diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index dd72b98..e49f8f5 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -8107,6 +8107,10 @@ build_type_attribute_variant (@var{mdecl}, @hook TARGET_HAVE_CONDITIONAL_EXECUTION +@hook TARGET_GEN_CCMP_FIRST + +@hook TARGET_GEN_CCMP_NEXT + @hook TARGET_LOOP_UNROLL_ADJUST @defmac POWI_MAX_MULTS diff --git a/gcc/target.def b/gcc/target.def index e455211..6bbb907 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2320,6 +2320,27 @@ modes and they have different conditional execution capability, such as ARM.", bool, (void), default_have_conditional_execution) +DEFHOOK +(gen_ccmp_first, + "This function emits a comparison insn for the first of a sequence of\n\ + conditional comparisions. It returns a comparison expression appropriate\n\ + for passing to @code{gen_ccmp_next} or to @code{cbranch_optab}.\
[PATCH, 2/10] prepare ccmp
Hi, The patch makes several functions global, which will be used when expanding ccmp instructions. The other change in this patch is to check CCMP when turning code into jumpy sequence. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-23 Zhenqiang Chen * cfgexpand.c (expand_gimple_cond): Check conditional compare. * expmed.c (emit_cstore): Make it global. * expmed.h: #include "insn-codes.h". (emit_cstore): New prototype. * expr.c (expand_operands): Make it global. * expr.h (expand_operands): New prototype. * optabs.c (get_rtx_code): Make it global and return CODE for BIT_AND_EXPR and BIT_IOR_EXPR. * optabs.h (get_rtx_code): New prototype. diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index e8cd87f..a32e1b3 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -2095,9 +2095,10 @@ expand_gimple_cond (basic_block bb, gimple stmt) op0 = gimple_assign_rhs1 (second); op1 = gimple_assign_rhs2 (second); } - /* If jumps are cheap turn some more codes into -jumpy sequences. */ - else if (BRANCH_COST (optimize_insn_for_speed_p (), false) < 4) + /* If jumps are cheap and the target does not support conditional +compare, turn some more codes into jumpy sequences. */ + else if (BRANCH_COST (optimize_insn_for_speed_p (), false) < 4 + && (targetm.gen_ccmp_first == NULL)) { if ((code2 == BIT_AND_EXPR && TYPE_PRECISION (TREE_TYPE (op0)) == 1 diff --git a/gcc/expmed.c b/gcc/expmed.c index e76b6fc..c8d63a9 100644 --- a/gcc/expmed.c +++ b/gcc/expmed.c @@ -5105,7 +5105,7 @@ expand_and (enum machine_mode mode, rtx op0, rtx op1, rtx target) } /* Helper function for emit_store_flag. */ -static rtx +rtx emit_cstore (rtx target, enum insn_code icode, enum rtx_code code, enum machine_mode mode, enum machine_mode compare_mode, int unsignedp, rtx x, rtx y, int normalizep, diff --git a/gcc/expmed.h b/gcc/expmed.h index 4d01d1f..a567bad 100644 --- a/gcc/expmed.h +++ b/gcc/expmed.h @@ -20,6 +20,8 @@ along with GCC; see the file COPYING3. If not see #ifndef EXPMED_H #define EXPMED_H 1 +#include "insn-codes.h" + enum alg_code { alg_unknown, alg_zero, @@ -665,4 +667,9 @@ convert_cost (enum machine_mode to_mode, enum machine_mode from_mode, } extern int mult_by_coeff_cost (HOST_WIDE_INT, enum machine_mode, bool); + +extern rtx emit_cstore (rtx target, enum insn_code icode, enum rtx_code code, +enum machine_mode mode, enum machine_mode compare_mode, +int unsignedp, rtx x, rtx y, int normalizep, +enum machine_mode target_mode); #endif diff --git a/gcc/expr.c b/gcc/expr.c index 512c024..04cf56e 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -146,8 +146,6 @@ static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT, static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree); static int is_aligning_offset (const_tree, const_tree); -static void expand_operands (tree, tree, rtx, rtx*, rtx*, -enum expand_modifier); static rtx reduce_to_bit_field_precision (rtx, rtx, tree); static rtx do_store_flag (sepops, rtx, enum machine_mode); #ifdef PUSH_ROUNDING @@ -7496,7 +7494,7 @@ convert_tree_comp_to_rtx (enum tree_code tcode, int unsignedp) The value may be stored in TARGET if TARGET is nonzero. The MODIFIER argument is as documented by expand_expr. */ -static void +void expand_operands (tree exp0, tree exp1, rtx target, rtx *op0, rtx *op1, enum expand_modifier modifier) { diff --git a/gcc/expr.h b/gcc/expr.h index 6a1d3ab..66ca82f 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -787,4 +787,6 @@ extern bool categorize_ctor_elements (const_tree, HOST_WIDE_INT *, by EXP. This does not include any offset in DECL_FIELD_BIT_OFFSET. */ extern tree component_ref_field_offset (tree); +extern void expand_operands (tree, tree, rtx, rtx*, rtx*, +enum expand_modifier); #endif /* GCC_EXPR_H */ diff --git a/gcc/optabs.c b/gcc/optabs.c index ca1c194..25aff1a 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -6453,7 +6453,7 @@ gen_cond_trap (enum rtx_code code, rtx op1, rtx op2, rtx tcode) /* Return rtx code for TCODE. Use UNSIGNEDP to select signed or unsigned operation code. */ -static enum rtx_code +enum rtx_code get_rtx_code (enum tree_code tcode, bool unsignedp) { enum rtx_code code; @@ -6503,6 +6503,12 @@ get_rtx_code (enum tree_code tcode, bool unsignedp) code = LTGT; break; +case BIT_AND_EXPR: + code = AND; + break; +case BIT_IOR_EXPR: + code = IOR; + break; default: gcc_unreachable (); } diff --git a/gcc/optabs.h b/gcc/optabs.h index 089b15a..61be4e2 100644 --- a/gcc/
[PATCH, 3/10] skip swapping operands used in ccmp
Hi, Swapping operands in a ccmp will lead to illegal instructions. So the patch disables it in simplify_while_replacing. The patch is separated from https://gcc.gnu.org/ml/gcc-patches/2014-02/msg01407.html. To make it clean. The patch adds two files: ccmp.{c,h} to hold all new ccmp related functions. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-23 Zhenqiang Chen * Makefile.in: Add ccmp.o * ccmp.c: New file. * ccmp.h: New file. * recog.c (simplify_while_replacing): Check ccmp_insn_p. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 5587b75..8757a30 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1169,6 +1169,7 @@ OBJS = \ builtins.o \ caller-save.o \ calls.o \ + ccmp.o \ cfg.o \ cfganal.o \ cfgbuild.o \ diff --git a/gcc/ccmp.c b/gcc/ccmp.c new file mode 100644 index 000..665c2a5 --- /dev/null +++ b/gcc/ccmp.c @@ -0,0 +1,62 @@ +/* Conditional compare related functions + Copyright (C) 2014-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "rtl.h" +#include "tree.h" +#include "stringpool.h" +#include "regs.h" +#include "expr.h" +#include "optabs.h" +#include "tree-iterator.h" +#include "basic-block.h" +#include "tree-ssa-alias.h" +#include "internal-fn.h" +#include "gimple-expr.h" +#include "is-a.h" +#include "gimple.h" +#include "gimple-ssa.h" +#include "tree-ssanames.h" +#include "target.h" +#include "common/common-target.h" +#include "df.h" +#include "tree-ssa-live.h" +#include "tree-outof-ssa.h" +#include "cfgexpand.h" +#include "tree-phinodes.h" +#include "ssa-iterators.h" +#include "expmed.h" +#include "ccmp.h" + +bool +ccmp_insn_p (rtx object) +{ + rtx x = PATTERN (object); + if (targetm.gen_ccmp_first + && GET_CODE (x) == SET + && GET_CODE (XEXP (x, 1)) == COMPARE + && (GET_CODE (XEXP (XEXP (x, 1), 0)) == IOR + || GET_CODE (XEXP (XEXP (x, 1), 0)) == AND)) +return true; + return false; +} + diff --git a/gcc/ccmp.h b/gcc/ccmp.h new file mode 100644 index 000..7e139aa --- /dev/null +++ b/gcc/ccmp.h @@ -0,0 +1,25 @@ +/* Conditional comapre related functions. + Copyright (C) 2014-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version.: + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#ifndef GCC_CCMP_H +#define GCC_CCMP_H + +extern bool ccmp_insn_p (rtx); + +#endif /* GCC_CCMP_H */ diff --git a/gcc/recog.c b/gcc/recog.c index 8d10a4f..b53a28c 100644 --- a/gcc/recog.c +++ b/gcc/recog.c @@ -40,6 +40,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "df.h" #include "insn-codes.h" +#include "ccmp.h" #ifndef STACK_PUSH_CODE #ifdef STACK_GROWS_DOWNWARD @@ -577,7 +578,8 @@ simplify_while_replacing (rtx *loc, rtx to, rtx object, enum rtx_code code = GET_CODE (x); rtx new_rtx = NULL_RTX; - if (SWAPPABLE_OPERANDS_P (x) + /* Do not swap compares in conditional compare instruction. */ + if (SWAPPABLE_OPERANDS_P (x) && !ccmp_insn_p (object) && swap_commutative_operands_p (XEXP (x, 0), XEXP (x, 1))) { validate_unshare_change (object, loc,
[PATCH, 4/10] expand ccmp
Hi, This patch includes the main logic to expand ccmp instructions. In the patch, * ccmp_candidate_p is used to identify the CCMP candidate * expand_ccmp_expr is the main entry, which calls expand_ccmp_expr_1 to expand CCMP. * expand_ccmp_expr_1 uses a recursive algorithm to expand CCMP. It calls gen_ccmp_first and gen_ccmp_next to generate CCMP instructions. During expanding, we must make sure that no instruction can clobber the CC reg except the compares. So clobber_cc_p and check_clobber_cc are introduced to do the check. * If the final result is not used in a COND_EXPR (checked by function used_in_cond_stmt_p), it calls cstorecc4 pattern to store the CC to a general register. Bootstrap and no make check regression on X86-64. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-23 Zhenqiang Chen * ccmp.c (ccmp_candidate_p, used_in_cond_stmt_p, check_clobber_cc, clobber_cc_p, expand_ccmp_next, expand_ccmp_expr_1, expand_ccmp_expr): New functions to expand ccmp. * ccmp.h (expand_ccmp_expr): New prototype. * expr.c: #include "ccmp.h" (expand_expr_real_1): Try to expand ccmp. diff --git a/gcc/ccmp.c b/gcc/ccmp.c index 665c2a5..97b3910 100644 --- a/gcc/ccmp.c +++ b/gcc/ccmp.c @@ -47,6 +47,262 @@ along with GCC; see the file COPYING3. If not see #include "expmed.h" #include "ccmp.h" +/* The following functions expand conditional compare (CCMP) instructions. + Here is a short description about the over all algorithm: + * ccmp_candidate_p is used to identify the CCMP candidate + + * expand_ccmp_expr is the main entry, which calls expand_ccmp_expr_1 + to expand CCMP. + + * expand_ccmp_expr_1 uses a recursive algorithm to expand CCMP. + It calls two target hooks gen_ccmp_first and gen_ccmp_next to generate + CCMP instructions. +- gen_ccmp_first expands the first compare in CCMP. +- gen_ccmp_next expands the following compares. + + During expanding, we must make sure that no instruction can clobber the + CC reg except the compares. So clobber_cc_p and check_clobber_cc are + introduced to do the check. + + * If the final result is not used in a COND_EXPR (checked by function + used_in_cond_stmt_p), it calls cstorecc4 pattern to store the CC to a + general register. */ + +/* Check whether G is a potential conditional compare candidate. */ +static bool +ccmp_candidate_p (gimple g) +{ + tree rhs = gimple_assign_rhs_to_tree (g); + tree lhs, op0, op1; + gimple gs0, gs1; + enum tree_code tcode, tcode0, tcode1; + tcode = TREE_CODE (rhs); + + if (tcode != BIT_AND_EXPR && tcode != BIT_IOR_EXPR) +return false; + + lhs = gimple_assign_lhs (g); + op0 = TREE_OPERAND (rhs, 0); + op1 = TREE_OPERAND (rhs, 1); + + if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME) + || !has_single_use (lhs)) +return false; + + gs0 = get_gimple_for_ssa_name (op0); + gs1 = get_gimple_for_ssa_name (op1); + if (!gs0 || !gs1 || !is_gimple_assign (gs0) || !is_gimple_assign (gs1) + /* g, gs0 and gs1 must be in the same basic block, since current stage +is out-of-ssa. We can not guarantee the correctness when forwording +the gs0 and gs1 into g whithout DATAFLOW analysis. */ + || gimple_bb (gs0) != gimple_bb (gs1) + || gimple_bb (gs0) != gimple_bb (g)) +return false; + + if (!(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0))) + || POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0 + || !(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1))) + || POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1) +return false; + + tcode0 = gimple_assign_rhs_code (gs0); + tcode1 = gimple_assign_rhs_code (gs1); + if (TREE_CODE_CLASS (tcode0) == tcc_comparison + && TREE_CODE_CLASS (tcode1) == tcc_comparison) +return true; + if (TREE_CODE_CLASS (tcode0) == tcc_comparison + && ccmp_candidate_p (gs1)) +return true; + else if (TREE_CODE_CLASS (tcode1) == tcc_comparison + && ccmp_candidate_p (gs0)) +return true; + /* We skip ccmp_candidate_p (gs1) && ccmp_candidate_p (gs0) since + there is no way to set the CC flag. */ + return false; +} + +/* Check whether EXP is used in a GIMPLE_COND statement or not. */ +static bool +used_in_cond_stmt_p (tree exp) +{ + bool expand_cond = false; + imm_use_iterator ui; + gimple use_stmt; + FOR_EACH_IMM_USE_STMT (use_stmt, ui, exp) +if (gimple_code (use_stmt) == GIMPLE_COND) + { + tree op1 = gimple_cond_rhs (use_stmt); + if (integer_zerop (op1)) + expand_cond = true; + BREAK_FROM_IMM_USE_STMT (ui); + } + return expand_cond; +} + +/* If SETTER clobber CC reg, set DATA to TRUE. */ +static void +check_clobber_cc (rtx reg, const_rtx setter, void *data) +{ + if (GET_CODE (setter) == CLOBBER
[PATCH, 5/10] aarch64: add ccmp operand predicate
Hi, The patches defines ccmp operand predicate for AARCH64. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-23 Zhenqiang Chen * config/aarch64/aarch64-protos.h (aarch64_uimm5): New prototype. * config/aarch64/constraints.md (Usn): Immediate for ccmn. * config/aarch64/predicates.md (aarch64_ccmp_immediate): New. (aarch64_ccmp_operand): New. * config/aarch64/aarch64.c (aarch64_uimm5): New function. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index c4f75b3..997ff50 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -246,6 +246,8 @@ void aarch64_init_expanders (void); void aarch64_print_operand (FILE *, rtx, char); void aarch64_print_operand_address (FILE *, rtx); +bool aarch64_uimm5 (HOST_WIDE_INT); + /* Initialize builtins for SIMD intrinsics. */ void init_aarch64_simd_builtins (void); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index f2968ff..ecf88f9 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -9566,6 +9566,13 @@ aarch64_expand_movmem (rtx *operands) return true; } +/* Return true if val can be encoded as a 5-bit unsigned immediate. */ +bool +aarch64_uimm5 (HOST_WIDE_INT val) +{ + return (val & (HOST_WIDE_INT) 0x1f) == val; +} + #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST aarch64_address_cost diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index 807d0b1..bb6a8a1 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -89,6 +89,11 @@ (and (match_code "const_int") (match_test "(unsigned HOST_WIDE_INT) ival < 32"))) +(define_constraint "Usn" + "A constant that can be used with a CCMN operation (once negated)." + (and (match_code "const_int") + (match_test "aarch64_uimm5 (-ival)"))) + (define_constraint "Usd" "@internal A constraint that matches an immediate shift constant in DImode." diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 2702a3c..dd35714 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -30,6 +30,15 @@ (ior (match_code "symbol_ref") (match_operand 0 "register_operand"))) +(define_predicate "aarch64_ccmp_immediate" + (and (match_code "const_int") + (ior (match_test "aarch64_uimm5 (INTVAL (op))") + (match_test "aarch64_uimm5 (-INTVAL (op))" + +(define_predicate "aarch64_ccmp_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_ccmp_immediate"))) + (define_predicate "aarch64_simd_register" (and (match_code "reg") (ior (match_test "REGNO_REG_CLASS (REGNO (op)) == FP_LO_REGS")
[PATCH, 6/10] aarch64: add ccmp CC mode
Hi, The patches add a set of CC mode for AARCH64, which is similar as them for ARM. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-23 Zhenqiang Chen * config/aarch64/aarch64-modes.def: Define new CC modes for ccmp. * config/aarch64/aarch64.c (aarch64_get_condition_code_1): New prototype. (aarch64_get_condition_code): Call aarch64_get_condition_code_1. (aarch64_get_condition_code_1): New function to handle ccmp CC mode. * config/aarch64/predicates.md (ccmp_cc_register): New. diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index 1d2cc76..71fd2f0 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -25,6 +25,16 @@ CC_MODE (CC_ZESWP); /* zero-extend LHS (but swap to make it RHS). */ CC_MODE (CC_SESWP); /* sign-extend LHS (but swap to make it RHS). */ CC_MODE (CC_NZ);/* Only N and Z bits of condition flags are valid. */ CC_MODE (CC_Z); /* Only Z bit of condition flags is valid. */ +CC_MODE (CC_DNE); +CC_MODE (CC_DEQ); +CC_MODE (CC_DLE); +CC_MODE (CC_DLT); +CC_MODE (CC_DGE); +CC_MODE (CC_DGT); +CC_MODE (CC_DLEU); +CC_MODE (CC_DLTU); +CC_MODE (CC_DGEU); +CC_MODE (CC_DGTU); /* Vector modes. */ VECTOR_MODES (INT, 8);/* V8QI V4HI V2SI. */ diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ecf88f9..e5ede6e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3460,6 +3460,9 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y) } static unsigned +aarch64_get_condition_code_1 (enum machine_mode, enum rtx_code); + +static unsigned aarch64_get_condition_code (rtx x) { enum machine_mode mode = GET_MODE (XEXP (x, 0)); @@ -3467,7 +3470,12 @@ aarch64_get_condition_code (rtx x) if (GET_MODE_CLASS (mode) != MODE_CC) mode = SELECT_CC_MODE (comp_code, XEXP (x, 0), XEXP (x, 1)); + return aarch64_get_condition_code_1 (mode, comp_code); +} +static unsigned +aarch64_get_condition_code_1 (enum machine_mode mode, enum rtx_code comp_code) +{ switch (mode) { case CCFPmode: @@ -3490,6 +3498,27 @@ aarch64_get_condition_code (rtx x) } break; +case CC_DNEmode: + return comp_code == NE ? AARCH64_NE : AARCH64_EQ; +case CC_DEQmode: + return comp_code == NE ? AARCH64_EQ : AARCH64_NE; +case CC_DGEmode: + return comp_code == NE ? AARCH64_GE : AARCH64_LT; +case CC_DLTmode: + return comp_code == NE ? AARCH64_LT : AARCH64_GE; +case CC_DGTmode: + return comp_code == NE ? AARCH64_GT : AARCH64_LE; +case CC_DLEmode: + return comp_code == NE ? AARCH64_LE : AARCH64_GT; +case CC_DGEUmode: + return comp_code == NE ? AARCH64_CS : AARCH64_CC; +case CC_DLTUmode: + return comp_code == NE ? AARCH64_CC : AARCH64_CS; +case CC_DGTUmode: + return comp_code == NE ? AARCH64_HI : AARCH64_LS; +case CC_DLEUmode: + return comp_code == NE ? AARCH64_LS : AARCH64_HI; + case CCmode: switch (comp_code) { diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index dd35714..ab02fd0 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -39,6 +39,23 @@ (ior (match_operand 0 "register_operand") (match_operand 0 "aarch64_ccmp_immediate"))) +(define_special_predicate "ccmp_cc_register" + (and (match_code "reg") + (and (match_test "REGNO (op) == CC_REGNUM") + (ior (match_test "mode == GET_MODE (op)") +(match_test "mode == VOIDmode + && (GET_MODE (op) == CC_DNEmode + || GET_MODE (op) == CC_DEQmode + || GET_MODE (op) == CC_DLEmode + || GET_MODE (op) == CC_DLTmode + || GET_MODE (op) == CC_DGEmode + || GET_MODE (op) == CC_DGTmode + || GET_MODE (op) == CC_DLEUmode + || GET_MODE (op) == CC_DLTUmode + || GET_MODE (op) == CC_DGEUmode + || GET_MODE (op) == CC_DGTUmode)" +) + (define_predicate "aarch64_simd_register" (and (match_code "reg") (ior (match_test "REGNO_REG_CLASS (REGNO (op)) == FP_LO_REGS")
[PATCH, 7/10] aarch64: add function to output ccmp insn
Hi, The patch adds three help functions to output ccmp instructions. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-23 Zhenqiang Chen * config/aarch64/aarch64-protos.h (aarch64_output_ccmp): New prototype. * config/aarch64/aarch64.c (aarch64_code_to_nzcv): New function. (aarch64_mode_to_condition_code): New function. (aarch64_output_ccmp): New function. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 997ff50..ff1a0f4 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -247,6 +247,7 @@ void aarch64_print_operand (FILE *, rtx, char); void aarch64_print_operand_address (FILE *, rtx); bool aarch64_uimm5 (HOST_WIDE_INT); +const char* aarch64_output_ccmp (rtx *, bool, int); /* Initialize builtins for SIMD intrinsics. */ void init_aarch64_simd_builtins (void); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e5ede6e..5fe4826 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -9602,6 +9602,137 @@ aarch64_uimm5 (HOST_WIDE_INT val) return (val & (HOST_WIDE_INT) 0x1f) == val; } +/* N Z C V. */ +#define AARCH64_CC_V 1 +#define AARCH64_CC_C (1 << 1) +#define AARCH64_CC_Z (1 << 2) +#define AARCH64_CC_N (1 << 3) + +static unsigned int +aarch64_code_to_nzcv (enum rtx_code code, bool inverse) +{ + switch (code) +{ +case NE: /* NE, Z == 0. */ + return inverse ? AARCH64_CC_Z : 0; +case EQ: /* EQ, Z == 1. */ + return inverse ? 0 : AARCH64_CC_Z; +case LE: /* LE, !(Z == 0 && N == V). */ + return inverse ? AARCH64_CC_N | AARCH64_CC_V : AARCH64_CC_Z; +case GT: /* GT, Z == 0 && N == V. */ + return inverse ? AARCH64_CC_Z : AARCH64_CC_N | AARCH64_CC_V; +case LT: /* LT, N != V. */ + return inverse ? AARCH64_CC_N | AARCH64_CC_V : AARCH64_CC_N; +case GE: /* GE, N == V. */ + return inverse ? AARCH64_CC_N : AARCH64_CC_N | AARCH64_CC_V; +case LEU: /* LS, !(C == 1 && Z == 0). */ + return inverse ? AARCH64_CC_C: AARCH64_CC_Z; +case GTU: /* HI, C ==1 && Z == 0. */ + return inverse ? AARCH64_CC_Z : AARCH64_CC_C; +case LTU: /* CC, C == 0. */ + return inverse ? AARCH64_CC_C : 0; +case GEU: /* CS, C == 1. */ + return inverse ? 0 : AARCH64_CC_C; +default: + gcc_unreachable (); + return 0; +} +} + +static unsigned +aarch64_mode_to_condition_code (enum machine_mode mode, bool inverse) +{ + switch (mode) +{ +case CC_DNEmode: + return inverse ? aarch64_get_condition_code_1 (CCmode, EQ) + : aarch64_get_condition_code_1 (CCmode, NE); +case CC_DEQmode: + return inverse ? aarch64_get_condition_code_1 (CCmode, NE) + : aarch64_get_condition_code_1 (CCmode, EQ); +case CC_DLEmode: + return inverse ? aarch64_get_condition_code_1 (CCmode, GT) + : aarch64_get_condition_code_1 (CCmode, LE); +case CC_DGTmode: + return inverse ? aarch64_get_condition_code_1 (CCmode, LE) + : aarch64_get_condition_code_1 (CCmode, GT); +case CC_DLTmode: + return inverse ? aarch64_get_condition_code_1 (CCmode, GE) + : aarch64_get_condition_code_1 (CCmode, LT); +case CC_DGEmode: + return inverse ? aarch64_get_condition_code_1 (CCmode, LT) + : aarch64_get_condition_code_1 (CCmode, GE); +case CC_DLEUmode: + return inverse ? aarch64_get_condition_code_1 (CCmode, GTU) + : aarch64_get_condition_code_1 (CCmode, LEU); +case CC_DGTUmode: + return inverse ? aarch64_get_condition_code_1 (CCmode, LEU) + : aarch64_get_condition_code_1 (CCmode, GTU); +case CC_DLTUmode: + return inverse ? aarch64_get_condition_code_1 (CCmode, GEU) + : aarch64_get_condition_code_1 (CCmode, LTU); +case CC_DGEUmode: + return inverse ? aarch64_get_condition_code_1 (CCmode, LTU) + : aarch64_get_condition_code_1 (CCmode, GEU); +default: + gcc_unreachable (); +} +} + +const char * +aarch64_output_ccmp (rtx *operands, bool is_and, int which_alternative) +{ + char buf[32]; + rtx cc = operands[0]; + enum rtx_code code = GET_CODE (operands[5]); + unsigned char nzcv = aarch64_code_to_nzcv (code, is_and); + enum machine_mode mode = GET_MODE (cc); + unsigned int cond_code = aarch64_mode_to_condition_code (mode, !is_and); + + gcc_assert (GET_MODE (operands[2]) == SImode + || GET_MODE (operands[2]) == DImode); + + if (GET_MODE (operands[2]) == SImode) +switch (which_alternative) + { + case 0: + snprintf (buf, sizeof (buf), "ccmp\t%%w2, %%w3, #%u, %s", + nzcv, aarch64_condition_codes[cond_code]); + break; + case 1: + snprintf (buf, sizeof (buf), "ccmp\t
[PATCH, 9/10] aarch64: generate conditional compare instructions
Hi, The patches implements the two hooks for AARCH64 to generate ccmp instructions. Bootstrap and no make check regression on qemu. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-23 Zhenqiang Chen * config/aarch64/aarch64.c (aarch64_code_to_ccmode): New function. (aarch64_convert_mode, aarch64_convert_mode): New functions. (aarch64_gen_ccmp_first, aarch64_gen_ccmp_next): New functions. (TARGET_GEN_CCMP_FIRST, TARGET_GEN_CCMP_NEXT): Define the two hooks. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4e8d55b..6f08e38 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -9601,6 +9601,137 @@ aarch64_uimm5 (HOST_WIDE_INT val) return (val & (HOST_WIDE_INT) 0x1f) == val; } +static enum machine_mode +aarch64_code_to_ccmode (enum rtx_code code) +{ + switch (code) +{ +case NE: + return CC_DNEmode; +case EQ: + return CC_DEQmode; +case LE: + return CC_DLEmode; +case LT: + return CC_DLTmode; +case GE: + return CC_DGEmode; +case GT: + return CC_DGTmode; +case LEU: + return CC_DLEUmode; +case LTU: + return CC_DLTUmode; +case GEU: + return CC_DGEUmode; +case GTU: + return CC_DGTUmode; +default: + return CCmode; +} +} + +static bool +aarch64_convert_mode (rtx* op0, rtx* op1, int unsignedp) +{ + enum machine_mode mode; + + mode = GET_MODE (*op0); + if (mode == VOIDmode) +mode = GET_MODE (*op1); + + if (mode == QImode || mode == HImode) +{ + *op0 = convert_modes (SImode, mode, *op0, unsignedp); + *op1 = convert_modes (SImode, mode, *op1, unsignedp); +} + else if (mode != SImode && mode != DImode) +return false; + + return true; +} + +static rtx +aarch64_gen_ccmp_first (int code, rtx op0, rtx op1) +{ + enum machine_mode mode; + rtx cmp, target; + int unsignedp = code == LTU || code == LEU || code == GTU || code == GEU; + + mode = GET_MODE (op0); + if (mode == VOIDmode) +mode = GET_MODE (op1); + + if (mode == VOIDmode) +return NULL_RTX; + + /* Make op0 and op1 are legal operands for cmp. */ + if (!register_operand (op0, GET_MODE (op0))) +op0 = force_reg (mode, op0); + if (!aarch64_plus_operand (op1, GET_MODE (op1))) +op1 = force_reg (mode, op1); + + if (!aarch64_convert_mode (&op0, &op1, unsignedp)) +return NULL_RTX; + + mode = aarch64_code_to_ccmode ((enum rtx_code) code); + if (mode == CCmode) +return NULL_RTX; + + cmp = gen_rtx_fmt_ee (COMPARE, CCmode, op0, op1); + target = gen_rtx_REG (mode, CC_REGNUM); + emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (CCmode, CC_REGNUM), cmp)); + return target; +} + +static rtx +aarch64_gen_ccmp_next (rtx prev, int cmp_code, rtx op0, rtx op1, int bit_code) +{ + rtx cmp0, cmp1, target, bit_op; + enum machine_mode mode; + int unsignedp = cmp_code == LTU || cmp_code == LEU + || cmp_code == GTU || cmp_code == GEU; + + mode = GET_MODE (op0); + if (mode == VOIDmode) +mode = GET_MODE (op1); + + if (mode == VOIDmode) +return NULL_RTX; + + /* Give up if the operand is illegal since force_reg will introduce + additional overhead. */ + if (!register_operand (op0, GET_MODE (op0)) + || !aarch64_ccmp_operand (op1, GET_MODE (op1))) +return NULL_RTX; + + if (!aarch64_convert_mode (&op0, &op1, unsignedp)) +return NULL_RTX; + + mode = aarch64_code_to_ccmode ((enum rtx_code) cmp_code); + if (mode == CCmode) +return NULL_RTX; + + cmp1 = gen_rtx_fmt_ee ((enum rtx_code) cmp_code, SImode, op0, op1); + + cmp0 = gen_rtx_fmt_ee (NE, SImode, prev, const0_rtx); + + bit_op = gen_rtx_fmt_ee ((enum rtx_code) bit_code, SImode, cmp0, cmp1); + + /* Generate insn to match ccmp_and/ccmp_ior. */ + target = gen_rtx_REG (mode, CC_REGNUM); + emit_insn (gen_rtx_SET (VOIDmode, target, + gen_rtx_fmt_ee (COMPARE, VOIDmode, + bit_op, const0_rtx))); + return target; +} + +#undef TARGET_GEN_CCMP_FIRST +#define TARGET_GEN_CCMP_FIRST aarch64_gen_ccmp_first + +#undef TARGET_GEN_CCMP_NEXT +#define TARGET_GEN_CCMP_NEXT aarch64_gen_ccmp_next + #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST aarch64_address_cost
[PATCH, 10/10] aarch64: Handle ccmp in ifcvt to make it work with cmov
Hi, The patch enhances ifcvt to handle conditional compare instruction (ccmp) to make it work with cmov. For ccmp, ALLOW_CC_MODE is set to TRUE when calling canonicalize_condition. And the backend does not need to generate additional "compare (CC, 0)" for it. Bootstrap and no check regression on qemu. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-23 Zhenqiang Chen * config/aarch64/aarch64.md (movcc): Handle ccmp_cc. * ifcvt.c: #include "ccmp.h". (struct noce_if_info): Add a new field ccmp_p. (noce_emit_cmove): Allow ccmp condition. (noce_get_alt_condition): Call canonicalize_condition with ccmp_p. (noce_get_condition): Set ALLOW_CC_MODE to TRUE for ccmp. (noce_process_if_block): Set ccmp_p for ccmp. testsuite/ChangeLog: 2014-06-23 Zhenqiang Chen * gcc.target/aarch64/ccmn-csel-1.c: New testcase. * gcc.target/aarch64/ccmn-csel-2.c: New testcase. * gcc.target/aarch64/ccmn-csel-3.c: New testcase. * gcc.target/aarch64/ccmp-csel-1.c: New testcase. * gcc.target/aarch64/ccmp-csel-2.c: New testcase. * gcc.target/aarch64/ccmp-csel-3.c: New testcase. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index fcc5559..82cc561 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2459,15 +2459,19 @@ (match_operand:ALLI 3 "register_operand" "")))] "" { -rtx ccreg; enum rtx_code code = GET_CODE (operands[1]); if (code == UNEQ || code == LTGT) FAIL; -ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0), - XEXP (operands[1], 1)); -operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx); +if (!ccmp_cc_register (XEXP (operands[1], 0), + GET_MODE (XEXP (operands[1], 0 + { + rtx ccreg; + ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0), +XEXP (operands[1], 1)); + operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx); + } } ) diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 2ca2278..8ee1266 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -43,6 +43,7 @@ #include "vec.h" #include "pointer-set.h" #include "dbgcnt.h" +#include "ccmp.h" #ifndef HAVE_conditional_move #define HAVE_conditional_move 0 @@ -786,6 +787,9 @@ struct noce_if_info /* Estimated cost of the particular branch instruction. */ int branch_cost; + + /* The COND is a conditional compare or not. */ + bool ccmp_p; }; static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int); @@ -1407,9 +1411,16 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code, end_sequence (); } - /* Don't even try if the comparison operands are weird. */ - if (! general_operand (cmp_a, GET_MODE (cmp_a)) - || ! general_operand (cmp_b, GET_MODE (cmp_b))) + /* Don't even try if the comparison operands are weird + except conditional compare. */ + if (if_info->ccmp_p) +{ + if (!(GET_MODE_CLASS (GET_MODE (cmp_a)) == MODE_CC + || GET_MODE_CLASS (GET_MODE (cmp_b)) == MODE_CC)) + return NULL_RTX; +} + else if (! general_operand (cmp_a, GET_MODE (cmp_a)) + || ! general_operand (cmp_b, GET_MODE (cmp_b))) return NULL_RTX; #if HAVE_conditional_move @@ -1849,7 +1860,7 @@ noce_get_alt_condition (struct noce_if_info *if_info, rtx target, } cond = canonicalize_condition (if_info->jump, cond, reverse, -earliest, target, false, true); +earliest, target, if_info->ccmp_p, true); if (! cond || ! reg_mentioned_p (target, cond)) return NULL; @@ -2300,6 +2311,7 @@ noce_get_condition (rtx jump, rtx *earliest, bool then_else_reversed) { rtx cond, set, tmp; bool reverse; + int allow_cc_mode = false; if (! any_condjump_p (jump)) return NULL_RTX; @@ -2333,10 +2345,21 @@ noce_get_condition (rtx jump, rtx *earliest, bool then_else_reversed) return cond; } + /* For conditional compare, set ALLOW_CC_MODE to TRUE. */ + if (targetm.gen_ccmp_first) +{ + rtx prev = prev_nonnote_nondebug_insn (jump); + if (prev + && NONJUMP_INSN_P (prev) + && BLOCK_FOR_INSN (prev) == BLOCK_FOR_INSN (jump) + && ccmp_insn_p (prev)) + allow_cc_mode = true; +} + /* Otherwise, fall back on canonicalize_condition to do the dirty work of manipulating MODE_CC values and COMPARE rtx codes. */ tmp = canonicalize_condition (jump, cond, reverse, earliest, - NULL_RTX, false, true); + NULL_RTX, allow_cc_mode, true); /* We don't handle side-effects in the conditio
[PATCH, 8/10] aarch64: ccmp insn patterns
Hi, The patch adds two insn patterns for ccmp instructions. cbranchcc4 is introduced to generate optimized conditional branch without an additional compare against the result of ccmp. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-23 Zhenqiang Chen * config/aarch64/aarch64.md (cbranchcc4): New. (*ccmp_and, *ccmp_ior): New. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a4d8887..c25d940 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -230,6 +230,52 @@ " ) +(define_expand "cbranchcc4" + [(set (pc) (if_then_else + (match_operator 0 "aarch64_comparison_operator" + [(match_operand 1 "cc_register" "") + (const_int 0)]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "" + " ") + +(define_insn "*ccmp_and" + [(set (match_operand 6 "ccmp_cc_register" "") + (compare +(and:SI + (match_operator 4 "aarch64_comparison_operator" + [(match_operand 0 "ccmp_cc_register" "") + (match_operand 1 "aarch64_plus_operand" "")]) + (match_operator 5 "aarch64_comparison_operator" + [(match_operand:GPI 2 "register_operand" "r,r,r") + (match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn")])) +(const_int 0)))] + "" + { +return aarch64_output_ccmp (operands, true, which_alternative); + } + [(set_attr "type" "alus_reg,alus_imm,alus_imm")] +) + +(define_insn "*ccmp_ior" + [(set (match_operand 6 "ccmp_cc_register" "") + (compare +(ior:SI + (match_operator 4 "aarch64_comparison_operator" + [(match_operand 0 "ccmp_cc_register" "") + (match_operand 1 "aarch64_plus_operand" "")]) + (match_operator 5 "aarch64_comparison_operator" + [(match_operand:GPI 2 "register_operand" "r,r,r") + (match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn")])) +(const_int 0)))] + "" + { +return aarch64_output_ccmp (operands, false, which_alternative); + } + [(set_attr "type" "alus_reg,alus_imm,alus_imm")] +) + (define_insn "*condjump" [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" [(match_operand 1 "cc_register" "") (const_int 0)]) 57,1
Re: [PATCH, 10/10] aarch64: Handle ccmp in ifcvt to make it work with cmov
On 23 June 2014 15:09, Andrew Pinski wrote: > On Mon, Jun 23, 2014 at 12:01 AM, Zhenqiang Chen > wrote: >> Hi, >> >> The patch enhances ifcvt to handle conditional compare instruction >> (ccmp) to make it work with cmov. For ccmp, ALLOW_CC_MODE is set to >> TRUE when calling canonicalize_condition. And the backend does not >> need to generate additional "compare (CC, 0)" for it. >> >> Bootstrap and no check regression on qemu. >> >> OK for trunk? >> >> Thanks! >> -Zhenqiang >> >> ChangeLog: >> 2014-06-23 Zhenqiang Chen >> >> * config/aarch64/aarch64.md (movcc): Handle ccmp_cc. >> * ifcvt.c: #include "ccmp.h". >> (struct noce_if_info): Add a new field ccmp_p. >> (noce_emit_cmove): Allow ccmp condition. >> (noce_get_alt_condition): Call canonicalize_condition with ccmp_p. >> (noce_get_condition): Set ALLOW_CC_MODE to TRUE for ccmp. >> (noce_process_if_block): Set ccmp_p for ccmp. >> >> testsuite/ChangeLog: >> 2014-06-23 Zhenqiang Chen >> >> * gcc.target/aarch64/ccmn-csel-1.c: New testcase. >> * gcc.target/aarch64/ccmn-csel-2.c: New testcase. >> * gcc.target/aarch64/ccmn-csel-3.c: New testcase. >> * gcc.target/aarch64/ccmp-csel-1.c: New testcase. >> * gcc.target/aarch64/ccmp-csel-2.c: New testcase. >> * gcc.target/aarch64/ccmp-csel-3.c: New testcase. >> >> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md >> index fcc5559..82cc561 100644 >> --- a/gcc/config/aarch64/aarch64.md >> +++ b/gcc/config/aarch64/aarch64.md >> @@ -2459,15 +2459,19 @@ >>(match_operand:ALLI 3 "register_operand" "")))] >>"" >>{ >> -rtx ccreg; >> enum rtx_code code = GET_CODE (operands[1]); >> >> if (code == UNEQ || code == LTGT) >>FAIL; >> >> -ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0), >> - XEXP (operands[1], 1)); >> -operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx); >> +if (!ccmp_cc_register (XEXP (operands[1], 0), >> + GET_MODE (XEXP (operands[1], 0 >> + { >> + rtx ccreg; >> + ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0), >> +XEXP (operands[1], 1)); >> + operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx); >> + } >>} >> ) >> > > > You should do the same thing for the FP one. The change to aarch64.md > is exactly the same patch which I came up with. Thanks for the comments. For AARCH64, we can mix INT and FP compares. But FP compare would be slower than INT compare. CMP FCCMP FCMP CCMP FCMP FCCMP I have no enough resource to collect benchmark results to approve them valuable. So the patches did not handle FP at all. If you had approved CCMP for FP valuable, I will work out a separate patch to support it. Or you can share your patches. Thanks! -Zhenqiang > For the rest I actually I have a late phi-opt pass which does the > conversion into COND_EXPR. That is I don't change ifcvt at all. > > And then I needed two more patches after that to get conditional > compares to work with cmov's. Thanks. Any patch to improve ccmp is welcome. -Zhenqiang > The following patch which fixes up expand_cond_expr_using_cmove to > handle CCmode correctly: > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -7989,7 +7989,9 @@ expand_cond_expr_using_cmove (tree treeop0 > ATTRIBUTE_UNUSED, >op00 = expand_normal (treeop0); >op01 = const0_rtx; >comparison_code = NE; > - comparison_mode = TYPE_MODE (TREE_TYPE (treeop0)); > + comparison_mode = GET_MODE (op00); > + if (comparison_mode == VOIDmode) > + comparison_mode = TYPE_MODE (TREE_TYPE (treeop0)); > } > >if (GET_MODE (op1) != mode) > > > --- CUT --- > And then this one to have ccmp to be expanded from the tree level: > index cfc4a16..056e9b0 100644 (file) > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -9300,26 +9300,36 @@ used_in_cond_stmt_p (tree exp) >imm_use_iterator ui; >gimple use_stmt; >FOR_EACH_IMM_USE_STMT (use_stmt, ui, exp) > -if (gimple_code (use_stmt) == GIMPLE_COND) > - { > - tree op1 = gimple_cond_rhs (use_stmt); > - /* TBD: If we can convert all > - _Bool t; > +{ > + if (gimple_code (use_stmt) == GIMPLE_COND) > + { > +
Re: [PATCH, 3/10] skip swapping operands used in ccmp
On 26 June 2014 05:03, Jeff Law wrote: > On 06/25/14 08:44, Richard Earnshaw wrote: >> >> On 23/06/14 07:58, Zhenqiang Chen wrote: >>> >>> Hi, >>> >>> Swapping operands in a ccmp will lead to illegal instructions. So the >>> patch disables it in simplify_while_replacing. >>> >>> The patch is separated from >>> https://gcc.gnu.org/ml/gcc-patches/2014-02/msg01407.html. >>> >>> To make it clean. The patch adds two files: ccmp.{c,h} to hold all new >>> ccmp related functions. >>> >>> OK for trunk? >>> >>> Thanks! >>> -Zhenqiang >>> >>> ChangeLog: >>> 2014-06-23 Zhenqiang Chen >>> >>> * Makefile.in: Add ccmp.o >>> * ccmp.c: New file. >> >> >> Do we really need a new file for one 10-line function? Seems like >> overkill. I think it would be better to just drop this function into >> recog.c. > > Right. And if we did want a new file, clearly the #includes need to be > trimmed :-) Yes. It is not necessary for this patch itself. The file is in need for "[PATCH, 4/10] expand ccmp". Overall it will have more than 300 lines of codes. And #includes are trimmed. Previously all codes were in expr.c. It always conflicted when rebasing. So I move them in separate files. Thanks! -Zhenqiang
Re: [PATCH, 3/10] skip swapping operands used in ccmp
On 25 June 2014 22:44, Richard Earnshaw wrote: > On 23/06/14 07:58, Zhenqiang Chen wrote: >> Hi, >> >> Swapping operands in a ccmp will lead to illegal instructions. So the >> patch disables it in simplify_while_replacing. >> >> The patch is separated from >> https://gcc.gnu.org/ml/gcc-patches/2014-02/msg01407.html. >> >> To make it clean. The patch adds two files: ccmp.{c,h} to hold all new >> ccmp related functions. >> >> OK for trunk? >> >> Thanks! >> -Zhenqiang >> >> ChangeLog: >> 2014-06-23 Zhenqiang Chen >> >> * Makefile.in: Add ccmp.o >> * ccmp.c: New file. > > Do we really need a new file for one 10-line function? Seems like > overkill. I think it would be better to just drop this function into > recog.c. > > Also, can you explain more clearly what the problem is with swapping the > operands? If this can't be done, then SWAPPABLE_OPERANDS is arguably > doing the wrong thing; and that might mean that rtx class you've applied > to your new code is incorrect. Thanks for the comments. In previous tests, I got several new fails if the operands were swapped. I will try to reproduce it and back to you. Thanks! -Zhenqiang > R. > >> * ccmp.h: New file. >> * recog.c (simplify_while_replacing): Check ccmp_insn_p. >> >> diff --git a/gcc/Makefile.in b/gcc/Makefile.in >> index 5587b75..8757a30 100644 >> --- a/gcc/Makefile.in >> +++ b/gcc/Makefile.in >> @@ -1169,6 +1169,7 @@ OBJS = \ >> builtins.o \ >> caller-save.o \ >> calls.o \ >> + ccmp.o \ >> cfg.o \ >> cfganal.o \ >> cfgbuild.o \ >> diff --git a/gcc/ccmp.c b/gcc/ccmp.c >> new file mode 100644 >> index 000..665c2a5 >> --- /dev/null >> +++ b/gcc/ccmp.c >> @@ -0,0 +1,62 @@ >> +/* Conditional compare related functions >> + Copyright (C) 2014-2014 Free Software Foundation, Inc. >> + >> +This file is part of GCC. >> + >> +GCC is free software; you can redistribute it and/or modify it under >> +the terms of the GNU General Public License as published by the Free >> +Software Foundation; either version 3, or (at your option) any later >> +version. >> + >> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY >> +WARRANTY; without even the implied warranty of MERCHANTABILITY or >> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License >> +for more details. >> + >> +You should have received a copy of the GNU General Public License >> +along with GCC; see the file COPYING3. If not see >> +<http://www.gnu.org/licenses/>. */ >> + >> +#include "config.h" >> +#include "system.h" >> +#include "coretypes.h" >> +#include "tm.h" >> +#include "rtl.h" >> +#include "tree.h" >> +#include "stringpool.h" >> +#include "regs.h" >> +#include "expr.h" >> +#include "optabs.h" >> +#include "tree-iterator.h" >> +#include "basic-block.h" >> +#include "tree-ssa-alias.h" >> +#include "internal-fn.h" >> +#include "gimple-expr.h" >> +#include "is-a.h" >> +#include "gimple.h" >> +#include "gimple-ssa.h" >> +#include "tree-ssanames.h" >> +#include "target.h" >> +#include "common/common-target.h" >> +#include "df.h" >> +#include "tree-ssa-live.h" >> +#include "tree-outof-ssa.h" >> +#include "cfgexpand.h" >> +#include "tree-phinodes.h" >> +#include "ssa-iterators.h" >> +#include "expmed.h" >> +#include "ccmp.h" >> + >> +bool >> +ccmp_insn_p (rtx object) >> +{ >> + rtx x = PATTERN (object); >> + if (targetm.gen_ccmp_first >> + && GET_CODE (x) == SET >> + && GET_CODE (XEXP (x, 1)) == COMPARE >> + && (GET_CODE (XEXP (XEXP (x, 1), 0)) == IOR >> + || GET_CODE (XEXP (XEXP (x, 1), 0)) == AND)) >> +return true; >> + return false; >> +} >> + >> diff --git a/gcc/ccmp.h b/gcc/ccmp.h >> new file mode 100644 >> index 000..7e139aa >> --- /dev/null >> +++ b/gcc/ccmp.h >> @@ -0,0 +1,25 @@ >> +/* Conditional comapre related functions. >> + Copyright (C) 2014-2014 Free Software Foundation, Inc. >> + >> +This
[Committed] [PATCH, loop2_invariant, 1/2] Check only one register class
On 26 June 2014 05:30, Jeff Law wrote: > On 06/11/14 04:05, Zhenqiang Chen wrote: >> >> On 10 June 2014 19:06, Steven Bosscher wrote: >>> >>> On Tue, Jun 10, 2014 at 11:22 AM, Zhenqiang Chen wrote: >>>> >>>> Hi, >>>> >>>> For loop2-invariant pass, when flag_ira_loop_pressure is enabled, >>>> function gain_for_invariant checks the pressures of all register >>>> classes. This does not make sense since one invariant might impact >>>> only one register class. >>>> >>>> The patch enhances functions get_inv_cost and gain_for_invariant to >>>> check only the register pressure of the invariant if possible. >>> >>> >>> This patch may work for targets with more-or-less orthogonal reg >>> classes, but not if there is a lot of overlap between reg classes. >> >> >> Yes. I need check the overlap between reg classes. >> >> Patch is updated to check all overlap reg classes by >> reg_classes_intersect_p: > > So you need a new ChangeLog, of course :-) > > > >> >> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c >> index c76a2a0..6e43b49 100644 >> --- a/gcc/loop-invariant.c >> +++ b/gcc/loop-invariant.c >> @@ -1092,16 +1092,22 @@ get_pressure_class_and_nregs (rtx insn, int >> *nregs) >> } >> >> /* Calculates cost and number of registers needed for moving invariant >> INV >> - out of the loop and stores them to *COST and *REGS_NEEDED. */ >> + out of the loop and stores them to *COST and *REGS_NEEDED. *CL will >> be >> + the REG_CLASS of INV. Return >> + 0: if INV is invalid. >> + 1: if INV and its depends_on have same reg_class >> + > 1: if INV and its depends_on have different reg_classes. */ > > Nit/bikeshedding. I tend to prefer < 0, 0, > 0 (or -1, 0, 1) for > tri-states. It's not a big deal though. Thanks! Update the tri-state values to ( -1, 0, 1). > >>check_p = i < ira_pressure_classes_num; >> + >> + if ((dep_ret > 1) || ((dep_ret == 1) && (*cl != dep_cl))) >> + { >> + *cl = ALL_REGS; >> + ret ++; > > Whitespace nit -- no space in this statement. use "ret++;" > > You should add a testcase if at all possible. Perhaps two, one which runs > on an ARM variant and one for x86_64. The former because that's obviously > what Linaro cares about, the latter for wider testing. > > > Definitely add a ChangeLog entry, fix the whitespace nit & add testcase. OK > with those fixes. Your choice on the tri-state values. Updated with a testcase for X86-64 and committed @212135. ChangeLog: 2014-06-30 Zhenqiang Chen * loop-invariant.c (get_inv_cost): Handle register class. (gain_for_invariant): Check the register pressure of the inv and its overlapped register class, other than all. testsuite/ChangeLog: 2014-06-30 Zhenqiang Chen * ira-loop-pressure.c: New test. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 25e63e4..ef5c59b 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1090,16 +1090,22 @@ get_pressure_class_and_nregs (rtx insn, int *nregs) } /* Calculates cost and number of registers needed for moving invariant INV - out of the loop and stores them to *COST and *REGS_NEEDED. */ + out of the loop and stores them to *COST and *REGS_NEEDED. *CL will be + the REG_CLASS of INV. Return + -1: if INV is invalid. + 0: if INV and its depends_on have same reg_class + 1: if INV and its depends_on have different reg_classes. */ -static void -get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) +static int +get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed, + enum reg_class *cl) { int i, acomp_cost; unsigned aregs_needed[N_REG_CLASSES]; unsigned depno; struct invariant *dep; bitmap_iterator bi; + int ret = 1; /* Find the representative of the class of the equivalent invariants. */ inv = invariants[inv->eqto]; @@ -1115,7 +1121,7 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) if (inv->move || inv->stamp == actual_stamp) -return; +return -1; inv->stamp = actual_stamp; if (! flag_ira_loop_pressure) @@ -1127,6 +1133,8 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed) pressure_class = get_pressure_class_and_nregs (inv->insn, &nregs); regs_needed[pressure_class] += nregs; + *cl = pressure_class; + ret = 0; } if (!inv->cheap_address @@ -1167,6 +1175,8 @@ get_inv_cost (stru
Re: [PATCH, 2/10] prepare ccmp
On 25 June 2014 22:41, Richard Earnshaw wrote: > On 23/06/14 07:57, Zhenqiang Chen wrote: >> Hi, >> >> The patch makes several functions global, which will be used when >> expanding ccmp instructions. >> >> The other change in this patch is to check CCMP when turning code into >> jumpy sequence. >> >> OK for trunk? >> > > This isn't a complete review. In particular, I'd like one of the gimple > experts to go over this again. > > However, some general issues do crop up. I'll deal with each patch as I > spot something. > > > enum rtx_code code; >> @@ -6503,6 +6503,12 @@ get_rtx_code (enum tree_code tcode, bool unsignedp) >>code = LTGT; >>break; >> >> +case BIT_AND_EXPR: >> + code = AND; >> + break; >> +case BIT_IOR_EXPR: >> + code = IOR; >> + break; > > Blank lines between case alternatives. Thanks. Patch is updated. diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index e8cd87f..a32e1b3 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -2095,9 +2095,10 @@ expand_gimple_cond (basic_block bb, gimple stmt) op0 = gimple_assign_rhs1 (second); op1 = gimple_assign_rhs2 (second); } - /* If jumps are cheap turn some more codes into -jumpy sequences. */ - else if (BRANCH_COST (optimize_insn_for_speed_p (), false) < 4) + /* If jumps are cheap and the target does not support conditional +compare, turn some more codes into jumpy sequences. */ + else if (BRANCH_COST (optimize_insn_for_speed_p (), false) < 4 + && (targetm.gen_ccmp_first == NULL)) { if ((code2 == BIT_AND_EXPR && TYPE_PRECISION (TREE_TYPE (op0)) == 1 diff --git a/gcc/expmed.c b/gcc/expmed.c index e76b6fc..c8d63a9 100644 --- a/gcc/expmed.c +++ b/gcc/expmed.c @@ -5105,7 +5105,7 @@ expand_and (enum machine_mode mode, rtx op0, rtx op1, rtx target) } /* Helper function for emit_store_flag. */ -static rtx +rtx emit_cstore (rtx target, enum insn_code icode, enum rtx_code code, enum machine_mode mode, enum machine_mode compare_mode, int unsignedp, rtx x, rtx y, int normalizep, diff --git a/gcc/expmed.h b/gcc/expmed.h index 4d01d1f..a567bad 100644 --- a/gcc/expmed.h +++ b/gcc/expmed.h @@ -20,6 +20,8 @@ along with GCC; see the file COPYING3. If not see #ifndef EXPMED_H #define EXPMED_H 1 +#include "insn-codes.h" + enum alg_code { alg_unknown, alg_zero, @@ -665,4 +667,9 @@ convert_cost (enum machine_mode to_mode, enum machine_mode from_mode, } extern int mult_by_coeff_cost (HOST_WIDE_INT, enum machine_mode, bool); + +extern rtx emit_cstore (rtx target, enum insn_code icode, enum rtx_code code, +enum machine_mode mode, enum machine_mode compare_mode, +int unsignedp, rtx x, rtx y, int normalizep, +enum machine_mode target_mode); #endif diff --git a/gcc/expr.c b/gcc/expr.c index 512c024..04cf56e 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -146,8 +146,6 @@ static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT, static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree); static int is_aligning_offset (const_tree, const_tree); -static void expand_operands (tree, tree, rtx, rtx*, rtx*, -enum expand_modifier); static rtx reduce_to_bit_field_precision (rtx, rtx, tree); static rtx do_store_flag (sepops, rtx, enum machine_mode); #ifdef PUSH_ROUNDING @@ -7496,7 +7494,7 @@ convert_tree_comp_to_rtx (enum tree_code tcode, int unsignedp) The value may be stored in TARGET if TARGET is nonzero. The MODIFIER argument is as documented by expand_expr. */ -static void +void expand_operands (tree exp0, tree exp1, rtx target, rtx *op0, rtx *op1, enum expand_modifier modifier) { diff --git a/gcc/expr.h b/gcc/expr.h index 6a1d3ab..66ca82f 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -787,4 +787,6 @@ extern bool categorize_ctor_elements (const_tree, HOST_WIDE_INT *, by EXP. This does not include any offset in DECL_FIELD_BIT_OFFSET. */ extern tree component_ref_field_offset (tree); +extern void expand_operands (tree, tree, rtx, rtx*, rtx*, +enum expand_modifier); #endif /* GCC_EXPR_H */ diff --git a/gcc/optabs.c b/gcc/optabs.c index ca1c194..0c3dae1 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -6453,7 +6453,7 @@ gen_cond_trap (enum rtx_code code, rtx op1, rtx op2, rtx tcode) /* Return rtx code for TCODE. Use UNSIGNEDP to select signed or unsigned operation code. */ -static enum rtx_code +enum rtx_code get_rtx_code (enum tree_code tcode, bool unsignedp) { enum rtx_code code; @@ -6503,6 +6503,14 @@ get_rtx_code (
Re: [PATCH, loop2_invariant, 2/2] Change heuristics for identical invariants
On 10 June 2014 19:16, Steven Bosscher wrote: > On Tue, Jun 10, 2014 at 11:23 AM, Zhenqiang Chen wrote: >> * loop-invariant.c (struct invariant): Add a new member: eqno; >> (find_identical_invariants): Update eqno; >> (create_new_invariant): Init eqno; >> (get_inv_cost): Compute comp_cost wiht eqno; >> (gain_for_invariant): Take spill cost into account. > > Look OK except ... > >> @@ -1243,7 +1256,13 @@ gain_for_invariant (struct invariant *inv, >> unsigned *regs_needed, >> + IRA_LOOP_RESERVED_REGS >> - ira_class_hard_regs_num[cl]; >>if (size_cost > 0) >> - return -1; >> + { >> + int spill_cost = target_spill_cost [speed] * (int) regs_needed[cl]; >> + if (comp_cost <= spill_cost) >> + return -1; >> + >> + return 2; >> + } >>else >> size_cost = 0; >> } > > ... why "return 2", instead of just falling through to "return > comp_cost - size_cost;"? The updated patch removes the check on spill cost since the check seams not sound and tests show it does not help much on the result. Bootstrap and no make check regression on X86-64. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-07-01 Zhenqiang Chen * loop-invariant.c (struct invariant): Add a new member: eqno; (find_identical_invariants): Update eqno; (create_new_invariant): Init eqno; (get_inv_cost): Compute comp_cost wiht eqno; diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index d47d461..bd67eb9 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -104,6 +104,9 @@ struct invariant /* The number of the invariant with the same value. */ unsigned eqto; + /* The number of invariants which eqto this. */ + unsigned eqno; + /* If we moved the invariant out of the loop, the register that contains its value. */ rtx reg; @@ -498,6 +501,7 @@ find_identical_invariants (invariant_htab_type *eq, struct invariant *inv) struct invariant *dep; rtx expr, set; enum machine_mode mode; + struct invariant *tmp; if (inv->eqto != ~0u) return; @@ -513,7 +517,12 @@ find_identical_invariants (invariant_htab_type *eq, struct invariant *inv) mode = GET_MODE (expr); if (mode == VOIDmode) mode = GET_MODE (SET_DEST (set)); - inv->eqto = find_or_insert_inv (eq, expr, mode, inv)->invno; + + tmp = find_or_insert_inv (eq, expr, mode, inv); + inv->eqto = tmp->invno; + + if (tmp->invno != inv->invno && inv->always_executed) +tmp->eqno++; if (dump_file && inv->eqto != inv->invno) fprintf (dump_file, @@ -722,6 +731,10 @@ create_new_invariant (struct def *def, rtx insn, bitmap depends_on, inv->invno = invariants.length (); inv->eqto = ~0u; + + /* Itself. */ + inv->eqno = 1; + if (def) def->invno = inv->invno; invariants.safe_push (inv); @@ -1136,7 +1149,7 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed, if (!inv->cheap_address || inv->def->n_addr_uses < inv->def->n_uses) -(*comp_cost) += inv->cost; +(*comp_cost) += inv->cost * inv->eqno; #ifdef STACK_REGS {
Re: [PATCH, 4/10] expand ccmp
On 25 June 2014 23:16, Richard Earnshaw wrote: > On 23/06/14 07:59, Zhenqiang Chen wrote: >> Hi, >> >> This patch includes the main logic to expand ccmp instructions. >> >> In the patch, >> * ccmp_candidate_p is used to identify the CCMP candidate >> * expand_ccmp_expr is the main entry, which calls expand_ccmp_expr_1 >> to expand CCMP. >> * expand_ccmp_expr_1 uses a recursive algorithm to expand CCMP. >> It calls gen_ccmp_first and gen_ccmp_next to generate CCMP instructions. >> >> During expanding, we must make sure that no instruction can clobber the >> CC reg except the compares. So clobber_cc_p and check_clobber_cc are >> introduced to do the check. >> >> * If the final result is not used in a COND_EXPR (checked by function >> used_in_cond_stmt_p), it calls cstorecc4 pattern to store the CC to a >> general register. >> >> Bootstrap and no make check regression on X86-64. >> >> OK for trunk? >> >> Thanks! >> -Zhenqiang >> >> ChangeLog: >> 2014-06-23 Zhenqiang Chen >> >> * ccmp.c (ccmp_candidate_p, used_in_cond_stmt_p, check_clobber_cc, >> clobber_cc_p, expand_ccmp_next, expand_ccmp_expr_1, >> expand_ccmp_expr): >> New functions to expand ccmp. >> * ccmp.h (expand_ccmp_expr): New prototype. >> * expr.c: #include "ccmp.h" >> (expand_expr_real_1): Try to expand ccmp. >> >> diff --git a/gcc/ccmp.c b/gcc/ccmp.c >> index 665c2a5..97b3910 100644 >> --- a/gcc/ccmp.c >> +++ b/gcc/ccmp.c >> @@ -47,6 +47,262 @@ along with GCC; see the file COPYING3. If not see >> #include "expmed.h" >> #include "ccmp.h" >> >> +/* The following functions expand conditional compare (CCMP) instructions. >> + Here is a short description about the over all algorithm: >> + * ccmp_candidate_p is used to identify the CCMP candidate >> + >> + * expand_ccmp_expr is the main entry, which calls expand_ccmp_expr_1 >> + to expand CCMP. >> + >> + * expand_ccmp_expr_1 uses a recursive algorithm to expand CCMP. >> + It calls two target hooks gen_ccmp_first and gen_ccmp_next to >> generate >> + CCMP instructions. >> +- gen_ccmp_first expands the first compare in CCMP. >> +- gen_ccmp_next expands the following compares. >> + >> + During expanding, we must make sure that no instruction can clobber >> the >> + CC reg except the compares. So clobber_cc_p and check_clobber_cc are >> + introduced to do the check. >> + >> + * If the final result is not used in a COND_EXPR (checked by function >> + used_in_cond_stmt_p), it calls cstorecc4 pattern to store the CC to a >> + general register. */ >> + >> +/* Check whether G is a potential conditional compare candidate. */ >> +static bool >> +ccmp_candidate_p (gimple g) >> +{ >> + tree rhs = gimple_assign_rhs_to_tree (g); >> + tree lhs, op0, op1; >> + gimple gs0, gs1; >> + enum tree_code tcode, tcode0, tcode1; >> + tcode = TREE_CODE (rhs); >> + >> + if (tcode != BIT_AND_EXPR && tcode != BIT_IOR_EXPR) >> +return false; >> + >> + lhs = gimple_assign_lhs (g); >> + op0 = TREE_OPERAND (rhs, 0); >> + op1 = TREE_OPERAND (rhs, 1); >> + >> + if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME) >> + || !has_single_use (lhs)) >> +return false; >> + >> + gs0 = get_gimple_for_ssa_name (op0); >> + gs1 = get_gimple_for_ssa_name (op1); >> + if (!gs0 || !gs1 || !is_gimple_assign (gs0) || !is_gimple_assign (gs1) >> + /* g, gs0 and gs1 must be in the same basic block, since current stage >> +is out-of-ssa. We can not guarantee the correctness when forwording >> +the gs0 and gs1 into g whithout DATAFLOW analysis. */ >> + || gimple_bb (gs0) != gimple_bb (gs1) >> + || gimple_bb (gs0) != gimple_bb (g)) >> +return false; >> + >> + if (!(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0))) >> + || POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0 >> + || !(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1))) >> + || POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1) >> +return false; >> + >> + tcode0 = gimple_assign_rhs_code (gs0); >> + tcode1 = gimple_assign_rhs_code (gs1); >> + if (TREE_CODE_CLASS (tcode0) == tcc_comparison >> + && TR
[Committed]: [PATCH, loop2_invariant, 2/2] Change heuristics for identical invariants
On 2 July 2014 03:54, Jeff Law wrote: > On 07/01/14 01:16, Zhenqiang Chen wrote: > >> >> ChangeLog: >> 2014-07-01 Zhenqiang Chen >> >> * loop-invariant.c (struct invariant): Add a new member: eqno; >> (find_identical_invariants): Update eqno; >> (create_new_invariant): Init eqno; >> (get_inv_cost): Compute comp_cost wiht eqno; > > s/wiht/with/ > > Do you have a testcase for this? If at all possible I'd like to see some > kind of test to verify that this tracking results in better invariant > selection. Since the patch bases on the result of rtx_cost, it is hard to design a common test case. A testcase for ARM is added. > With some kind of test and the ChangeLog typo fixed, this is OK. Thanks. Updated and committed @r212256. ChangeLog: 2014-07-03 Zhenqiang Chen * loop-invariant.c (struct invariant): Add a new member: eqno; (find_identical_invariants): Update eqno; (create_new_invariant): Init eqno; (get_inv_cost): Compute comp_cost with eqno; testsuite/ChangeLog: 2014-07-03 Zhenqiang Chen * gcc.target/arm/identical-invariants.c: New test. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index d47d461..bd67eb9 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -104,6 +104,9 @@ struct invariant /* The number of the invariant with the same value. */ unsigned eqto; + /* The number of invariants which eqto this. */ + unsigned eqno; + /* If we moved the invariant out of the loop, the register that contains its value. */ rtx reg; @@ -498,6 +501,7 @@ find_identical_invariants (invariant_htab_type *eq, struct invariant *inv) struct invariant *dep; rtx expr, set; enum machine_mode mode; + struct invariant *tmp; if (inv->eqto != ~0u) return; @@ -513,7 +517,12 @@ find_identical_invariants (invariant_htab_type *eq, struct invariant *inv) mode = GET_MODE (expr); if (mode == VOIDmode) mode = GET_MODE (SET_DEST (set)); - inv->eqto = find_or_insert_inv (eq, expr, mode, inv)->invno; + + tmp = find_or_insert_inv (eq, expr, mode, inv); + inv->eqto = tmp->invno; + + if (tmp->invno != inv->invno && inv->always_executed) +tmp->eqno++; if (dump_file && inv->eqto != inv->invno) fprintf (dump_file, @@ -1136,7 +1149,7 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed, if (!inv->cheap_address || inv->def->n_addr_uses < inv->def->n_uses) -(*comp_cost) += inv->cost; +(*comp_cost) += inv->cost * inv->eqno; #ifdef STACK_REGS { diff --git a/gcc/testsuite/gcc.target/arm/identical-invariants.c b/gcc/testsuite/gcc.target/arm/identical-invariants.c new file mode 100644 index 000..7762ce6 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/identical-invariants.c @@ -0,0 +1,29 @@ +/* { dg-do compile { target { arm_thumb2_ok } } } */ +/* { dg-options "-O2 -fdump-rtl-loop2_invariant " } */ + +int t1, t2, t3, t4, t5, t6, t7, t8, t9, t10; +extern void foo2 (int *, int *, int *, int *, int *, int *); +extern int foo3 (int, int, int, int, int, int); +int foo (int a, int b, int c, int d) +{ + int i = a; + + for (; i > 0; i += b) +{ + if (a > 0x1234567) + foo2 (&t1, &t2, &t3, &t4, &t5, &t6); + foo2 (&t1, &t2, &t3, &t4, &t5, &t6); + if (b > 0x1234567) + foo2 (&t7, &t2, &t8, &t4, &t5, &t6); + foo2 (&t1, &t2, &t3, &t4, &t5, &t6); + if (c > 0x1234567) + foo2 (&t1, &t9, &t10, &t4, &t5, &t6); + t2 = t5 - d; +} + + return foo3 (t1, t2, t3, t4, t5, t6); +} + +/* { dg-final { scan-rtl-dump "Decided to move invariant 0" "loop2_invariant" } } */ +/* { dg-final { cleanup-rtl-dump "loop2_invariant" } } */ +
Re: [PATCH, 10/10] aarch64: Handle ccmp in ifcvt to make it work with cmov
Andrew, Thanks for the input. Patch is updated with your codes. In addition, add "cstorecc4" and 4 test cases. So it can optimized the following example int foo (int a, int b) { return a > 0 && b < 7; } to: cmpw0, wzr ccmpw1, #6, #9, gt csetw0, le Bootstrap and no make check regression on qemu. OK for trunk? Thanks! -Zhenqiang 2014-07-04 Zhenqiang Chen Andrew Pinski * config/aarch64/aarch64.md (cstorecc4): New. (movcc): Handle ccmp_cc. * ccmp.c (used_in_cond_stmt_p): Hande ? expr. (expand_ccmp_expr): Set mode to the MODE of gimple_assign_lhs (g). * expr.c (expand_cond_expr_using_cmove): Handle CCmode. * ifcvt.c: #include "ccmp.h". (struct noce_if_info): Add a new field ccmp_p. (noce_emit_cmove): Allow ccmp condition. (noce_get_alt_condition): Call canonicalize_condition with ccmp_p. (noce_get_condition): Set ALLOW_CC_MODE to TRUE for ccmp. (noce_process_if_block): Set ccmp_p for ccmp. testsuite/ChangeLog: 2014-07-04 Zhenqiang Chen * gcc.target/aarch64/ccmn-csel-1.c: New testcase. * gcc.target/aarch64/ccmn-csel-2.c: New testcase. * gcc.target/aarch64/ccmn-csel-3.c: New testcase. * gcc.target/aarch64/ccmp-csel-1.c: New testcase. * gcc.target/aarch64/ccmp-csel-2.c: New testcase. * gcc.target/aarch64/ccmp-csel-3.c: New testcase. * gcc.target/aarch64/ccmp-csel-4.c: New testcase. * gcc.target/aarch64/ccmp-csel-5.c: New testcase. * gcc.target/aarch64/ccmp-cset-1.c: New testcase. * gcc.target/aarch64/ccmp-cset-2.c: New testcase. diff --git a/gcc/ccmp.c b/gcc/ccmp.c index e0670f1..6382974 100644 --- a/gcc/ccmp.c +++ b/gcc/ccmp.c @@ -25,6 +25,7 @@ along with GCC; see the file COPYING3. If not see #include "tm_p.h" #include "tree.h" #include "stringpool.h" +#include "stor-layout.h" #include "regs.h" #include "expr.h" #include "optabs.h" @@ -133,6 +134,13 @@ used_in_cond_stmt_p (tree exp) expand_cond = true; BREAK_FROM_IMM_USE_STMT (ui); } +else if (gimple_code (use_stmt) == GIMPLE_ASSIGN +&& gimple_expr_code (use_stmt) == COND_EXPR) + { + if (gimple_assign_rhs1 (use_stmt) == exp) + expand_cond = true; + } + return expand_cond; } @@ -274,9 +282,11 @@ expand_ccmp_expr (gimple g) icode = optab_handler (cstore_optab, cc_mode); if (icode != CODE_FOR_nothing) { - rtx target = gen_reg_rtx (word_mode); - tmp = emit_cstore (target, icode, NE, CCmode, CCmode, -0, tmp, const0_rtx, 1, word_mode); + tree lhs = gimple_assign_lhs(g); + enum machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); + rtx target = gen_reg_rtx (mode); + tmp = emit_cstore (target, icode, NE, cc_mode, cc_mode, +0, tmp, const0_rtx, 1, mode); if (tmp) return tmp; } diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c25d940..88733f4 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2337,6 +2337,19 @@ " ) +(define_expand "cstorecc4" + [(set (match_operand:SI 0 "register_operand") + (match_operator 1 "aarch64_comparison_operator" +[(match_operand 2 "ccmp_cc_register") + (match_operand 3 "aarch64_plus_operand")]))] + "" +"{ + operands[3] = const0_rtx; + emit_insn (gen_rtx_SET (SImode, operands[0], operands[1])); + DONE; +}") + + (define_expand "cstore4" [(set (match_operand:SI 0 "register_operand" "") (match_operator:SI 1 "aarch64_comparison_operator" @@ -2485,15 +2498,19 @@ (match_operand:ALLI 3 "register_operand" "")))] "" { -rtx ccreg; enum rtx_code code = GET_CODE (operands[1]); if (code == UNEQ || code == LTGT) FAIL; -ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0), - XEXP (operands[1], 1)); -operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx); +if (!ccmp_cc_register (XEXP (operands[1], 0), + GET_MODE (XEXP (operands[1], 0 + { + rtx ccreg; + ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0), +XEXP (operands[1], 1)); + operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx); + } } ) diff --git a/gcc/expr.c b/gcc/expr.c index 4c31521..88928aa 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -7986,7 +7986,9 @@ expand_cond_expr_using_cmove (tree treeop0 ATTRIBUTE_UNUSED, op00 = expand_normal (treeop0); op0
[Ping ^ 4] [PATCH] Fix PR 61225
Ping? Thanks! -Zhenqiang On 17 June 2014 12:53, Zhenqiang Chen wrote: > Ping? > > Thanks! > -Zhenqiang > > On 9 June 2014 17:08, Zhenqiang Chen wrote: >> Ping ^2? >> >> Thanks! >> -Zhenqiang >> >> On 28 May 2014 15:02, Zhenqiang Chen wrote: >>> Ping? >>> >>> Thanks! >>> -Zhenqiang >>> >>> On 22 May 2014 17:52, Zhenqiang Chen wrote: >>>> On 21 May 2014 20:43, Steven Bosscher wrote: >>>>> On Wed, May 21, 2014 at 11:58 AM, Zhenqiang Chen wrote: >>>>>> Hi, >>>>>> >>>>>> The patch fixes the gcc.target/i386/pr49095.c FAIL in PR61225. The >>>>>> test case tends to check a peephole2 optimization, which optimizes the >>>>>> following sequence >>>>>> >>>>>> 2: bx:SI=ax:SI >>>>>> 25: ax:SI=[bx:SI] >>>>>> 7: {ax:SI=ax:SI-0x1;clobber flags:CC;} >>>>>> 8: [bx:SI]=ax:SI >>>>>> 9: flags:CCZ=cmp(ax:SI,0) >>>>>> to >>>>>>2: bx:SI=ax:SI >>>>>>41: {flags:CCZ=cmp([bx:SI]-0x1,0);[bx:SI]=[bx:SI]-0x1;} >>>>>> >>>>>> The enhanced shrink-wrapping, which calls copyprop_hardreg_forward >>>>>> changes the INSN 25 to >>>>>> >>>>>> 25: ax:SI=[ax:SI] >>>>>> >>>>>> Then peephole2 can not optimize it since two memory_operands look like >>>>>> different. >>>>>> >>>>>> To fix it, the patch adds another peephole2 rule to read one more >>>>>> insn. From the register copy, it knows the address is the same. >>>>> >>>>> That is one complex peephole2 to deal with a transformation like this. >>>>> It seems to be like it's a too specific solution for a bigger problem. >>>>> >>>>> Could you please try one of the following solutions instead: >>>>> >>>>> 1. Track register values for peephole2 and try different alternatives >>>>> based on known register equivalences? E.g. in your example, perhaps >>>>> there is already a REG_EQUAL/REG_EQUIV note available on insn 25 after >>>>> copyprop_hardreg_forward, to annotate that [ax:SI] is equivalent to >>>>> [bx:SI] at that point (or if that information is not available, it is >>>>> not very difficult to make it available). Then you could try applying >>>>> peephole2 on the original pattern but also on patterns modified with >>>>> the known equivalences (i.e. try peephole2 on multiple equivalent >>>>> patterns for the same insn). This may expose other peephole2 >>>>> opportunities, not just the specific one your patch addresses. >>>> >>>> Patch is updated according to the comment. There is no REG_EQUAL. So I >>>> add it when replace_oldest_value_reg. >>>> >>>> ChangeLog: >>>> 2014-05-22 Zhenqiang Chen >>>> >>>> Part of PR rtl-optimization/61225 >>>> * config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p): New >>>> proto. >>>> * config/i386/i386.c (ix86_peephole2_rtx_equal_p): New function. >>>> * regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note when >>>> propagating to SET. >>>> >>>> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h >>>> index 39462bd..0c4a2b9 100644 >>>> --- a/gcc/config/i386/i386-protos.h >>>> +++ b/gcc/config/i386/i386-protos.h >>>> @@ -42,6 +42,7 @@ extern enum calling_abi ix86_function_type_abi >>>> (const_tree); >>>> >>>> extern void ix86_reset_previous_fndecl (void); >>>> >>>> +extern bool ix86_peephole2_rtx_equal_p (rtx, rtx, rtx, rtx); >>>> #ifdef RTX_CODE >>>> extern int standard_80387_constant_p (rtx); >>>> extern const char *standard_80387_constant_opcode (rtx); >>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >>>> index 6ffb788..583ebe8 100644 >>>> --- a/gcc/config/i386/i386.c >>>> +++ b/gcc/config/i386/i386.c >>>> @@ -46856,6 +46856,29 @@ ix86_atomic_assign_expand_fenv (tree *hold, >>>> tree *clear, tree *update) >>>> atomic_feraiseexcept_call); >>>> } >>>> >>>> +/* OP0 is
[PATCH, NLS] try to search relative dir
Hi, LOCALEDIR seams a build-time constant, which is an absolute path. But * When releasing gcc binaries as a tarball, users can put it at any dir. * For "Canadian" build, the dir layout on the build machine might be different from the host machine. For such cases, gcc most likely can not find the gcc.mo. The patch tries to search relative dir "../share/locale" from gcc. Although it can not cover all cases, I think it can cover most cases. Bootstrap and no make check regression on X86-64. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-07-07 Zhenqiang Chen * gcc.c (main): Call gcc_init_libintl_program. * intl.c (gcc_init_libintl): Likewise. (gcc_init_libintl_program): New function. * intl.h (gcc_init_libintl_program): New prototype. diff --git a/gcc/gcc.c b/gcc/gcc.c index 6cd08ea..0addc0c5 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -6409,7 +6409,7 @@ main (int argc, char **argv) /* Unlock the stdio streams. */ unlock_std_streams (); - gcc_init_libintl (); + gcc_init_libintl_program (argv[0]); diagnostic_initialize (global_dc, 0); diff --git a/gcc/intl.c b/gcc/intl.c index 2509c1a..6d1a45c 100644 --- a/gcc/intl.c +++ b/gcc/intl.c @@ -48,6 +48,14 @@ bool locale_utf8 = false; void gcc_init_libintl (void) { + gcc_init_libintl_program ("gcc"); +} + +#define LIBINTL_RELATIVE_DIR "../share/locale" + +void +gcc_init_libintl_program (const char * program_name) +{ #ifdef HAVE_LC_MESSAGES setlocale (LC_CTYPE, ""); setlocale (LC_MESSAGES, ""); @@ -55,7 +63,32 @@ gcc_init_libintl (void) setlocale (LC_ALL, ""); #endif - (void) bindtextdomain ("gcc", LOCALEDIR); + if (!access (LOCALEDIR, X_OK)) +{ + /* If LOCALEDIR exists, use LOCALEDIR. */ + (void) bindtextdomain ("gcc", LOCALEDIR); +} + else +{ + /* Try relative dir, i.e. .../bin/../share/locale. */ + int len1, len2; + char *prefix_dir, *locale_dir; + prefix_dir = make_relative_prefix (program_name, ".", "."); + len1 = strlen (prefix_dir); + len2 = strlen (LIBINTL_RELATIVE_DIR); + locale_dir = (char*) xmalloc (len1 + len2 + 1); + if (locale_dir != NULL) + { + strcpy (locale_dir, prefix_dir); + strcpy (locale_dir + len1, LIBINTL_RELATIVE_DIR); + (void) bindtextdomain ("gcc", locale_dir); + } + else + (void) bindtextdomain ("gcc", LOCALEDIR); + + free (prefix_dir); +} + (void) textdomain ("gcc"); /* Opening quotation mark. */ diff --git a/gcc/intl.h b/gcc/intl.h index 91e7440..84b9d58 100644 --- a/gcc/intl.h +++ b/gcc/intl.h @@ -30,6 +30,7 @@ #include extern void gcc_init_libintl (void); extern size_t gcc_gettext_width (const char *); +extern void gcc_init_libintl_program (const char *); #else /* Stubs. */ # undef textdomain @@ -41,6 +42,7 @@ extern size_t gcc_gettext_width (const char *); # define ngettext(singular,plural,n) fake_ngettext (singular, plural, n) # define gcc_init_libintl()/* nothing */ # define gcc_gettext_width(s) strlen (s) +# define gcc_init_libintl_program (s) /* Nothing. */ extern const char *fake_ngettext (const char *singular, const char *plural, unsigned long int n);