[PATCH] Optimize __sync_fetch_and_add (x, -N) == N and __sync_add_and_fetch (x, N) == 0 (PR target/48986)
Hi! This patch optimizes using peephole2 __sync_fetch_and_add (x, -N) == N and __sync_add_and_fetch (x, N) == 0 by just doing lock {add,sub,inc,dec} and testing flags, instead of lock xadd plus comparison. The sync_old_add predicate change makes it possible to optimize __sync_add_and_fetch with constant second argument to same code as __sync_fetch_and_add. Doing it in peephole2 has a disadvantage though, both that the 3 instructions need to be consecutive and e.g. that xadd insn has to be supported by the CPU. Other alternative would be to come up with a new bool builtin that would represent the whole __sync_fetch_and_add (x, -N) == N operation (perhaps with dot or space in its name to make it inaccessible), try to match it during some folding and expand it using special optab. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk this way? 2011-05-16 Jakub Jelinek PR target/48986 * config/i386/sync.md (sync_old_add): Relax operand 2 predicate to allow CONST_INT. (*sync_old_add_cmp): New insn and peephole2 for it. --- gcc/config/i386/sync.md.jj 2010-05-21 11:46:29.0 +0200 +++ gcc/config/i386/sync.md 2011-05-16 14:42:08.0 +0200 @@ -170,11 +170,62 @@ (define_insn "sync_old_add" [(match_operand:SWI 1 "memory_operand" "+m")] UNSPECV_XCHG)) (set (match_dup 1) (plus:SWI (match_dup 1) - (match_operand:SWI 2 "register_operand" "0"))) + (match_operand:SWI 2 "nonmemory_operand" "0"))) (clobber (reg:CC FLAGS_REG))] "TARGET_XADD" "lock{%;} xadd{}\t{%0, %1|%1, %0}") +(define_peephole2 + [(set (match_operand:SWI 0 "register_operand" "") + (match_operand:SWI 2 "const_int_operand" "")) + (parallel [(set (match_dup 0) + (unspec_volatile:SWI +[(match_operand:SWI 1 "memory_operand" "")] UNSPECV_XCHG)) + (set (match_dup 1) + (plus:SWI (match_dup 1) +(match_dup 0))) + (clobber (reg:CC FLAGS_REG))]) + (set (reg:CCZ FLAGS_REG) + (compare:CCZ (match_dup 0) +(match_operand:SWI 3 "const_int_operand" "")))] + "peep2_reg_dead_p (3, operands[0]) + && (unsigned HOST_WIDE_INT) INTVAL (operands[2]) + == -(unsigned HOST_WIDE_INT) INTVAL (operands[3]) + && !reg_overlap_mentioned_p (operands[0], operands[1])" + [(parallel [(set (reg:CCZ FLAGS_REG) + (compare:CCZ (unspec_volatile:SWI [(match_dup 1)] +UNSPECV_XCHG) + (match_dup 3))) + (set (match_dup 1) + (plus:SWI (match_dup 1) +(match_dup 2)))])]) + +(define_insn "*sync_old_add_cmp" + [(set (reg:CCZ FLAGS_REG) + (compare:CCZ (unspec_volatile:SWI + [(match_operand:SWI 0 "memory_operand" "+m")] + UNSPECV_XCHG) +(match_operand:SWI 2 "const_int_operand" "i"))) + (set (match_dup 0) + (plus:SWI (match_dup 0) + (match_operand:SWI 1 "const_int_operand" "i")))] + "(unsigned HOST_WIDE_INT) INTVAL (operands[1]) + == -(unsigned HOST_WIDE_INT) INTVAL (operands[2])" +{ + if (TARGET_USE_INCDEC) +{ + if (operands[1] == const1_rtx) + return "lock{%;} inc{}\t%0"; + if (operands[1] == constm1_rtx) + return "lock{%;} dec{}\t%0"; +} + + if (x86_maybe_negate_const_int (&operands[1], mode)) +return "lock{%;} sub{}\t{%1, %0|%0, %1}"; + + return "lock{%;} add{}\t{%1, %0|%0, %1}"; +}) + ;; Recall that xchg implicitly sets LOCK#, so adding it again wastes space. (define_insn "sync_lock_test_and_set" [(set (match_operand:SWI 0 "register_operand" "=") Jakub
Re: [patch, ARM] Fix PR42017, LR not used in leaf functions
On 2011/5/13 04:26 PM, Richard Sandiford wrote: > Richard Sandiford writes: >> Chung-Lin Tang writes: >>> My fix here simply adds 'reload_completed' as an additional condition >>> for EPILOGUE_USES to return true for LR_REGNUM. I think this should be >>> valid, as correct LR save/restoring is handled by the epilogue/prologue >>> code; it should be safe for IRA to treat it as a normal call-used register. >> >> FWIW, epilogue_completed might be a more accurate choice. > > I still stand by this, although I realise no other target does it. Did a re-test of the patch just to be sure, as expected the test results were also clear. Attached is the updated patch. >> It seems a lot of other ports suffer from the same problem though. >> I wonder which targets really do want to make a register live throughout >> the function? If none do, perhaps we should say that this macro is >> only meaningful once the epilogue has been generated. > > To answer my own question, I suppose VRSAVE is one. So I was wrong > about the target-independent "fix". > > Richard To rehash what I remember we discussed at LDS, such registers like VRSAVE might be more appropriately placed in global regs. It looks like EPILOGUE_USES could be more clarified in its use... To Richard Earnshaw and Ramana, is the patch okay for trunk? This should be a not-so-insignificant performance regression-fix/improvement. Thanks, Chung-Lin Index: config/arm/arm.h === --- config/arm/arm.h(revision 173814) +++ config/arm/arm.h(working copy) @@ -1627,7 +1627,7 @@ frame. */ #define EXIT_IGNORE_STACK 1 -#define EPILOGUE_USES(REGNO) ((REGNO) == LR_REGNUM) +#define EPILOGUE_USES(REGNO) (epilogue_completed && (REGNO) == LR_REGNUM) /* Determine if the epilogue should be output as RTL. You should override this if you define FUNCTION_EXTRA_EPILOGUE. */
[PATCH, PR45098]
Hi Zdenek, I have a patch set for for PR45098. 01_object-size-target.patch 02_pr45098-rtx-cost-set.patch 03_pr45098-computation-cost.patch 04_pr45098-iv-init-cost.patch 05_pr45098-bound-cost.patch 06_pr45098-bound-cost.test.patch 07_pr45098-nowrap-limits-iterations.patch 08_pr45098-nowrap-limits-iterations.test.patch 09_pr45098-shift-add-cost.patch 10_pr45098-shift-add-cost.test.patch I will sent out the patches individually. The patch set has been bootstrapped and reg-tested on x86_64, and reg-tested on ARM. The effect of the patch set on examples is the removal of 1 iterator, demonstrated below for '-Os -mthumb -march=armv7-a' on example tr4. tr4.c: ... extern void foo2 (short*); void tr4 (short array[], int n) { int i; if (n > 0) for (i = 0; i < n; i++) foo2 (&array[i]); } ... tr4.s diff (left without, right with patch): ... push{r4, r5, r6, lr} | cmp r1, #0 subsr6, r1, #0| push{r3, r4, r5, lr} ble .L1 ble .L1 mov r5, r0| mov r4, r0 movsr4, #0| add r5, r0, r1, lsl #1 .L3:.L3: mov r0, r5| mov r0, r4 addsr4, r4, #1| addsr4, r4, #2 bl foo2bl foo2 addsr5, r5, #2| cmp r4, r5 cmp r4, r6< bne .L3 bne .L3 .L1:.L1: pop {r4, r5, r6, pc} | pop {r3, r4, r5, pc} ... The effect of the patch set on the test cases in terms of size is listed in the following 2 tables. --- -Os -thumb -mmarch=armv7-a --- withoutwith delta --- tr1 32 30 -2 tr2 36 36 0 tr3 32 30 -2 tr4 26 26 0 tr5 20 20 0 --- --- -Os -mmarch=armv7-a --- withoutwith delta --- tr1 60 52 -8 tr2 64 60 -4 tr3 60 52 -8 tr4 48 44 -4 tr5 36 32 -4 --- The size impact on several benchmarks is shown in the following table (%, lower is better). nonepic thumb1 thumb2 thumb1 thumb2 spec2000 99.999.999.9 99.9 eembc 99.9 100.099.9 100.1 dhrystone100.0 100.0 100.0 100.0 coremark 99.399.999.3 100.0 Thanks, - Tom
[PATCH, PR45098, 1/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: > Hi Zdenek, > > I have a patch set for for PR45098. > > 01_object-size-target.patch > 02_pr45098-rtx-cost-set.patch > 03_pr45098-computation-cost.patch > 04_pr45098-iv-init-cost.patch > 05_pr45098-bound-cost.patch > 06_pr45098-bound-cost.test.patch > 07_pr45098-nowrap-limits-iterations.patch > 08_pr45098-nowrap-limits-iterations.test.patch > 09_pr45098-shift-add-cost.patch > 10_pr45098-shift-add-cost.test.patch > > I will sent out the patches individually. OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries * lib/lib/scanasm.exp (object-size): Fix target selector handling. Index: gcc/testsuite/lib/scanasm.exp === --- gcc/testsuite/lib/scanasm.exp (revision 173734) +++ gcc/testsuite/lib/scanasm.exp (working copy) @@ -330,7 +330,7 @@ proc object-size { args } { return } if { [llength $args] >= 4 } { - switch [dg-process-target [lindex $args 1]] { + switch [dg-process-target [lindex $args 3]] { "S" { } "N" { return } "F" { setup_xfail "*-*-*" }
Re: [PATCH] Misc debug info improvements
On Mon, May 16, 2011 at 02:13:22PM +0200, Jakub Jelinek wrote: > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Actually, I've missed one regression in libjava testing on x86_64-linux, FAIL: pr26390 -O3 compilation from source The problem is when cselib_subst_to_values substs ENTRY_VALUE for corresponding VALUE, we should treat ENTRY_VALUE like REG in replace_expr_with_values, otherwise we can end up with var-tracking noting equivalence between a VALUE and the same VALUE, which causes ICE in set_slot_part. Updated patch below, bootstrapped/regtested on x86_64-linux and i686-linux. Ok? 2011-05-17 Jakub Jelinek * cselib.c (promote_debug_loc): Allow l->next non-NULL for cselib_preserve_constants. (cselib_lookup_1): If cselib_preserve_constants, a new VALUE is being created for REG and there is a VALUE for the same register in wider mode, add another loc with lowpart SUBREG of the wider VALUE. (cselib_subst_to_values): Handle ENTRY_VALUE. * var-tracking.c (vt_expand_loc, vt_expand_loc_dummy): Increase max_depth from 8 to 10. (replace_expr_with_values): Return NULL for ENTRY_VALUE too. * dwarf2out.c (convert_descriptor_to_signed): New function. (mem_loc_descriptor) : Optimize using DW_OP_and instead of two shifts. (mem_loc_descriptor) : ZERO_EXTEND second argument to the right mode if needed. (mem_loc_descriptor) : For typed ops just use DW_OP_mod. (mem_loc_descriptor) : Use convert_descriptor_to_signed. (mem_loc_descriptor) : Handle these rtls. * gcc.dg/guality/bswaptest.c: New test. * gcc.dg/guality/clztest.c: New test. * gcc.dg/guality/ctztest.c: New test. * gcc.dg/guality/rotatetest.c: New test. --- gcc/cselib.c.jj 2011-05-02 18:39:28.0 +0200 +++ gcc/cselib.c2011-05-13 17:55:24.0 +0200 @@ -257,7 +257,7 @@ promote_debug_loc (struct elt_loc_list * { n_debug_values--; l->setting_insn = cselib_current_insn; - gcc_assert (!l->next); + gcc_assert (!l->next || cselib_preserve_constants); } } @@ -1719,6 +1719,12 @@ cselib_subst_to_values (rtx x, enum mach } return e->val_rtx; +case ENTRY_VALUE: + e = cselib_lookup (x, GET_MODE (x), 0, memmode); + if (! e) + break; + return e->val_rtx; + case CONST_DOUBLE: case CONST_VECTOR: case CONST_INT: @@ -1843,6 +1849,43 @@ cselib_lookup_1 (rtx x, enum machine_mod used_regs[n_used_regs++] = i; REG_VALUES (i) = new_elt_list (REG_VALUES (i), NULL); } + else if (cselib_preserve_constants + && GET_MODE_CLASS (mode) == MODE_INT) + { + /* During var-tracking, try harder to find equivalences +for SUBREGs. If a setter sets say a DImode register +and user uses that register only in SImode, add a lowpart +subreg location. */ + struct elt_list *lwider = NULL; + l = REG_VALUES (i); + if (l && l->elt == NULL) + l = l->next; + for (; l; l = l->next) + if (GET_MODE_CLASS (GET_MODE (l->elt->val_rtx)) == MODE_INT + && GET_MODE_SIZE (GET_MODE (l->elt->val_rtx)) + > GET_MODE_SIZE (mode) + && (lwider == NULL + || GET_MODE_SIZE (GET_MODE (l->elt->val_rtx)) + < GET_MODE_SIZE (GET_MODE (lwider->elt->val_rtx + { + struct elt_loc_list *el; + if (i < FIRST_PSEUDO_REGISTER + && hard_regno_nregs[i][GET_MODE (l->elt->val_rtx)] != 1) + continue; + for (el = l->elt->locs; el; el = el->next) + if (!REG_P (el->loc)) + break; + if (el) + lwider = l; + } + if (lwider) + { + rtx sub = lowpart_subreg (mode, lwider->elt->val_rtx, + GET_MODE (lwider->elt->val_rtx)); + if (sub) + e->locs->next = new_elt_loc_list (e->locs->next, sub); + } + } REG_VALUES (i)->next = new_elt_list (REG_VALUES (i)->next, e); slot = cselib_find_slot (x, e->hash, INSERT, memmode); *slot = e; --- gcc/var-tracking.c.jj 2011-05-11 19:51:48.0 +0200 +++ gcc/var-tracking.c 2011-05-13 10:52:25.0 +0200 @@ -4836,7 +4836,7 @@ get_address_mode (rtx mem) static rtx replace_expr_with_values (rtx loc) { - if (REG_P (loc)) + if (REG_P (loc) || GET_CODE (loc) == ENTRY_VALUE) return NULL; else if (MEM_P (loc)) { @@ -7415,7 +7415,7 @@ vt_expand_loc (rtx loc, htab_t vars, boo data.dummy = false; data.cur_loc_changed = false; data.ignore_cur_loc = ignore_cur_loc; - loc = cselib_expand_value_rtx_cb (loc, scratch_regs, 8, + loc = cselib_expand_value_rtx_cb
[PATCH, PR45098, 2/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: > Hi Zdenek, > > I have a patch set for for PR45098. > > 01_object-size-target.patch > 02_pr45098-rtx-cost-set.patch > 03_pr45098-computation-cost.patch > 04_pr45098-iv-init-cost.patch > 05_pr45098-bound-cost.patch > 06_pr45098-bound-cost.test.patch > 07_pr45098-nowrap-limits-iterations.patch > 08_pr45098-nowrap-limits-iterations.test.patch > 09_pr45098-shift-add-cost.patch > 10_pr45098-shift-add-cost.test.patch > OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries PR target/45098 * tree-ssa-loop-ivopts.c (seq_cost): Fix call to rtx_cost. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 173380) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -2745,7 +2745,7 @@ seq_cost (rtx seq, bool speed) { set = single_set (seq); if (set) - cost += rtx_cost (set, SET,speed); + cost += rtx_cost (SET_SRC (set), SET, speed); else cost++; }
[PATCH, PR45098, 3/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: > Hi Zdenek, > > I have a patch set for for PR45098. > > 01_object-size-target.patch > 02_pr45098-rtx-cost-set.patch > 03_pr45098-computation-cost.patch > 04_pr45098-iv-init-cost.patch > 05_pr45098-bound-cost.patch > 06_pr45098-bound-cost.test.patch > 07_pr45098-nowrap-limits-iterations.patch > 08_pr45098-nowrap-limits-iterations.test.patch > 09_pr45098-shift-add-cost.patch > 10_pr45098-shift-add-cost.test.patch > > I will sent out the patches individually. > OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries PR target/45098 * tree-ssa-loop-ivopts.c (computation_cost): Prevent cost of 0. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 173380) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -2862,7 +2862,9 @@ computation_cost (tree expr, bool speed) default_rtl_profile (); node->frequency = real_frequency; - cost = seq_cost (seq, speed); + cost = (seq != NULL_RTX + ? seq_cost (seq, speed) + : (unsigned)rtx_cost (rslt, SET, speed)); if (MEM_P (rslt)) cost += address_cost (XEXP (rslt, 0), TYPE_MODE (type), TYPE_ADDR_SPACE (type), speed);
[PATCH, PR45098, 4/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: > Hi Zdenek, > > I have a patch set for for PR45098. > > 01_object-size-target.patch > 02_pr45098-rtx-cost-set.patch > 03_pr45098-computation-cost.patch > 04_pr45098-iv-init-cost.patch > 05_pr45098-bound-cost.patch > 06_pr45098-bound-cost.test.patch > 07_pr45098-nowrap-limits-iterations.patch > 08_pr45098-nowrap-limits-iterations.test.patch > 09_pr45098-shift-add-cost.patch > 10_pr45098-shift-add-cost.test.patch > > I will sent out the patches individually. > OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries * tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent cost_base.cost == 0. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 173380) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d base = cand->iv->base; cost_base = force_var_cost (data, base, NULL); + if (cost_base.cost == 0) + cost_base.cost = COSTS_N_INSNS (1); cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed); cost = cost_step + adjust_setup_cost (data, cost_base.cost);
[PATCH, PR45098, 5/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: > Hi Zdenek, > > I have a patch set for for PR45098. > > 01_object-size-target.patch > 02_pr45098-rtx-cost-set.patch > 03_pr45098-computation-cost.patch > 04_pr45098-iv-init-cost.patch > 05_pr45098-bound-cost.patch > 06_pr45098-bound-cost.test.patch > 07_pr45098-nowrap-limits-iterations.patch > 08_pr45098-nowrap-limits-iterations.test.patch > 09_pr45098-shift-add-cost.patch > 10_pr45098-shift-add-cost.test.patch > > I will sent out the patches individually. > OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries PR target/45098 * tree-ssa-loop-ivopts.c (get_expr_id): Factored new function out of get_loop_invariant_expr_id. (get_loop_invariant_expr_id): Use get_expr_id. (parm_decl_cost): New function. (determine_use_iv_cost_condition): Use get_expr_id and parm_decl_cost. Improve bound cost estimation. Use different inv_expr_id for elim and express cases. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 173380) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -3835,6 +3835,28 @@ compare_aff_trees (aff_tree *aff1, aff_t return true; } +/* Stores EXPR in DATA->inv_expr_tab, and assigns it an inv_expr_id. */ + +static int +get_expr_id (struct ivopts_data *data, tree expr) +{ + struct iv_inv_expr_ent ent; + struct iv_inv_expr_ent **slot; + + ent.expr = expr; + ent.hash = iterative_hash_expr (expr, 0); + slot = (struct iv_inv_expr_ent **) htab_find_slot (data->inv_expr_tab, + &ent, INSERT); + if (*slot) +return (*slot)->id; + + *slot = XNEW (struct iv_inv_expr_ent); + (*slot)->expr = expr; + (*slot)->hash = ent.hash; + (*slot)->id = data->inv_expr_id++; + return (*slot)->id; +} + /* Returns the pseudo expr id if expression UBASE - RATIO * CBASE requires a new compiler generated temporary. Returns -1 otherwise. ADDRESS_P is a flag indicating if the expression is for address @@ -3847,8 +3869,6 @@ get_loop_invariant_expr_id (struct ivopt { aff_tree ubase_aff, cbase_aff; tree expr, ub, cb; - struct iv_inv_expr_ent ent; - struct iv_inv_expr_ent **slot; STRIP_NOPS (ubase); STRIP_NOPS (cbase); @@ -3936,18 +3956,7 @@ get_loop_invariant_expr_id (struct ivopt aff_combination_scale (&cbase_aff, shwi_to_double_int (-1 * ratio)); aff_combination_add (&ubase_aff, &cbase_aff); expr = aff_combination_to_tree (&ubase_aff); - ent.expr = expr; - ent.hash = iterative_hash_expr (expr, 0); - slot = (struct iv_inv_expr_ent **) htab_find_slot (data->inv_expr_tab, - &ent, INSERT); - if (*slot) -return (*slot)->id; - - *slot = XNEW (struct iv_inv_expr_ent); - (*slot)->expr = expr; - (*slot)->hash = ent.hash; - (*slot)->id = data->inv_expr_id++; - return (*slot)->id; + return get_expr_id (data, expr); } @@ -4412,6 +4421,23 @@ may_eliminate_iv (struct ivopts_data *da return true; } + /* Calculates the cost of BOUND, if it is a PARM_DECL. A PARM_DECL must +be copied, if is is used in the loop body and DATA->body_includes_call. */ + +static int +parm_decl_cost (struct ivopts_data *data, tree bound) +{ + tree sbound = bound; + STRIP_NOPS (sbound); + + if (TREE_CODE (sbound) == SSA_NAME + && TREE_CODE (SSA_NAME_VAR (sbound)) == PARM_DECL + && gimple_nop_p (SSA_NAME_DEF_STMT (sbound)) + && data->body_includes_call) +return COSTS_N_INSNS (1); + + return 0; +} /* Determines cost of basing replacement of USE on CAND in a condition. */ @@ -4422,9 +4448,9 @@ determine_use_iv_cost_condition (struct tree bound = NULL_TREE; struct iv *cmp_iv; bitmap depends_on_elim = NULL, depends_on_express = NULL, depends_on; - comp_cost elim_cost, express_cost, cost; + comp_cost elim_cost, express_cost, cost, bound_cost; bool ok; - int inv_expr_id = -1; + int elim_inv_expr_id = -1, express_inv_expr_id = -1, inv_expr_id; tree *control_var, *bound_cst; /* Only consider real candidates. */ @@ -4438,6 +4464,21 @@ determine_use_iv_cost_condition (struct if (may_eliminate_iv (data, use, cand, &bound)) { elim_cost = force_var_cost (data, bound, &depends_on_elim); + if (elim_cost.cost == 0) +elim_cost.cost = parm_decl_cost (data, bound); + else if (TREE_CODE (bound) == INTEGER_CST) +elim_cost.cost = 0; + /* If we replace a loop condition 'i < n' with 'p < base + n', + depends_on_elim will have 'base' and 'n' set, which implies + that both 'base' and 'n' will be live during the loop. More likely, + 'base + n' will be loop invariant, resulting in only one live value + during the loop. So in that case we clear depends_on_elim and set +elim_inv_expr_id instead. */ + if (depends_on_elim && bitmap_count_bits (depends_on_elim) > 1) + { + elim_inv_expr_id = get_expr_id (data, bound); + bitmap_clear (depends_on_elim)
[PATCH, PR45098, 6/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: > Hi Zdenek, > > I have a patch set for for PR45098. > > 01_object-size-target.patch > 02_pr45098-rtx-cost-set.patch > 03_pr45098-computation-cost.patch > 04_pr45098-iv-init-cost.patch > 05_pr45098-bound-cost.patch > 06_pr45098-bound-cost.test.patch > 07_pr45098-nowrap-limits-iterations.patch > 08_pr45098-nowrap-limits-iterations.test.patch > 09_pr45098-shift-add-cost.patch > 10_pr45098-shift-add-cost.test.patch > > I will sent out the patches individually. > OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries PR target/45098 * gcc.target/arm/ivopts.c: New test. * gcc.target/arm/ivopts-2.c: New test. Index: gcc/testsuite/gcc.target/arm/ivopts-2.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.target/arm/ivopts-2.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do assemble } */ +/* { dg-options "-Os -mthumb -fdump-tree-ivopts -save-temps" } */ + +extern void foo2 (short*); + +void +tr4 (short array[], int n) +{ + int x; + if (n > 0) +for (x = 0; x < n; x++) + foo2 (&array[x]); +} + +/* { dg-final { scan-tree-dump-times "PHI 0) +for (x = 0; x < n; x++) + array[x] = 0; +} + +/* { dg-final { scan-tree-dump-times "PHI <" 1 "ivopts"} } */ +/* { dg-final { object-size text <= 20 { target arm_thumb2_ok } } } */ +/* { dg-final { cleanup-tree-dump "ivopts" } } */
[PATCH, PR45098, 7/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: > Hi Zdenek, > > I have a patch set for for PR45098. > > 01_object-size-target.patch > 02_pr45098-rtx-cost-set.patch > 03_pr45098-computation-cost.patch > 04_pr45098-iv-init-cost.patch > 05_pr45098-bound-cost.patch > 06_pr45098-bound-cost.test.patch > 07_pr45098-nowrap-limits-iterations.patch > 08_pr45098-nowrap-limits-iterations.test.patch > 09_pr45098-shift-add-cost.patch > 10_pr45098-shift-add-cost.test.patch > > I will sent out the patches individually. > OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries PR target/45098 * tree-ssa-loop-ivopts.c (struct ivopts_data): Add fields max_iterations_p and max_iterations. (is_nonwrap_use, max_loop_iterations, set_max_iterations): New function. (may_eliminate_iv): Use max_iterations_p and max_iterations. (tree_ssa_iv_optimize_loop): Use set_max_iterations. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 173355) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -291,6 +291,12 @@ struct ivopts_data /* Whether the loop body includes any function calls. */ bool body_includes_call; + + /* Whether max_iterations is valid. */ + bool max_iterations_p; + + /* Maximum number of iterations of current_loop. */ + double_int max_iterations; }; /* An assignment of iv candidates to uses. */ @@ -4319,6 +4325,108 @@ iv_elimination_compare (struct ivopts_da return (exit->flags & EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR); } +/* Determine if USE contains non-wrapping arithmetic. */ + +static bool +is_nonwrap_use (struct ivopts_data *data, struct iv_use *use) +{ + gimple stmt = use->stmt; + tree var, ptr, ptr_type; + + if (!is_gimple_assign (stmt)) +return false; + + switch (gimple_assign_rhs_code (stmt)) +{ +case POINTER_PLUS_EXPR: + ptr = gimple_assign_rhs1 (stmt); + ptr_type = TREE_TYPE (ptr); + var = gimple_assign_rhs2 (stmt); + if (!expr_invariant_in_loop_p (data->current_loop, ptr)) +return false; + break; +case ARRAY_REF: + ptr = TREE_OPERAND ((gimple_assign_rhs1 (stmt)), 0); + ptr_type = build_pointer_type (TREE_TYPE (gimple_assign_rhs1 (stmt))); + var = TREE_OPERAND ((gimple_assign_rhs1 (stmt)), 1); + break; +default: + return false; +} + + if (!nowrap_type_p (ptr_type)) +return false; + + if (TYPE_PRECISION (ptr_type) != TYPE_PRECISION (TREE_TYPE (var))) +return false; + + return true; +} + +/* Attempt to infer maximum number of loop iterations of DATA->current_loop + from uses in loop containing non-wrapping arithmetic. If successful, + return true, and return maximum iterations in MAX_NITER. */ + +static bool +max_loop_iterations (struct ivopts_data *data, double_int *max_niter) +{ + struct iv_use *use; + struct iv *iv; + bool found = false; + double_int period; + gimple stmt; + unsigned i; + + for (i = 0; i < n_iv_uses (data); i++) +{ + use = iv_use (data, i); + + stmt = use->stmt; + if (!just_once_each_iteration_p (data->current_loop, gimple_bb (stmt))) + continue; + + if (!is_nonwrap_use (data, use)) +continue; + + iv = use->iv; + if (iv->step == NULL_TREE || TREE_CODE (iv->step) != INTEGER_CST) + continue; + period = tree_to_double_int (iv_period (iv)); + + if (found) +*max_niter = double_int_umin (*max_niter, period); + else +{ + found = true; + *max_niter = period; +} +} + + return found; +} + +/* Initializes DATA->max_iterations and DATA->max_iterations_p. */ + +static void +set_max_iterations (struct ivopts_data *data) +{ + double_int max_niter, max_niter2; + bool estimate1, estimate2; + + data->max_iterations_p = false; + estimate1 = estimated_loop_iterations (data->current_loop, true, &max_niter); + estimate2 = max_loop_iterations (data, &max_niter2); + if (!(estimate1 || estimate2)) +return; + if (estimate1 && estimate2) +data->max_iterations = double_int_umin (max_niter, max_niter2); + else if (estimate1) +data->max_iterations = max_niter; + else +data->max_iterations = max_niter2; + data->max_iterations_p = true; +} + /* Check whether it is possible to express the condition in USE by comparison of candidate CAND. If so, store the value compared with to BOUND. */ @@ -4391,10 +4499,10 @@ may_eliminate_iv (struct ivopts_data *da /* See if we can take advantage of infered loop bound information. */ if (loop_only_exit_p (loop, exit)) { - if (!estimated_loop_iterations (loop, true, &max_niter)) + if (!data->max_iterations_p) return false; /* The loop bound is already adjusted by adding 1. */ - if (double_int_ucmp (max_niter, period_value) > 0) + if (double_int_ucmp (data->max_iterations, period_value) > 0) re
[PATCH, PR45098, 8/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: > Hi Zdenek, > > I have a patch set for for PR45098. > > 01_object-size-target.patch > 02_pr45098-rtx-cost-set.patch > 03_pr45098-computation-cost.patch > 04_pr45098-iv-init-cost.patch > 05_pr45098-bound-cost.patch > 06_pr45098-bound-cost.test.patch > 07_pr45098-nowrap-limits-iterations.patch > 08_pr45098-nowrap-limits-iterations.test.patch > 09_pr45098-shift-add-cost.patch > 10_pr45098-shift-add-cost.test.patch > > I will sent out the patches individually. > OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries PR target/45098 * gcc.target/arm/ivopts-3.c: New test. * gcc.target/arm/ivopts-4.c: New test. * gcc.target/arm/ivopts-5.c: New test. * gcc.dg/tree-ssa/ivopt_infer_2.c: Adapt test. Index: gcc/testsuite/gcc.target/arm/ivopts-3.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.target/arm/ivopts-3.c (revision 0) @@ -0,0 +1,20 @@ +/* { dg-do assemble } */ +/* { dg-options "-Os -mthumb -fdump-tree-ivopts -save-temps" } */ + +extern unsigned int foo2 (short*) __attribute__((pure)); + +unsigned int +tr3 (short array[], unsigned int n) +{ + unsigned sum = 0; + unsigned int x; + for (x = 0; x < n; x++) +sum += foo2 (&array[x]); + return sum; +} + +/* { dg-final { scan-tree-dump-times "PHI 0) +for (x = 0; x < n; x++) + sum += foo (&array[x]); + return sum; +} + +/* { dg-final { scan-tree-dump-times "PHI
[PATCH, PR45098, 9/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: > Hi Zdenek, > > I have a patch set for for PR45098. > > 01_object-size-target.patch > 02_pr45098-rtx-cost-set.patch > 03_pr45098-computation-cost.patch > 04_pr45098-iv-init-cost.patch > 05_pr45098-bound-cost.patch > 06_pr45098-bound-cost.test.patch > 07_pr45098-nowrap-limits-iterations.patch > 08_pr45098-nowrap-limits-iterations.test.patch > 09_pr45098-shift-add-cost.patch > 10_pr45098-shift-add-cost.test.patch > > I will sent out the patches individually. > OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries PR target/45098 * tree-ssa-loop-ivopts.c: Include expmed.h. (get_shiftadd_cost): New function. (force_expr_to_var_cost): Use get_shiftadd_cost. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 173380) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -92,6 +92,12 @@ along with GCC; see the file COPYING3. #include "tree-inline.h" #include "tree-ssa-propagate.h" +/* FIXME: add_cost and zero_cost defined in exprmed.h conflict with local uses. + */ +#include "expmed.h" +#undef add_cost +#undef zero_cost + /* FIXME: Expressions are expanded to RTL in this pass to determine the cost of different addressing modes. This should be moved to a TBD interface between the GIMPLE and RTL worlds. */ @@ -3504,6 +3510,37 @@ get_address_cost (bool symbol_present, b return new_cost (cost + acost, complexity); } + /* Calculate the SPEED or size cost of shiftadd EXPR in MODE. MULT is the +the EXPR operand holding the shift. COST0 and COST1 are the costs for +calculating the operands of EXPR. Returns true if successful, and returns +the cost in COST. */ + +static bool +get_shiftadd_cost (tree expr, enum machine_mode mode, comp_cost cost0, + comp_cost cost1, tree mult, bool speed, comp_cost *cost) +{ + comp_cost res; + tree op1 = TREE_OPERAND (expr, 1); + tree cst = TREE_OPERAND (mult, 1); + int m = exact_log2 (int_cst_value (cst)); + int maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (mode)); + int sa_cost; + + if (!(m >= 0 && m < maxm)) +return false; + + sa_cost = (TREE_CODE (expr) != MINUS_EXPR + ? shiftadd_cost[speed][mode][m] + : (mult == op1 +? shiftsub1_cost[speed][mode][m] +: shiftsub0_cost[speed][mode][m])); + res = new_cost (sa_cost, 0); + res = add_costs (res, mult == op1 ? cost0 : cost1); + + *cost = res; + return true; +} + /* Estimates cost of forcing expression EXPR into a variable. */ static comp_cost @@ -3629,6 +3666,21 @@ force_expr_to_var_cost (tree expr, bool case MINUS_EXPR: case NEGATE_EXPR: cost = new_cost (add_cost (mode, speed), 0); + if (TREE_CODE (expr) != NEGATE_EXPR) +{ + tree mult = NULL_TREE; + comp_cost sa_cost; + if (TREE_CODE (op1) == MULT_EXPR) +mult = op1; + else if (TREE_CODE (op0) == MULT_EXPR) +mult = op0; + + if (mult != NULL_TREE + && TREE_CODE (TREE_OPERAND (mult, 1)) == INTEGER_CST + && get_shiftadd_cost (expr, mode, cost0, cost1, mult, speed, +&sa_cost)) +return sa_cost; +} break; case MULT_EXPR:
[PATCH, PR45098, 10/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: > Hi Zdenek, > > I have a patch set for for PR45098. > > 01_object-size-target.patch > 02_pr45098-rtx-cost-set.patch > 03_pr45098-computation-cost.patch > 04_pr45098-iv-init-cost.patch > 05_pr45098-bound-cost.patch > 06_pr45098-bound-cost.test.patch > 07_pr45098-nowrap-limits-iterations.patch > 08_pr45098-nowrap-limits-iterations.test.patch > 09_pr45098-shift-add-cost.patch > 10_pr45098-shift-add-cost.test.patch > > I will sent out the patches individually. > OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries PR target/45098 * gcc.target/arm/ivopts-6.c: New test. Index: gcc/testsuite/gcc.target/arm/ivopts-6.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.target/arm/ivopts-6.c (revision 0) @@ -0,0 +1,15 @@ +/* { dg-do assemble } */ +/* { dg-options "-Os -fdump-tree-ivopts -save-temps -marm" } */ + +void +tr5 (short array[], int n) +{ + int x; + if (n > 0) +for (x = 0; x < n; x++) + array[x] = 0; +} + +/* { dg-final { scan-tree-dump-times "PHI <" 1 "ivopts"} } */ +/* { dg-final { object-size text <= 32 } } */ +/* { dg-final { cleanup-tree-dump "ivopts" } } */
Re: [PATCH] Optimize __sync_fetch_and_add (x, -N) == N and __sync_add_and_fetch (x, N) == 0 (PR target/48986)
On Tue, May 17, 2011 at 9:02 AM, Jakub Jelinek wrote: > Hi! > > This patch optimizes using peephole2 __sync_fetch_and_add (x, -N) == N > and __sync_add_and_fetch (x, N) == 0 by just doing lock {add,sub,inc,dec} > and testing flags, instead of lock xadd plus comparison. > The sync_old_add predicate change makes it possible to optimize > __sync_add_and_fetch with constant second argument to same > code as __sync_fetch_and_add. Doing it in peephole2 has a disadvantage > though, both that the 3 instructions need to be consecutive and > e.g. that xadd insn has to be supported by the CPU. Other alternative > would be to come up with a new bool builtin that would represent the > whole __sync_fetch_and_add (x, -N) == N operation (perhaps with dot or space > in its name to make it inaccessible), try to match it during some folding > and expand it using special optab. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk > this way? > > 2011-05-16 Jakub Jelinek > > PR target/48986 > * config/i386/sync.md (sync_old_add): Relax operand 2 > predicate to allow CONST_INT. > (*sync_old_add_cmp): New insn and peephole2 for it. OK, but please add a comment explaining why we have matched constraint with non-matched predicate. These operands are otherwise targets for cleanups ;) Also, a comment explaining the purpose of the added peephole would be nice. IMO, the change to sync_old_add is also appropriate to release branches. Thanks, Uros.
Re: Don't let search bots look at buglist.cgi
On Mon, May 16, 2011 at 10:27:44PM -0700, Ian Lance Taylor wrote: > On Mon, May 16, 2011 at 6:42 AM, Richard Guenther > wrote: > >>> > >>> httpd being in the top-10 always, fiddling with bugzilla URLs? > >>> (Note, I don't have access to gcc.gnu.org, I'm relaying info from multiple > >>> instances of discussion on #gcc and richi poking on it; that said, it > >>> still might not be web crawlers, that's right, but I'll happily accept > >>> _any_ load improvement on gcc.gnu.org, how unfounded they might seem) > > I think that simply blocking buglist.cgi has dropped bugzilla off the > immediate radar. > It also seems to have lowered the load, although I'm not sure if we > are still keeping > historical data. > > > > I for example see also > > > > 66.249.71.59 - - [16/May/2011:13:37:58 +] "GET > > /viewcvs?view=revision&revision=169814 HTTP/1.1" 200 1334 "-" > > "Mozilla/5.0 (compatible; Googlebot/2.1; > > +http://www.google.com/bot.html)" (35%) 2060117us > > > > and viewvc is certainly even worse (from an I/O perspecive). I thought > > we blocked all bot traffic from the viewvc stuff ... > > This is only happening at top level. I committed this patch to fix this. Probably you know it much better than me, but wouldn't it be a possibility to only allow some of google crawlers? (if all try to crawl bugzilla) As I read http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=1061943 it would be possible to block the Crawlers Googlebot-Mobile, Mediapartners-Google and AdsBot-Google, (which seem to be independent Crawlers?) while allowing the main Googlebot (Well, I don't know how often which crawler appears how often on bugzilla...) Axel
Commit: RX: Add peepholes for move followed by compare
Hi Guys, I am applying the patch below to add a peephole optimization to the RX backend. It was suggested by Kazuhio Inaoka at Renesas Japan, and adapted by me to use peephole2 system. It finds a register move followed by a comparison of the moved register against zero and replaces the two instructions with a single addition instruction. The addition does not actually do anything since the value being added is zero, but as a side effect it moves the register and performs the comparison. Cheers Nick gcc/ChangeLog 2011-05-17 Kazuhio Inaoka Nick Clifton * config/rx/rx.md: Add peepholes to match a register move followed by a comparison of the moved register. Replace these with an addition of zero that does both actions in one instruction. Index: gcc/config/rx/rx.md === --- gcc/config/rx/rx.md (revision 173815) +++ gcc/config/rx/rx.md (working copy) @@ -904,6 +904,39 @@ (set_attr "length" "3,4,5,6,7,6")] ) +;; Peepholes to match: +;; (set (reg A) (reg B)) +;; (set (CC) (compare:CC (reg A/reg B) (const_int 0))) +;; and replace them with the addsi3_flags pattern, using an add +;; of zero to copy the register and set the condition code bits. +(define_peephole2 + [(set (match_operand:SI 0 "register_operand") +(match_operand:SI 1 "register_operand")) + (set (reg:CC CC_REG) +(compare:CC (match_dup 0) +(const_int 0)))] + "" + [(parallel [(set (match_dup 0) + (plus:SI (match_dup 1) (const_int 0))) + (set (reg:CC_ZSC CC_REG) + (compare:CC_ZSC (plus:SI (match_dup 1) (const_int 0)) + (const_int 0)))])] +) + +(define_peephole2 + [(set (match_operand:SI 0 "register_operand") +(match_operand:SI 1 "register_operand")) + (set (reg:CC CC_REG) +(compare:CC (match_dup 1) +(const_int 0)))] + "" + [(parallel [(set (match_dup 0) + (plus:SI (match_dup 1) (const_int 0))) + (set (reg:CC_ZSC CC_REG) + (compare:CC_ZSC (plus:SI (match_dup 1) (const_int 0)) + (const_int 0)))])] +) + (define_expand "adddi3" [(set (match_operand:DI 0 "register_operand") (plus:DI (match_operand:DI 1 "register_operand")
Re: [PATCH][?/n] Cleanup LTO type merging
On Mon, 16 May 2011, Jan Hubicka wrote: > > > > I've seen us merge different named structs which happen to reside > > on the same variant list. That's bogus, not only because we are > > adjusting TYPE_MAIN_VARIANT during incremental type-merging and > > fixup, so computing a persistent hash by looking at it looks > > fishy as well. > > Hi, > as reported on IRC earlier, I get the segfault while building libxul > duea to infinite recursion problem. > > I now however also get a lot more of the following ICEs: > In function '__unguarded_insertion_sort': > lto1: internal compiler error: in splice_child_die, at dwarf2out.c:8274 > previously it reported once during Mozilla build (and I put testcase into > bugzilla), now it reproduces on many libraries. I did not see this problem > when applying only the SCC hasing change. This change causes us to preserve more TYPE_DECLs I think, so we might run more often into pre-existing debuginfo issues. Previously most of the types were merged into their nameless variant which probably didn't get output into debug info. Do you by chance have small testcases for your problems? ;) Richard. > perhaps another unrelated thing, but I now also get undefined symbols during > the builds. > > Honza > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > Richard. > > > > 2011-05-16 Richard Guenther > > > > * gimple.c (gimple_types_compatible_p_1): Use names of the > > type itself, not its main variant. > > (iterative_hash_gimple_type): Likewise. > > > > Index: gcc/gimple.c > > === > > *** gcc/gimple.c(revision 173794) > > --- gcc/gimple.c(working copy) > > *** gimple_types_compatible_p_1 (tree t1, tr > > *** 3817,3824 > > tree f1, f2; > > > > /* The struct tags shall compare equal. */ > > ! if (!compare_type_names_p (TYPE_MAIN_VARIANT (t1), > > ! TYPE_MAIN_VARIANT (t2), false)) > > goto different_types; > > > > /* For aggregate types, all the fields must be the same. */ > > --- 3817,3823 > > tree f1, f2; > > > > /* The struct tags shall compare equal. */ > > ! if (!compare_type_names_p (t1, t2, false)) > > goto different_types; > > > > /* For aggregate types, all the fields must be the same. */ > > *** iterative_hash_gimple_type (tree type, h > > *** 4193,4199 > > unsigned nf; > > tree f; > > > > ! v = iterative_hash_name (TYPE_NAME (TYPE_MAIN_VARIANT (type)), v); > > > > for (f = TYPE_FIELDS (type), nf = 0; f; f = TREE_CHAIN (f)) > > { > > --- 4192,4198 > > unsigned nf; > > tree f; > > > > ! v = iterative_hash_name (TYPE_NAME (type), v); > > > > for (f = TYPE_FIELDS (type), nf = 0; f; f = TREE_CHAIN (f)) > > { > > -- Richard Guenther Novell / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer
Commit: RX: Add peepholes to remove redundant extensions
Hi Guys, I am applying the patch below to add a couple of peephole optimizations to the RX backend. It seems that GCC does not cope very well with the RX's ability to perform either sign-extending loads or zero-extending loads and so sometimes it can generate an extending load followed by a register to register extension. The peepholes match these cases and delete the unnecessary extension where possible. Cheers Nick gcc/ChangeLog 2011-05-17 Nick Clifton * config/rx/rx.md: Add peephole to remove redundant extensions after loads. Index: gcc/config/rx/rx.md === --- gcc/config/rx/rx.md (revision 173819) +++ gcc/config/rx/rx.md (working copy) @@ -1701,6 +1701,35 @@ (extend_types:SI (match_dup 1] ) +;; Convert: +;; (set (reg1) (sign_extend (mem)) +;; (set (reg2) (zero_extend (reg1)) +;; into +;; (set (reg2) (zero_extend (mem))) +(define_peephole2 + [(set (match_operand:SI 0 "register_operand") + (sign_extend:SI (match_operand:small_int_modes 1 "memory_operand"))) + (set (match_operand:SI 2 "register_operand") + (zero_extend:SI (match_operand:small_int_modes 3 "register_operand")))] + "REGNO (operands[0]) == REGNO (operands[3]) + && (REGNO (operands[0]) == REGNO (operands[2]) + || peep2_regno_dead_p (2, REGNO (operands[0])))" + [(set (match_dup 2) + (zero_extend:SI (match_dup 1)))] +) + +;; Remove the redundant sign extension from: +;; (set (reg) (extend (mem))) +;; (set (reg) (extend (reg))) +(define_peephole2 + [(set (match_operand:SI 0 "register_operand") + (extend_types:SI (match_operand:small_int_modes 1 "memory_operand"))) + (set (match_dup 0) + (extend_types:SI (match_operand:small_int_modes 2 "register_operand")))] + "REGNO (operands[0]) == REGNO (operands[2])" + [(set (match_dup 0) (extend_types:SI (match_dup 1)))] +) + (define_insn "comparesi3_" [(set (reg:CC CC_REG) (compare:CC (match_operand:SI 0 "register_operand" "=r")
Commit: RX: Fix predicates for restricted memory patterns
Hi Guys, I am applying the patch below to fix a minor discrepancy in the rx.md file. Several patterns can only use restricted memory addresses. They have the correct Q constraint, but they were using the more permissive memory_operand predicate. The patch fixes these patterns by replacing memory_operand with rx_restricted_mem_operand. Cheers Nick gcc/ChangeLog 2011-05-17 Nick Clifton * config/rx/rx.md (bitset_in_memory): Use rx_restricted_mem_operand. (bitinvert_in_memory): Likewise. (bitclr_in_memory): Likewise. Index: gcc/config/rx/rx.md === --- gcc/config/rx/rx.md (revision 173820) +++ gcc/config/rx/rx.md (working copy) @@ -1831,7 +1831,7 @@ ) (define_insn "*bitset_in_memory" - [(set (match_operand:QI0 "memory_operand" "+Q") + [(set (match_operand:QI0 "rx_restricted_mem_operand" "+Q") (ior:QI (ashift:QI (const_int 1) (match_operand:QI 1 "nonmemory_operand" "ri")) (match_dup 0)))] @@ -1852,7 +1852,7 @@ ) (define_insn "*bitinvert_in_memory" - [(set (match_operand:QI 0 "memory_operand" "+Q") + [(set (match_operand:QI 0 "rx_restricted_mem_operand" "+Q") (xor:QI (ashift:QI (const_int 1) (match_operand:QI 1 "nonmemory_operand" "ri")) (match_dup 0)))] @@ -1875,7 +1875,7 @@ ) (define_insn "*bitclr_in_memory" - [(set (match_operand:QI 0 "memory_operand" "+Q") + [(set (match_operand:QI 0 "rx_restricted_mem_operand" "+Q") (and:QI (not:QI (ashift:QI (const_int 1)
Commit: RX: Include cost of register moving in the cost of register loading.
Hi Guys, I am applying the patch below to fix a bug with the rx_memory_move_cost function. The problem was that the costs are meant to be relative to the cost of moving a value between registers, but the existing definition was making stores cheaper than moves, and loads the same cost as moves. Thus gcc was sometimes choosing to store values in memory when actually it was better to keep them in memory. The patch fixes the problem by adding in the register move cost to the memory move cost. It also removes the call to memory_move_secondary_cost since there is no secondary cost. Cheers Nick gcc/ChangeLog 2011-05-17 Nick Clifton * config/rx/rx.c (rx_memory_move_cost): Include cost of register moves. Index: gcc/config/rx/rx.c === --- gcc/config/rx/rx.c (revision 173815) +++ gcc/config/rx/rx.c (working copy) @@ -2638,7 +2638,7 @@ static int rx_memory_move_cost (enum machine_mode mode, reg_class_t regclass, bool in) { - return (in ? 2 : 0) + memory_move_secondary_cost (mode, regclass, in); + return (in ? 2 : 0) + REGISTER_MOVE_COST (mode, regclass, regclass); } /* Convert a CC_MODE to the set of flags that it represents. */
Re: [PATCH] Fix PR46728 (move pow/powi folds to tree phases)
On Mon, May 16, 2011 at 7:30 PM, William J. Schmidt wrote: > Richi, thank you for the detailed review! > > I'll plan to move the power-series expansion into the existing IL walk > during pass_cse_sincos. As part of this, I'll move > tree_expand_builtin_powi and its subfunctions from builtins.c into > tree-ssa-math-opts.c. I'll submit this as a separate patch. > > I will also stop attempting to make code generation match completely at > -O0. If there are tests in the test suite that fail only at -O0 due to > these changes, I'll modify the tests to require -O1 or higher. > > I understand that you'd prefer that I leave the existing > canonicalization folds in place, and only un-canonicalize them during my > new pass (now, during cse_sincos). Actually, that was my first approach > to this issue. The problem that I ran into is that the various folds > are not performed just by the front end, but can pop up later, after my > pass is done. In particular, pass_fold_builtins will undo my changes, > turning expressions involving roots back into expressions involving > pow/powi. It wasn't clear to me whether the folds could kick in > elsewhere as well, so I took the approach of shutting them down. I see > now that this does lose some optimizations such as > pow(sqrt(cbrx(x)),6.0), as you pointed out. Yeah, it's always a delicate balance between canonicalization and optimal form for further optimization. Did you really see sqrt(cbrt(x)) being transformed back to pow()? I would doubt that, as on gimple the foldings that require two function calls to match shouldn't trigger. Or do you see sqrt(x) turned into pow(x,0.5)? I see that the vectorizer for example handles both pow(x,0.5) and pow(x,2), so indeed that might happen. I think what we might want to do is limit what the generic gimple fold_stmt folding does to function calls, also to avoid building regular generic call statements again. But that might be a bigger project and certainly should be done separately. So I'd say don't worry about this issue for the initial patch but instead deal with it separately. We also repeatedly thought about whether canonicalizing everything to pow is a good idea or not, especially our canonicalizing of x * x to pow (x, 2) leads to interesting effects in some cases, as several passes do not handle function calls very well. So I also thought about introducing a POW_EXPR tree code that would be easier in this regard and would be a more IL friendly canonical form of the power-related functions. > Should I attempt to leave the folds in place, and screen out the > particular cases that are causing trouble in pass_fold_builtins? Or is > it too fragile to try to catch all places where folds occur? If there's > a flag that indicates parsing is complete, I suppose I could disable > individual folds once we're into the optimizer. I'd appreciate your > guidance. Indeed restricting canonicalization to earlier passes would be the way to go I think. I will think of the best way to achieve this. Richard. > Thanks, > Bill > > >
[PATCH][?/n] LTO type merging cleanup
This avoids the odd cases where gimple_register_canonical_type could end up running in cycles. I was able to reproduce this issue with an intermediate tree and LTO bootstrap. While the following patch is not the "real" fix (that one runs into a known cache-preloading issue again ...) it certainly makes a lot of sense and avoids the issue by design. LTO bootstrapped on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2011-05-17 Richard Guenther * gimple.c (gimple_register_canonical_type): Use the main-variant leader for computing the canonical type. Index: gcc/gimple.c === *** gcc/gimple.c(revision 173825) --- gcc/gimple.c(working copy) *** gimple_register_canonical_type (tree t) *** 4856,4874 if (TYPE_CANONICAL (t)) return TYPE_CANONICAL (t); ! /* Always register the type itself first so that if it turns out ! to be the canonical type it will be the one we merge to as well. */ ! t = gimple_register_type (t); if (TYPE_CANONICAL (t)) return TYPE_CANONICAL (t); - /* Always register the main variant first. This is important so we - pick up the non-typedef variants as canonical, otherwise we'll end - up taking typedef ids for structure tags during comparison. */ - if (TYPE_MAIN_VARIANT (t) != t) - gimple_register_canonical_type (TYPE_MAIN_VARIANT (t)); - if (gimple_canonical_types == NULL) gimple_canonical_types = htab_create_ggc (16381, gimple_canonical_type_hash, gimple_canonical_type_eq, 0); --- 4856,4869 if (TYPE_CANONICAL (t)) return TYPE_CANONICAL (t); ! /* Use the leader of our main variant for determining our canonical ! type. The main variant leader is a type that will always ! prevail. */ ! t = gimple_register_type (TYPE_MAIN_VARIANT (t)); if (TYPE_CANONICAL (t)) return TYPE_CANONICAL (t); if (gimple_canonical_types == NULL) gimple_canonical_types = htab_create_ggc (16381, gimple_canonical_type_hash, gimple_canonical_type_eq, 0);
Re: [PATCH][?/n] Cleanup LTO type merging
On Mon, 16 May 2011, H.J. Lu wrote: > On Mon, May 16, 2011 at 7:17 AM, Richard Guenther wrote: > > > > The following patch improves hashing types by re-instantiating the > > patch that makes us visit aggregate target types of pointers and > > function return and argument types. This halves the collision > > rate on the type hash table for a linux-kernel build and improves > > WPA compile-time from 3mins to 1mins and reduces memory usage by > > 1GB for that testcase. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6 > > build-tested. > > > > Richard. > > > > (patch is reversed) > > > > 2011-05-16 Richard Guenther > > > > * gimple.c (iterative_hash_gimple_type): Re-instantiate > > change to always visit pointer target and function result > > and argument types. > > > > This caused: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 I have reverted the patch for now. Richard.
Re: Reintroduce -mflat option on SPARC
> Right, -mflat option should only be for 32-bit SPARC target. OK, let's keep it that way for now. Another question: why does the model hijack %i7 to use it as frame pointer, instead of just using %fp? AFAICS both are kept as fixed registers by the code so the model seems to be wasting 1 register (2 without frame pointer). -- Eric Botcazou
Re: Don't let search bots look at buglist.cgi
Hi, On Mon, 16 May 2011, Ian Lance Taylor wrote: > >>> httpd being in the top-10 always, fiddling with bugzilla URLs? > >>> (Note, I don't have access to gcc.gnu.org, I'm relaying info from > >>> multiple instances of discussion on #gcc and richi poking on it; > >>> that said, it still might not be web crawlers, that's right, but > >>> I'll happily accept > >>> _any_ load improvement on gcc.gnu.org, how unfounded they might seem) > > I think that simply blocking buglist.cgi has dropped bugzilla off the > immediate radar. It also seems to have lowered the load, although I'm > not sure if we are still keeping historical data. Btw. FWIW, I had a quick look at one of the httpd log files, and in seven hours on last Saturday (from 5:30 to 12:30), there were overall 435203 GET requests, and 391319 of them came from our own MnoGoSearch engine, that's 90%. Granted many are then in fact 304 (not modified) responses, but still, perhaps the eagerness of our own crawler can be turned down a bit. Ciao, Michael.
Re: [PATCH][?/n] Cleanup LTO type merging
> On Mon, 16 May 2011, Jan Hubicka wrote: > > > > > > > I've seen us merge different named structs which happen to reside > > > on the same variant list. That's bogus, not only because we are > > > adjusting TYPE_MAIN_VARIANT during incremental type-merging and > > > fixup, so computing a persistent hash by looking at it looks > > > fishy as well. > > > > Hi, > > as reported on IRC earlier, I get the segfault while building libxul > > duea to infinite recursion problem. > > > > I now however also get a lot more of the following ICEs: > > In function '__unguarded_insertion_sort': > > lto1: internal compiler error: in splice_child_die, at dwarf2out.c:8274 > > previously it reported once during Mozilla build (and I put testcase into > > bugzilla), now it reproduces on many libraries. I did not see this problem > > when applying only the SCC hasing change. > > This change causes us to preserve more TYPE_DECLs I think, so we might > run more often into pre-existing debuginfo issues. Previously most > of the types were merged into their nameless variant which probably > didn't get output into debug info. > > Do you by chance have small testcases for your problems? ;) I think you might just look into one at http://gcc.gnu.org/bugzilla/show _bug.cgi?id=48354 Honza
Re: [PATCH] Fix PR46728 (move pow/powi folds to tree phases)
On Tue, 2011-05-17 at 11:03 +0200, Richard Guenther wrote: > On Mon, May 16, 2011 at 7:30 PM, William J. Schmidt > wrote: > > Richi, thank you for the detailed review! > > > > I'll plan to move the power-series expansion into the existing IL walk > > during pass_cse_sincos. As part of this, I'll move > > tree_expand_builtin_powi and its subfunctions from builtins.c into > > tree-ssa-math-opts.c. I'll submit this as a separate patch. > > > > I will also stop attempting to make code generation match completely at > > -O0. If there are tests in the test suite that fail only at -O0 due to > > these changes, I'll modify the tests to require -O1 or higher. > > > > I understand that you'd prefer that I leave the existing > > canonicalization folds in place, and only un-canonicalize them during my > > new pass (now, during cse_sincos). Actually, that was my first approach > > to this issue. The problem that I ran into is that the various folds > > are not performed just by the front end, but can pop up later, after my > > pass is done. In particular, pass_fold_builtins will undo my changes, > > turning expressions involving roots back into expressions involving > > pow/powi. It wasn't clear to me whether the folds could kick in > > elsewhere as well, so I took the approach of shutting them down. I see > > now that this does lose some optimizations such as > > pow(sqrt(cbrx(x)),6.0), as you pointed out. > > Yeah, it's always a delicate balance between canonicalization > and optimal form for further optimization. Did you really see > sqrt(cbrt(x)) being transformed back to pow()? I would doubt that, > as on gimple the foldings that require two function calls to match > shouldn't trigger. Or do you see sqrt(x) turned into pow(x,0.5)? > I see that the vectorizer for example handles both pow(x,0.5) and > pow(x,2), so indeed that might happen. Yes, I was seeing sqrt(x) turned back to pow(x,0.5), and even x*x turning back into pow(x,2.0). I don't specifically recall the sqrt(cbrt(x)) case; you're probably right about that one. But I had several test cases break because of this. > > I think what we might want to do is limit what the generic > gimple fold_stmt folding does to function calls, also to avoid > building regular generic call statements again. But that might > be a bigger project and certainly should be done separately. > > So I'd say don't worry about this issue for the initial patch but > instead deal with it separately. Agreed... > > We also repeatedly thought about whether canonicalizing > everything to pow is a good idea or not, especially our > canonicalizing of x * x to pow (x, 2) leads to interesting > effects in some cases, as several passes do not handle > function calls very well. So I also thought about introducing > a POW_EXPR tree code that would be easier in this > regard and would be a more IL friendly canonical form > of the power-related functions. > > > Should I attempt to leave the folds in place, and screen out the > > particular cases that are causing trouble in pass_fold_builtins? Or is > > it too fragile to try to catch all places where folds occur? If there's > > a flag that indicates parsing is complete, I suppose I could disable > > individual folds once we're into the optimizer. I'd appreciate your > > guidance. > > Indeed restricting canonicalization to earlier passes would be the > way to go I think. I will think of the best way to achieve this. Thanks. I think we need to address this as part of this patch, unless you're willing to live with a number of broken test cases in the meanwhile. If I only do the un-canonicalization in the new pass and let some of the folds be re-done later, some will fail. I'll start experimenting and see how many. Bill > > Richard. > > > Thanks, > > Bill > > > > > >
Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use
> 2011-05-16 Kai Tietz > > PR middle-end/48989 > * gcc-interface/trans.c (Exception_Handler_to_gnu_sjlj): Use > boolean_false_node instead of integer_zero_node. > (convert_with_check): Likewise. > * gcc-interface/decl.c (choices_to_gnu): Likewise. OK for this part. > * gcc-interface/misc.c (gnat_init): Set precision for > generated boolean_type_node and initialize > boolean_false_node. Not OK, you cannot set the precision of boolean_type_node to 1 in Ada. -- Eric Botcazou
[PATCH][?/n] LTO type merging cleanup
This fixes an oversight in the new SCC hash mixing code - we of course need to return the adjusted hash of our type, not the purely local one. There's still something weird going on, hash values somehow depend on the order we feed it types ... Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2011-05-17 Richard Guenther * gimple.c (iterative_hash_gimple_type): Simplify singleton case some more, fix final hash value of the non-singleton case. Index: gcc/gimple.c === --- gcc/gimple.c(revision 173827) +++ gcc/gimple.c(working copy) @@ -4213,25 +4213,24 @@ iterative_hash_gimple_type (tree type, h if (state->low == state->dfsnum) { tree x; - struct sccs *cstate; struct tree_int_map *m; /* Pop off the SCC and set its hash values. */ x = VEC_pop (tree, *sccstack); - cstate = (struct sccs *)*pointer_map_contains (sccstate, x); - cstate->on_sccstack = false; /* Optimize SCC size one. */ if (x == type) { + state->on_sccstack = false; m = ggc_alloc_cleared_tree_int_map (); m->base.from = x; - m->to = cstate->u.hash; + m->to = v; slot = htab_find_slot (type_hash_cache, m, INSERT); gcc_assert (!*slot); *slot = (void *) m; } else { + struct sccs *cstate; unsigned first, i, size, j; struct type_hash_pair *pairs; /* Pop off the SCC and build an array of type, hash pairs. */ @@ -4241,6 +4240,8 @@ iterative_hash_gimple_type (tree type, h size = VEC_length (tree, *sccstack) - first + 1; pairs = XALLOCAVEC (struct type_hash_pair, size); i = 0; + cstate = (struct sccs *)*pointer_map_contains (sccstate, x); + cstate->on_sccstack = false; pairs[i].type = x; pairs[i].hash = cstate->u.hash; do @@ -4275,6 +4276,8 @@ iterative_hash_gimple_type (tree type, h for (j = 0; pairs[j].hash != pairs[i].hash; ++j) hash = iterative_hash_hashval_t (pairs[j].hash, hash); m->to = hash; + if (pairs[i].type == type) + v = hash; slot = htab_find_slot (type_hash_cache, m, INSERT); gcc_assert (!*slot); *slot = (void *) m;
Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use
2011/5/17 Eric Botcazou : >> 2011-05-16 Kai Tietz >> >> PR middle-end/48989 >> * gcc-interface/trans.c (Exception_Handler_to_gnu_sjlj): Use >> boolean_false_node instead of integer_zero_node. >> (convert_with_check): Likewise. >> * gcc-interface/decl.c (choices_to_gnu): Likewise. > > OK for this part. > >> * gcc-interface/misc.c (gnat_init): Set precision for >> generated boolean_type_node and initialize >> boolean_false_node. > > Not OK, you cannot set the precision of boolean_type_node to 1 in Ada. > > -- > Eric Botcazou Hmm, sad. As the a check in tree-cfg for truth-expressions about having type-precision of 1 would be a good way. What is actual the cause for not setting type-precision here? At least in testcases I didn't found a regression caused by this. Regards, Kai
Re: [PATCH][?/n] Cleanup LTO type merging
On Tue, May 17, 2011 at 3:29 AM, Richard Guenther wrote: > On Mon, 16 May 2011, H.J. Lu wrote: > >> On Mon, May 16, 2011 at 7:17 AM, Richard Guenther wrote: >> > >> > The following patch improves hashing types by re-instantiating the >> > patch that makes us visit aggregate target types of pointers and >> > function return and argument types. This halves the collision >> > rate on the type hash table for a linux-kernel build and improves >> > WPA compile-time from 3mins to 1mins and reduces memory usage by >> > 1GB for that testcase. >> > >> > Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6 >> > build-tested. >> > >> > Richard. >> > >> > (patch is reversed) >> > >> > 2011-05-16 Richard Guenther >> > >> > * gimple.c (iterative_hash_gimple_type): Re-instantiate >> > change to always visit pointer target and function result >> > and argument types. >> > >> >> This caused: >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 > > I have reverted the patch for now. > It doesn't solve the problem and I reopened: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 Your followup patches may have similar issues. -- H.J.
Re: [PATCH][?/n] Cleanup LTO type merging
On Tue, May 17, 2011 at 5:59 AM, H.J. Lu wrote: > On Tue, May 17, 2011 at 3:29 AM, Richard Guenther wrote: >> On Mon, 16 May 2011, H.J. Lu wrote: >> >>> On Mon, May 16, 2011 at 7:17 AM, Richard Guenther wrote: >>> > >>> > The following patch improves hashing types by re-instantiating the >>> > patch that makes us visit aggregate target types of pointers and >>> > function return and argument types. This halves the collision >>> > rate on the type hash table for a linux-kernel build and improves >>> > WPA compile-time from 3mins to 1mins and reduces memory usage by >>> > 1GB for that testcase. >>> > >>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6 >>> > build-tested. >>> > >>> > Richard. >>> > >>> > (patch is reversed) >>> > >>> > 2011-05-16 Richard Guenther >>> > >>> > * gimple.c (iterative_hash_gimple_type): Re-instantiate >>> > change to always visit pointer target and function result >>> > and argument types. >>> > >>> >>> This caused: >>> >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 >> >> I have reverted the patch for now. >> > > It doesn't solve the problem and I reopened: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 > > Your followup patches may have similar issues. > I think you reverted the WRONG patch: http://gcc.gnu.org/viewcvs?view=revision&revision=173827 -- H.J.
Re: [PATCH][?/n] Cleanup LTO type merging
On Tue, May 17, 2011 at 3:01 PM, H.J. Lu wrote: > On Tue, May 17, 2011 at 5:59 AM, H.J. Lu wrote: >> On Tue, May 17, 2011 at 3:29 AM, Richard Guenther wrote: >>> On Mon, 16 May 2011, H.J. Lu wrote: >>> On Mon, May 16, 2011 at 7:17 AM, Richard Guenther wrote: > > The following patch improves hashing types by re-instantiating the > patch that makes us visit aggregate target types of pointers and > function return and argument types. This halves the collision > rate on the type hash table for a linux-kernel build and improves > WPA compile-time from 3mins to 1mins and reduces memory usage by > 1GB for that testcase. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6 > build-tested. > > Richard. > > (patch is reversed) > > 2011-05-16 Richard Guenther > > * gimple.c (iterative_hash_gimple_type): Re-instantiate > change to always visit pointer target and function result > and argument types. > This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 >>> >>> I have reverted the patch for now. >>> >> >> It doesn't solve the problem and I reopened: >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 >> >> Your followup patches may have similar issues. >> > > I think you reverted the WRONG patch: > > http://gcc.gnu.org/viewcvs?view=revision&revision=173827 No, that was on purpose. > -- > H.J. >
Re: [PATCH][?/n] Cleanup LTO type merging
On Tue, May 17, 2011 at 6:03 AM, Richard Guenther wrote: > On Tue, May 17, 2011 at 3:01 PM, H.J. Lu wrote: >> On Tue, May 17, 2011 at 5:59 AM, H.J. Lu wrote: >>> On Tue, May 17, 2011 at 3:29 AM, Richard Guenther wrote: On Mon, 16 May 2011, H.J. Lu wrote: > On Mon, May 16, 2011 at 7:17 AM, Richard Guenther > wrote: > > > > The following patch improves hashing types by re-instantiating the > > patch that makes us visit aggregate target types of pointers and > > function return and argument types. This halves the collision > > rate on the type hash table for a linux-kernel build and improves > > WPA compile-time from 3mins to 1mins and reduces memory usage by > > 1GB for that testcase. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6 > > build-tested. > > > > Richard. > > > > (patch is reversed) > > > > 2011-05-16 Richard Guenther > > > > * gimple.c (iterative_hash_gimple_type): Re-instantiate > > change to always visit pointer target and function result > > and argument types. > > > > This caused: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 I have reverted the patch for now. >>> >>> It doesn't solve the problem and I reopened: >>> >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 >>> >>> Your followup patches may have similar issues. >>> >> >> I think you reverted the WRONG patch: >> >> http://gcc.gnu.org/viewcvs?view=revision&revision=173827 > > No, that was on purpose. > But it doesn't fix the problem. -- H.J.
Re: FDO patch -- make ic related vars TLS if target allows
On Wed, Apr 27, 2011 at 10:54 AM, Xinliang David Li wrote: > Hi please review the trivial patch below. It reduces race conditions > in value profiling. Another trivial change (to initialize > function_list struct) is also included. > > Bootstrapped and regression tested on x86-64/linux. > > Thanks, > > David > > > 2011-04-27 Xinliang David Li > > * tree-profile.c (init_ic_make_global_vars): Set > tls attribute on ic vars. > * coverage.c (coverage_end_function): Initialize > function_list with zero. > This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49014 -- H.J.
Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use
> Hmm, sad. As the a check in tree-cfg for truth-expressions about > having type-precision of 1 would be a good way. What is actual the > cause for not setting type-precision here? But we are setting it: /* In Ada, we use an unsigned 8-bit type for the default boolean type. */ boolean_type_node = make_unsigned_type (8); TREE_SET_CODE (boolean_type_node, BOOLEAN_TYPE); See make_unsigned_type: /* Create and return a type for unsigned integers of PRECISION bits. */ tree make_unsigned_type (int precision) { tree type = make_node (INTEGER_TYPE); TYPE_PRECISION (type) = precision; fixup_unsigned_type (type); return type; } The other languages are changing the precision, but in Ada we need a standard scalar (precision == mode size) in order to support invalid values. > At least in testcases I didn't found a regression caused by this. Right, I've just installed the attached testcase, it passes with the unmodified compiler but fails with your gcc-interface/misc.c change. 2011-05-17 Eric Botcazou * gnat.dg/invalid1.adb: New test. -- Eric Botcazou -- { dg-do run } -- { dg-options "-gnatws -gnatVa" } pragma Initialize_Scalars; procedure Invalid1 is X : Boolean; A : Boolean := False; procedure Uninit (B : out Boolean) is begin if A then B := True; raise Program_Error; end if; end; begin -- first, check that initialize_scalars is enabled begin if X then A := False; end if; raise Program_Error; exception when Constraint_Error => null; end; -- second, check if copyback of an invalid value raises constraint error begin Uninit (A); if A then -- we expect constraint error in the 'if' above according to gnat ug: -- -- call. Note that there is no specific option to test `out' -- parameters, but any reference within the subprogram will be tested -- in the usual manner, and if an invalid value is copied back, any -- reference to it will be subject to validity checking. -- ... raise Program_Error; end if; raise Program_Error; exception when Constraint_Error => null; end; end;
Re: [PATCH] comment precising need to use free_dominance_info
So maybe this patch adding a comment on calculate_dominance_info is more adapted. ChangeLog: 2011-05-17 Pierre Vittet * dominance.c (calculate_dominance_info): Add comment precising when to free with free_dominance_info contributor number: 634276 Index: gcc/dominance.c === --- gcc/dominance.c (revision 173830) +++ gcc/dominance.c (working copy) @@ -628,8 +628,15 @@ compute_dom_fast_query (enum cdi_direction dir) } /* The main entry point into this module. DIR is set depending on whether - we want to compute dominators or postdominators. */ + we want to compute dominators or postdominators. + We try to keep dominance info alive as long as possible (to avoid + recomputing it often). It has to be freed with free_dominance_info when CFG + transformation makes it invalide. + + post_dominance info is less often used, and should be freed after each use. +*/ + void calculate_dominance_info (enum cdi_direction dir) {
RFA: MN10300: Add TLS support
Hi Richard, Hi Jeff, Hi Alex, Here is another MN10300 patch. This ones adds support for TLS. I must confess that I did not actually write this code - DJ did - but I have been asked to submit it upstream, so here goes: OK to apply ? Cheers Nick gcc/ChangeLog 2011-05-17 DJ Delorie Nick Clifton * config/mn10300/mn10300.c (mn10300_unspec_int_label_counter): New variable. (mn10300_option_override): Disable TLS for the MN10300. (tls_symbolic_operand_kind): New function. (get_some_local_dynamic_name_1): New function. (get_some_local_dynamic_name): New function. (mn10300_print_operand): Handle %&. (mn10300_legitimize_address): Legitimize TLS addresses. (is_legitimate_tls_operand): New function. (mn10300_legitimate_pic_operand_p): TLS operands are legitimate. (mn10300_legitimate_address_p): TLS symbols do not make legitimate addresses. Allow TLS operands under some circumstances. (mn10300_legitimate_constant_p): Handle TLS UNSPECs. (mn10300_init_machine_status): New function. (mn10300_init_expanders): New function. (pic_nonpic_got_ptr): New function. (mn10300_tls_get_addr): New function. (mn10300_legitimize_tls_address): New function. (mn10300_constant_address_p): New function. (TARGET_HAVE_TLS): Define. * config/mn10300/predicates.md (tls_symbolic_operand): New. (nontls_general_operand): New. * config/mn10300/mn10300.h (enum reg_class): Add D0_REGS, A0_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. (struct machine_function): New structure. (INIT_EXPANDERS): Define. (mn10300_unspec_int_label_counter): New variable. (PRINT_OPERAND_PUNCT_VALID_P): Define. (CONSTANT_ADDRESS_P): Define. * config/mn10300/constraints (B): New constraint. (C): New constraint. * config/mn10300/mn10300-protos.h: Alpha sort. (mn10300_init_expanders): Prototype. (mn10300_tls_get_addr): Prototype. (mn10300_legitimize_tls_address): Prototype. (mn10300_constant_address_p): Prototype. * config/mn10300/mn10300.md (TLS_REG): New constant. (UNSPEC_INT_LABEL): New constant. (UNSPEC_TLSGD): New constant. (UNSPEC_TLSLDM): New constant. (UNSPEC_DTPOFF): New constant. (UNSPEC_GOTNTPOFF): New constant. (UNSPEC_INDNTPOFF): New constant. (UNSPEC_TPOFF): New constant. (UNSPEC_TLS_GD): New constant. (UNSPEC_TLS_LD_BASE): New constant. (movsi): Add TLS code. (tls_global_dynamic_i): New pattern. (tls_global_dynamic): New pattern. (tls_local_dynamic_base_i): New pattern. (tls_local_dynamic_base): New pattern. (tls_initial_exec): New pattern. (tls_initial_exec_1): New pattern. (tls_initial_exec_2): New pattern. (am33_set_got): New pattern. (int_label): New pattern. (am33_loadPC_anyreg): New pattern. (add_GOT_to_any_reg): New pattern. Index: gcc/config/mn10300/mn10300.c === --- gcc/config/mn10300/mn10300.c (revision 173815) +++ gcc/config/mn10300/mn10300.c (working copy) @@ -46,7 +46,12 @@ #include "df.h" #include "opts.h" #include "cfgloop.h" +#include "ggc.h" +/* This is used by GOTaddr2picreg to uniquely identify + UNSPEC_INT_LABELs. */ +int mn10300_unspec_int_label_counter; + /* This is used in the am33_2.0-linux-gnu port, in which global symbol names are not prefixed by underscores, to tell whether to prefix a label with a plus sign or not, so that the assembler can tell @@ -124,6 +129,9 @@ target_flags &= ~MASK_MULT_BUG; else { + /* We can't do TLS if we don't have the TLS register. */ + targetm.have_tls = false; + /* Disable scheduling for the MN10300 as we do not have timing information available for it. */ flag_schedule_insns = 0; @@ -162,6 +170,51 @@ fprintf (asm_out_file, "\t.am33\n"); } +/* Returns non-zero if OP has the KIND tls model. */ + +static inline bool +tls_symbolic_operand_kind (rtx op, enum tls_model kind) +{ + if (GET_CODE (op) != SYMBOL_REF) +return false; + return SYMBOL_REF_TLS_MODEL (op) == kind; +} + +/* Locate some local-dynamic symbol still in use by this function + so that we can print its name in some tls_local_dynamic_base + pattern. This is used by "%&" in print_operand(). */ + +static int +get_some_local_dynamic_name_1 (rtx *px, void *data ATTRIBUTE_UNUSED) +{ + rtx x = *px; + + if (GET_CODE (x) == SYMBOL_REF + && tls_symbolic_operand_kind (x, TLS_MODEL_LOCAL_DYNAMIC)) +{ + cfun->machine->some_ld_name = XSTR (x, 0); + return 1; +} + + return 0; +} + +static const char * +get_some_local_dynamic_name (void) +{ + rtx insn;
[PATCH] Fixup LTO SCC hash comparison fn
Quite obvious if you look at it for the 100th time... Richard. 2011-05-17 Richard Guenther * gimple.c (type_hash_pair_compare): Fix comparison. Index: gcc/gimple.c === --- gcc/gimple.c(revision 173830) +++ gcc/gimple.c(working copy) @@ -4070,9 +4070,11 @@ type_hash_pair_compare (const void *p1_, { const struct type_hash_pair *p1 = (const struct type_hash_pair *) p1_; const struct type_hash_pair *p2 = (const struct type_hash_pair *) p2_; - if (p1->hash == p2->hash) -return TYPE_UID (p1->type) - TYPE_UID (p2->type); - return p1->hash - p2->hash; + if (p1->hash < p2->hash) +return -1; + else if (p1->hash > p2->hash) +return 1; + return 0; } /* Returning a hash value for gimple type TYPE combined with VAL.
Clean up ARM string option handling
This patch cleans up ARM handling of various options, making enumerated options that were handled in arm_option_override use Enum instead (except for -mfpu=, to be handled in a subsequent patch) and using UInteger for -mstructure-size-boundary=. -mfp= and -mfpe= (legacy aliases) are converted into actual .opt Alias entries for each valid argument to those options. (Thus, the last of a sequence of interspersed -mfpu= and -mfp=/-mfpe= options now wins, which I think is correct, whereas previously an earlier -mfpu= option would override a later -mfp=/-mfpe= option.) -mstructure-size-boundary= using UInteger means that its arguments are now required to be decimal integers without leading whitespace or ignored trailing text, whereas previously such variants (and octal and hexadecimal values) would be accepted; I think this stricter checking, consistent with other integer-valued options, is also the desired semantics. Tested building cc1 and xgcc for cross to arm-eabi. Will commit to trunk in the absence of target maintainer objections. Remarks on oddities in the configuration of various ARM targets for the consideration of the ARM target maintainers (I don't plan to work on these issues): * The default float-abi (TARGET_DEFAULT_FLOAT_ABI) is "hard" only for old-ABI GNU/Linux. Although it's also defined to ARM_FLOAT_ABI_HARD in arm/semi.h, that definition gets overridden in arm/coff.h. * The header arm/aout.h has a comment saying it is for "ARM with a.out", which is misleading since there are no such targets in GCC. Actually, all ARM targets include it either immediately before arm/arm.h, or in the case of FreeBSD with arm/freebsd.h inbetween - but arm/freebsd.h does not test or modify any macros from arm/aout.h, so actually arm/aout.h could safely be merged into arm/arm.h, making the port simpler and less confusing. (I haven't checked whether arm/arm.h tests or modifies any macros from arm/aout.h, which would be relevant to determining the best way to merge each macro into arm/arm.h.) * The header arm/semi.h says "ARM on semi-hosted platform" - which is also misleading since it's only used for WinCE. As noted above at least one target macro in this header is dead. As the four headers arm/semi.h arm/coff.h arm/pe.h arm/wince-pe.h are all only used for this one target, and I doubt the distinctions between them genuinely follow any proper abstraction levels between four different sets of targets that simply happen to be the same right now, merging them into one header might improve things. * arm*-*-uclinux* are legacy Linux-based targets not using the common gnu-user.h and linux.h headers as they should be. * There were previous discussions of deprecating at least some non-AAPCS targets. Since for some target OSes (such as WinCE) the ABI is what it is, it may not be possible to eliminate the old ABI support completely. But maybe some other OS maintainers (VxWorks, FreeBSD, NetBSD, eCos, RTEMS) are moving their OSes to use AAPCS, in which case some older target variants could be deprecated and removed? (Maybe old-ABI bare-metal, GNU/Linux and uClinux could be deprecated in any case.) Closely tied into old-ABI use are defaults of -mfpu= based on the selected CPU - but NetBSD and VxWorks at least default to VFP (define FPUTYPE_DEFAULT appropriately) even though using pre-AAPCS ABIs. * There was also a previous suggestion that -mwords-little-endian is long-obsolete and should be removed as well. 2011-05-17 Joseph Myers * config/arm/arm-opts.h (enum arm_fp16_format_type, enum arm_abi_type, enum float_abi_type, enum arm_tp_type): Move from arm.h. * config/arm/arm.c (arm_float_abi, arm_fp16_format, arm_abi, target_thread_pointer, arm_structure_size_boundary, struct float_abi, all_float_abis, struct fp16_format, all_fp16_formats, struct abi_name, arm_all_abis): Remove. (arm_option_override) Don't process most enumerated option values here. Don't process target_fpe_name here. Work with integer not string for structure size boundary; use separate diagnostics for each case. * config/arm/arm.h (enum float_abi_type, enum arm_fp16_format_type, enum arm_abi_type, enum arm_tp_type): Move to arm-opts.h. (arm_float_abi, arm_fp16_format, arm_abi, target_thread_pointer, arm_structure_size_boundary): Remove. * config/arm/arm.opt (mabi=): Use Enum and Init. (arm_abi_type): New Enum and EnumValue entries. (mfloat-abi=): Use Enum and Init. (float_abi_type): New Enum and EnumValue entries. (mfp=, mfpe=): Replace by separate Alias entries for each argument. (mfp16-format=): Use Enum and Init. (arm_fp16_format_type): New Enum and EnumValue entries. (mstructure-size-boundary=): Use UInteger and Init. (mtp=): Use Enum and Init. (arm_tp_type): New
[PING][PATCH 13/18] move TS_EXP to be a substructure of TS_TYPED
On 05/10/2011 04:18 PM, Nathan Froyd wrote: > On 03/10/2011 11:23 PM, Nathan Froyd wrote: >> After all that, we can finally make tree_exp inherit from typed_tree. >> Quite anticlimatic. > > Ping. http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00559.html Ping^2. -Nathan
Re: Libiberty: POSIXify psignal definition
On Thu, 2011-05-05 at 09:30 +0200, Corinna Vinschen wrote: > [Please keep me CCed, I'm not subscribed to gcc-patches. Thank you] > > Hi, > > the definition of psignal in libiberty is > >void psignal (int, char *); > > The correct definition per POSIX is > >void psignal (int, const char *); > > The below patch fixes that. > > > Thanks, > Corinna > > > * strsignal.c (psignal): Change second parameter to const char *. > Fix comment accordingly. > OK. R.
Re: Libiberty: POSIXify psignal definition
> > * strsignal.c (psignal): Change second parameter to const char *. > > Fix comment accordingly. > > > > OK. I had argued against this patch: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00439.html The newlib change broke ALL released versions of gcc, and the above patch does NOT fix the problem, but merely hides it until the next time we trip over it.
Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use
2011/5/17 Eric Botcazou : >> Hmm, sad. As the a check in tree-cfg for truth-expressions about >> having type-precision of 1 would be a good way. What is actual the >> cause for not setting type-precision here? > > But we are setting it: > > /* In Ada, we use an unsigned 8-bit type for the default boolean type. */ > boolean_type_node = make_unsigned_type (8); > TREE_SET_CODE (boolean_type_node, BOOLEAN_TYPE); > > See make_unsigned_type: > > /* Create and return a type for unsigned integers of PRECISION bits. */ > > tree > make_unsigned_type (int precision) > { > tree type = make_node (INTEGER_TYPE); > > TYPE_PRECISION (type) = precision; > > fixup_unsigned_type (type); > return type; > } > > > The other languages are changing the precision, but in Ada we need a standard > scalar (precision == mode size) in order to support invalid values. > >> At least in testcases I didn't found a regression caused by this. > > Right, I've just installed the attached testcase, it passes with the > unmodified > compiler but fails with your gcc-interface/misc.c change. > > > 2011-05-17 Eric Botcazou > > * gnat.dg/invalid1.adb: New test. > > > -- > Eric Botcazou > Ok, thanks for explaining it. So would be patch ok for apply without the precision setting? Regards, Kai
Re: Libiberty: POSIXify psignal definition
On Tue, 2011-05-17 at 11:52 -0400, DJ Delorie wrote: > > > * strsignal.c (psignal): Change second parameter to const char *. > > > Fix comment accordingly. > > > > > > > OK. > > I had argued against this patch: > > http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00439.html > > The newlib change broke ALL released versions of gcc, and the above > patch does NOT fix the problem, but merely hides it until the next > time we trip over it. > So regardless of whether the changes to newlib are a good idea or not, I think the fix to libiberty is still right. Posix says that psignal takes a const char *, and libiberty's implementation doesn't. That's just silly. I do agree that the newlib code should be tightened up, particularly in order to support older compilers; but that doesn't mean we shouldn't fix libiberty as well. R.
Re: Libiberty: POSIXify psignal definition
On May 17 16:33, Richard Earnshaw wrote: > > On Thu, 2011-05-05 at 09:30 +0200, Corinna Vinschen wrote: > > [Please keep me CCed, I'm not subscribed to gcc-patches. Thank you] > > > > Hi, > > > > the definition of psignal in libiberty is > > > >void psignal (int, char *); > > > > The correct definition per POSIX is > > > >void psignal (int, const char *); > > > > The below patch fixes that. > > > > > > Thanks, > > Corinna > > > > > > * strsignal.c (psignal): Change second parameter to const char *. > > Fix comment accordingly. > > > > OK. > > R. Thanks. I just have no check in rights to the gcc repository. I applied the change to the sourceware CVS repository but for gcc I need a proxy. Thanks, Corinna -- Corinna Vinschen Cygwin Project Co-Leader Red Hat
Re: Libiberty: POSIXify psignal definition
On May 17 17:07, Richard Earnshaw wrote: > > On Tue, 2011-05-17 at 11:52 -0400, DJ Delorie wrote: > > > > * strsignal.c (psignal): Change second parameter to const char > > > > *. > > > > Fix comment accordingly. > > > > > > > > > > OK. > > > > I had argued against this patch: > > > > http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00439.html > > > > The newlib change broke ALL released versions of gcc, and the above > > patch does NOT fix the problem, but merely hides it until the next > > time we trip over it. > > > > So regardless of whether the changes to newlib are a good idea or not, I > think the fix to libiberty is still right. Posix says that psignal takes > a const char *, and libiberty's implementation doesn't. That's just > silly. > > I do agree that the newlib code should be tightened up, particularly in > order to support older compilers; What I don't understand is why the newlib change broke older compilers. The function has been added to newlib and the definitions in newlib are correct. If this is refering to the fact that libiberty doesn't grok automatically if a symbol has been added to newlib, then that's a problem in libiberty, not in newlib. Otherwise, if you're building an older compiler, just use an older newlib as well. Corinna -- Corinna Vinschen Cygwin Project Co-Leader Red Hat
Re: Libiberty: POSIXify psignal definition
> So regardless of whether the changes to newlib are a good idea or not, I > think the fix to libiberty is still right. Irrelevent. I said I'd accept that change *after* the real problem is fixed. The real problem hasn't been fixed. The real problem is that libibery should NOT INCLUDE PSIGNAL AT ALL if newlib has it. What *should* have happened, is libiberty should have been fixed *first*, and newlib waited until a gcc/binutils release cycle happened, so that at least ONE version of those could build with newlib.
Re: Libiberty: POSIXify psignal definition
> Thanks. I just have no check in rights to the gcc repository. I > applied the change to the sourceware CVS repository but for gcc I > need a proxy. Please, never apply libiberty patches only to src. They're likely to get deleted by the robomerge. The rule is: gcc only, or both at the same time.
Re: Libiberty: POSIXify psignal definition
> What I don't understand is why the newlib change broke older compilers. Older compilers have the older libiberty. At the moment, libiberty cannot be built by *any* released gcc, because you cannot *build* any released gcc, because it cannot build its target libiberty. > The function has been added to newlib and the definitions in newlib are > correct. "Correct" is irrelevent. They don't match libiberty, so the build breaks. > If this is refering to the fact that libiberty doesn't grok > automatically if a symbol has been added to newlib, then that's a > problem in libiberty, not in newlib. It's a problem in every released gcc at the moment, so no released gcc can be built for a newlib target, without hacking the sources. > Otherwise, if you're building an older compiler, just use an older > newlib as well. The only option here is to not release a newlib at all until a fixed gcc release happens, then, and require that fixed gcc for that version of newlib forward.
Re: [Patch, libfortran] PR 48931 Async-signal-safety of backtrace signal handler
On 05/14/2011 09:40 PM, Janne Blomqvist wrote: Hi, the current version of showing the backtrace is not async-signal-safe as it uses backtrace_symbols() which, in turn, uses malloc(). The attached patch changes the backtrace printing functionality to instead use backtrace_symbols_fd() and pipes. Great - this would solve a problem I filed a bugzilla report for years ago (unfortunately, I do not know the number of it). I closed it WONTFIX, because neither FX nor I could come up with an alternative way *not* using malloc. [ The problem was getting a traceback after corruption of the malloc arena, which just hangs under the current implementation. ] -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
[PATCH, i386]: Trivial, use bool some more.
Hello! 2011-05-16 Uros Bizjak * config/i386/i386-protos.h (output_fix_trunc): Change arg 3 to bool. (output_fp_compare): Change args 3 and 4 to bool. (ix86_expand_call): Change arg 6 to bool. (ix86_attr_length_immediate_default): Change arg 2 to bool. (ix86_attr_length_vex_default): Change arg 3 to bool. * config/i386/i386.md: Update all uses. * config/i386/i386.c: Ditto. (ix86_flags_dependent): Change return type to bool. Patch was tested on x86_64-pc-linux-gnu {,-m32}, also with --enable-build-with-cxx (additional patch is needed to bootstrap without errors ATM). Committed to mainline SVN. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 173832) +++ config/i386/i386.md (working copy) @@ -414,9 +414,9 @@ (const_int 0) (eq_attr "type" "alu,alu1,negnot,imovx,ishift,rotate,ishift1,rotate1, imul,icmp,push,pop") - (symbol_ref "ix86_attr_length_immediate_default(insn,1)") + (symbol_ref "ix86_attr_length_immediate_default (insn, true)") (eq_attr "type" "imov,test") - (symbol_ref "ix86_attr_length_immediate_default(insn,0)") + (symbol_ref "ix86_attr_length_immediate_default (insn, false)") (eq_attr "type" "call") (if_then_else (match_operand 0 "constant_call_address_operand" "") (const_int 4) @@ -524,11 +524,11 @@ (if_then_else (and (eq_attr "prefix_0f" "1") (eq_attr "prefix_extra" "0")) (if_then_else (eq_attr "prefix_vex_w" "1") - (symbol_ref "ix86_attr_length_vex_default (insn, 1, 1)") - (symbol_ref "ix86_attr_length_vex_default (insn, 1, 0)")) + (symbol_ref "ix86_attr_length_vex_default (insn, true, true)") + (symbol_ref "ix86_attr_length_vex_default (insn, true, false)")) (if_then_else (eq_attr "prefix_vex_w" "1") - (symbol_ref "ix86_attr_length_vex_default (insn, 0, 1)") - (symbol_ref "ix86_attr_length_vex_default (insn, 0, 0)" + (symbol_ref "ix86_attr_length_vex_default (insn, false, true)") + (symbol_ref "ix86_attr_length_vex_default (insn, false, false)" ;; Set when modrm byte is used. (define_attr "modrm" "" @@ -1262,7 +1262,7 @@ UNSPEC_FNSTSW))] "X87_FLOAT_MODE_P (GET_MODE (operands[1])) && GET_MODE (operands[1]) == GET_MODE (operands[2])" - "* return output_fp_compare (insn, operands, 0, 0);" + "* return output_fp_compare (insn, operands, false, false);" [(set_attr "type" "multi") (set_attr "unit" "i387") (set (attr "mode") @@ -1309,7 +1309,7 @@ (match_operand:XF 2 "register_operand" "f"))] UNSPEC_FNSTSW))] "TARGET_80387" - "* return output_fp_compare (insn, operands, 0, 0);" + "* return output_fp_compare (insn, operands, false, false);" [(set_attr "type" "multi") (set_attr "unit" "i387") (set_attr "mode" "XF")]) @@ -1343,7 +1343,7 @@ (match_operand:MODEF 2 "nonimmediate_operand" "fm"))] UNSPEC_FNSTSW))] "TARGET_80387" - "* return output_fp_compare (insn, operands, 0, 0);" + "* return output_fp_compare (insn, operands, false, false);" [(set_attr "type" "multi") (set_attr "unit" "i387") (set_attr "mode" "")]) @@ -1378,7 +1378,7 @@ UNSPEC_FNSTSW))] "X87_FLOAT_MODE_P (GET_MODE (operands[1])) && GET_MODE (operands[1]) == GET_MODE (operands[2])" - "* return output_fp_compare (insn, operands, 0, 1);" + "* return output_fp_compare (insn, operands, false, true);" [(set_attr "type" "multi") (set_attr "unit" "i387") (set (attr "mode") @@ -1428,7 +1428,7 @@ "X87_FLOAT_MODE_P (GET_MODE (operands[1])) && (TARGET_USE_MODE_FIOP || optimize_function_for_size_p (cfun)) && (GET_MODE (operands [3]) == GET_MODE (operands[1]))" - "* return output_fp_compare (insn, operands, 0, 0);" + "* return output_fp_compare (insn, operands, false, false);" [(set_attr "type" "multi") (set_attr "unit" "i387") (set_attr "fp_int_src" "true") @@ -1504,7 +1504,7 @@ "TARGET_MIX_SSE_I387 && SSE_FLOAT_MODE_P (GET_MODE (operands[0])) && GET_MODE (operands[0]) == GET_MODE (operands[1])" - "* return output_fp_compare (insn, operands, 1, 0);" + "* return output_fp_compare (insn, operands, true, false);" [(set_attr "type" "fcmp,ssecomi") (set_attr "prefix" "orig,maybe_vex") (set (attr "mode") @@ -1533,7 +1533,7 @@ "TARGET_SSE_MATH && SSE_FLOAT_MODE_P (GET_MODE (operands[0])) && GET_MODE (operands[0]) == GET_MODE (operands[1])" - "* return output_fp_compare (insn, operands, 1, 0);" + "* return output_fp_compare (insn, operands, true, false);" [(set_attr "type" "ssecomi") (set_attr "prefix" "maybe_vex") (set (attr "mode") @@ -1557,7 +1557,7 @@ && TARGET_CMOVE && !(SSE_FLOAT_MODE_P (GET_MODE (operands[0])) && TARGET_SSE_MATH) && GET_MODE (operands[0]) == GET_MODE (operands[1]
[PATCH]: Restore bootstrap with --enable-build-with-cxx
Hello! 2011-05-17 Uros Bizjak * ipa-inline-analysis.c (inline_node_duplication_hook): Initialize info->entry with 0 * tree-inline.c (maybe_inline_call_in_expr): Initialize id.transform_lang_insert_block with NULL. Tested on x86_64-pc-linux-gnu {, m32} with --enable-build-with-cxx. Committed to mainline SVN as obvious. Uros. Index: ipa-inline-analysis.c === --- ipa-inline-analysis.c (revision 173832) +++ ipa-inline-analysis.c (working copy) @@ -702,7 +702,7 @@ inline_node_duplication_hook (struct cgr bool inlined_to_p = false; struct cgraph_edge *edge; - info->entry = false; + info->entry = 0; VEC_safe_grow_cleared (tree, heap, known_vals, count); for (i = 0; i < count; i++) { Index: tree-inline.c === --- tree-inline.c (revision 173832) +++ tree-inline.c (working copy) @@ -5232,7 +5232,7 @@ maybe_inline_call_in_expr (tree exp) id.transform_call_graph_edges = CB_CGE_DUPLICATE; id.transform_new_cfg = false; id.transform_return_to_modify = true; - id.transform_lang_insert_block = false; + id.transform_lang_insert_block = NULL; /* Make sure not to unshare trees behind the front-end's back since front-end specific mechanisms may rely on sharing. */
[PATCH, MELT] correcting path error in the Makefile.in
This patch correct a bug in the current revision of MELT, which was preventing MELT to run correctly. This was a path problem in gcc/Makefile.in (melt-modules/ and melt-modules.mk) were not found. My contributor number is 634276. changelog : 2011-05-17 Pierre Vittet * Makefile.in : Correct path errors for melt_module_dir and for install-melt-mk target Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 173832) +++ gcc/Makefile.in (working copy) @@ -5352,7 +5352,7 @@ melt_default_modules_list=melt-default-modules melt_source_dir=$(libexecsubdir)/melt-source/ ## this is the installation directory of melt dynamic modules (*.so) -melt_module_dir=$(libexecsubdir)/melt-module/ +melt_module_dir=$(libexecsubdir)/melt-modules/ ## this is the installed path of the MELT module makefile melt_installed_module_makefile=$(libexecsubdir)/melt-module.mk @@ -5416,8 +5416,8 @@ install-melt-modules: melt-modules melt-all-module ## install the makefile for MELT modules install-melt-mk: melt-module.mk - $(mkinstalldirs) $(DESTDIR)$(plugin_includedir) - $(INSTALL_DATA) $< $(DESTDIR)/$(plugin_includedir)/ + $(mkinstalldirs) $(DESTDIR)$(libexecsubdir) + $(INSTALL_DATA) $< $(DESTDIR)/$(libexecsubdir)/ ## install the default modules list install-melt-default-modules-list: $(melt_default_modules_list).modlis
Re: [Patch, libfortran] PR 48931 Async-signal-safety of backtrace signal handler
On 05/17/2011 07:50 PM, Toon Moene wrote: On 05/14/2011 09:40 PM, Janne Blomqvist wrote: Hi, the current version of showing the backtrace is not async-signal-safe as it uses backtrace_symbols() which, in turn, uses malloc(). The attached patch changes the backtrace printing functionality to instead use backtrace_symbols_fd() and pipes. Great - this would solve a problem I filed a bugzilla report for years ago (unfortunately, I do not know the number of it). It was 33905 (2007-10-26). -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Re: [PATCH, MELT] correcting path error in the Makefile.in
On Tue, 17 May 2011 21:30:44 +0200 Pierre Vittet wrote: > This patch correct a bug in the current revision of MELT, which was > preventing MELT to run correctly. > > This was a path problem in gcc/Makefile.in (melt-modules/ and > melt-modules.mk) were not found. > > My contributor number is 634276. > > changelog : > > > 2011-05-17 Pierre Vittet > > * Makefile.in : Correct path errors for melt_module_dir and for > install-melt-mk target The ChangeLog.MELT entry should mention the Makefile target as changelog functions. And the colon shouldn't have any space before. So I applied the patch with the following entry: 2011-05-17 Pierre Vittet * Makefile.in (melt_module_dir,install-melt-mk): Correct path errors. Committed revision 173835. Thanks. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***
Re: [PATCH]: Restore bootstrap with --enable-build-with-cxx
On 05/17/2011 08:32 PM, Uros Bizjak wrote: Tested on x86_64-pc-linux-gnu {, m32} with --enable-build-with-cxx. Committed to mainline SVN as obvious. Does that mean that I can now remove the --disable-werror from my daily C++ bootstrap run ? It's great that some people understand the intricacies of the infight^H^H^H^H^H^H differences between the C and C++ type model. OK: 1/2 :-) -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Restore MIPS builds
I've applied the patch below to restore -Werror MIPS builds. Tested on mips64-linux-gnu. Richard gcc/ * config/mips/mips.c (mips_handle_option): Remove unused variable. Index: gcc/config/mips/mips.c === --- gcc/config/mips/mips.c 2011-05-15 08:37:21.0 +0100 +++ gcc/config/mips/mips.c 2011-05-15 08:37:28.0 +0100 @@ -15287,7 +15287,6 @@ mips_handle_option (struct gcc_options * location_t loc ATTRIBUTE_UNUSED) { size_t code = decoded->opt_index; - const char *arg = decoded->arg; switch (code) {
[PATCH] fix vfmsubaddpd/vfmaddsubpd generation
This patch fixes an obvious problem: the fma4_fmsubadd/fma4_fmaddsub instruction templates don't generate vfmsubaddpd/vfmaddsubpd because they don't use This passes bootstrap on x86_64 on trunk. Okay to commit? BTW, I'm testing on gcc-4_6-branch. Should I post a different patch thread, or just use this one? -- Quentin From aa70d4f6180f1c6712888b7328723232b5da8bdc Mon Sep 17 00:00:00 2001 From: Quentin Neill Date: Tue, 17 May 2011 10:24:17 -0500 Subject: [PATCH] 2011-05-17 Harsha Jagasia * config/i386/sse.md (fma4_fmsubadd): Use . (fma4_fmaddsub): Likewise --- gcc/ChangeLog |5 + gcc/config/i386/sse.md |4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3625d9b..e86ea4e 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2011-05-17 Harsha Jagasia + + * config/i386/sse.md (fma4_fmsubadd): Use . + (fma4_fmaddsub): Likewise + 2011-05-17 Richard Guenther * gimple.c (iterative_hash_gimple_type): Simplify singleton diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 291bffb..7c4e6dd 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1663,7 +1663,7 @@ (match_operand:VF 3 "nonimmediate_operand" "xm,x")] UNSPEC_FMADDSUB))] "TARGET_FMA4" - "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}" + "vfmaddsubp\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "type" "ssemuladd") (set_attr "mode" "")]) @@ -1676,7 +1676,7 @@ (match_operand:VF 3 "nonimmediate_operand" "xm,x"))] UNSPEC_FMADDSUB))] "TARGET_FMA4" - "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}" + "vfmsubaddp\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "type" "ssemuladd") (set_attr "mode" "")]) -- 1.7.1
[v3] vs noexcept
Hi, this time too, took the occasion to add the get(tuple&&) bits. Tested x86_64-linux, committed. Paolo. /// 2011-05-17 Paolo Carlini * include/std/tuple: Use noexcept where appropriate. (tuple<>::swap): Rework implementation. (_Head_base<>::_M_swap_impl): Remove. (get(std::tuple<>&&)): Add. * testsuite/20_util/tuple/element_access/get2.cc: New. * testsuite/20_util/weak_ptr/comparison/cmp_neg.cc: Adjust dg-error line number. Index: include/std/tuple === --- include/std/tuple (revision 173832) +++ include/std/tuple (working copy) @@ -59,6 +59,15 @@ struct __add_ref<_Tp&> { typedef _Tp& type; }; + // Adds an rvalue reference to a non-reference type. + template +struct __add_r_ref +{ typedef _Tp&& type; }; + + template +struct __add_r_ref<_Tp&> +{ typedef _Tp& type; }; + template struct _Head_base; @@ -78,13 +87,6 @@ _Head& _M_head() { return *this; } const _Head& _M_head() const { return *this; } - - void - _M_swap_impl(_Head& __h) - { - using std::swap; - swap(__h, _M_head()); - } }; template @@ -103,13 +105,6 @@ _Head& _M_head() { return _M_head_impl; } const _Head& _M_head() const { return _M_head_impl; } - void - _M_swap_impl(_Head& __h) - { - using std::swap; - swap(__h, _M_head()); - } - _Head _M_head_impl; }; @@ -130,9 +125,11 @@ */ template struct _Tuple_impl<_Idx> -{ +{ + template friend class _Tuple_impl; + protected: - void _M_swap_impl(_Tuple_impl&) { /* no-op */ } + void _M_swap(_Tuple_impl&) noexcept { /* no-op */ } }; /** @@ -145,6 +142,8 @@ : public _Tuple_impl<_Idx + 1, _Tail...>, private _Head_base<_Idx, _Head, std::is_empty<_Head>::value> { + template friend class _Tuple_impl; + typedef _Tuple_impl<_Idx + 1, _Tail...> _Inherited; typedef _Head_base<_Idx, _Head, std::is_empty<_Head>::value> _Base; @@ -218,10 +217,14 @@ protected: void - _M_swap_impl(_Tuple_impl& __in) + _M_swap(_Tuple_impl& __in) + noexcept(noexcept(swap(std::declval<_Head&>(), +std::declval<_Head&>())) + && noexcept(__in._M_tail()._M_swap(__in._M_tail( { - _Base::_M_swap_impl(__in._M_head()); - _Inherited::_M_swap_impl(__in._M_tail()); + using std::swap; + swap(this->_M_head(), __in._M_head()); + _Inherited::_M_swap(__in._M_tail()); } }; @@ -300,14 +303,15 @@ void swap(tuple& __in) - { _Inherited::_M_swap_impl(__in); } + noexcept(noexcept(__in._M_swap(__in))) + { _Inherited::_M_swap(__in); } }; template<> class tuple<> { public: - void swap(tuple&) { /* no-op */ } + void swap(tuple&) noexcept { /* no-op */ } }; /// tuple (2-element), with construction and assignment from a pair. @@ -360,6 +364,7 @@ tuple& operator=(tuple&& __in) + // noexcept has to wait is_nothrow_move_assignable { static_cast<_Inherited&>(*this) = std::move(__in); return *this; @@ -392,7 +397,7 @@ template tuple& -operator=(pair<_U1, _U2>&& __in) +operator=(pair<_U1, _U2>&& __in) noexcept { this->_M_head() = std::forward<_U1>(__in.first); this->_M_tail()._M_head() = std::forward<_U2>(__in.second); @@ -401,11 +406,8 @@ void swap(tuple& __in) - { - using std::swap; - swap(this->_M_head(), __in._M_head()); - swap(this->_M_tail()._M_head(), __in._M_tail()._M_head()); - } + noexcept(noexcept(__in._M_swap(__in))) + { _Inherited::_M_swap(__in); } }; /// tuple (1-element). @@ -473,7 +475,8 @@ void swap(tuple& __in) - { _Inherited::_M_swap_impl(__in); } + noexcept(noexcept(__in._M_swap(__in))) + { _Inherited::_M_swap(__in); } }; @@ -522,22 +525,31 @@ __get_helper(const _Tuple_impl<__i, _Head, _Tail...>& __t) { return __t._M_head(); } - // Return a reference (const reference) to the ith element of a tuple. - // Any const or non-const ref elements are returned with their original type. + // Return a reference (const reference, rvalue reference) to the ith element + // of a tuple. Any const or non-const ref elements are returned with their + // original type. template inline typename __add_ref< - typename tuple_element<__i, tuple<_Elements...> >::type + typename tuple_element<__i, tuple<_Elements...>>::type >::type -get(tuple<_Elements...>& __t) +get(tuple<_Elements...>& __t) noexcept { return __get_helper<__i>(__t); }
Fix PR 49026 (-mfpmath= attribute bug)
PR 49026 identified testsuite regressions when mfpmath= is set by target attributes, that for some reason appear on x86_64-darwin but not x86_64-linux. This patch fixes one place where I failed to preserve the logic of this attribute handling, and restores the code generated for the testcase to the code attached to that PR as being generated before my previous patch. Bootstrapped with no regressions on x86_64-unknown-linux-gnu. Applied to mainline. 2011-05-17 Joseph Myers * config/i386/i386.c (ix86_valid_target_attribute_tree): Use enum_opts_set when testing if attributes have set -mfpmath=. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 173809) +++ gcc/config/i386/i386.c (working copy) @@ -4692,7 +4692,7 @@ ix86_valid_target_attribute_tree (tree a || target_flags != def->x_target_flags || option_strings[IX86_FUNCTION_SPECIFIC_ARCH] || option_strings[IX86_FUNCTION_SPECIFIC_TUNE] - || ix86_fpmath != def->x_ix86_fpmath) + || enum_opts_set.x_ix86_fpmath) { /* If we are using the default tune= or arch=, undo the string assigned, and use the default. */ -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH]: Restore bootstrap with --enable-build-with-cxx
On Tue, May 17, 2011 at 2:46 PM, Toon Moene wrote: > On 05/17/2011 08:32 PM, Uros Bizjak wrote: > >> Tested on x86_64-pc-linux-gnu {, m32} with --enable-build-with-cxx. >> Committed to mainline SVN as obvious. > > Does that mean that I can now remove the --disable-werror from my daily C++ > bootstrap run ? > > It's great that some people understand the intricacies of the > infight^H^H^H^H^H^H differences between the C and C++ type model. > > OK: 1/2 :-) I suspect this infight would vanish if we just switched, as we discussed in the past. -- Gaby
Re: [google] Parameterize function overhead estimate for inlining
You will have a followup patch to override arm defaults, right? Ok for google/main. Thanks, David On Tue, May 17, 2011 at 9:29 PM, Mark Heffernan wrote: > This tiny change improves the size estimation for inlining and results in an > average 1% size reduction and a small (maybe 0.25% geomean) performance > increase on internal benchmarks on x86-64. I parameterized the value rather > than changing it directly because previous exploration with x86 and ARM > arches indicated that it varies significantly with architecture. Default > value is tuned for x86-64. > Bootstrapped and tested on x86-64. Will explore relevance and effectiveness > for trunk and SPEC later. > Ok for google/main? > Mark > 2011-05-17 Mark Heffernan > > > > > > * ipa-inline.c (estimate_function_body_sizes): Parameterize static > > > function static overhead. > > > * params.def (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE): New parameter. > Index: ipa-inline.c > === > --- ipa-inline.c (revision 173845) > +++ ipa-inline.c (working copy) > @@ -1979,10 +1979,11 @@ estimate_function_body_sizes (struct cgr > gcov_type time = 0; > gcov_type time_inlining_benefit = 0; > /* Estimate static overhead for function prologue/epilogue and alignment. > */ > - int size = 2; > + int size = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE); > /* Benefits are scaled by probability of elimination that is in range > <0,2>. */ > - int size_inlining_benefit = 2 * 2; > + int size_inlining_benefit = > + PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE) * 2; > basic_block bb; > gimple_stmt_iterator bsi; > struct function *my_function = DECL_STRUCT_FUNCTION (node->decl); > Index: params.def > === > --- params.def (revision 173845) > +++ params.def (working copy) > @@ -110,6 +110,11 @@ DEFPARAM (PARAM_MIN_INLINE_RECURSIVE_PRO > "Inline recursively only when the probability of call being executed > exceeds the parameter", > 10, 0, 0) > > +DEFPARAM (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE, > + "inline-function-overhead-size", > + "Size estimate of function overhead (prologue and epilogue) for inlining > purposes", > + 7, 0, 0) > + > /* Limit of iterations of early inliner. This basically bounds number of > nested indirect calls early inliner can resolve. Deeper chains are > still > handled by late inlining. */ >
[google] Increase inlining limits with FDO/LIPO
This small patch greatly expands the function size limits for inlining with FDO/LIPO. With profile information, the inliner is much more selective and precise and so the limits can be increased with less worry that functions and total code size will blow up. This speeds up x86-64 internal benchmarks by about geomean 1.5% to 3% with LIPO (depending on microarch), and 1% to 1.5% with FDO. Size increase is negligible (0.1% mean). Bootstrapped and regression tested on x86-64. Trunk testing to follow. Ok for google/main? Mark 2011-05-17 Mark Heffernan * opts.c (finish_options): Increase inlining limits with profile generate and use. Index: opts.c === --- opts.c (revision 173666) +++ opts.c (working copy) @@ -828,6 +828,22 @@ finish_options (struct gcc_options *opts opts->x_flag_split_stack = 0; } } + + if (opts->x_flag_profile_use + || opts->x_profile_arc_flag + || opts->x_flag_profile_values) +{ + /* With accurate profile information, inlining is much more +selective and makes better decisions, so increase the +inlining function size limits. Changes must be added to both +the generate and use builds to avoid profile mismatches. */ + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_SINGLE, 1000, +opts->x_param_values, opts_set->x_param_values); + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_AUTO, 1000, +opts->x_param_values, opts_set->x_param_values); +} }
[patch] [1/2] Support reduction in loop SLP
Hi, This is the first part of reduction support in loop-aware SLP. The purpose of the patch is to handle "unrolled" reductions such as: #a1 = phi ... a2 = a1 + x ... a3 = a2 + y ... a5 = a4 + z Such sequence of statements is gathered into a reduction chain and serves as a root for an SLP instance (similar to a group of strided stores in the existing loop SLP implementation). The patch also fixes PR tree-optimization/41881. Since reduction chains use the same data structure as strided data accesses, this part of the patch renames these data structures, removing data-ref and interleaving references. Bootstrapped and tested on powerpc64-suse-linux. I am going to apply it later today. Ira ChangeLog: * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Use new names for group elements access. * tree-vectorizer.h (struct _stmt_vec_info): Use interleaving info for reduction chains as well. Remove data reference and interleaving related words from the fields names. * tree-vect-loop.c (vect_transform_loop): Use new names for group elements access. * tree-vect-data-refs.c (vect_get_place_in_interleaving_chain, vect_insert_into_interleaving_chain, vect_update_interleaving_chain, vect_update_interleaving_chain, vect_same_range_drs, vect_analyze_data_ref_dependence, vect_update_misalignment_for_peel, vect_verify_datarefs_alignment, vector_alignment_reachable_p, vect_peeling_hash_get_lowest_cost, vect_enhance_data_refs_alignment, vect_analyze_group_access, vect_analyze_data_ref_access, vect_create_data_ref_ptr, vect_transform_strided_load, vect_record_strided_load_vectors): Likewise. * tree-vect-stmts.c (vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost, vectorizable_store, vectorizable_load, vect_remove_stores, new_stmt_vec_info): Likewise. * tree-vect-slp.c (vect_build_slp_tree, vect_supported_slp_permutation_p, vect_analyze_slp_instance): Likewise. Index: tree-vect-loop-manip.c === --- tree-vect-loop-manip.c (revision 173814) +++ tree-vect-loop-manip.c (working copy) @@ -2437,7 +2437,7 @@ vect_create_cond_for_alias_checks (loop_vec_info l dr_a = DDR_A (ddr); stmt_a = DR_STMT (DDR_A (ddr)); - dr_group_first_a = DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt_a)); + dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); if (dr_group_first_a) { stmt_a = dr_group_first_a; @@ -2446,7 +2446,7 @@ vect_create_cond_for_alias_checks (loop_vec_info l dr_b = DDR_B (ddr); stmt_b = DR_STMT (DDR_B (ddr)); - dr_group_first_b = DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt_b)); + dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b)); if (dr_group_first_b) { stmt_b = dr_group_first_b; Index: tree-vectorizer.h === --- tree-vectorizer.h (revision 173814) +++ tree-vectorizer.h (working copy) @@ -468,15 +473,15 @@ typedef struct _stmt_vec_info { /* Whether the stmt is SLPed, loop-based vectorized, or both. */ enum slp_vect_type slp_type; - /* Interleaving info. */ - /* First data-ref in the interleaving group. */ - gimple first_dr; - /* Pointer to the next data-ref in the group. */ - gimple next_dr; - /* In case that two or more stmts share data-ref, this is the pointer to the - previously detected stmt with the same dr. */ + /* Interleaving and reduction chains info. */ + /* First element in the group. */ + gimple first_element; + /* Pointer to the next element in the group. */ + gimple next_element; + /* For data-refs, in case that two or more stmts share data-ref, this is the + pointer to the previously detected stmt with the same dr. */ gimple same_dr_stmt; - /* The size of the interleaving group. */ + /* The size of the group. */ unsigned int size; /* For stores, number of stores from this group seen. We vectorize the last one. */ @@ -527,22 +532,22 @@ typedef struct _stmt_vec_info { #define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt #define STMT_VINFO_SAME_ALIGN_REFS(S) (S)->same_align_refs #define STMT_VINFO_DEF_TYPE(S) (S)->def_type -#define STMT_VINFO_DR_GROUP_FIRST_DR(S)(S)->first_dr -#define STMT_VINFO_DR_GROUP_NEXT_DR(S) (S)->next_dr -#define STMT_VINFO_DR_GROUP_SIZE(S)(S)->size -#define STMT_VINFO_DR_GROUP_STORE_COUNT(S) (S)->store_count -#define STMT_VINFO_DR_GROUP_GAP(S) (S)->gap -#define STMT_VINFO_DR_GROUP_SAME_DR_STMT(S)(S)->same_dr_stmt -#define STMT_VINFO_DR_GROUP_READ_WRITE_DEPENDENCE(S) (S)->read_write_dep -#define STMT_VINFO_STRIDED_ACCESS(S) ((S)->first_dr != NULL) +#define STMT_VINFO_GROUP_FIRST_ELEMENT(S) (S)->first_element +#define STMT_VINFO
[patch] [2/2] Support reduction in loop SLP
This part adds the actual code for reduction support. Bootstrapped and tested on powerpc64-suse-linux. I am planning to apply it later today. Ira ChangeLog: PR tree-optimization/41881 * tree-vectorizer.h (struct _loop_vec_info): Add new field reduction_chains along with a macro for its access. * tree-vect-loop.c (new_loop_vec_info): Initialize reduction chains. (destroy_loop_vec_info): Free reduction chains. (vect_analyze_loop_2): Return false if vect_analyze_slp() returns false. (vect_is_slp_reduction): New function. (vect_is_simple_reduction_1): Call vect_is_slp_reduction. (vect_create_epilog_for_reduction): Support SLP reduction chains. * tree-vect-slp.c (vect_get_and_check_slp_defs): Allow different definition types for reduction chains. (vect_supported_load_permutation_p): Don't allow permutations for reduction chains. (vect_analyze_slp_instance): Support reduction chains. (vect_analyze_slp): Try to build SLP instance from reduction chains. (vect_get_constant_vectors): Handle reduction chains. (vect_schedule_slp_instance): Mark the first statement of the reduction chain as reduction. testsuite/ChangeLog: PR tree-optimization/41881 * gcc.dg/vect/O3-pr41881.c: New test. * gcc.dg/vect/O3-slp-reduc-10.c: New test. Index: testsuite/gcc.dg/vect/O3-slp-reduc-10.c === --- testsuite/gcc.dg/vect/O3-slp-reduc-10.c (revision 0) +++ testsuite/gcc.dg/vect/O3-slp-reduc-10.c (revision 0) @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 128 +#define TYPE int +#define RESULT 755918 + +__attribute__ ((noinline)) TYPE fun2 (TYPE *x, TYPE *y, unsigned int n) +{ + int i, j; + TYPE dot = 14; + + for (i = 0; i < n / 2; i++) +for (j = 0; j < 2; j++) + dot += *(x++) * *(y++); + + return dot; +} + +int main (void) +{ + TYPE a[N], b[N], dot; + int i; + + check_vect (); + + for (i = 0; i < N; i++) +{ + a[i] = i; + b[i] = i+8; +} + + dot = fun2 (a, b, N); + if (dot != RESULT) +abort(); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target { vect_int_mult && {! vect_no_align } } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ Index: testsuite/gcc.dg/vect/O3-pr41881.c === --- testsuite/gcc.dg/vect/O3-pr41881.c (revision 0) +++ testsuite/gcc.dg/vect/O3-pr41881.c (revision 0) @@ -0,0 +1,30 @@ +/* { dg-do compile } */ + +#define TYPE int + +TYPE fun1(TYPE *x, TYPE *y, unsigned int n) +{ + int i, j; + TYPE dot = 0; + + for (i = 0; i < n; i++) +dot += *(x++) * *(y++); + + return dot; +} + +TYPE fun2(TYPE *x, TYPE *y, unsigned int n) +{ + int i, j; + TYPE dot = 0; + + for (i = 0; i < n / 8; i++) +for (j = 0; j < 8; j++) + dot += *(x++) * *(y++); + + return dot; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target { vect_int_mult && {! vect_no_align } } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + Index: tree-vectorizer.h === --- tree-vectorizer.h (revision 173814) +++ tree-vectorizer.h (working copy) @@ -248,6 +248,10 @@ typedef struct _loop_vec_info { /* Reduction cycles detected in the loop. Used in loop-aware SLP. */ VEC (gimple, heap) *reductions; + /* All reduction chains in the loop, represented by the first + stmt in the chain. */ + VEC (gimple, heap) *reduction_chains; + /* Hash table used to choose the best peeling option. */ htab_t peeling_htab; @@ -277,6 +281,7 @@ typedef struct _loop_vec_info { #define LOOP_VINFO_SLP_INSTANCES(L)(L)->slp_instances #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor #define LOOP_VINFO_REDUCTIONS(L) (L)->reductions +#define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains #define LOOP_VINFO_PEELING_HTAB(L) (L)->peeling_htab #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ Index: tree-vect-loop.c === --- tree-vect-loop.c(revision 173814) +++ tree-vect-loop.c(working copy) @@ -757,6 +757,7 @@ new_loop_vec_info (struct loop *loop) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS)); LOOP_VINFO_STRIDED_STORES (res) = VEC_alloc (gimple, heap, 10); LOOP_VINFO_REDUCTIONS (res) = VEC_alloc (gimple, heap, 10); + LOOP_VINFO_REDUCTION_CHAINS (res) = VEC_alloc (gimple, heap, 10); LOOP_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 10); LOOP_VINFO_SLP_UNROLLING_FACTOR (res) = 1; LOOP_VINFO_PEELING_HTAB (res) = NULL; @@ -852,6 +853,7 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, b VEC_free (slp_instance, heap,
[PATCH] Fix up execute_update_addresses_taken for debug stmts (PR tree-optimization/49000)
Hi! When an addressable var is optimized into non-addressable, we didn't clean up MEM_REFs containing ADDR_EXPR of such VARs in debug stmts. This got later on folded into the var itself and caused ssa verification errors. Fixed by trying to rewrite it and if it fails, resetting. Bootstrapped/regtested on x86_64-linux and i686-linux, no change in cc1plus .debug_info/.debug_loc, implicitptr.c testcase still works too. Ok for trunk/4.6? 2011-05-18 Jakub Jelinek PR tree-optimization/49000 * tree-ssa.c (execute_update_addresses_taken): Call maybe_rewrite_mem_ref_base on debug stmt value. If it couldn't be rewritten and decl has been marked for renaming, reset the debug stmt. * gcc.dg/pr49000.c: New test. --- gcc/tree-ssa.c.jj 2011-05-11 19:39:04.0 +0200 +++ gcc/tree-ssa.c 2011-05-17 18:20:10.0 +0200 @@ -2230,6 +2230,17 @@ execute_update_addresses_taken (void) } } + else if (gimple_debug_bind_p (stmt) +&& gimple_debug_bind_has_value_p (stmt)) + { + tree *valuep = gimple_debug_bind_get_value_ptr (stmt); + tree decl; + maybe_rewrite_mem_ref_base (valuep); + decl = non_rewritable_mem_ref_base (*valuep); + if (decl && symbol_marked_for_renaming (decl)) + gimple_debug_bind_reset_value (stmt); + } + if (gimple_references_memory_p (stmt) || is_gimple_debug (stmt)) update_stmt (stmt); --- gcc/testsuite/gcc.dg/pr49000.c.jj 2011-05-17 18:30:10.0 +0200 +++ gcc/testsuite/gcc.dg/pr49000.c 2011-05-17 18:23:16.0 +0200 @@ -0,0 +1,29 @@ +/* PR tree-optimization/49000 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -g" } */ + +static +foo (int x, int y) +{ + return x * y; +} + +static int +bar (int *z) +{ + return *z; +} + +void +baz (void) +{ + int a = 42; + int *b = &a; + foo (bar (&a), 3); +} + +void +test (void) +{ + baz (); +} Jakub
[PATCH] Small typed DWARF improvement
Hi! This patch optimizes away unneeded DW_OP_GNU_converts. mem_loc_descriptor attempts to keep the operands signed when it returns, if next op needs it unsigned again with the same size, there might be useless converts. The patch won't change DW_OP_GNU_convert to integral from non-integral (so that say float to {un,}signed conversion is done with the right sign), for other converts will change if possible preceeding typed op's base type if size is the same, both the typed op and following DW_OP_GNU_convert are integral or have the same encoding. Example testcase which is improved is e.g.: /* { dg-do run } */ /* { dg-options "-g" } */ volatile int vv; __attribute__((noclone, noinline)) void foo (double d) { unsigned long f = ((unsigned long) d) / 33UL; vv++; /* { dg-final { gdb-test 10 "f" "7" } } */ } int main () { foo (231.0); return 0; } where previously we emitted DW_OP_GNU_regval_type DW_OP_GNU_convert DW_OP_GNU_convert DW_OP_GNU_convert DW_OP_const1u <33> DW_OP_GNU_convert DW_OP_div DW_OP_GNU_convert while with this patch DW_OP_GNU_convert DW_OP_GNU_convert can go away. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-05-17 Jakub Jelinek * dwarf2out.c (resolve_addr_in_expr): Optimize away redundant DW_OP_GNU_convert ops. --- gcc/dwarf2out.c.jj 2011-05-17 13:35:26.0 +0200 +++ gcc/dwarf2out.c 2011-05-17 14:41:21.0 +0200 @@ -24092,23 +24092,84 @@ resolve_one_addr (rtx *addr, void *data static bool resolve_addr_in_expr (dw_loc_descr_ref loc) { + dw_loc_descr_ref keep = NULL; for (; loc; loc = loc->dw_loc_next) -if (((loc->dw_loc_opc == DW_OP_addr || loc->dtprel) -&& resolve_one_addr (&loc->dw_loc_oprnd1.v.val_addr, NULL)) - || (loc->dw_loc_opc == DW_OP_implicit_value - && loc->dw_loc_oprnd2.val_class == dw_val_class_addr - && resolve_one_addr (&loc->dw_loc_oprnd2.v.val_addr, NULL))) - return false; -else if (loc->dw_loc_opc == DW_OP_GNU_implicit_pointer -&& loc->dw_loc_oprnd1.val_class == dw_val_class_decl_ref) +switch (loc->dw_loc_opc) { - dw_die_ref ref - = lookup_decl_die (loc->dw_loc_oprnd1.v.val_decl_ref); - if (ref == NULL) + case DW_OP_addr: + if (resolve_one_addr (&loc->dw_loc_oprnd1.v.val_addr, NULL)) return false; - loc->dw_loc_oprnd1.val_class = dw_val_class_die_ref; - loc->dw_loc_oprnd1.v.val_die_ref.die = ref; - loc->dw_loc_oprnd1.v.val_die_ref.external = 0; + break; + case DW_OP_const4u: + case DW_OP_const8u: + if (loc->dtprel + && resolve_one_addr (&loc->dw_loc_oprnd1.v.val_addr, NULL)) + return false; + break; + case DW_OP_implicit_value: + if (loc->dw_loc_oprnd2.val_class == dw_val_class_addr + && resolve_one_addr (&loc->dw_loc_oprnd2.v.val_addr, NULL)) + return false; + break; + case DW_OP_GNU_implicit_pointer: + if (loc->dw_loc_oprnd1.val_class == dw_val_class_decl_ref) + { + dw_die_ref ref + = lookup_decl_die (loc->dw_loc_oprnd1.v.val_decl_ref); + if (ref == NULL) + return false; + loc->dw_loc_oprnd1.val_class = dw_val_class_die_ref; + loc->dw_loc_oprnd1.v.val_die_ref.die = ref; + loc->dw_loc_oprnd1.v.val_die_ref.external = 0; + } + break; + case DW_OP_GNU_const_type: + case DW_OP_GNU_regval_type: + case DW_OP_GNU_deref_type: + case DW_OP_GNU_convert: + case DW_OP_GNU_reinterpret: + while (loc->dw_loc_next + && loc->dw_loc_next->dw_loc_opc == DW_OP_GNU_convert) + { + dw_die_ref base1, base2; + unsigned enc1, enc2, size1, size2; + if (loc->dw_loc_opc == DW_OP_GNU_regval_type + || loc->dw_loc_opc == DW_OP_GNU_deref_type) + base1 = loc->dw_loc_oprnd2.v.val_die_ref.die; + else + base1 = loc->dw_loc_oprnd1.v.val_die_ref.die; + base2 = loc->dw_loc_next->dw_loc_oprnd1.v.val_die_ref.die; + gcc_assert (base1->die_tag == DW_TAG_base_type + && base2->die_tag == DW_TAG_base_type); + enc1 = get_AT_unsigned (base1, DW_AT_encoding); + enc2 = get_AT_unsigned (base2, DW_AT_encoding); + size1 = get_AT_unsigned (base1, DW_AT_byte_size); + size2 = get_AT_unsigned (base2, DW_AT_byte_size); + if (size1 == size2 + && (((enc1 == DW_ATE_unsigned || enc1 == DW_ATE_signed) +&& (enc2 == DW_ATE_unsigned || enc2 == DW_ATE_signed) +&& loc != keep) + || enc1 == enc2)) + { + /* Optimize away next DW_OP_GNU_convert after + adjusting LOC's base type die reference. */ + if (loc->dw_loc_opc == DW_OP_GNU_regval_type + || loc->dw_loc_opc == DW_O
Re: [google] Increase inlining limits with FDO/LIPO
To make consistent inline decisions between profile-gen and profile-use, probably better to check these two: flag_profile_arcs and flag_branch_probabilities. -fprofile-use enables profile-arcs, and value profiling is enabled only when edge/branch profiling is enabled (so no need to be checked). David On Tue, May 17, 2011 at 10:50 PM, Mark Heffernan wrote: > This small patch greatly expands the function size limits for inlining with > FDO/LIPO. With profile information, the inliner is much more selective and > precise and so the limits can be increased with less worry that functions > and total code size will blow up. This speeds up x86-64 internal benchmarks > by about geomean 1.5% to 3% with LIPO (depending on microarch), and 1% to > 1.5% with FDO. Size increase is negligible (0.1% mean). > Bootstrapped and regression tested on x86-64. > Trunk testing to follow. > Ok for google/main? > Mark > > 2011-05-17 Mark Heffernan > * opts.c (finish_options): Increase inlining limits with profile > generate and use. > > Index: opts.c > === > --- opts.c (revision 173666) > +++ opts.c (working copy) > @@ -828,6 +828,22 @@ finish_options (struct gcc_options *opts > opts->x_flag_split_stack = 0; > } > } > + > + if (opts->x_flag_profile_use > + || opts->x_profile_arc_flag > + || opts->x_flag_profile_values) > + { > + /* With accurate profile information, inlining is much more > + selective and makes better decisions, so increase the > + inlining function size limits. Changes must be added to both > + the generate and use builds to avoid profile mismatches. */ > + maybe_set_param_value > + (PARAM_MAX_INLINE_INSNS_SINGLE, 1000, > + opts->x_param_values, opts_set->x_param_values); > + maybe_set_param_value > + (PARAM_MAX_INLINE_INSNS_AUTO, 1000, > + opts->x_param_values, opts_set->x_param_values); > + } > } >
Re: [patch gimplifier]: Make sure TRUTH_NOT_EXPR has boolean_type_node type and argument
2011/5/16 Richard Guenther : > On Mon, May 16, 2011 at 3:45 PM, Michael Matz wrote: >> Hi, >> >> On Mon, 16 May 2011, Richard Guenther wrote: >> >>> > I think conversion _to_ BOOLEAN_TYPE shouldn't be useless, on the >>> > grounds that it requires booleanization (at least conceptually), i.e. >>> > conversion to a set of two values (no matter the precision or size) >>> > based on the outcome of comparing the RHS value with >>> > false_pre_image(TREE_TYPE(RHS)). >>> > >>> > Conversion _from_ BOOLEAN_TYPE can be regarded as useless, as the >>> > conversions from false or true into false_pre_image or true_pre_image >>> > always is simply an embedding of 0 or 1/-1 (depending on target type >>> > signedness). And if the BOOLEAN_TYPE and the LHS have same signedness >>> > the bit representation of boolean_true_type is (or should be) the same >>> > as the one converted to LHS (namely either 1 or -1). >>> >>> Sure, that would probably be enough to prevent non-BOOLEAN_TYPEs be used >>> where BOOLEAN_TYPE nodes were used before. It still will cause an >>> artificial conversion from a single-bit bitfield read to a bool. >> >> Not if you're special casing single-bit conversions (on the grounds that a >> booleanization from two-valued set to a different two-valued set of >> the same signedness will not actually require a comparison). I think it's >> better to be very precise in our base predicates than to add various hacks >> over the place to care for imprecision. > > Or require a 1-bit integral type for TRUTH_* operands only (which ensures > the two-valueness which is what we really want). That can be done > by either fixing the frontends to make boolean_type_node have 1-bit > precision or to build a middle-end private type with that constraints > (though that's the more difficult route as we still do not have a strong > FE - middle-end hand-off point, and it certainly is not the gimplifier). > > Long term all the global trees should be FE private and the middle-end > should have its own set. > > Richard. > >> >> Ciao, >> Michael. > Hello, initial idea was to check for logical operations that the conversion to boolean_type_node is useless. This assumption was flawed by the fact that boolean_type_node gets re-defined in free_lang_decl to a 1-bit precision BOOL_TYPE_SIZE-ed type, if FE's boolean_type_node is incompatible to this. By this FE's boolean_type_node gets via the back-door incompatible in tree-cfg checks. So for all languages - but ADA - logical types have precision set to one. Just for ADA case, which requires a different boolean_type_node kind, we need to inspect the inner type to be a boolean. As Fortran has also integer typed boolean compatible types, we can't simply check for BOOLEAN_TYPE here and need to check for precision first. ChangeLog 2011-05-18 Kai Tietz PR middle-end/48989 * tree-cfg.c (verify_gimple_assign_binary): Check lhs type for being compatible to boolean for logical operations. (verify_gimple_assign_unary): Likewise. (compatible_boolean_type_p): New helper. Bootstrapped on x86_64-pc-linux-gnu. And regression tested for ADA and Fortran. Ok for apply? Regards, Kai Index: gcc/gcc/tree-cfg.c === --- gcc.orig/gcc/tree-cfg.c 2011-05-16 14:26:12.369031500 +0200 +++ gcc/gcc/tree-cfg.c 2011-05-18 08:20:34.935819100 +0200 @@ -3220,6 +3220,31 @@ verify_gimple_comparison (tree type, tre return false; } +/* Checks TYPE for being compatible to boolean. Returns + FALSE, if type is not compatible, otherwise TRUE. + + A type is compatible if + a) TYPE_PRECISION is one. + b) The type - or the inner type - is of kind BOOLEAN_TYPE. */ + +static bool +compatible_boolean_type_p (tree type) +{ + if (!type) +return false; + if (TYPE_PRECISION (type) == 1) +return true; + + /* We try to look here into inner type, as ADA uses + boolean_type_node with type precision != 1. */ + while (TREE_TYPE (type) +&& (TREE_CODE (type) == INTEGER_TYPE +|| TREE_CODE (type) == REAL_TYPE)) +type = TREE_TYPE (type); + + return TYPE_PRECISION (type) == 1 || TREE_CODE (type) == BOOLEAN_TYPE; +} + /* Verify a gimple assignment statement STMT with an unary rhs. Returns true if anything is wrong. */ @@ -3350,15 +3375,16 @@ verify_gimple_assign_unary (gimple stmt) return false; case TRUTH_NOT_EXPR: - if (!useless_type_conversion_p (boolean_type_node, rhs1_type)) + + if (!useless_type_conversion_p (lhs_type, rhs1_type) + || !compatible_boolean_type_p (lhs_type)) { - error ("invalid types in truth not"); - debug_generic_expr (lhs_type); - debug_generic_expr (rhs1_type); - return true; + error ("invalid types in truth not"); + debug_generic_expr (lhs_type); + debug_generic_expr (rhs1_type); + return true; } break; -