Re: [PATCH] Fix PR54826
On Fri, Oct 05, 2012 at 07:24:53PM -0700, Dehao Chen wrote: > This patch fixes PR54826. When lowering the gimple, the block for call > arg also need to be reset. > > Bootstrapped and passed gcc regression test on x86. > > Okay for trunk? > > Thanks, > Dehao > > 2012-10-05 Dehao Chen > > * gimple-low.c (lower_stmt): Set the block for call args. Yes, thanks. > --- gcc/gimple-low.c (revision 192147) > +++ gcc/gimple-low.c (working copy) > @@ -425,7 +425,15 @@ lower_stmt (gimple_stmt_iterator *gsi, struct lowe > case GIMPLE_CALL: >{ > tree decl = gimple_call_fndecl (stmt); > + unsigned i; > > + for (i = 0; i < gimple_call_num_args (stmt); i++) > + { > + tree arg = gimple_call_arg (stmt, i); > + if (EXPR_P (arg)) > + TREE_SET_BLOCK (arg, data->block); > + } > + > if (decl > && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL > && DECL_FUNCTION_CODE (decl) == BUILT_IN_SETJMP) Jakub
Re: [RFC] Make vectorizer to skip loops with small iteration estimate
Hi, I benchmarked the patch moving loop header copying and it is quite noticeable win. Some testsuite updating is needed. In many cases it is just because the optimizations are now happening earlier. There are however few testusite failures I have torubles to deal with: ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-ssa/pr21559.c scan-tree-dump-times vrp1 "Threaded jump" 3 ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-ssa/ssa-dom-thread-2.c scan-tree-dump-times vrp1 "Jumps threaded: 1" 1 ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/vect/O3-slp-reduc-10.c scan-tree-dump-times vect "vectorized 1 loops" 2 ./testsuite/g++/g++.sum:FAIL: g++.dg/tree-ssa/pr18178.C -std=gnu++98 scan-tree-dump-times vrp1 "if " 1 ./testsuite/g++/g++.sum:FAIL: g++.dg/tree-ssa/pr18178.C -std=gnu++11 scan-tree-dump-times vrp1 "if " 1 This is mostly about VRP losing its ability to thread some jumps from the duplicated loop header out of the loop across the loopback edge. This seems to be due to loop updating logic. Do we care about these? Honza Index: tree-ssa-threadupdate.c === *** tree-ssa-threadupdate.c (revision 192123) --- tree-ssa-threadupdate.c (working copy) *** static bool *** 846,854 def_split_header_continue_p (const_basic_block bb, const void *data) { const_basic_block new_header = (const_basic_block) data; ! return (bb != new_header ! && (loop_depth (bb->loop_father) ! >= loop_depth (new_header->loop_father))); } /* Thread jumps through the header of LOOP. Returns true if cfg changes. --- 846,860 def_split_header_continue_p (const_basic_block bb, const void *data) { const_basic_block new_header = (const_basic_block) data; ! const struct loop *l; ! ! if (bb == new_header ! || loop_depth (bb->loop_father) < loop_depth (new_header->loop_father)) ! return false; ! for (l = bb->loop_father; l; l = loop_outer (l)) ! if (l == new_header->loop_father) ! return true; ! return false; } /* Thread jumps through the header of LOOP. Returns true if cfg changes. Index: testsuite/gcc.dg/unroll_2.c === *** testsuite/gcc.dg/unroll_2.c (revision 192123) --- testsuite/gcc.dg/unroll_2.c (working copy) *** *** 1,5 /* { dg-do compile { target i?86-*-linux* x86_64-*-linux* } } */ ! /* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo -fenable-rtl-loop2_unroll" } */ unsigned a[100], b[100]; inline void bar() --- 1,5 /* { dg-do compile { target i?86-*-linux* x86_64-*-linux* } } */ ! /* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo -fenable-rtl-loop2_unroll -fno-tree-dominator-opts" } */ unsigned a[100], b[100]; inline void bar() Index: testsuite/gcc.dg/unroll_3.c === *** testsuite/gcc.dg/unroll_3.c (revision 192123) --- testsuite/gcc.dg/unroll_3.c (working copy) *** *** 1,5 /* { dg-do compile { target i?86-*-linux* x86_64-*-linux* } } */ ! /* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } */ unsigned a[100], b[100]; inline void bar() --- 1,5 /* { dg-do compile { target i?86-*-linux* x86_64-*-linux* } } */ ! /* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo -fno-tree-dominator-opts" } */ unsigned a[100], b[100]; inline void bar() Index: testsuite/gcc.dg/torture/pr23821.c === *** testsuite/gcc.dg/torture/pr23821.c (revision 192123) --- testsuite/gcc.dg/torture/pr23821.c (working copy) *** *** 1,9 /* { dg-do compile } */ /* { dg-skip-if "" { *-*-* } { "-O0" "-fno-fat-lto-objects" } { "" } } */ ! /* At -O1 DOM threads a jump in a non-optimal way which leads to the bogus propagation. */ ! /* { dg-skip-if "" { *-*-* } { "-O1" } { "" } } */ ! /* { dg-options "-fdump-tree-ivcanon-details" } */ int a[199]; --- 1,8 /* { dg-do compile } */ /* { dg-skip-if "" { *-*-* } { "-O0" "-fno-fat-lto-objects" } { "" } } */ ! /* DOM threads a jump in a non-optimal way which leads to the bogus propagation. */ ! /* { dg-options "-fdump-tree-ivcanon-details -fno-tree-dominator-opts" } */ int a[199]; Index: testsuite/gcc.dg/tree-ssa/ivopt_1.c === *** testsuite/gcc.dg/tree-ssa/ivopt_1.c (revision 192123) --- testsuite/gcc.dg/tree-ssa/ivopt_1.c (working copy) *** void foo (int i_width, TYPE dst, TYPE sr *** 14,18 } ! /* { dg-final { scan-tree-dump-times "PHI
[patch, fortran] PR 54833 Don't wrap calls to free(a) in if (a != NULL)
Hello world, the attached patch removes wrapping calls to free(a) by if (a != NULL) for some cases. It is not complete, because automatic deallocation of allocatable structure components is not yet covered. OK for trunk? Thomas 2012-10-06 Thomas König PR fortran/54833 * trans.c (gfc_call_free): Do not wrap the call to __builtin_free in check for NULL. (gfc_deallocate_with_status): For automatic deallocation without status, don't wrap call to __builtin_free in check for NULL. 2012-10-06 Thomas König PR fortran/54833 * gfortran.dg/auto_dealloc_3.f90: New test. Index: trans.c === --- trans.c (Revision 191857) +++ trans.c (Arbeitskopie) @@ -814,26 +814,23 @@ gfc_allocate_allocatable (stmtblock_t * block, tre } -/* Free a given variable, if it's not NULL. */ +/* Free a given variable. If it is NULL, free takes care of this + automatically. */ tree gfc_call_free (tree var) { stmtblock_t block; - tree tmp, cond, call; + tree call; if (TREE_TYPE (var) != TREE_TYPE (pvoid_type_node)) var = fold_convert (pvoid_type_node, var); gfc_start_block (&block); var = gfc_evaluate_now (var, &block); - cond = fold_build2_loc (input_location, NE_EXPR, boolean_type_node, var, - build_int_cst (pvoid_type_node, 0)); call = build_call_expr_loc (input_location, builtin_decl_explicit (BUILT_IN_FREE), 1, var); - tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node, cond, call, - build_empty_stmt (input_location)); - gfc_add_expr_to_block (&block, tmp); + gfc_add_expr_to_block (&block, call); return gfc_finish_block (&block); } @@ -861,11 +858,10 @@ gfc_call_free (tree var) } } - In this front-end version, status doesn't have to be GFC_INTEGER_4. - Moreover, if CAN_FAIL is true, then we will not emit a runtime error, - even when no status variable is passed to us (this is used for - unconditional deallocation generated by the front-end at end of - each procedure). + In this front-end version, status doesn't have to be GFC_INTEGER_4. If + CAN_FAIL is true and no status variable is passed, we will simply call + free(). This is used for unconditional deallocation generated by the + front-end at end of each procedure. If a runtime-message is possible, `expr' must point to the original expression being deallocated for its locus and variable name. @@ -890,6 +886,14 @@ gfc_deallocate_with_status (tree pointer, tree sta STRIP_NOPS (pointer); } + if (can_fail && status == NULL_TREE) +{ + tmp = build_call_expr_loc (input_location, + builtin_decl_explicit (BUILT_IN_FREE), 1, + fold_convert (pvoid_type_node, pointer)); + return tmp; +} + cond = fold_build2_loc (input_location, EQ_EXPR, boolean_type_node, pointer, build_int_cst (TREE_TYPE (pointer), 0)); ! { dg-do compile } ! { dg-options "-fdump-tree-original" } ! PR fortran/54833 ! Make sure we don't wrap a free() in an if(a.data != NULL) by ! counting the ifs. subroutine foo real, allocatable :: a(:) allocate(a(10)) end subroutine foo ! { dg-final { scan-tree-dump-times "if" 3 "original" } } ! { dg-final { cleanup-tree-dump "original" } }
RFA: Simplifying truncation and integer lowpart subregs
[cc:ing sh, spu and tilegx maintainers] Richard Sandiford writes: > Andrew Pinski writes: >> On Thu, Sep 27, 2012 at 11:13 AM, Uros Bizjak wrote: >>> 2012-09-27 Uros Bizjak >>> >>> PR rtl-optimization/54457 >>> * simplify-rtx.c (simplify_subreg): >>> Simplify (subreg:M (op:N ((x:N) (y:N)), 0) >>> to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where >>> the outer subreg is effectively a truncation to the original mode M. >> >> >> When I was doing something similar on our internal toolchain at >> Cavium. I found doing this caused a regression on MIPS64 n32 in >> gcc.c-torture/execute/20040709-1.c Where: >> >> >> (insn 15 14 16 2 (set (reg/v:DI 200 [ y ]) >> (reg:DI 2 $2)) t.c:16 301 {*movdi_64bit} >> (expr_list:REG_DEAD (reg:DI 2 $2) >> (nil))) >> >> (insn 16 15 17 2 (set (reg:DI 210) >> (zero_extract:DI (reg/v:DI 200 [ y ]) >> (const_int 29 [0x1d]) >> (const_int 0 [0]))) t.c:16 249 {extzvdi} >> (expr_list:REG_DEAD (reg/v:DI 200 [ y ]) >> (nil))) >> >> (insn 17 16 23 2 (set (reg:SI 211) >> (truncate:SI (reg:DI 210))) t.c:16 175 {truncdisi2} >> (expr_list:REG_DEAD (reg:DI 210) >> (nil))) >> >> Gets converted to: >> (insn 23 17 26 2 (set (reg/i:SI 2 $2) >> (and:SI (reg:SI 2 $2 [+4 ]) >> (const_int 536870911 [0x1fff]))) t.c:18 156 {*andsi3} >> (nil)) >> >> Which is considered an ext instruction >> >> And with the Octeon simulator which causes undefined arguments to >> 32bit word operations to come out as 0xDEADBEEF which showed the >> regression. I fixed it by changing it to produce TRUNCATE instead of >> the subreg. >> >> I did the simplification on ior/and rather than plus/minus/mult so the >> issue is only when expanding to this to and/ior. > > Hmm, hadn't thought of that. I think some of the existing subreg > optimisations suffer the same problem. I.e. we can't assume that > subreg truncations of nested operands are OK just because the outer > subreg is OK. > > I've got a patch I'm testing. The idea is to split most of the lowpart subreg handling out of simplify_subreg and apply it to TRUNCATE too. There are three reasons: - I wanted to make the !TRULY_NOOP_TRUNCATION truncation simplifications as similar to subreg truncation simplifications as possible. - Some of the current lowpart subreg simplifications are also correct for vector truncations. - Ideally, using simplify_gen_unary (TRUNCATE, ...) instead of simplify_gen_subreg shouldn't penalise TRULY_NOOP_TRUNCATION targets. There is already code to use gen_lowpart_no_emit for truncations that reduce to subregs, but as things stand, gen_lowpart_no_emit only passes objects like SUBREG, REG, MEM, etc., to simplify_gen_subreg; others go through gen_lowpart_SUBREG and get no recursive simplification. We inherited this code from a 1996 patch (r13058): if ((TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op)) ? (num_sign_bit_copies (op, GET_MODE (op)) > (unsigned int) (GET_MODE_PRECISION (GET_MODE (op)) - GET_MODE_PRECISION (mode))) ... return rtl_hooks.gen_lowpart_no_emit (mode, op); I don't see any reason for the sign-bit check. If truncations are noops, we should be able to use a subreg regardless. Other than removing that check, the patch just moves simplifications around. I've not tried to match new patterns. The other !TRULY_NOOP_TRUNCATION targets are sh64, spu and tilegx. I don't think sh64 has any patterns that would be adversely affected, although the patch ought to make these patterns redundant: (define_insn_and_split "*logical_sidisi3" [(set (match_operand:SI 0 "arith_reg_dest" "=r,r") (truncate:SI (sign_extend:DI (match_operator:SI 3 "logical_operator" [(match_operand:SI 1 "arith_reg_operand" "%r,r") (match_operand:SI 2 "logical_operand" "r,I10")]] "TARGET_SHMEDIA" "#" "&& 1" [(set (match_dup 0) (match_dup 3))]) (define_insn_and_split "*logical_sidi3_2" [(set (match_operand:DI 0 "arith_reg_dest" "=r,r") (sign_extend:DI (truncate:SI (sign_extend:DI (match_operator:SI 3 "logical_operator" [(match_operand:SI 1 "arith_reg_operand" "%r,r") (match_operand:SI 2 "logical_operand" "r,I10")])] "TARGET_SHMEDIA" "#" "&& 1" [(set (match_dup 0) (sign_extend:DI (match_dup 3)))]) combine should now simplify the first to the normal SI logical op and the second to *logical_sidisi3. I don't think any spu or tilegx patterns are affected either way. Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf. Also tested by making sure that there were no code differences for a set of gcc .ii files on gcc20 (-O2 -march=native). OK to install? Richard gcc/ * machmode.h (GET_MODE_UNIT_PRECISIO
Re: [Patch, Fortran, F08] PR 45521: GENERIC resolution with ALLOCATABLE/POINTER and PROCEDURE
Hello, Le 04/10/2012 00:06, Janus Weil a écrit : > Hi all, > > the attached patch implements an F08 feature, which allows to > distinguish two specific procedures in a generic interface, based on > the POINTER and ALLOCATABLE attribute of their arguments. > > In addition to this, the patch fixes a bug in rejecting data actual > arguments passed to procedure formal arguments. > > The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk? > Looks good, yes. Thanks. Mikael
Re: [Patch,avr]: Remove -mshort-calls option
2012/10/5 Georg-Johann Lay : > As already discussed, this patch removes the -mshort-calls command option from > avr-gcc. > > Ok to apply? > > If the change is on order, changes to wwwdocs will follow, i.e. deprecate the > option in 4.7 and tell it is removed in the 4.8 caveats. > > Johann > > > * doc/invoke.texi (AVR Options): Remove -mshort-calls. > * config/avr/avr.opt (-mshort-calls): Remove option. > * config/avr/avr.h (AVR_HAVE_JMP_CALL): Don't depend on > TARGET_SHORT_CALLS. > Ok. Approved. Denis.
Re: RFA: Simplifying truncation and integer lowpart subregs
> Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf. Also tested > by making sure that there were no code differences for a set of gcc .ii > files on gcc20 (-O2 -march=native). OK to install? Are you sure that generating TRUNCATEs out of nowhere in simplify_subreg is always correct? > gcc/ > * machmode.h (GET_MODE_UNIT_PRECISION): New macro. > * simplify-rtx.c (simplify_truncation): New function. You should say where it comes from. > (simplify_unary_operation_1): Use it. Remove sign bit test > for !TRULY_NOOP_TRUNCATION_MODES_P. (simplify_unary_operation_1) : ... > (simplify_subreg): Use simplify_int_lowpart for TRUNCATE. This function doesn't exist. And this is misleading, it's not just for TRUNCATE, it's for a truncation to the lowpart. > /* Try to simplify a unary operation CODE whose output mode is to be > MODE with input operand OP whose mode was originally OP_MODE. > Return zero if no simplification can be made. */ > @@ -689,12 +850,6 @@ simplify_unary_operation_1 (enum rtx_cod > op_mode = mode; > in2 = simplify_gen_unary (NOT, op_mode, in2, op_mode); > > - if (GET_CODE (in2) == NOT && GET_CODE (in1) != NOT) > - { > - rtx tem = in2; > - in2 = in1; in1 = tem; > - } > - > return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR, >mode, in1, in2); > } Why is this hunk here? > @@ -5595,14 +5730,6 @@ simplify_subreg (enum machine_mode outer >return NULL_RTX; > } > > - /* Merge implicit and explicit truncations. */ > - > - if (GET_CODE (op) == TRUNCATE > - && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode) > - && subreg_lowpart_offset (outermode, innermode) == byte) > -return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0), > -GET_MODE (XEXP (op, 0))); > - >/* SUBREG of a hard register => just change the register number > and/or mode. If the hard register is not valid in that mode, > suppress this simplification. If the hard register is the stack, Likewise. -- Eric Botcazou
[SH] PR 54685 - unsigned int comparison with 0x7FFFFFFF
Hello, The attached patch improves comparisons such as 'unsigned int <= 0x7FFF' on SH. As mentioned in the PR, for some reason, those comparisons do not go through the cstore expander. As a consequence the comparison doesn't get the chance to be canonicalized by the target code and ends up as '(~x) >> 31'. I've not investigated this further and just fixed the symptoms on SH. I don't know whether it's also an issue on other targets. Tested on rev 192142 with make -k check RUNTESTFLAGS="--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}" and no new failures. OK? Cheers, Oleg gcc/ChangeLog: PR target/54685 * config/sh/sh.md (one_cmplsi2): Make insn_and_split. Add manual combine matching for an insn sequence where a ge:SI pattern can be used. testsuite/ChangeLog: PR target/54685 * gcc.target/sh/pr54685.c: New. Index: gcc/testsuite/gcc.target/sh/pr54685.c === --- gcc/testsuite/gcc.target/sh/pr54685.c (revision 0) +++ gcc/testsuite/gcc.target/sh/pr54685.c (revision 0) @@ -0,0 +1,58 @@ +/* Check that a comparison 'unsigned int <= 0x7FFF' results in code + utilizing the cmp/pz instruction. */ +/* { dg-do compile { target "sh*-*-*" } } */ +/* { dg-options "-O1" } */ +/* { dg-skip-if "" { "sh*-*-*" } { "-m5*"} { "" } } */ +/* { dg-final { scan-assembler-not "not" } } */ +/* { dg-final { scan-assembler-times "cmp/pz" 7 } } */ +/* { dg-final { scan-assembler-times "shll" 1 } } */ +/* { dg-final { scan-assembler-times "movt" 4 } } */ + +int +test_00 (unsigned int a) +{ + return !(a > 0x7FFF); +} + +int +test_01 (unsigned int a) +{ + return !(a > 0x7FFF) ? -5 : 10; +} + +int +test_02 (unsigned int a) +{ + /* 1x shll, 1x movt */ + return a >= 0x8000; +} + +int +test_03 (unsigned int a) +{ + return a >= 0x8000 ? -5 : 10; +} + +int +test_04 (unsigned int a) +{ + return a <= 0x7FFF; +} + +int +test_05 (unsigned int a) +{ + return a <= 0x7FFF ? -5 : 10; +} + +int +test_06 (unsigned int a) +{ + return a < 0x8000; +} + +int +test_07 (unsigned int a) +{ + return a < 0x8000 ? -5 : 10; +} Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 192142) +++ gcc/config/sh/sh.md (working copy) @@ -5188,11 +5188,61 @@ "neg %1,%0" [(set_attr "type" "arith")]) -(define_insn "one_cmplsi2" +(define_insn_and_split "one_cmplsi2" [(set (match_operand:SI 0 "arith_reg_dest" "=r") (not:SI (match_operand:SI 1 "arith_reg_operand" "r")))] "TARGET_SH1" "not %1,%0" + "&& can_create_pseudo_p ()" + [(set (reg:SI T_REG) (ge:SI (match_dup 1) (const_int 0))) + (set (match_dup 0) (reg:SI T_REG))] +{ +/* PR 54685 + If the result of 'unsigned int <= 0x7FFF' ends up as the following + sequence: + + (set (reg0) (not:SI (reg0) (reg1))) + (parallel [(set (reg2) (lshiftrt:SI (reg0) (const_int 31))) + (clobber (reg:SI T_REG))]) + + ... match and combine the sequence manually in the split pass after the + combine pass. Notice that combine does try the target pattern of this + split, but if the pattern is added it interferes with other patterns, in + particular with the div0s comparisons. + This could also be done with a peephole but doing it here before register + allocation can save one temporary. + When we're here, the not:SI pattern obviously has been matched already + and we only have to see whether the following insn is the left shift. */ + + rtx i = next_nonnote_insn_bb (curr_insn); + if (i == NULL_RTX || !NONJUMP_INSN_P (i)) +FAIL; + + rtx p = PATTERN (i); + if (GET_CODE (p) != PARALLEL || XVECLEN (p, 0) != 2) +FAIL; + + rtx p0 = XVECEXP (p, 0, 0); + rtx p1 = XVECEXP (p, 0, 1); + + if (/* (set (reg2) (lshiftrt:SI (reg0) (const_int 31))) */ + GET_CODE (p0) == SET + && GET_CODE (XEXP (p0, 1)) == LSHIFTRT + && REG_P (XEXP (XEXP (p0, 1), 0)) + && REGNO (XEXP (XEXP (p0, 1), 0)) == REGNO (operands[0]) + && CONST_INT_P (XEXP (XEXP (p0, 1), 1)) + && INTVAL (XEXP (XEXP (p0, 1), 1)) == 31 + + /* (clobber (reg:SI T_REG)) */ + && GET_CODE (p1) == CLOBBER && REG_P (XEXP (p1, 0)) + && REGNO (XEXP (p1, 0)) == T_REG) +{ + operands[0] = XEXP (p0, 0); + set_insn_deleted (i); +} + else +FAIL; +} [(set_attr "type" "arith")]) (define_expand "one_cmpldi2"
Re: Propagate profile counts during switch expansion
> > > >> > >> > + > >> > + default_edge->count = default_count; > >> > + if (count) > >> > +{ > >> > + edge e; > >> > + edge_iterator ei; > >> > + FOR_EACH_EDGE (e, ei, stmt_bb->succs) > >> > +e->probability = e->count * REG_BR_PROB_BASE / count; > >> > +} > >> > >> Hmm, this updates origina BB containing the switch statement? > > The out_edges of the original BB. > >> Of course, > >> modulo roundoff errors, this should hold. I wonder where did you got the > >> diferences and why do you need this? > > The count of the outgoing edge of BB that jumps to the default label > needs to be updated (to 0 or original_default_count/2, depending on > whether default label is a jump table target). So I need to > redistribute the probabilities of the rest of the edges. That's why I > do this here. Hmm, I see, OK then. Please add a comment. Honza > > > Originally, I obtained e->probability as e->count * REG_BR_PROB_BASE / > > bb->count. During profile bootstrap, I noticed BBs whose sum of > > outgoing edge counts exceeded the BB's count and hence I normalized > > with the sum of edge counts. > > > >> You are going to output new control flow > >> and find_many_sub_basic_blocks will recompute all the counts/frequencies > >> inside > >> anyway? > > Sorry, I am lost here. I am just updating the probability of > > stmt_bb->succs right? > > > >
Re: [patch, fortran] PR 54833 Don't wrap calls to free(a) in if (a != NULL)
I wrote: the attached patch removes wrapping calls to free(a) by if (a != NULL) for some cases. It is not complete, because automatic deallocation of allocatable structure components is not yet covered. I accidentally posted an old version, which had a bug in coarrays (basically was just missing an "else"). Regression-tested. OK for trunk? Thomas 2012-10-06 Thomas König PR fortran/54833 * trans.c (gfc_call_free): Do not wrap the call to __builtin_free in check for NULL. (gfc_deallocate_with_status): For automatic deallocation without status for non-coarrays, don't wrap call to __builtin_free in check for NULL. 2012-10-06 Thomas König PR fortran/54833 * gfortran.dg/auto_dealloc_3.f90: New test. Index: trans.c === --- trans.c (Revision 191857) +++ trans.c (Arbeitskopie) @@ -814,26 +814,23 @@ gfc_allocate_allocatable (stmtblock_t * block, tre } -/* Free a given variable, if it's not NULL. */ +/* Free a given variable. If it is NULL, free takes care of this + automatically. */ tree gfc_call_free (tree var) { stmtblock_t block; - tree tmp, cond, call; + tree call; if (TREE_TYPE (var) != TREE_TYPE (pvoid_type_node)) var = fold_convert (pvoid_type_node, var); gfc_start_block (&block); var = gfc_evaluate_now (var, &block); - cond = fold_build2_loc (input_location, NE_EXPR, boolean_type_node, var, - build_int_cst (pvoid_type_node, 0)); call = build_call_expr_loc (input_location, builtin_decl_explicit (BUILT_IN_FREE), 1, var); - tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node, cond, call, - build_empty_stmt (input_location)); - gfc_add_expr_to_block (&block, tmp); + gfc_add_expr_to_block (&block, call); return gfc_finish_block (&block); } @@ -861,11 +858,10 @@ gfc_call_free (tree var) } } - In this front-end version, status doesn't have to be GFC_INTEGER_4. - Moreover, if CAN_FAIL is true, then we will not emit a runtime error, - even when no status variable is passed to us (this is used for - unconditional deallocation generated by the front-end at end of - each procedure). + In this front-end version, status doesn't have to be GFC_INTEGER_4. If + CAN_FAIL is true, no status variable is passed and we are not dealing with + a coarray, we will simply call free(). This is used for unconditional + deallocation generated by the front-end at end of each procedure. If a runtime-message is possible, `expr' must point to the original expression being deallocated for its locus and variable name. @@ -890,6 +886,14 @@ gfc_deallocate_with_status (tree pointer, tree sta STRIP_NOPS (pointer); } + else if (can_fail && status == NULL_TREE) +{ + tmp = build_call_expr_loc (input_location, + builtin_decl_explicit (BUILT_IN_FREE), 1, + fold_convert (pvoid_type_node, pointer)); + return tmp; +} + cond = fold_build2_loc (input_location, EQ_EXPR, boolean_type_node, pointer, build_int_cst (TREE_TYPE (pointer), 0)); ! { dg-do compile } ! { dg-options "-fdump-tree-original" } ! PR fortran/54833 ! Make sure we don't wrap a free() in an if(a.data != NULL) by ! counting the ifs. subroutine foo real, allocatable :: a(:) allocate(a(10)) end subroutine foo ! { dg-final { scan-tree-dump-times "if" 3 "original" } } ! { dg-final { cleanup-tree-dump "original" } }
*ping* [patch, libfortran] Fix PR 54736, memory corruption with GFORTRAN_CONVERT_UNIT
Am 01.10.2012 20:34, schrieb Thomas Koenig: Hello world, the previous version of the patch has an issue that Shane pointed out in the PR. This version should work; at least it survived all the test cases I could come up with. Regression-tested (again). OK for trunk? Also for 4.6 and 4.7? Ping? I would like to start committing patches so I don't have too many of them in my tree at the same time :-) Thomas
[m68k] Remove anddi3, iordi3, xordi3, one_cmpldi2 patterns
Letting the DImode logical ops being split by lower subreg tends to produce better code. Tested on m68k-linux and committed. Andreas. PR rtl-optimization/54739 * config/m68k/m68k.md (anddi3, iordi3, xordi3, one_cmpldi2): Remove. Index: config/m68k/m68k.md === --- config/m68k/m68k.md (revision 192155) +++ config/m68k/m68k.md (working copy) @@ -3597,72 +3597,6 @@ ;; logical-and instructions -;; "anddi3" is mainly here to help combine(). -(define_insn "anddi3" - [(set (match_operand:DI 0 "nonimmediate_operand" "=o,d") - (and:DI (match_operand:DI 1 "general_operand" "%0,0") - (match_operand:DI 2 "general_operand" "dn,don")))] - "!TARGET_COLDFIRE" -{ - CC_STATUS_INIT; - /* We can get CONST_DOUBLE, but also const1_rtx etc. */ - if (CONSTANT_P (operands[2])) -{ - rtx hi, lo; - - split_double (operands[2], &hi, &lo); - - switch (INTVAL (hi)) - { - case 0 : - output_asm_insn ("clr%.l %0", operands); - break; - case -1 : - break; - default : - { - rtx xoperands[3]; - - xoperands[0] = operands[0]; - xoperands[2] = hi; - output_asm_insn (output_andsi3 (xoperands), xoperands); - } - } - if (GET_CODE (operands[0]) == REG) - operands[0] = gen_rtx_REG (SImode, REGNO (operands[0]) + 1); - else - operands[0] = adjust_address (operands[0], SImode, 4); - switch (INTVAL (lo)) - { - case 0 : - output_asm_insn ("clr%.l %0", operands); - break; - case -1 : - break; - default : - { - rtx xoperands[3]; - - xoperands[0] = operands[0]; - xoperands[2] = lo; - output_asm_insn (output_andsi3 (xoperands), xoperands); - } - } - return ""; -} - if (GET_CODE (operands[0]) != REG) -{ - operands[1] = adjust_address (operands[0], SImode, 4); - return "and%.l %2,%0\;and%.l %R2,%1"; -} - if (GET_CODE (operands[2]) != REG) -{ - operands[1] = adjust_address (operands[2], SImode, 4); - return "and%.l %2,%0\;and%.l %1,%R0"; -} - return "and%.l %2,%0\;and%.l %R2,%R0"; -}) - ;; Prevent AND from being made with sp. This doesn't exist in the machine ;; and reload will cause inefficient code. Since sp is a FIXED_REG, we ;; can't allocate pseudos into it. @@ -3780,76 +3714,6 @@ return "or%.w %1,%0"; }) -;; "iordi3" is mainly here to help combine(). -(define_insn "iordi3" - [(set (match_operand:DI 0 "nonimmediate_operand" "=o,d") - (ior:DI (match_operand:DI 1 "general_operand" "%0,0") - (match_operand:DI 2 "general_operand" "dn,don")))] - "!TARGET_COLDFIRE" -{ - CC_STATUS_INIT; - /* We can get CONST_DOUBLE, but also const1_rtx etc. */ - if (CONSTANT_P (operands[2])) -{ - rtx hi, lo; - - split_double (operands[2], &hi, &lo); - - switch (INTVAL (hi)) - { - case 0 : - break; - case -1 : - /* FIXME : a scratch register would be welcome here if operand[0] - is not a register */ - output_asm_insn ("move%.l #-1,%0", operands); - break; - default : - { - rtx xoperands[3]; - - xoperands[0] = operands[0]; - xoperands[2] = hi; - output_asm_insn (output_iorsi3 (xoperands), xoperands); - } - } - if (GET_CODE (operands[0]) == REG) - operands[0] = gen_rtx_REG (SImode, REGNO (operands[0]) + 1); - else - operands[0] = adjust_address (operands[0], SImode, 4); - switch (INTVAL (lo)) - { - case 0 : - break; - case -1 : - /* FIXME : a scratch register would be welcome here if operand[0] - is not a register */ - output_asm_insn ("move%.l #-1,%0", operands); - break; - default : - { - rtx xoperands[3]; - - xoperands[0] = operands[0]; - xoperands[2] = lo; - output_asm_insn (output_iorsi3 (xoperands), xoperands); - } - } - return ""; -} - if (GET_CODE (operands[0]) != REG) -{ - operands[1] = adjust_address (operands[0], SImode, 4); - return "or%.l %2,%0\;or%.l %R2,%1"; -} - if (GET_CODE (operands[2]) != REG) -{ - operands[1] = adjust_address (operands[2], SImode, 4); - return "or%.l %2,%0\;or%.l %1,%R0"; -} - return "or%.l %2,%0\;or%.l %R2,%R0"; -}) - (define_expand "iorsi3" [(set (match_operand:SI 0 "nonimmediate_operand" "") (ior:SI (match_operand:SI 1 "general_operand" "") @@ -3957,79 +3821,6 @@ ;; xor instructions -;; "xordi3" is mainly here to help combine(). -(define_insn "xordi3" - [(set (match_operand:DI 0 "nonimmediate_operand" "=od") - (xor:DI (match_operand:DI 1
Re: *ping* [patch, libfortran] Fix PR 54736, memory corruption with GFORTRAN_CONVERT_UNIT
On Sat, Oct 6, 2012 at 1:31 PM, Thomas Koenig wrote: > Am 01.10.2012 20:34, schrieb Thomas Koenig: >> >> Hello world, >> >> the previous version of the patch has an issue that Shane pointed >> out in the PR. This version should work; at least it survived >> all the test cases I could come up with. >> >> Regression-tested (again). OK for trunk? Also for 4.6 and 4.7? > > > Ping? > > I would like to start committing patches so I don't have too many > of them in my tree at the same time :-) This looks OK to me. Ciao! Steven
Re: [Patch, Fortran, F08] PR 45521: GENERIC resolution with ALLOCATABLE/POINTER and PROCEDURE
Hi, >> the attached patch implements an F08 feature, which allows to >> distinguish two specific procedures in a generic interface, based on >> the POINTER and ALLOCATABLE attribute of their arguments. >> >> In addition to this, the patch fixes a bug in rejecting data actual >> arguments passed to procedure formal arguments. >> >> The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk? >> > Looks good, yes. Thanks. thanks, Mikael. Committed as r192157. Cheers, Janus
Re: RFA: Simplifying truncation and integer lowpart subregs
Thanks for the review. Eric Botcazou writes: >> Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf. Also tested >> by making sure that there were no code differences for a set of gcc .ii >> files on gcc20 (-O2 -march=native). OK to install? > > Are you sure that generating TRUNCATEs out of nowhere in simplify_subreg is > always correct? We only use TRUNCATEs when truncating the operands of an existing operation. We don't convert the SUBREG itself to a TRUNCATE. I don't think it matters whether the narrowing of the operation was brought about by a TRUNCATE or by a SUBREG. I think modelling it as a TRUNCATE operation is correct for !TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out). And we shouldn't generate an actual TRUNCATE rtx for TRULY_NOOP_TRUNCATION (the thing about making simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg for those targets). I suppose: /* We can't handle truncation to a partial integer mode here because we don't know the real bitsize of the partial integer mode. */ if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT) break; might be a problem though; we should still allow a subreg to be generated. Is that what you were thinking of, or something else? I'm certainly open to other ideas though. Or do you think that I was wrong about doing this narrowing in simplify_subreg to begin with? >> gcc/ >> * machmode.h (GET_MODE_UNIT_PRECISION): New macro. >> * simplify-rtx.c (simplify_truncation): New function. > > You should say where it comes from. OK. >> (simplify_unary_operation_1): Use it. Remove sign bit test >> for !TRULY_NOOP_TRUNCATION_MODES_P. > > (simplify_unary_operation_1) : ... > >> (simplify_subreg): Use simplify_int_lowpart for TRUNCATE. > > This function doesn't exist. And this is misleading, it's not just for > TRUNCATE, it's for a truncation to the lowpart. Curses, forgot to update that part. Thanks. >> /* Try to simplify a unary operation CODE whose output mode is to be >> MODE with input operand OP whose mode was originally OP_MODE. >> Return zero if no simplification can be made. */ >> @@ -689,12 +850,6 @@ simplify_unary_operation_1 (enum rtx_cod >> op_mode = mode; >>in2 = simplify_gen_unary (NOT, op_mode, in2, op_mode); >> >> - if (GET_CODE (in2) == NOT && GET_CODE (in1) != NOT) >> -{ >> - rtx tem = in2; >> - in2 = in1; in1 = tem; >> -} >> - >>return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR, >> mode, in1, in2); >> } > > Why is this hunk here? Sorry, I've no idea :-( >> @@ -5595,14 +5730,6 @@ simplify_subreg (enum machine_mode outer >>return NULL_RTX; >> } >> >> - /* Merge implicit and explicit truncations. */ >> - >> - if (GET_CODE (op) == TRUNCATE >> - && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode) >> - && subreg_lowpart_offset (outermode, innermode) == byte) >> -return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0), >> - GET_MODE (XEXP (op, 0))); >> - >>/* SUBREG of a hard register => just change the register number >> and/or mode. If the hard register is not valid in that mode, >> suppress this simplification. If the hard register is the stack, > > Likewise. This moved to simplify_truncation: /* (truncate:A (truncate:B X)) is (truncate:A X). */ if (GET_CODE (op) == TRUNCATE) return simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0), GET_MODE (XEXP (op, 0))); I haven't included an updated patch because of your general concern above. Richard
Re: RFA: Simplifying truncation and integer lowpart subregs
> I think modelling it as a TRUNCATE operation is correct for > !TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out). > And we shouldn't generate an actual TRUNCATE rtx for > TRULY_NOOP_TRUNCATION (the thing about making > simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg > for those targets). I suppose: > > /* We can't handle truncation to a partial integer mode here > because we don't know the real bitsize of the partial > integer mode. */ > if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT) > break; > > might be a problem though; we should still allow a subreg to be > generated. Is that what you were thinking of, or something else? I was thinking of the !TRULY_NOOP_TRUNCATION case, where the two operations aren't equivalent. Generating TRUNCATE in simplify_subreg seems suspicious to me in this case but, if not doing it is the source of the bug, I guess I need to do some homework on this TRULY_NOOP_TRUNCATION stuff. :-) Maybe add a blurb to the head comment of simplify_truncation, explaining that it is valid to call the function both for TRUNCATEs and truncations to the lowpart, and why it is correct to generate new TRUNCATEs in the latter case. -- Eric Botcazou
Re: [SH] PR 54760 - Add thread pointer built-ins and GBR displacement addressing
On Sat, 2012-10-06 at 12:31 +0900, Kaz Kojima wrote: > Oleg Endo wrote: > > The attached patch is the next step that adds the thread pointer > > builtins. The GBR address mode stuff will follow afterwards separately. > > Tested on rev 192142 with 'make all' and > > 'make -k check-gcc RUNTESTFLAGS="sh.exp --target_board=sh-sim > > \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"' > > The patch is OK. Installed. Attached patch adds the GBR addressing mode stuff. I've added the '!NONJUMP_INSN_P (i)' check which should fix the crash. Could you please test again? Tested on rev 192154 with 'make all' and make -k check-gcc RUNTESTFLAGS="sh.exp=pr54760* --target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}" Cheers, Oleg gcc/ChangeLog: PR target/54760 * config/sh/sh.md (*mov_gbr_load, *mov_gbr_store): New insns and accompanying unnamed splits. * config/sh/predicates.md (general_movsrc_operand, general_movdst_operand): Reject GBR addresses. * config/sh/sh-protos.h (sh_find_equiv_gbr_addr): New declaration. * config/sh/sh.c (sh_address_cost, sh_legitimate_address_p, sh_secondary_reload): Handle GBR addresses. (base_reg_disp): New class. (sh_find_base_reg_disp, sh_find_equiv_gbr_addr): New functions. testsuite/ChangeLog: PR target/54760 * gcc.target/sh/pr54760-2.c: New. * gcc.target/sh/pr54760-3.c: New. Index: gcc/testsuite/gcc.target/sh/pr54760-2.c === --- gcc/testsuite/gcc.target/sh/pr54760-2.c (revision 0) +++ gcc/testsuite/gcc.target/sh/pr54760-2.c (revision 0) @@ -0,0 +1,223 @@ +/* Check that thread pointer relative memory accesses are converted to + gbr displacement address modes. If we see a gbr register store + instruction something is not working properly. */ +/* { dg-do compile { target "sh*-*-*" } } */ +/* { dg-options "-O1" } */ +/* { dg-skip-if "" { "sh*-*-*" } { "-m5*"} { "" } } */ +/* { dg-final { scan-assembler-times "stc\tgbr" 0 } } */ + +/* --- + Simple GBR load. +*/ +#define func(name, type, disp)\ + int \ + name ## _tp_load (void) \ + { \ +type* tp = (type*)__builtin_thread_pointer (); \ +return tp[disp]; \ + } + +func (test00, int, 0) +func (test01, int, 5) +func (test02, int, 255) + +func (test03, short, 0) +func (test04, short, 5) +func (test05, short, 255) + +func (test06, char, 0) +func (test07, char, 5) +func (test08, char, 255) + +func (test09, unsigned int, 0) +func (test10, unsigned int, 5) +func (test11, unsigned int, 255) + +func (test12, unsigned short, 0) +func (test13, unsigned short, 5) +func (test14, unsigned short, 255) + +func (test15, unsigned char, 0) +func (test16, unsigned char, 5) +func (test17, unsigned char, 255) + +#undef func + +/* --- + Simple GBR store. +*/ +#define func(name, type, disp)\ + void \ + name ## _tp_store (int a) \ + { \ +type* tp = (type*)__builtin_thread_pointer (); \ +tp[disp] = (type)a; \ + } + +func (test00, int, 0) +func (test01, int, 5) +func (test02, int, 255) + +func (test03, short, 0) +func (test04, short, 5) +func (test05, short, 255) + +func (test06, char, 0) +func (test07, char, 5) +func (test08, char, 255) + +func (test09, unsigned int, 0) +func (test10, unsigned int, 5) +func (test11, unsigned int, 255) + +func (test12, unsigned short, 0) +func (test13, unsigned short, 5) +func (test14, unsigned short, 255) + +func (test15, unsigned char, 0) +func (test16, unsigned char, 5) +func (test17, unsigned char, 255) + +#undef func + +/* --- + Arithmetic on the result of a GBR load. +*/ +#define func(name, type, disp, op, opname)\ + int \ + name ## _tp_load_arith_ ##opname (int a) \ + { \ +type* tp = (type*)__builtin_thread_pointer (); \ +return tp[disp] op a; \ + } + +#define funcs(op, opname) \ + func (test00, int, 0, op, opname) \ + func (test01, int, 5, op, opname) \ + func (test02, int, 255, op, opname) \ + func (test03, short, 0, op, opname) \ + func (test04, short, 5, op, opname) \ + func (test05, short, 255, op, opname) \ + func (test06, char, 0, op, opname) \ + func (test07, char, 5, op, opname) \ + func (test08, char, 255, op, opname) \ + func (test09, unsigned int, 0, op, opname) \ + func (test10, unsigned int, 5, op, opname) \ + func (test11, unsigned int, 255, op, opname) \ + func (test12, unsigned short, 0, op, opname) \ + func (test13, unsigned short, 5, op, opname) \ + func (test14, unsigned short, 255, op, opname) \ + func (test15, unsigned char, 0, op, opname) \ + func (test16, unsigned char, 5, op, opname) \ + func (test17, unsigned char, 255, op, opname) \ + +funcs (+, plus) +funcs (-, minus) +funcs (*, mul) +funcs (&,
Re: [C++ Patch] PR 52764
OK. Jason
Re: [C++ Patch] PR 54249
OK. Jason
Re: [C++ Patch / RFC] PR 51422
On 09/27/2012 07:08 AM, Paolo Carlini wrote: Then checking error_operand_p (decl) in is_capture_proxy solves the problem but now the question is: do we have reasons to believe that such VAR_DECLs should never ever reach is_normal_capture_proxy? That depends on our error recovery strategy for an invalid capture. It seems that we currently build a VAR_DECL that has an error_mark_node DECL_VALUE_EXPR; if we're going to do that, we need to deal with it. I guess we should return true in that case, if it doesn't cause more problems later on. Jason
[Patch, Fortran, committed] PR 54832
> This is now: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54832 ... which I have just fixed by an (obvious and regtested) patch: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192160 Thanks again for reporting this, Janus > 2012/10/6 Rouson, Damian : >> The Gfortran 4.8.0 20120930 ICE shown below appears to stem from the >> "result" keyword used in the first interface body. The code works fine >> with gfortran 4.7.2. This is a simplified version of code from my book: >> >> $ cat integrand.F90 >> module integrand_module >> >> type ,abstract :: integrand >> contains >> procedure(t_interface), deferred :: t >> procedure(assign_interface), deferred :: assign >> procedure(times_interface), deferred :: times >> generic :: operator(*) => times >> generic :: assignment(=) => assign >> end type >> >> abstract interface >> function t_interface(this) result(dState_dt) >> import :: integrand >> class(integrand) ,intent(in) :: this >> class(integrand) ,allocatable :: dState_dt >> end function >> function times_interface(lhs,rhs) >> import :: integrand >> class(integrand) ,intent(in) :: lhs >> class(integrand) ,allocatable :: times_interface >> real, intent(in) :: rhs >> end function >> subroutine assign_interface(lhs,rhs) >> import :: integrand >> class(integrand) ,intent(in):: rhs >> class(integrand) ,intent(inout) :: lhs >> end subroutine >> end interface >> >> contains >> subroutine integrate(model,dt) >> class(integrand) :: model >> real dt >> model = model%t()*dt >>end subroutine >> end module >> $ gfortran -c integrand.F90 >> integrand.F90: In function 'integrate': >> integrand.F90:35:0: internal compiler error: in gfc_conv_expr, at >> fortran/trans-expr.c:5862 >> model = model%t()*dt >> ^ >> >> integrand.F90:35:0: internal compiler error: Abort trap: 6 >> gfortran: internal compiler error: Abort trap: 6 (program f951) >> Abort trap: 6 >> remote-ca41:gnu-regression rouson$ gfortran --version >> GNU Fortran (MacPorts gcc48 4.8-20120930_1) 4.8.0 20120930 (experimental) >> Copyright (C) 2012 Free Software Foundation, Inc. >> >> >>
Re: Profile housekeeping 6/n (-fprofile-consistency-report)
> > Index: passes.c > > +/* Hold statistic about profile consistency. */ > ... > > I don't see why this should live in passes.c, can you please put it in > a more logical place (profile.c, perhaps)? Hmm, the problem here is that the code is using passmanager's dumping bits to order the passes and assign them names. So moving it elsewhere requires exporting it that is not nice. passes.c does similar stuff already, so I decided to keep it there. Here is patch I comitted that prints more detailed report. It reports also changes in overall unit time/size estimates and when pass did nothing it reports it as uneffective. For tramp3d it looks this way: Pass name|mismatch in |mismated out|Overall |freq count |freq count |size time cfg (after TODO)||| -1.5474% ssa ||| -2.0148% inline_param ||| einline ||| -0.4991% einline (after TODO)||| -0.0129% early_optimizations ||| copyrename ||| ccp ||| -0.2273% forwprop ||| -0.0688% ealias ||| esra ||| -0.1892% fre ||| -7.9369% copyprop (after TODO)||| -0.0187% mergephi ||| cddce||| -0.1655% eipa_sra ||| -0.0237% tailr||| -0.5305% switchconv ||| +0.0190% profile_estimate |+1 || local-pure-const ||| fnsplit | +20 || +0.2333% +0.8267% fnsplit (after TODO)| -20 || -1.8146% -1.3300% release_ssa ||| inline_param ||| inline | +229 || +14.5138% -10.6020% inline (after TODO)| -225 || -0.2662% -10.2195% copyrename ||| -2.7026% -2.9448% cunrolli ||| ccp ||| -0.0174% -0.0142% ccp (after TODO)|+8 || -0.3571% -0.1213% forwprop ||| -0.1442% -0.5343% alias||| retslot ||| phiprop ||| fre ||| -0.2801% -0.2814% fre (after TODO)||| -0.2897% -0.0362% copyprop ||| -0.0264% -0.0251% mergephi ||| vrp | +80 || +0.1233% -1.0271% vrp (after TODO)|+3 || -0.8618% -0.2421% dce ||| -0.0089% -0.0021% dce (after TODO)|-7 || -0.0089% -0.0347% cdce ||| cselim ||| -0.0177% +0.% ifcombine||| +0.0044% +0.0001% ifcombine(after TODO)||| -0.0089% +0.% phiopt ||| -0.1730% -0.1203% tailr||| ch |+2 || +2.0535% +0.% ch (after TODO)|||+0.0002% cplxlower||| sra ||| +0.0087% copyrename ||| dom | +86 || +1.1802% -0.1366% dom (after TODO)|+5 || -0.1894% -0.3042% phicprop ||| -0.0043% phicprop (after TODO)|-6 ||+0.2117% dse ||| -0.0086% -0.0009% re
Re: Profile housekeeping 6/n (-fprofile-consistency-report)
On Sat, Oct 6, 2012 at 4:10 PM, Jan Hubicka wrote: >> > Index: passes.c >> > +/* Hold statistic about profile consistency. */ >> ... >> >> I don't see why this should live in passes.c, can you please put it in >> a more logical place (profile.c, perhaps)? > > Hmm, the problem here is that the code is using passmanager's dumping bits > to order the passes and assign them names. So moving it elsewhere requires > exporting it that is not nice. passes.c does similar stuff already, so I > decided > to keep it there. I think this is not the right decision. We can also throw _all_ code on one pile because there are inter-dependencies. Or they can be fixed with a proper interface. > If there are no complains I will commit the patch tomorrow. +1 complaint. You're putting profile stuff and even RTL stuff in the pass manager. That is Just Wrong. Ciao! Steven
[patch] fix libbacktrace build failure on arm-linux
current trunk fails to build on arm-linux with: In file included from ../../../../src/libbacktrace/backtrace.c:35:0: ../libgcc/unwind.h: In function '_Unwind_decode_typeinfo_ptr': ../libgcc/unwind.h:42:45: error: unused parameter 'base' [-Werror=unused-parameter] _Unwind_decode_typeinfo_ptr (_Unwind_Word base, _Unwind_Word ptr) ^ ../libgcc/unwind.h: In function '__gnu_unwind_24bit': ../libgcc/unwind.h:68:41: error: unused parameter 'context' [-Werror=unused-parameter] __gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact) ^ ../libgcc/unwind.h:68:54: error: unused parameter 'data' [-Werror=unused-parameter] __gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact) ^ ../libgcc/unwind.h:68:64: error: unused parameter 'compact' [-Werror=unused-parameter] __gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact) ^ cc1: all warnings being treated as errors make[8]: *** [backtrace.lo] Error 1 the immediate fix is to mark all arguments as unused, however I don't know if this function should be used by libbacktrace, if it returns _URC_FAILURE unconditionally. * config/arm/unwind-arm.h (__gnu_unwind_24bit): Mark parameters as unused. --- libgcc/config/arm/unwind-arm.h (revision 192162) +++ libgcc/config/arm/unwind-arm.h (working copy) @@ -64,8 +64,11 @@ return tmp; } +#define __unused __attribute__((unused)) + static inline _Unwind_Reason_Code - __gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact) + __gnu_unwind_24bit (_Unwind_Context * context __unused, _uw data __unused, + int compact __unused) { return _URC_FAILURE; }
Re: Profile housekeeping 6/n (-fprofile-consistency-report)
Jan. This patch also breaks bootstrap due compilation errors reported for pases.c and toplev.c Graham
Re: Profile housekeeping 6/n (-fprofile-consistency-report)
> Jan. > > This patch also breaks bootstrap due compilation errors reported for > pases.c and toplev.c Sorry for that. I swapped files and used old version of the patch. It should be fixed now. Honza > > Graham
Re: Profile housekeeping 6/n (-fprofile-consistency-report)
On Sat, Oct 6, 2012 at 4:39 PM, Steven Bosscher wrote: >> If there are no complains I will commit the patch tomorrow. > > +1 complaint. > You're putting profile stuff and even RTL stuff in the pass manager. > That is Just Wrong. You already committed the patch. Your tomorrow started early? ;-) Ciao! Steven
Re: Profile housekeeping 6/n (-fprofile-consistency-report)
> On Sat, Oct 6, 2012 at 4:10 PM, Jan Hubicka wrote: > >> > Index: passes.c > >> > +/* Hold statistic about profile consistency. */ > >> ... > >> > >> I don't see why this should live in passes.c, can you please put it in > >> a more logical place (profile.c, perhaps)? > > > > Hmm, the problem here is that the code is using passmanager's dumping bits > > to order the passes and assign them names. So moving it elsewhere requires > > exporting it that is not nice. passes.c does similar stuff already, so I > > decided > > to keep it there. > > I think this is not the right decision. We can also throw _all_ code > on one pile because there are inter-dependencies. Or they can be fixed > with a proper interface. > > > If there are no complains I will commit the patch tomorrow. > > +1 complaint. > You're putting profile stuff and even RTL stuff in the pass manager. > That is Just Wrong. Hmm, the code really is about collecting statistics of individual passes. I do not think it is that different from rest of TODO handling logic, but I do not care much. It is not supposed to be something that will grow over the time and become essential part of the compiler. The code uses the CFG/Gimple/RTL interfaces that are both used by many parts of the compiler (including the passmanager) + passmanager's passes_by_id and passes_by_id_size that are not exported. profile.c is not good place because that is only about reading profile feedback. cfg.c or predict.c seems better because that is where most of cfg/profile API lives. I will probably move it to cfg.c, because it should be in sync with the BB dumping code that should report the same inconsistencies as the statistics loop. Would you preffer exporting profile_record structure and putting the actual walk of CFG into cfg.c or exporting the passmanager's passes_by_id used by the dumping? (actually passes_by_id misses "static" in the declaratoin, so it is matter of putting it to the header) I will implement one of those. Honza > > Ciao! > Steven
Re: Profile housekeeping 6/n (-fprofile-consistency-report)
> On Sat, Oct 6, 2012 at 4:39 PM, Steven Bosscher wrote: > > >> If there are no complains I will commit the patch tomorrow. > > > > +1 complaint. > > You're putting profile stuff and even RTL stuff in the pass manager. > > That is Just Wrong. > > You already committed the patch. Your tomorrow started early? ;-) The tomorrow you cite is from Thursday :) Honza
Re: patch to fix constant math - third small patch
This is the third patch in the series of patches to fix constant math. this one changes some predicates at the rtl level to use the new predicate CONST_SCALAR_INT_P. I did not include a few that were tightly intertwined with other changes. Not all of these changes are strictly mechanical. Richard, when reviewing this had me make additional changes to remove what he thought were latent bugs at the rtl level. However, it appears that the bugs were not latent.I do not know what is going on here but i am smart enough to not look a gift horse in the mouth. All of this was done on the same machine with no changes and identical configs. It is an x86-64 with ubuntu 12-4. ok for commit? in the logs below, gbBaseline is a trunk from friday and the gbWide is the same revision but with my patches. Some of this like gfortran.dg/pr32627 is obviously flutter, but the rest does not appear to be. = heracles:~/gcc(13) gccBaseline/contrib/compare_tests gbBaseline/gcc/testsuite/gcc/gcc.log gbWide/gcc/testsuite/gcc/gcc.log New tests that PASS: gcc.dg/builtins-85.c scan-assembler mysnprintf gcc.dg/builtins-85.c scan-assembler-not __chk_fail gcc.dg/builtins-85.c (test for excess errors) heracles:~/gcc(14) gccBaseline/contrib/compare_tests gbBaseline/gcc/testsuite/gfortran/gfortran.log gbWide/gcc/testsuite/gfortran/gfortran.log New tests that PASS: gfortran.dg/pr32627.f03 -O3 -fomit-frame-pointer -funroll-loops (test for excess errors) gfortran.dg/pr32627.f03 -O3 -fomit-frame-pointer (test for excess errors) gfortran.dg/pr32627.f03 -Os (test for excess errors) gfortran.dg/pr32635.f -O0 execution test gfortran.dg/pr32635.f -O0 (test for excess errors) gfortran.dg/substr_6.f90 -O2 (test for excess errors) Old tests that passed, that have disappeared: (Eeek!) gfortran.dg/pr32627.f03 -O1 (test for excess errors) gfortran.dg/pr32627.f03 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (test for excess errors) gfortran.dg/pr32627.f03 -O3 -g (test for excess errors) gfortran.dg/substring_equivalence.f90 -O (test for excess errors) Using /home/zadeck/gcc/gccBaseline/gcc/testsuite/config/default.exp as tool-and-target-specific interface file. === g++ Summary === # of expected passes49793 # of expected failures284 # of unsupported tests601 runtest completed at Fri Oct 5 16:10:20 2012 heracles:~/gcc(16) tail gbWide/gcc/testsuite/g++/g++.log Using /usr/share/dejagnu/config/unix.exp as generic interface file for target. Using /home/zadeck/gcc/gccWide/gcc/testsuite/config/default.exp as tool-and-target-specific interface file. === g++ Summary === # of expected passes50472 # of expected failures284 # of unsupported tests613 runtest completed at Fri Oct 5 19:51:50 2012
Re: Profile housekeeping 6/n (-fprofile-consistency-report)
Hi, does this look better? Moving to cfg.c would importing tree-pass.h and rtl.h that is not cool either. predict.c does all of these. Obviously can also go to a separate file, if preferred. If it looks fine to you, I will commit it after testing. Honza Index: tree-pass.h === *** tree-pass.h (revision 192158) --- tree-pass.h (working copy) *** extern void do_per_function_toporder (vo *** 547,551 --- 547,553 extern void disable_pass (const char *); extern void enable_pass (const char *); extern void dump_passes (void); + extern struct opt_pass **passes_by_id; + extern int passes_by_id_size; #endif /* GCC_TREE_PASS_H */ Index: predict.c === *** predict.c (revision 192158) --- predict.c (working copy) *** rebuild_frequencies (void) *** 2799,2801 --- 2799,3005 gcc_unreachable (); timevar_pop (TV_REBUILD_FREQUENCIES); } + + /* Make statistic about profile consistency. */ + + struct profile_record + { + int num_mismatched_freq_in[2]; + int num_mismatched_freq_out[2]; + int num_mismatched_count_in[2]; + int num_mismatched_count_out[2]; + bool run; + gcov_type time[2]; + int size[2]; + }; + + static struct profile_record *profile_record; + + static void + check_profile_consistency (int index, int subpass, bool run) + { + basic_block bb; + edge_iterator ei; + edge e; + int sum; + gcov_type lsum; + + if (index == -1) + return; + if (!profile_record) + profile_record = XCNEWVEC (struct profile_record, + passes_by_id_size); + gcc_assert (index < passes_by_id_size && index >= 0); + gcc_assert (subpass < 2); + profile_record[index].run |= run; + + FOR_ALL_BB (bb) +{ + if (bb != EXIT_BLOCK_PTR_FOR_FUNCTION (cfun) + && profile_status != PROFILE_ABSENT) + { + sum = 0; + FOR_EACH_EDGE (e, ei, bb->succs) + sum += e->probability; + if (EDGE_COUNT (bb->succs) && abs (sum - REG_BR_PROB_BASE) > 100) + profile_record[index].num_mismatched_freq_out[subpass]++; + lsum = 0; + FOR_EACH_EDGE (e, ei, bb->succs) + lsum += e->count; + if (EDGE_COUNT (bb->succs) + && (lsum - bb->count > 100 || lsum - bb->count < -100)) + profile_record[index].num_mismatched_count_out[subpass]++; + } + if (bb != ENTRY_BLOCK_PTR_FOR_FUNCTION (cfun) + && profile_status != PROFILE_ABSENT) + { + sum = 0; + FOR_EACH_EDGE (e, ei, bb->preds) + sum += EDGE_FREQUENCY (e); + if (abs (sum - bb->frequency) > 100 + || (MAX (sum, bb->frequency) > 10 + && abs ((sum - bb->frequency) * 100 / (MAX (sum, bb->frequency) + 1)) > 10)) + profile_record[index].num_mismatched_freq_in[subpass]++; + lsum = 0; + FOR_EACH_EDGE (e, ei, bb->preds) + lsum += e->count; + if (lsum - bb->count > 100 || lsum - bb->count < -100) + profile_record[index].num_mismatched_count_in[subpass]++; + } + if (bb == ENTRY_BLOCK_PTR_FOR_FUNCTION (cfun) + || bb == EXIT_BLOCK_PTR_FOR_FUNCTION (cfun)) + continue; + if ((cfun && (cfun->curr_properties & PROP_trees))) + { + gimple_stmt_iterator i; + + for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next (&i)) + { + profile_record[index].size[subpass] ++= estimate_num_insns (gsi_stmt (i), &eni_size_weights); + if (profile_status == PROFILE_READ) + profile_record[index].time[subpass] + += estimate_num_insns (gsi_stmt (i), + &eni_time_weights) * bb->count; + else if (profile_status == PROFILE_GUESSED) + profile_record[index].time[subpass] + += estimate_num_insns (gsi_stmt (i), + &eni_time_weights) * bb->frequency; + } + } + else if (cfun && (cfun->curr_properties & PROP_rtl)) + { + rtx insn; + for (insn = NEXT_INSN (BB_HEAD (bb)); insn && insn != NEXT_INSN (BB_END (bb)); + insn = NEXT_INSN (insn)) + if (INSN_P (insn)) + { + profile_record[index].size[subpass] + += insn_rtx_cost (PATTERN (insn), false); + if (profile_status == PROFILE_READ) + profile_record[index].time[subpass] ++= insn_rtx_cost (PATTERN (insn), true) * bb->count; + else if (profile_status == PROFILE_GUESSED) + profile_record[index].time[subpass] ++= insn_rtx_cost (PATTERN (insn), true) * bb->frequency; + } + } +} + } + + /* Output profile consistency. */ + + void + dump_pro
Re: [wwwdocs] SH 4.8 changes update
Hi Oleg, have you considered also documenting the new __builtin_thread_pointer and __builtin_set_thread_pointer built-ins? (I just noticed this by chance.) On Thu, 4 Oct 2012, Oleg Endo wrote: > The atomic options of SH have been changed recently. The attached patch > updates the 4.8 changes.html accordingly, plus some minor wording fixes. Looking at the list of parameters to -matomic-model=, it appears that a definition list () is better suitable than a regular list, so I went ahead and made the change below. Gerald Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.37 diff -u -3 -p -r1.37 changes.html --- changes.html5 Oct 2012 19:30:26 - 1.37 +++ changes.html6 Oct 2012 15:32:05 - @@ -258,30 +258,30 @@ by this change. A new option -matomic-model=model selects the model for the generated atomic sequences. The following models are supported: - -soft-gusa + +soft-gusa Software gUSA sequences (SH3* and SH4* only). On SH4A targets this will now also partially utilize the movco.l and movli.l instructions. This is the default when the target -is sh3*-*-linux* or sh4*-*-linux*. +is sh3*-*-linux* or sh4*-*-linux*. -hard-llcs +hard-llcs Hardware movco.l / movli.l sequences -(SH4A only). +(SH4A only). -soft-tcb -Software thread control block sequences. +soft-tcb +Software thread control block sequences. -soft-imask -Software interrupt flipping sequences (privileged mode only). This is -the default when the target is sh1*-*-linux* or -sh2*-*-linux*. +soft-imask +Software interrupt flipping sequences (privileged mode only). This +is the default when the target is sh1*-*-linux* or +sh2*-*-linux*. -none +none Generates function calls to the respective __atomic -built-in functions. This is the default for SH64 targets or when the -target is not sh*-*-linux*. - +built-in functions. This is the default for SH64 targets or when +the target is not sh*-*-linux*. + The option -msoft-atomic has been deprecated. It is now an alias for -matomic-model=soft-gusa.
Re: [ping patch] Predict for loop exits in short-circuit conditions
ping^2 Honza, do you think this patch can make into 4.8 stage 1? Thanks, Dehao On Wed, Sep 26, 2012 at 2:34 PM, Dehao Chen wrote: > http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01975.html > > Thanks, > Dehao
Re: [ping patch] Predict for loop exits in short-circuit conditions
> ping^2 > > Honza, do you think this patch can make into 4.8 stage 1? + if (check_value_one ^ integer_onep (val)) Probably better as != (especially because GNU coding standard allows predicates to return more than just boolean) +{ + edge e1; + edge_iterator ei; + tree val = gimple_phi_arg_def (phi_stmt, i); + edge e = gimple_phi_arg_edge (phi_stmt, i); + + if (!TREE_CONSTANT (val) || !(integer_zerop (val) || integer_onep (val))) + continue; + if (check_value_one ^ integer_onep (val)) + continue; + if (VEC_length (edge, e->src->succs) != 1) + { + if (!predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS_GUESSED) + && !predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS) + && !predicted_by_p (exit_edge->src, PRED_LOOP_EXIT)) + predict_edge_def (e, PRED_LOOP_EXIT, NOT_TAKEN); + continue; + } + + FOR_EACH_EDGE (e1, ei, e->src->preds) + if (!predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS_GUESSED) + && !predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS) + && !predicted_by_p (exit_edge->src, PRED_LOOP_EXIT)) + predict_edge_def (e1, PRED_LOOP_EXIT, NOT_TAKEN); Here you found an edge that you know is going to terminate the loop and you want to predict all paths to this edge as unlikely. Perhaps you want to use predict paths leading_to_edge for edge? You do not need to check PRED_LOOP_ITERATIONS and PRED_LOOP_ITERATIONS_GUESSED because those never go to the non-exit edges. The nature of predict_paths_for_bb type heuristic is that they are not really additive: if the path leads to two different aborts it does not make it more sure that it will be unlikely. So perhaps you can add !predicted_by_p (e, pred) prior predict_edge_def call in the function? I wonder if we did VRP just before branch predction to jump thread the shortcut condtions into loopback edges, would be there still cases where this optimization will match? Honza
Drop V1 plugin API hack
Hi, this patch removes V1 API hack, making us to require V2 plugin api aware linker to get sane code quality on anything using COMDAT. Given that the V1 API linkers are quite old now, I think it is safe. I added a testcase that should fail with old linker and will update changes.html. The patch also cleans up the code for new symbol table APIs. Nice effect is that inliner is now more aggressive about inlining comdats and thus tramp3d scoes are improved with -flto once again. lto-bootstrapped/regtested x86_64, comitted. Honza PR lto/53831 PR lto/54776 * g++.dg/lto/v1-plugin-api-not-supported.C: New testcase. * lto-streamer-out.c (produce_symtab): Cleanup; drop v1 API hack. Index: testsuite/g++.dg/lto/v1-plugin-api-not-supported.C === --- testsuite/g++.dg/lto/v1-plugin-api-not-supported.C (revision 0) +++ testsuite/g++.dg/lto/v1-plugin-api-not-supported.C (revision 0) @@ -0,0 +1,59 @@ +// { dg-lto-do run } +// { dg-require-linker-plugin "" } +// { dg-lto-options {{-O2 -fuse-linker-plugin -fno-early-inlining}} + +extern "C" void abort (void); +extern "C" void linker_error (); + +class A +{ +public: + int data; + virtual int foo (int i) +{ + return i + 1; +} +}; + +class B : public A +{ +public: + virtual int foo (int i) +{ + return i + 2; +} +}; + +class C : public A +{ +public: + virtual int foo (int i) +{ + linker_error (); + return i + 3; +} +}; + + +static int middleman (class A *obj, int i) +{ + return obj->foo (i); +} + +int __attribute__ ((noinline,noclone)) get_input(void) +{ + return 1; +} + +int main (int argc, char *argv[]) +{ + class B b; + if (middleman (&b, get_input ()) != 3) +abort (); + return 0; +} + +/* { dg-final { scan-ipa-dump "Discovered a virtual call to a known target.*B::foo" "cp" } } */ +/* { dg-final { scan-tree-dump-times "OBJ_TYPE_REF" 0 "optimized"} } */ +/* { dg-final { cleanup-ipa-dump "cp" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: lto-streamer-out.c === --- lto-streamer-out.c (revision 192154) +++ lto-streamer-out.c (working copy) @@ -1285,11 +1287,9 @@ produce_symtab (struct output_block *ob) struct streamer_tree_cache_d *cache = ob->writer_cache; char *section_name = lto_get_section_name (LTO_section_symtab, NULL, NULL); struct pointer_set_t *seen; - struct cgraph_node *node; - struct varpool_node *vnode; struct lto_output_stream stream; lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder; - int i; + lto_symtab_encoder_iterator lsei; lto_begin_section (section_name, false); free (section_name); @@ -1297,78 +1297,26 @@ produce_symtab (struct output_block *ob) seen = pointer_set_create (); memset (&stream, 0, sizeof (stream)); - /* Write all functions. - First write all defined functions and then write all used functions. - This is done so only to handle duplicated symbols in cgraph. */ - for (i = 0; i < lto_symtab_encoder_size (encoder); i++) + /* Write the symbol table. + First write everything defined and then all declarations. + This is neccesary to handle cases where we have duplicated symbols. */ + for (lsei = lsei_start (encoder); + !lsei_end_p (lsei); lsei_next (&lsei)) { - if (!symtab_function_p (lto_symtab_encoder_deref (encoder, i))) - continue; - node = cgraph (lto_symtab_encoder_deref (encoder, i)); - if (DECL_EXTERNAL (node->symbol.decl)) - continue; - if (DECL_COMDAT (node->symbol.decl) - && cgraph_comdat_can_be_unshared_p (node)) - continue; - if ((node->alias && !node->thunk.alias) || node->global.inlined_to) + symtab_node node = lsei_node (lsei); + + if (!symtab_real_symbol_p (node) || DECL_EXTERNAL (node->symbol.decl)) continue; write_symbol (cache, &stream, node->symbol.decl, seen, false); } - for (i = 0; i < lto_symtab_encoder_size (encoder); i++) + for (lsei = lsei_start (encoder); + !lsei_end_p (lsei); lsei_next (&lsei)) { - if (!symtab_function_p (lto_symtab_encoder_deref (encoder, i))) - continue; - node = cgraph (lto_symtab_encoder_deref (encoder, i)); - if (!DECL_EXTERNAL (node->symbol.decl)) - continue; - /* We keep around unused extern inlines in order to be able to inline -them indirectly or via vtables. Do not output them to symbol -table: they end up being undefined and just consume space. */ - if (!node->symbol.address_taken && !node->callers) - continue; - if (DECL_COMDAT (node->symbol.decl) - && cgraph_comdat_can_be_unshared_p (node)) - continue; - if ((node->alias && !node->thunk.alias) || node->global.inlined_to) - continue; - write_symbol (cache, &stream, node->symbol.decl, seen, false); -} + symtab_node node
Re: [wwwdocs] SH 4.8 changes update
On Sat, 2012-10-06 at 17:57 +0200, Gerald Pfeifer wrote: > Hi Oleg, > > have you considered also documenting the new __builtin_thread_pointer > and __builtin_set_thread_pointer built-ins? (I just noticed this by > chance.) Yes, sure. The documentation and www changes updates will follow soon. > > On Thu, 4 Oct 2012, Oleg Endo wrote: > > The atomic options of SH have been changed recently. The attached patch > > updates the 4.8 changes.html accordingly, plus some minor wording fixes. > > Looking at the list of parameters to -matomic-model=, it appears > that a definition list () is better suitable than a regular > list, so I went ahead and made the change below. > Thanks! Looks way better now. Cheers, Oleg
[RFA] Support common C++ declarations inside GTY'd structures
This patch combines the changes from http://gcc.gnu.org/ml/gcc-patches/2012-08/msg02016.html with other additions to support C++ inside GTY'd structures. The main changes wrt Aaron's original patch are: - Support for function declarations inside classes. - Support scoping in identifiers. This does not mean that gengtype supports scopes, it just knows that 'Foo::id' is a single entity. - Explicit non-support for typedef and enum inside class/struct. Since gengtype does not really know about scopes, it cannot understand these types, but it knows enough to recognize and reject them. GTY'd struct/class that need to typedef their own types should use GTY((user)). - Documentation on what is and is not supported. There is one check I needed to remove that gave me some trouble. When a ctor is detected, we have already parsed the name of the ctor as a type, which is then registered in the list of structures. We go on to recognize it as a ctor *after* the type has been registered. We reject the field in declarator() and it is never added to the list of fields for the class. However, when we reach the end of the class, we find that the type we created while parsing the ctor has line number information in it (the line where the ctor was) and gengtype thinks that it is a duplicate structure definition. I took out this check for two reasons: (a) It is actually unnecessary because if there were really duplicate definitions of this structure, the code would not compile, and (b) all the other alternatives required making the parser much more convoluted and I'm trying hard not to make gengtype parser too smart. Tested on x86_64. OK for trunk? 2012-10-05 Aaron Gray Diego Novillo * gengtype-lex.l: Support for C++ single line comments. Support for classes. (CXX_KEYWORD): New. Support C++ keywords inline, public, protected, private, template, operator, friend, &, ~. (TYPEDEF): New. Support typedef. * gengtype-parser.c: updated 'token_names[]' (direct_declarator): Add support for parsing functions and ctors. 2012-10-05 Diego Novillo * doc/gty.texi: Document C++ limitations in gengtype. * gengtype-lex.l (CID): Rename from ID. (ID): Include scoping '::' as part of the identifier name. * gengtype-parse.c (token_names): Update. (token_value_format): Update. (consume_until_eos): Rename from consume_until_semi. Remove unused argument IMMEDIATE. Update all callers. Also consider '}' as a finalizer. (consume_until_comma_or_eos): Rename from consume_until_comma_or_semi. Remove unused argument IMMEDIATE. Update all callers. Also consider '}' as a finalizer. (direct_declarator): Add documentation on ctor support. Add argument IN_STRUCT. If the token following ID is a '(', consider ID a function and return NULL. If the token following '(' is not a '*', and IN_STRUCT is true, conclude that this is a ctor and return NULL. If the token is IGNORABLE_CXX_KEYWORD, return NULL. (inner_declarator): Add argument IN_STRUCT. Update all callers. (declarator): Add argument IN_STRUCT with default value false. Update all callers. (type): Document argument NESTED. Skip over C++ inheritance specifiers. If a token TYPEDEF is found, emit an error. If an enum is found inside a class/structure, emit an error. (new_structure): Do not complain about duplicate structures if S has a line location set. * gengtype-state.c (write_state_type): Remove default handler. Add handler for TYPE_NONE. (read_state_scalar_char_type): * gengtype.c: Fix spacing. * gengtype.h (enum gty_token): Add name. Add token IGNORABLE_CXX_KEYWORD. diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi index b2214b8..ea1a928 100644 --- a/gcc/doc/gty.texi +++ b/gcc/doc/gty.texi @@ -65,6 +65,27 @@ The parser understands simple typedefs such as @code{typedef int @var{name};}. These don't need to be marked. +Since @code{gengtype}'s understanding of C++ is limited, there are +several constructs and declarations that are not supported inside +classes/structures marked for automatic GC code generation. The +following C++ constructs produce a @code{gengtype} error on +structures/classes marked for automatic GC code generation: + +@itemize @bullet +@item +Type definitions inside classes/structures are not supported. +@item +Enumerations inside classes/structures are not supported. +@end itemize + +If you have a class or structure using any of the above constructs, +you need to mark that class as @code{GTY ((user))} and provide your +own marking routines (see section @ref{User GC} for details). + +It is always valid to include function definitions inside classes. +Those are always ignored by @code{gengtype}
Re: [google] Emit relative addresses to function patch sections instead of absolute addresses. (issue6572065)
xray feature is not in trunk yet. David On Fri, Oct 5, 2012 at 3:53 PM, Diego Novillo wrote: > Harshit, why didn't you propose this patch for trunk? Why should we > make it a google-local patch? > > > Diego. > > On Fri, Sep 28, 2012 at 5:24 AM, Harshit Chopra wrote: >> commit fc3a55ccec9bc770c79f8a221f5abd397befc8f6 >> Author: Harshit Chopra >> Date: Thu Sep 20 17:49:59 2012 -0700 >> >> Instead of emitting absolute addresses to the function patch sections, >> emit relative addresses. Absolute addresses might require relocation, which >> is time consuming and fraught with other issues. >> >> M gcc/config/i386/i386.c >> >> Tested: >> Ran make check-gcc and manually confirmed that the affected tests pass. >> >> ChangeLog: >> >> 2012-09-28 Harshit Chopra >> >> * gcc/config/i386/i386.c >> (ix86_output_function_nops_prologue_epilogue): Emit relative address to >> function patch sections. >> >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> index f72b0b5..8c9334f 100644 >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -11098,7 +11098,7 @@ ix86_output_function_nops_prologue_epilogue (FILE >> *file, >> $LFPEL0: >> >> 0x90 (repeated num_actual_nops times) >> - .quad $LFPESL0 >> + .quad $LFPESL0 - . >> followed by section 'section_name' which contains the address >> of instruction at 'label'. >> */ >> @@ -0,7 +0,10 @@ ix86_output_function_nops_prologue_epilogue (FILE >> *file, >> asm_fprintf (file, ASM_BYTE"0x90\n"); >> >>fprintf (file, ASM_QUAD); >> + /* Output "section_label - ." for the relative address of the entry in >> + the section 'section_name'. */ >>assemble_name_raw (file, section_label); >> + fprintf (file, " - ."); >>fprintf (file, "\n"); >> >>/* Emit the backpointer section. For functions belonging to comdat group, >> @@ -11144,7 +11147,7 @@ ix86_output_function_nops_prologue_epilogue (FILE >> *file, >> .quad $LFPEL0 >> */ >>ASM_OUTPUT_INTERNAL_LABEL (file, section_label); >> - fprintf(file, ASM_QUAD"\t"); >> + fprintf(file, ASM_QUAD); >>assemble_name_raw (file, label); >>fprintf (file, "\n"); >> >> >> -- >> This patch is available for review at http://codereview.appspot.com/6572065
Re: *ping* [patch, libfortran] Fix PR 54736, memory corruption with GFORTRAN_CONVERT_UNIT
Hi Steven, Ping? I would like to start committing patches so I don't have too many of them in my tree at the same time :-) This looks OK to me. Committed as rev. 192158. I'll commit to 4.7 and after that to 4.6 in a few days. Thanks a lot for the review! Thomas
Re: [google] Emit relative addresses to function patch sections instead of absolute addresses. (issue6572065)
Ok for google branches. Please consider resend the original xray patch to trunk (gcc-4_8) You need to make the runtime bits available publicly though. thanks, David On Fri, Sep 28, 2012 at 2:24 AM, Harshit Chopra wrote: > commit fc3a55ccec9bc770c79f8a221f5abd397befc8f6 > Author: Harshit Chopra > Date: Thu Sep 20 17:49:59 2012 -0700 > > Instead of emitting absolute addresses to the function patch sections, > emit relative addresses. Absolute addresses might require relocation, which > is time consuming and fraught with other issues. > > M gcc/config/i386/i386.c > > Tested: > Ran make check-gcc and manually confirmed that the affected tests pass. > > ChangeLog: > > 2012-09-28 Harshit Chopra > > * gcc/config/i386/i386.c > (ix86_output_function_nops_prologue_epilogue): Emit relative address to > function patch sections. > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index f72b0b5..8c9334f 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -11098,7 +11098,7 @@ ix86_output_function_nops_prologue_epilogue (FILE > *file, > $LFPEL0: > > 0x90 (repeated num_actual_nops times) > - .quad $LFPESL0 > + .quad $LFPESL0 - . > followed by section 'section_name' which contains the address > of instruction at 'label'. > */ > @@ -0,7 +0,10 @@ ix86_output_function_nops_prologue_epilogue (FILE > *file, > asm_fprintf (file, ASM_BYTE"0x90\n"); > >fprintf (file, ASM_QUAD); > + /* Output "section_label - ." for the relative address of the entry in > + the section 'section_name'. */ >assemble_name_raw (file, section_label); > + fprintf (file, " - ."); >fprintf (file, "\n"); > >/* Emit the backpointer section. For functions belonging to comdat group, > @@ -11144,7 +11147,7 @@ ix86_output_function_nops_prologue_epilogue (FILE > *file, > .quad $LFPEL0 > */ >ASM_OUTPUT_INTERNAL_LABEL (file, section_label); > - fprintf(file, ASM_QUAD"\t"); > + fprintf(file, ASM_QUAD); >assemble_name_raw (file, label); >fprintf (file, "\n"); > > > -- > This patch is available for review at http://codereview.appspot.com/6572065
handle isl and cloog in contrib/download_prerequisites
Hi, GCC now requires ISL and a very new CLOOG but download_prerequisites does not download those. Also, there is only one sensible place to call this script from, so check for that. Also, factorize a bit the code. Tested by using it in the the GCC Compile Farm. I tried to build and install CLOOG and ISL by myself into /opt/cfarm/ but that breaks in some strange ways. Using the script, bootstrap works just fine. In some dream universe, configure would detect that the prerequisites are missing or incorrect very early, give a sensible message and suggest to download them automatically, and then just do that. OK for trunk? Cheers, Manuel. downpre.changelog Description: Binary data downpre.diff Description: Binary data
Re: [google] AutoFDO implementation
> Hi, > > This patch implements the fine-graind AutoFDO optimizations for GCC. > It uses linux perf to collect sample profiles, and uses debug info to > represent the profile. In GCC, it uses the profile to annotate CFG to > drive FDO. This can bring 50% to 110% of the speedup derived by > traditional instrumentation based FDO. (Average is between 70% to 80% > for many CPU intensive applications). Comparing with traditional FDO, > AutoFDO does not require instrumentation. It just need to have an > optimized binary with debug info to collect the profile. > > This patch has passed bootstrap and gcc regression tests as well as > tested with crosstool. Okay for google branches? > > If people in up-stream find this feature interesting, I'll spend some > time to port this to trunk and try to opensource the tool to generate > profile data file. I think it is useful feature, yes (and was in my TODO list for quite some time). Unlike edge profiles, these profiles should be also more independent of source code/configuration changes. Just few quick questions from first glance over the patch... > > Dehao > > The patch can also be viewed from: > > http://codereview.appspot.com/6567079 > > gcc/ChangeLog.google-4_7: > 2012-09-28 Dehao Chen > > * cgraphbuild.c (build_cgraph_edges): Handle AutoFDO profile. > (rebuild_cgraph_edges): Likewise. > * cgraph.c (cgraph_clone_node): Likewise. > (clone_function_name): Likewise. > * cgraph.h (cgraph_node): New field. > * tree-pass.h (pass_ipa_auto_profile): New pass. > * cfghooks.c (make_forwarder_block): Handle AutoFDO profile. > * ipa-inline-transform.c (clone_inlined_nodes): Likewise. > * toplev.c (compile_file): Likewise. > (process_options): Likewise. > * debug.h (auto_profile_debug_hooks): New. > * cgraphunit.c (cgraph_finalize_compilation_unit): Handle AutoFDO > profile. > (cgraph_copy_node_for_versioning): Likewise. > * regs.h (REG_FREQ_FROM_BB): Likewise. > * gcov-io.h: (GCOV_TAG_AFDO_FILE_NAMES): New. > (GCOV_TAG_AFDO_FUNCTION): New. > (GCOV_TAG_AFDO_MODULE_GROUPING): New. > * ira-int.h (REG_FREQ_FROM_EDGE_FREQ): Handle AutoFDO profile. > * ipa-inline.c (edge_hot_enough_p): Likewise. > (edge_badness): Likewise. > (inline_small_functions): Likewise. > * dwarf2out.c (auto_profile_debug_hooks): New. > * opts.c (common_handle_option): Handle AutoFDO profile. > * timevar.def (TV_IPA_AUTOFDO): New. > * predict.c (compute_function_frequency): Handle AutoFDO profile. > (rebuild_frequencies): Handle AutoFDO profile. > * auto-profile.c (struct gcov_callsite_pos): New. > (struct gcov_callsite): New. > (struct gcov_stack): New. > (struct gcov_function): New. > (struct afdo_bfd_name): New. > (struct afdo_module): New. > (afdo_get_filename): New. > (afdo_get_original_name_size): New. > (afdo_get_bfd_name): New. > (afdo_read_bfd_names): New. > (afdo_stack_hash): New. > (afdo_stack_eq): New. > (afdo_function_hash): New. > (afdo_function_eq): New. > (afdo_bfd_name_hash): New. > (afdo_bfd_name_eq): New. > (afdo_bfd_name_del): New. > (afdo_module_hash): New. > (afdo_module_eq): New. > (afdo_module_num_strings): New. > (afdo_add_module): New. > (read_aux_modules): New. > (get_inline_stack_size_by_stmt): New. > (get_inline_stack_size_by_edge): New. > (get_function_name_from_block): New. > (get_inline_stack_by_stmt): New. > (get_inline_stack_by_edge): New. > (afdo_get_function_count): New. > (afdo_set_current_function_count): New. > (afdo_add_bfd_name_mapping): New. > (afdo_add_copy_scale): New. > (get_stack_count): New. > (get_stmt_count): New. > (afdo_get_callsite_count): New. > (afdo_get_bb_count): New. > (afdo_annotate_cfg): New. > (read_profile): New. > (process_auto_profile): New. > (init_auto_profile): New. > (end_auto_profile): New. > (afdo_find_equiv_class): New. > (afdo_propagate_single_edge): New. > (afdo_propagate_multi_edge): New. > (afdo_propagate_circuit): New. > (afdo_propagate): New. > (afdo_calculate_branch_prob): New. > (auto_profile): New. > (gate_auto_profile_ipa): New. > (struct simple_ipa_opt_pass): New. > * auto-profile.h (init_auto_profile): New. > (end_auto_profile): New. > (process_auto_profile): New. > (afdo_set_current_function_count): New. > (afdo_add_bfd_name_mapping): New. > (afdo_add_copy_scale): New. > (afdo_calculate_branch_prob): New. > (afdo_get_callsite_count): New. > (afdo_get_bb_count): New. > * profile.c (compute_branch_probabilities): Handle AutoFDO profile. > (branch_prob): Likeise. > * loop-unroll.c (decide_unroll_runtime_iterations): Likewise. > * coverage.c (coverage_init): Likewise. > * tree-ssa-live.c (remove_unused_scope_block_p): Likewise. > * common.opt (fauto-profile): New. > * tree-inline.c (copy_bb): Handle AutoFDO profile. > (copy_edges_for_bb): Likewise. > (copy_cfg_body): Likewise. > * tree-profile.c (direct_call_profiling): Likewise. > (gate_tree_profile_ipa): Likewise. > * basic-block.h (EDGE_ANNOTATED): New field. > (BB_ANNOTATED): New field. > * tree-cfg.c (gimple_merge_blocks): Handle AutoFDO profile. > * passes.c (init_o
Re: handle isl and cloog in contrib/download_prerequisites
On 2012-10-06 13:30 , Manuel López-Ibáñez wrote: +download_prerequisite() { +WHAT=$1 +VERSION=$2 +PACK=$3 +case $PACK in +tar.bz2) TARX="j";; +tar.gz) TARX="z";; +esac GNU tar has not needed 'j' or 'z' for a while. Not sure whether other tars have the same feature, though. The patch is OK with a minor change: +wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$WHAT-${VERSION}.${PACK} || exit 1 +tar x${TARX}f $WHAT-${VERSION}.${PACK} || exit 1 +ln -sf $WHAT-${VERSION} $WHAT || exit 1 +rm $WHAT-${VERSION}.${PACK} || exit 1 +} + +if [ ! -e ./contrib/download_prerequisites ]; then Better use -x here. Diego.
Re: [PATCH] Rs6000 infrastructure cleanup (switches), revised patch #3
Michael Meissner wrote: > @@ -10326,7 +10352,7 @@ static rtx > altivec_expand_ld_builtin (tree exp, rtx target, bool *expandedp) > { >tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0); > - unsigned int fcode = DECL_FUNCTION_CODE (fndecl); > + enum rs6000_builtins fcode = (enum rs6000_builtins) DECL_FUNCTION_CODE (fndecl); >tree arg0; >enum machine_mode tmode, mode0; >rtx pat, op0; There are several changes like that in your patch. Shouldn't these changes be done in a separate patch? Regards, Gunther
Re: [PATCH] Rs6000 infrastructure cleanup (switches), revised patch #2c
Michael Meissner wrote: > On Thu, Oct 04, 2012 at 06:33:33PM +0200, Gunther Nikl wrote: >> Michael Meissner schrieb: >>> On Tue, Oct 02, 2012 at 10:13:25AM +0200, Gunther Nikl wrote: Michael Meissner wrote: > Segher Boessenkool asked me on IRC to break out the fix in the last > change. > This patch is just the change to set the default options if the user did > not > use -mcpu= and the compiler was not configured with --with-cpu=. > Here are the patches. Which GCC releases are affected by this bug? >>> All of them. >> So this bug is as old as the rs6000 port has PowerPC support? Then GCC >> 2.95 is also affected? I took a closer look. AFAICT, the above mentioned bug is a result of the rewritten option parsing for GCC 4.6. And that was what I wanted to know. Since this is a regression, should this be fixed in 4.6/4.7? > Well as I said, it is pretty latent, and most people never have noticed it. > It really depends on what the options are whether you run into the problem. Yes, I understood that but this did not address my question. Regards, Gunther
Re: [patch] fix libbacktrace build failure on arm-linux
On Sat, Oct 6, 2012 at 8:09 AM, Matthias Klose wrote: > current trunk fails to build on arm-linux with: > > In file included from ../../../../src/libbacktrace/backtrace.c:35:0: > ../libgcc/unwind.h: In function '_Unwind_decode_typeinfo_ptr': > ../libgcc/unwind.h:42:45: error: unused parameter 'base' > [-Werror=unused-parameter] >_Unwind_decode_typeinfo_ptr (_Unwind_Word base, _Unwind_Word ptr) > ^ > ../libgcc/unwind.h: In function '__gnu_unwind_24bit': > ../libgcc/unwind.h:68:41: error: unused parameter 'context' > [-Werror=unused-parameter] >__gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact) > ^ > ../libgcc/unwind.h:68:54: error: unused parameter 'data' > [-Werror=unused-parameter] >__gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact) > ^ > ../libgcc/unwind.h:68:64: error: unused parameter 'compact' > [-Werror=unused-parameter] >__gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact) > ^ > cc1: all warnings being treated as errors > make[8]: *** [backtrace.lo] Error 1 > > the immediate fix is to mark all arguments as unused, however I don't know if > this function should be used by libbacktrace, if it returns _URC_FAILURE > unconditionally. The function is not used by libbacktrace. It's an inline function defined in the header file, and the warning is about the inline function definition. That is, this is a libgcc bug, not a libbacktrace bug, it just happens to show up when compiling libbacktrace for ARM > * config/arm/unwind-arm.h (__gnu_unwind_24bit): Mark parameters > as unused. > > > --- libgcc/config/arm/unwind-arm.h (revision 192162) > +++ libgcc/config/arm/unwind-arm.h (working copy) > @@ -64,8 +64,11 @@ >return tmp; > } > > +#define __unused __attribute__((unused)) > + >static inline _Unwind_Reason_Code > - __gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact) > + __gnu_unwind_24bit (_Unwind_Context * context __unused, _uw data __unused, > + int compact __unused) > { >return _URC_FAILURE; > } Don't #define __unused. Just write __attribute__ ((unused)) on the parameters. Break the lines as needed. This is OK with that change. Thanks. Ian
[Patch, libfortran] Fix usage of secure_getenv with glibc 2.17
Hi, glibc 2.17 has renamed the exported but ostensibly private symbol __secure_getenv to secure_getenv. The attached patch updates libgfortran to first check for the presence of this new symbol. See http://sourceware.org/glibc/wiki/Tips_and_Tricks/secure_getenv for more details. Committed as obvious. 2012-10-06 Janne Blomqvist * configure.ac: Check for presence of secure_getenv. * libgfortran.h: Use HAVE_SECURE_GETENV. * Makefile.in: Regenerated. * aclocal.m4: Regenerated. * config.h.in: Regenerated. * configure: Regenerated. -- Janne Blomqvist secure_getenv.diff Description: Binary data
[Patch, Fortran] PR 40453: Enhanced (recursive) argument checking
Hi all, here is a rather straightforward patch, which does 'recursive' checking of dummy procedures. Regtested on x86_64-unknown-linux-gnu. Ok for trunk? Cheers, Janus 2012-10-06 Janus Weil PR fortran/40453 * interface.c (check_dummy_characteristics): Recursively check dummy procedures. 2012-10-06 Janus Weil PR fortran/40453 * gfortran.dg/dummy_procedure_9.f90: New. pr40453.diff Description: Binary data dummy_procedure_9.f90 Description: Binary data
Re: [TILE-Gx, committed] support -mcmodel=MODEL
On Fri, 14 Sep 2012, Walter Lee wrote: > How does this look: Good. Just, for some reason, the mail did not appear in my inbox and I was backlogged on gcc-patches; sorry for missing it. Here is a small markup change I applied on the wwwdocs side on top of your patch. Gerald Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.38 diff -u -3 -p -r1.38 changes.html --- changes.html6 Oct 2012 15:47:44 - 1.38 +++ changes.html6 Oct 2012 22:19:27 - @@ -348,8 +348,9 @@ by this change. TILE-Gx -Added support for the -mcmodel=MODEL command-line option. The -models supported are small and large. +Added support for the -mcmodel=MODEL +command-line option. The models supported are small +and large. XStormy16
Re: handle isl and cloog in contrib/download_prerequisites
On Sat, Oct 6, 2012 at 7:30 AM, Manuel López-Ibáñez wrote: > Hi, > > GCC now requires ISL and a very new CLOOG but download_prerequisites > does not download those. Also, there is only one sensible place to As of what version is isl/cloog no longer optional?
Re: [google] AutoFDO implementation
Jan Hubicka writes: > > I think it is useful feature, yes (and was in my TODO list for quite some > time). Unlike edge profiles, these profiles should be also more independent of > source code/configuration changes. It would be good to look at the tool, but: - Does it use the perf LBR support to estimate the last 16 basic blocks for each sample? - It doesn't seem to use callgraphs for estimating common indirect calls for devirtualization? I always disliked the limitation of the current indirect call profiling to only support a single translation unit. Using callgraphs around that would be quite nice. This would only work for non tail calls and with frame pointers, or using LBRs again. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [google] AutoFDO implementation
On Sat, Oct 6, 2012 at 10:55 AM, Jan Hubicka wrote: >> Hi, >> >> This patch implements the fine-graind AutoFDO optimizations for GCC. >> It uses linux perf to collect sample profiles, and uses debug info to >> represent the profile. In GCC, it uses the profile to annotate CFG to >> drive FDO. This can bring 50% to 110% of the speedup derived by >> traditional instrumentation based FDO. (Average is between 70% to 80% >> for many CPU intensive applications). Comparing with traditional FDO, >> AutoFDO does not require instrumentation. It just need to have an >> optimized binary with debug info to collect the profile. >> >> This patch has passed bootstrap and gcc regression tests as well as >> tested with crosstool. Okay for google branches? >> >> If people in up-stream find this feature interesting, I'll spend some >> time to port this to trunk and try to opensource the tool to generate >> profile data file. > > I think it is useful feature, yes (and was in my TODO list for quite some > time). Unlike edge profiles, these profiles should be also more independent of > source code/configuration changes. Thanks for your feedback and interest. Yes, in AutoFDO the coupling between the profiling build and fdo build are much loosen. > > Just few quick questions from first glance over the patch... >> >> Dehao >> >> The patch can also be viewed from: >> >> http://codereview.appspot.com/6567079 >> >> gcc/ChangeLog.google-4_7: >> 2012-09-28 Dehao Chen >> >> * cgraphbuild.c (build_cgraph_edges): Handle AutoFDO profile. >> (rebuild_cgraph_edges): Likewise. >> * cgraph.c (cgraph_clone_node): Likewise. >> (clone_function_name): Likewise. >> * cgraph.h (cgraph_node): New field. >> * tree-pass.h (pass_ipa_auto_profile): New pass. >> * cfghooks.c (make_forwarder_block): Handle AutoFDO profile. >> * ipa-inline-transform.c (clone_inlined_nodes): Likewise. >> * toplev.c (compile_file): Likewise. >> (process_options): Likewise. >> * debug.h (auto_profile_debug_hooks): New. >> * cgraphunit.c (cgraph_finalize_compilation_unit): Handle AutoFDO >> profile. >> (cgraph_copy_node_for_versioning): Likewise. >> * regs.h (REG_FREQ_FROM_BB): Likewise. >> * gcov-io.h: (GCOV_TAG_AFDO_FILE_NAMES): New. >> (GCOV_TAG_AFDO_FUNCTION): New. >> (GCOV_TAG_AFDO_MODULE_GROUPING): New. >> * ira-int.h (REG_FREQ_FROM_EDGE_FREQ): Handle AutoFDO profile. >> * ipa-inline.c (edge_hot_enough_p): Likewise. >> (edge_badness): Likewise. >> (inline_small_functions): Likewise. >> * dwarf2out.c (auto_profile_debug_hooks): New. >> * opts.c (common_handle_option): Handle AutoFDO profile. >> * timevar.def (TV_IPA_AUTOFDO): New. >> * predict.c (compute_function_frequency): Handle AutoFDO profile. >> (rebuild_frequencies): Handle AutoFDO profile. >> * auto-profile.c (struct gcov_callsite_pos): New. >> (struct gcov_callsite): New. >> (struct gcov_stack): New. >> (struct gcov_function): New. >> (struct afdo_bfd_name): New. >> (struct afdo_module): New. >> (afdo_get_filename): New. >> (afdo_get_original_name_size): New. >> (afdo_get_bfd_name): New. >> (afdo_read_bfd_names): New. >> (afdo_stack_hash): New. >> (afdo_stack_eq): New. >> (afdo_function_hash): New. >> (afdo_function_eq): New. >> (afdo_bfd_name_hash): New. >> (afdo_bfd_name_eq): New. >> (afdo_bfd_name_del): New. >> (afdo_module_hash): New. >> (afdo_module_eq): New. >> (afdo_module_num_strings): New. >> (afdo_add_module): New. >> (read_aux_modules): New. >> (get_inline_stack_size_by_stmt): New. >> (get_inline_stack_size_by_edge): New. >> (get_function_name_from_block): New. >> (get_inline_stack_by_stmt): New. >> (get_inline_stack_by_edge): New. >> (afdo_get_function_count): New. >> (afdo_set_current_function_count): New. >> (afdo_add_bfd_name_mapping): New. >> (afdo_add_copy_scale): New. >> (get_stack_count): New. >> (get_stmt_count): New. >> (afdo_get_callsite_count): New. >> (afdo_get_bb_count): New. >> (afdo_annotate_cfg): New. >> (read_profile): New. >> (process_auto_profile): New. >> (init_auto_profile): New. >> (end_auto_profile): New. >> (afdo_find_equiv_class): New. >> (afdo_propagate_single_edge): New. >> (afdo_propagate_multi_edge): New. >> (afdo_propagate_circuit): New. >> (afdo_propagate): New. >> (afdo_calculate_branch_prob): New. >> (auto_profile): New. >> (gate_auto_profile_ipa): New. >> (struct simple_ipa_opt_pass): New. >> * auto-profile.h (init_auto_profile): New. >> (end_auto_profile): New. >> (process_auto_profile): New. >> (afdo_set_current_function_count): New. >> (afdo_add_bfd_name_mapping): New. >> (afdo_add_copy_scale): New. >> (afdo_calculate_branch_prob): New. >> (afdo_get_callsite_count): New. >> (afdo_get_bb_count): New. >> * profile.c (compute_branch_probabilities): Handle AutoFDO profile. >> (branch_prob): Likeise. >> * loop-unroll.c (decide_unroll_runtime_iterations): Likewise. >> * coverage.c (coverage_init): Likewise. >> * tree-ssa-live.c (remove_unused_scope_block_p): Likewise. >> * common.opt (fauto-profile): New. >> * tree-inline.c (copy_bb): Handle AutoFDO profile. >> (copy
Re: [google] AutoFDO implementation
On Sat, Oct 6, 2012 at 6:13 PM, Andi Kleen wrote: > Jan Hubicka writes: >> >> I think it is useful feature, yes (and was in my TODO list for quite some >> time). Unlike edge profiles, these profiles should be also more independent >> of >> source code/configuration changes. > > It would be good to look at the tool, but: Thanks a lot for the interests and feedback. > > - Does it use the perf LBR support to estimate the last 16 basic blocks > for each sample? Yes, it uses LBR to calculate binary level profile. > > - It doesn't seem to use callgraphs for estimating common indirect calls > for devirtualization? At this phase, VPT is not supported, but will definitely support later. > > I always disliked the limitation of the current indirect call profiling > to only support a single translation unit. Using callgraphs around > that would be quite nice. You can take a look at LIPO, which supports multiple indirect call targets. But yes, LBR can give us complete indirect call histogram for each callsite, and almost for free. We'll use that to drive indirect call promotion. Currently, open-sourcing the tool to generate profile data file seemed more work than rewriting it because it has some internal dependencies. My plan is to integrate it as part of perf, i.e. one can use perf to get .afdo file directly. We'll try to make this happen soon and make into the gcc 4.9 release schedule. Cheers, Dehao > > This would only work for non tail calls and with frame pointers, or > using LBRs again. > > -Andi > > -- > a...@linux.intel.com -- Speaking for myself only
[PATCH] Add TF support for OpenBSD/i386 and OpenBSD/amd64
Adds the necessary support bits to libgcc. All other mainstream i386/amd64 targets already have this. Tested on i386-unknown-openbsd5.2 and x86_64-unknown-openbsd5.2. Fixes a couple of testcases. libgcc/: 2012-10-06 Mark Kettenis * config.host (i[34567]86-*-openbsd* and x86_64-*-openbsd*): Add to list of i[34567]86-*-* and x86_64-*-* soft-fp targets. gcc/: 2012-10-06 Mark Kettenis * config/i386/openbsdelf.h (LIBGCC2_HAS_TF_MODE, LIBGCC2_TF_CEXT) (TF_SIZE): Define. Index: libgcc/config.host === --- libgcc/config.host (revision 192154) +++ libgcc/config.host (working copy) @@ -1156,7 +1156,8 @@ i[34567]86-*-gnu* | \ i[34567]86-*-solaris2* | x86_64-*-solaris2.1[0-9]* | \ i[34567]86-*-cygwin* | i[34567]86-*-mingw* | x86_64-*-mingw* | \ - i[34567]86-*-freebsd* | x86_64-*-freebsd*) + i[34567]86-*-freebsd* | x86_64-*-freebsd* | \ + i[34567]86-*-openbsd* | x86_64-*-openbsd*) tmake_file="${tmake_file} t-softfp-tf" if test "${host_address}" = 32; then tmake_file="${tmake_file} i386/${host_address}/t-softfp" Index: gcc/config/i386/openbsdelf.h === --- gcc/config/i386/openbsdelf.h(revision 192154) +++ gcc/config/i386/openbsdelf.h(working copy) @@ -111,3 +111,9 @@ #define OBSD_HAS_CORRECT_SPECS #define HAVE_ENABLE_EXECUTE_STACK + +/* Put all *tf routines in libgcc. */ +#undef LIBGCC2_HAS_TF_MODE +#define LIBGCC2_HAS_TF_MODE 1 +#define LIBGCC2_TF_CEXT q +#define TF_SIZE 113