RFA: Use "m_foo" rather than "foo_" for member variables
Michael Matz writes: >Trever Saunders writes: >> Richard Biener writes: >> > Btw, I've come around multiple coding-styles in the past and I >> > definitely would prefer m_mode / m_count to mark members vs. mode_ and >> > count_. (and s_XXX for static members IIRC). >> >> I'd prefer m_/s_foo for members / static things too fwiw. > > Me as well. It's still ugly, but not so unsymmetric as the trailing > underscore. Well, I'm not sure how I came to be the one writing these patches, but I suppose I prefer m_foo too. So how about the attached? The first patch has changes to the coding conventions. I added some missing spaces while there. The second patch has the mechanical code changes. The reason for yesterday's mass adding of spaces was because the second patch would have been pretty inconsistent otherwise. Tested on x86_64-linux-gnu. Thanks, Richard Index: htdocs/codingconventions.html === RCS file: /cvs/gcc/wwwdocs/htdocs/codingconventions.html,v retrieving revision 1.67 diff -u -p -r1.67 codingconventions.html --- htdocs/codingconventions.html 16 Jul 2012 19:51:44 - 1.67 +++ htdocs/codingconventions.html 29 Sep 2013 08:55:10 - @@ -685,10 +685,10 @@ The compiler must build cleanly with Assertions -Code should use gcc_assert(EXPR) to check invariants. -Use gcc_unreachable() to mark places that should never be +Code should use gcc_assert (EXPR) to check invariants. +Use gcc_unreachable () to mark places that should never be reachable (such as an unreachable default case of a -switch). Do not use gcc_assert(0) for such purposes, as +switch). Do not use gcc_assert (0) for such purposes, as gcc_unreachable gives the compiler more information. The assertions are enabled unless explicitly configured off with --enable-checking=none. Do not use abort. @@ -1087,7 +1087,8 @@ rather thanWhen structs and/or classes have member functions, -prefer to name data members with a trailing underscore. +prefer to name data members with a leading m_ +and static data members with a leading s_. @@ -1201,7 +1202,7 @@ Prefer to put the entire member head on -gnuclass::gnuclass() : base_class() +gnuclass::gnuclass () : base_class () { ... }; @@ -1213,8 +1214,8 @@ place the colon of the initializer claus -gnuclass::gnuclass() -: base1(), base2(), member1(), member2(), member3(), member4() +gnuclass::gnuclass () +: base1 (), base2 (), member1 (), member2 (), member3 (), member4 () { ... }; @@ -1226,9 +1227,9 @@ move overflowing initializers to the nex -gnuclass::gnuclass() -: base1(some_expression), base2(another_expression), - member1(my_expressions_everywhere) +gnuclass::gnuclass () +: base1 (some_expression), base2 (another_expression), + member1 (my_expressions_everywhere) { ... }; @@ -1242,7 +1243,7 @@ it should appear on the next line indent void -very_long_class_name::very_long_function_name( +very_long_class_name::very_long_function_name ( very_long_type_name arg) { @@ -1256,7 +1257,7 @@ We may wish to do so pre-emptively for a void very_long_template_class_name -::very_long_function_name( +::very_long_function_name ( very_long_type_name arg) { Index: htdocs/codingrationale.html === RCS file: /cvs/gcc/wwwdocs/htdocs/codingrationale.html,v retrieving revision 1.2 diff -u -p -r1.2 codingrationale.html --- htdocs/codingrationale.html 6 Sep 2012 02:54:40 - 1.2 +++ htdocs/codingrationale.html 29 Sep 2013 08:55:10 - @@ -351,10 +351,10 @@ but the clarity in layout persists. Names -Naming data members with a trailing underscore +Prefixing data members with m_ highlights the extra overhead of access to fields over local variables. -Think of the trailing underscore -like you would Pascal's postfix ^ dereference operator. +Think of the leading m_ as being similar to the +* dereference operator. gcc/ * basic-block.h (edge_list): Prefix member names with "m_". * context.h (context): Likewise. * domwalk.h (dom_walker): Likewise. * gengtype-state.c (s_expr_writer, state_writer): Likewise. * graphite-sese-to-poly.c (sese_dom_walker): Likewise. * hash-table.h (hash_table): Likewise. * machmode.h (bit_field_mode_iterator): Likewise. * pass_manager.h (pass_list): Likewise. * tree-into-ssa.c (mark_def_dom_walker): Likewise. * tree-pass.h (pass_data): Likewise. * tree-ssa-dom.c (dom_opt_dom_walker): Likewise. * tree-ssa-phiopt.c (nontrapping_dom_walker): Likewise, * tree-ssa-uncprop.c (uncprop_dom_walker): Likewise. * asan.c (pass_data_asan): Update accordingly. * cfganal.c (control_dependences::find_control_dependence): Likewise. (control_dependences::control_dependences): Likewise. (control_dependences::~control_dependences): Likewise. (control_dependences::~control_dependences): Likewise. (control_dependences::get_edges_dependent_on):
Re: Add value range support into memcpy/memset expansion
> > We now produce: > > movqb(%rip), %rsi > > movqa(%rip), %rcx > > movq(%rsi), %rax <- first 8 bytes are moved > > leaq8(%rcx), %rdi > > andq$-8, %rdi <- dest is aligned > > movq%rax, (%rcx) > > movq132(%rsi), %rax <- last 8 bytes are moved > > movq%rax, 132(%rcx) > > subq%rdi, %rcx <- alignment is subtracted from count > > > subq%rcx, %rsi <- source is aligned > > This (source aligned) is not always true, but nevertheless the > sequence is very tight. Yep, sure, it is algigned only if source-dest is aligned, but that is best we can ask for. > > Unforutnately the following testcase: > > char *p,*q; > > t(int a) > > { > > if (a<100) > > memcpy(q,p,a); > > > > } > > Won't get inlined. This is because A is known to be smaller than 100 that > > results in anti range after conversion to size_t. This anti range allows > > very > > large values (above INT_MAX) and thus we do not know the block size. > > I am not sure if the sane range can be recovered somehow. If not, maybe > > this is common enough to add support for "probable" upper bound parameter to > > the template. > > Do we know if there is real code that intentionally does that other > than security flaws as result of improperly done range check? I do not think so. > > I think by default GCC should assume the memcpy size range is (0, 100) > here with perhaps an option to override it. Indeed, this is what I was suggesting. Problem is what to pass down to the expanders as a value range. We either need to update documentation of the expanders that the ranges are just highly probably - and I do not want to do that since I want to use the ranges for move_by_pieces, too. So I think we will have to introduce two upper bounds parameters - one sure and other very likely if there is no other solution. We play similar tricks in niter code. Honza
Re: Add value range support into memcpy/memset expansion
Hi Jan, > I also think the misaligned move trick can/should be performed by > move_by_pieces and we ought to consider sane use of SSE - current vector_loop > with unrolling factor of 4 seems bit extreme. At least buldozer is happy with > 2 and I would expect SSE moves to be especially useful for moving blocks with > known size where they are not used at all. > > Currently I disabled misaligned move prologues/epilogues for Michael's vector > loop path since they ends up longer than the traditional code (that use loop > for epilogue) Prologues could use this techniques even with vector_loop, as they actually don't have a loop. As for epilogues - have you tried to use misaligned vector_moves (movdqu)? It looks to me that we need approx. the same amount of instructions in vector-loop and in usual-loop epilogues, if we use vector-instructions in vector-loop epilogue. > Comments are welcome. BTW, maybe we could generalize expand_small_movmem a bit and make a separate expanding strategy out of it? It will expand a memmov with no loop (and probably no alignment prologue) - just with the biggest available moves. Also, a cost model could be added here to make decisions on when we actually want to align the moves. Here is a couple of examples of that: memcpy (a, b, 32); // alignment is unknown will expand to movdqu a, %xmm0 movdqu a+16, %xmm1 movdqu %xmm0, b movdqu %xmm1, b+16 memcpy (a, b, 32); // alignment is known and equals 64bit will expand to a) movdqu a, %xmm0 movdqu a+16, %xmm1 movdqu %xmm0, b movdqu %xmm1, b+16 or b) movqa, %xmm0 movdqa a+8, %xmm1 movqa+24,%xmm2 movq%xmm0, b movdqa %xmm1, b+8 movq%xmm2, b+24 We would compute the total cost of both variant and choose the best - for computation we need just a cost of aligned and misaligned moves. This strategy is actually pretty similar to move_by_pieces, but as it have much more target-specific information, it would be able to produce much more effective code. And one more question - in a work on vector_loop for memset I tried to merge many of movmem and setmem routines (e.g. instead of expand_movmem_prologue and expand_setmem_prologue I made a single routine expand_movmem_or_setmem_prologue). What do you think, is it a good idea? It reduces code size in i386.c, but slightly complicates the code. I'll send a patch shortly, as soon as the testing completes. Thanks, Michael
Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (1).
Chung-Ju Wu writes: >>> +/* Permitting tail calls. */ >>> + >>> +static bool >>> +nds32_warn_func_return (tree decl) >>> +{ >>> + /* Naked functions are implemented entirely in assembly, including the >>> + return sequence, so suppress warnings about this. */ >>> + return !nds32_naked_function_p (decl); >>> +} >> >> The comment above the function doesn't seem to match the function. >> > > The TARGET_WARN_FUNC_RETURN is located at "17.10.13 Permitting tail calls" > in GCC Internal 4.9.0 pre-release. Hmm, good point. That seems kind-of misplaced. > I should have added comment to describe > nds32_warn_func_return() to avoid such confusion. Like this: > > /* Permitting tail calls. */ > > /* Determine whether we need to enable warning for function return check. > */ > static bool > nds32_warn_func_return (tree decl) Look's good, thanks. >>> + /* If the function is 'naked', we do not have to generate >>> + epilogue code fragment BUT 'ret' instruction. */ >>> + if (cfun->machine->naked_p) >>> +{ >>> + /* Generate return instruction by using unspec_func_return pattern. >>> + Make sure this instruction is after gen_blockage(). >>> + NOTE that $lp will become 'live' >>> + after this instruction has been emitted. */ >>> + emit_insn (gen_unspec_func_return ()); >>> + return; >>> +} >> >> FWIW, this is different from AVR (for one), which explicitly doesn't emit a >> return instruction. One of the uses for AVR is to chain code together >> into an .init-like section. >> > > Yes. > Personally I prefer not to generate 'ret' instruction for a naked function. > Programmers have responsibility to take all the jobs and do everything > they need with c code, attribute, and inline asm. > > But some of our existing customers also expect that a naked function > is supposed to imply "no push/pop" only. Eventually we take their > requirement into account, having 'ret' produced by compiler automatically. That's certainly OK with me FWIW. I don't think there's any realistic hope of having the attribute behave the same way everywhere, because it's already been used in incompatible ways on different targets. >> Redundant brackets around the ==. There are a few other cases in the >> patch too, although most of it seems OK. >> > > Sorry I didn't see such requirement in GNU coding standards or > GCC coding convention. If it is not defined in coding convention, > may I keep these redundant brackets for readability? :) Now you mention it, I can't find it written down anywhere either. Maybe it's one of those unwritten things that I've been taking for granted, or maybe I just imagined it :-) So I can't push back on this. > + /* We need to provide a customized rtx which contains > + necessary information for data analysis, > + so we create a parallel rtx like this: > + (parallel [(unspec [(reg: Rb) > + (reg: Re) > + (const_int En4)] UNSPEC_STACK_PUSH_MULTIPLE) > +(use (reg:SI Rb)) > +(use (reg:SI Re)) > +(set (mem (plus (reg:SI SP_REGNUM) (const_int -4))) > + (reg:SI LP_REGNUM)) > +(set (mem (plus (reg:SI SP_REGNUM) (const_int -8))) > + (reg:SI GP_REGNUM)) > +(set (mem (plus (reg:SI SP_REGNUM) (const_int -12))) > + (reg:SI FP_REGNUM)) > +(set (mem (plus (reg:SI SP_REGNUM) (const_int -16))) > + (reg:SI Re)) > +... > +(set (mem (plus (reg:SI SP_REGNUM) (const_int -28))) > + (reg:SI Rb+1)) > +(set (mem (plus (reg:SI SP_REGNUM) (const_int -32))) > + (reg:SI Rb)) > +(set (reg:SI SP_REGNUM) > + (plus (reg:SI SP_REGNUM) (const_int -32)))]) */ The "use"s here should be redundant, since the "set"s say that the registers are used. Same comment for: > + /* We need to provide a customized rtx which contains > + necessary information for data analysis, > + so we create a parallel rtx like this: > + (NOTE: We have to clearly claim that we are using/clobbering Rb and Re, > +otherwise it may be renamed by optimization.) > + (parallel [(unspec [(reg: Rb) > + (reg: Re) > + (const_int En4)] UNSPEC_STACK_POP_MULTIPLE) > +(use (reg:SI Rb)) > +(use (reg:SI Re)) > +(clobber (reg:SI Rb)) > +(clobber (reg:SI Re)) > +(set (reg:SI Rb) > + (mem (plus (reg:SI SP_REGNUM) (const_int 0 > +(set (reg:SI Rb+1) > + (mem (plus (reg:SI SP_REGNUM) (const_int 4 > +... > +(set (reg:SI Re) > + (mem (plus (reg:SI SP_REGNUM) (const_int 16 > +(set (r
Document x86-tune options
Hi, this patch adds documetnation to individual options and also adds vertical whitespace. Comitted as obvious, Honza * config/i386/x86-tune.def: Add documentation for each of the options; add whitespace. Index: config/i386/x86-tune.def === --- config/i386/x86-tune.def(revision 203012) +++ config/i386/x86-tune.def(working copy) @@ -24,207 +24,378 @@ see the files COPYING3 and COPYING.RUNTI work well with PPro base chips. */ DEF_TUNE (X86_TUNE_USE_LEAVE, "use_leave", m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC) + +/* X86_TUNE_PUSH_MEMORY: Enable generation of "push mem" instructions. + Some chips, like 486 and Pentium have problems with these sequences. */ DEF_TUNE (X86_TUNE_PUSH_MEMORY, "push_memory", m_386 | m_P4_NOCONA | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC) + +/* X86_TUNE_ZERO_EXTEND_WITH_AND: Use AND instruction instead + of mozbl/movwl. */ DEF_TUNE (X86_TUNE_ZERO_EXTEND_WITH_AND, "zero_extend_with_and", m_486 | m_PENT) + +/* X86_TUNE_UNROLL_STRLEN: Produce (quite lame) unrolled sequence for + inline strlen. This affects only -minline-all-stringops mode. By + default we always dispatch to a library since our internal strlen + is bad. */ DEF_TUNE (X86_TUNE_UNROLL_STRLEN, "unroll_strlen", m_486 | m_PENT | m_PPRO | m_ATOM | m_SLM | m_CORE_ALL | m_K6 | m_AMD_MULTIPLE | m_GENERIC) + /* X86_TUNE_BRANCH_PREDICTION_HINTS: Branch hints were put in P4 based on simulation result. But after P4 was made, no performance benefit was observed with branch hints. It also increases the code size. As a result, icc never generates branch hints. */ DEF_TUNE (X86_TUNE_BRANCH_PREDICTION_HINTS, "branch_prediction_hints", 0) + +/* X86_TUNE_DOUBLE_WITH_ADD: Use add instead of sal to double value in + an integer register. */ DEF_TUNE (X86_TUNE_DOUBLE_WITH_ADD, "double_with_add", ~m_386) + +/* X86_TUNE_USE_SAHF: Controls use of SAHF. */ DEF_TUNE (X86_TUNE_USE_SAHF, "use_sahf", m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC) + /* X86_TUNE_MOVX: Enable to zero extend integer registers to avoid partial dependencies. */ DEF_TUNE (X86_TUNE_MOVX, "movx", m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_GEODE | m_AMD_MULTIPLE | m_GENERIC) -/* X86_TUNE_PARTIAL_REG_STALL: We probably ought to watch for partial - register stalls on Generic32 compilation setting as well. However - in current implementation the partial register stalls are not eliminated + +/* X86_TUNE_PARTIAL_REG_STALL: Pentium pro, unlike later chips, handled + use of partial registers by renaming. This improved performance of 16bit + code where upper halves of registers are not used. It also leads to + an penalty whenever a 16bit store is followed by 32bit use. This flag + disables production of such sequences in common cases. + See also X86_TUNE_HIMODE_MATH. + + In current implementation the partial register stalls are not eliminated very well - they can be introduced via subregs synthesized by combine and can happen in caller/callee saving sequences. */ DEF_TUNE (X86_TUNE_PARTIAL_REG_STALL, "partial_reg_stall", m_PPRO) + +/* X86_TUNE_PARTIAL_FLAG_REG_STALL: this flag disables use of of flags + set by instructions affecting just some flags (in particular shifts). + This is because Core2 resolves dependencies on whole flags register + and such sequences introduce false dependency on previous instruction + setting full flags. + + The flags does not affect generation of INC and DEC that is controlled + by X86_TUNE_USE_INCDEC. + + This flag may be dropped from generic once core2-corei5 machines are + rare enough. */ DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STALL, "partial_flag_reg_stall", m_CORE2 | m_GENERIC) + /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall - * on 16-bit immediate moves into memory on Core2 and Corei7. */ + on 16-bit immediate moves into memory on Core2 and Corei7. */ DEF_TUNE (X86_TUNE_LCP_STALL, "lcp_stall", m_CORE_ALL | m_GENERIC) + +/* X86_TUNE_USE_HIMODE_FIOP: Enables use of x87 instructions with 16bit + integer operand. + FIXME: Why this is disabled for modern chips? */ DEF_TUNE (X86_TUNE_USE_HIMODE_FIOP, "use_himode_fiop", m_386 | m_486 | m_K6_GEODE) + +/* X86_TUNE_USE_SIMODE_FIOP: Enables use of x87 instructions with 32bit + integer operand. */ DEF_TUNE (X86_TUNE_USE_SIMODE_FIOP, "use_simode_fiop", ~(m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_SLM | m_AMD_MULTIPLE | m_GENERIC)) + +/* X86_TUNE_USE_MOV0: Use "mov $0, reg" instead of "xor reg, reg" to clear + integer register. */ DEF_TUNE (X86_TUNE_USE_MOV0, "use_mov0", m_K6) + +/* X86_TUNE_USE_CLTD: Controls use of CLTD and CTQO
Re: Add value range support into memcpy/memset expansion
> Hi Jan, > > > I also think the misaligned move trick can/should be performed by > > move_by_pieces and we ought to consider sane use of SSE - current > > vector_loop > > with unrolling factor of 4 seems bit extreme. At least buldozer is happy > > with > > 2 and I would expect SSE moves to be especially useful for moving blocks > > with > > known size where they are not used at all. > > > > Currently I disabled misaligned move prologues/epilogues for Michael's > > vector > > loop path since they ends up longer than the traditional code (that use loop > > for epilogue) > Prologues could use this techniques even with vector_loop, as they actually > don't have a loop. Were new prologues lose is the fact that we need to handle all sizes smaller than SIZE_NEEDED. This is 64bytes that leads to a variant for 32..64, 16..32, 8...16 4..8 and the tail. It is quite a lot of code. When block is known to be greater than 64, this is also a win but my current patch does not fine tune this, yet. Similarly misaligned moves are win when size is known, alignment is not performed and normal fixed size epiogue needs more than one move or when alignment is known but offset is non-zero. It will need bit of tweaking to handle all the paths well - it is usual problem with the stringops, they get way too complex as number of factors increase. It is why I think we may consider vector loop with less than 4 unrollings. AMD optimization manual recommends two for buldozer... Is there difference between 4 and two for Atom? To be honest I am not quite sure from where constant of 4 comes. I think I introduced it long time ago for K8 where it apparently got some extra % of performance. It is used for larger blocks only for PPro. AMD chips preffer it for small blocks apparently becuase they preffer loop-less sequence. > As for epilogues - have you tried to use misaligned vector_moves (movdqu)? It > looks to me that we need approx. the same amount of instructions in > vector-loop > and in usual-loop epilogues, if we use vector-instructions in vector-loop > epilogue. Yes, code is in place for this. You can just remove the check for size_needed being smaller than 32 and it will produce the movdqu sequence for you (I tested it on the vector loop testcase in the testusite). The code will also use SSE for unrolled_loop prologue expansion at least for memcpy (for memset it does not have broadcast value so it should skip it). > > > Comments are welcome. > BTW, maybe we could generalize expand_small_movmem a bit and make a separate > expanding strategy out of it? It will expand a memmov with no loop (and > probably no alignment prologue) - just with the biggest available moves. > Also, > a cost model could be added here to make decisions on when we actually want to > align the moves. Here is a couple of examples of that: > > memcpy (a, b, 32); // alignment is unknown > will expand to > movdqu a, %xmm0 > movdqu a+16, %xmm1 > movdqu %xmm0, b > movdqu %xmm1, b+16 > memcpy (a, b, 32); // alignment is known and equals 64bit > will expand to > a) > movdqu a, %xmm0 > movdqu a+16, %xmm1 > movdqu %xmm0, b > movdqu %xmm1, b+16 > or b) > movq a, %xmm0 > movdqa a+8, %xmm1 > movq a+24,%xmm2 > movq %xmm0, b > movdqa %xmm1, b+8 > movq %xmm2, b+24 > > We would compute the total cost of both variant and choose the best - for > computation we need just a cost of aligned and misaligned moves. > > This strategy is actually pretty similar to move_by_pieces, but as it have > much > more target-specific information, it would be able to produce much more > effective code. I was actually thinking more along lines of teaching move_by_pieces to do the tricks. It seems there is not that much of x86 specific knowledge in here and other architectures will also benefit from it. We can also enable it when value range is close enough. I plan to look into it today or tomorrow - revisit your old patch to move_by_pieces and see how much of extra API I need to get move_by_pieces to do what expand_small_movmem does. > > And one more question - in a work on vector_loop for memset I tried to merge > many of movmem and setmem routines (e.g. instead of expand_movmem_prologue and > expand_setmem_prologue I made a single routine > expand_movmem_or_setmem_prologue). What do you think, is it a good idea? It > reduces code size in i386.c, but slightly complicates the code. I'll send a > patch shortly, as soon as the testing completes. I would like to see it. I am not too thrilled by the duplication. My original motivation for that was to keep under control number of code paths thorugh the expanders. We already have many of them (and it is easy to get wrong code) as different variants of prologues/epilgoues and main loops are not exactly the same and thus the code is not as moduler as I would like. I am not sure if adding differences in between memset and memmove is not going to
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
On Fri, Sep 27, 2013 at 7:15 AM, Teresa Johnson wrote: > On Thu, Sep 26, 2013 at 3:02 PM, Jan Hubicka wrote: >>> >>> Why not just have probably_never_executed_bb_p return simply return >>> false bb->frequency is non-zero (right now it does the opposite - >> >> We want to have frequencies guessed for functions that was not trained >> in the profiling run (that was patch I posted earlier that I think did not >> go in, yet). > > Right, but for splitting and bb layout purposes, for these statically > guessed unprofiled functions we in fact don't want to do any splitting > or treat the bbs as never executed (which shouldn't be a change from > the status quo since all the bbs in these functions are currently 0 > weight, it's only when we inline in the case of comdats that they > appear colder than the surrounding code, but in fact we don't want > this). > > The only other caller to probably_never_executed_bb_p is > compute_function_frequency, but in the case of statically guessed > functions they will have profile_status != PROFILE_READ and won't > invoke probably_never_executed_bb_p. But re-reading our most recent > exchange on the comdat profile issue, it sounds like you were > suggesting guessing profiles for all 0-weight functions early, then > dropping them from PROFILE_READ to PROFILE_GUESSED only once we > determine in ipa-inline that there is a potentially non-zero call path > to them. In that case with the change I describe above to > probably_never_executed_bb_p, the 0-weight functions with 0 calls to > them will incorrectly be marked as NODE_FREQUENCY_NORMAL, which would > be bad as they would not be size optimized or moved into the cold > section. > > So it seems like we want different handling of these guessed > frequencies in compute_function_frequency and bb-reorder.c. Actually I > think we can handle this by checking if the function entry block has a > 0 count. If so, then we just look at the bb counts and not the > frequencies for determining bb hotness as the frequencies would > presumably have been statically-guessed. This will ensure that the > cgraph node continues to be marked unlikely and size-optimized. If the > function entry block has a non-zero count, then we look at both the bb > count and the bb frequency - if they are both zero then the bb is > probably never executed, but if either is non-zero then we should > treat the block as possibly executed (which will come into play for > splitting and bb layout). Here is a patch to handle the profile insanities conservatively during splitting. It also simplifies the probably_never_executed* code to treat missing counts within a profiled function differently (conservatively, based on frequency) from the case where the whole function has a guessed profile. That way, once a patch to guess profiles for non-executed functions is added, they will continue to have their nodes marked as unlikely. I also pulled the guts of the probably_never_executed_bb_p code out to a helper that is then invoked by both the bb and edge versions of this function, so they stay in sync. This gets rid of a number of the failures with splitting + the linker script to make the unlikely section non-executable. I have a patch to fix some jump threading insanities that I will send separately. Bootstrapped and regression tested on x86_64. Also tested with an lto profiledbootstrap. Ok for trunk? Thanks, Teresa 2013-09-29 Teresa Johnson * bb-reorder.c (find_rarely_executed_basic_blocks_and_crossing_edges): Treat profile insanities conservatively. * predict.c (probably_never_executed): New function. Treat profile insanities conservatively. (probably_never_executed_bb_p): Invoke probably_never_executed. (probably_never_executed_edge_p): Invoke probably_never_executed. Index: bb-reorder.c === --- bb-reorder.c(revision 202947) +++ bb-reorder.c(working copy) @@ -1564,8 +1564,25 @@ find_rarely_executed_basic_blocks_and_crossing_edg /* Mark which partition (hot/cold) each basic block belongs in. */ FOR_EACH_BB (bb) { + bool cold_bb = false; if (probably_never_executed_bb_p (cfun, bb)) { + /* Handle profile insanities created by upstream optimizations + by also checking the incoming edge weights. If there is a non-cold + incoming edge, conservatively prevent this block from being split + into the cold section. */ + cold_bb = true; + FOR_EACH_EDGE (e, ei, bb->preds) +{ + if (!probably_never_executed_edge_p (cfun, e)) +{ + cold_bb = false; + break; +} +} +} + if (cold_bb) +{ BB_SET_PARTITION (bb, BB_COLD_PARTITION); cold_bb_count++; } Index: predict.c ===
[PATCH, doc]: Fix "@anchor should not appear in @heading" warning
Hello! Rather trivial fix - put @anchor before @heading, as texi manual suggests. 2013-09-29 Uros Bizjak * doc/install.texi (Host/target specific installation notes for GCC): Put @anchor before @heading. Tested by "make doc" with texinfo 5.1 on Fedora 19. OK for mainline? Uros. Index: doc/install.texi === --- doc/install.texi(revision 203011) +++ doc/install.texi(working copy) @@ -3173,8 +3173,8 @@ @end html -@heading @anchor{alpha-x-x}alpha*-*-* - +@anchor{alpha-x-x} +@heading alpha*-*-* This section contains general configuration information for all alpha-based platforms using ELF (in particular, ignore this section for DEC OSF/1, Digital UNIX and Tru64 UNIX)@. In addition to reading this @@ -3188,7 +3188,8 @@ @html @end html -@heading @anchor{alpha-dec-osf51}alpha*-dec-osf5.1 +@anchor{alpha-dec-osf51} +@heading alpha*-dec-osf5.1 Systems using processors that implement the DEC Alpha architecture and are running the DEC/Compaq/HP Unix (DEC OSF/1, Digital UNIX, or Compaq/HP Tru64 UNIX) operating system, for example the DEC Alpha AXP systems. @@ -3201,14 +3202,15 @@ @html @end html -@heading @anchor{amd64-x-solaris210}amd64-*-solaris2.1[0-9]* - +@anchor{amd64-x-solaris210} +@heading amd64-*-solaris2.1[0-9]* This is a synonym for @samp{x86_64-*-solaris2.1[0-9]*}. @html @end html -@heading @anchor{arm-x-eabi}arm-*-eabi +@anchor{arm-x-eabi} +@heading arm-*-eabi ARM-family processors. Subtargets that use the ELF object format require GNU binutils 2.13 or newer. Such subtargets include: @code{arm-*-netbsdelf}, @code{arm-*-*linux-*} @@ -3217,8 +3219,8 @@ @html @end html -@heading @anchor{avr}avr - +@anchor{avr} +@heading avr ATMEL AVR-family micro controllers. These are used in embedded applications. There are no standard Unix configurations. @ifnothtml @@ -3254,8 +3256,8 @@ @html @end html -@heading @anchor{bfin}Blackfin - +@anchor{bfin} +@heading Blackfin The Blackfin processor, an Analog Devices DSP. @ifnothtml @xref{Blackfin Options,, Blackfin Options, gcc, Using the GNU Compiler @@ -3271,11 +3273,11 @@ @html @end html -@heading @anchor{cr16}CR16 +@anchor{cr16} +@heading CR16 +The CR16 CompactRISC architecture is a 16-bit architecture. This +architecture is used in embedded applications. -The CR16 CompactRISC architecture is a 16-bit architecture. This architecture is -used in embedded applications. - @ifnothtml @xref{CR16 Options,, CR16 Options, gcc, Using and Porting the GNU Compiler Collection (GCC)}, @@ -3288,14 +3290,14 @@ Use @samp{configure --target=cr16-elf --enable-languages=c,c++} to configure GCC@ for building a CR16 elf cross-compiler. -Use @samp{configure --target=cr16-uclinux --enable-languages=c,c++} to configure -GCC@ for building a CR16 uclinux cross-compiler. +Use @samp{configure --target=cr16-uclinux --enable-languages=c,c++} to +configure GCC@ for building a CR16 uclinux cross-compiler. @html @end html -@heading @anchor{cris}CRIS - +@anchor{cris} +@heading CRIS CRIS is the CPU architecture in Axis Communications ETRAX system-on-a-chip series. These are used in embedded applications. @@ -3329,8 +3331,8 @@ @html @end html -@heading @anchor{dos}DOS - +@anchor{dos} +@heading DOS Please have a look at the @uref{binaries.html,,binaries page}. You cannot install GCC by itself on MSDOS; it will not compile under @@ -3341,15 +3343,16 @@ @html @end html -@heading @anchor{epiphany-x-elf}epiphany-*-elf +@anchor{epiphany-x-elf} +@heading epiphany-*-elf Adapteva Epiphany. This configuration is intended for embedded systems. @html @end html -@heading @anchor{x-x-freebsd}*-*-freebsd* - +@anchor{x-x-freebsd} +@heading *-*-freebsd* Support for FreeBSD 1 was discontinued in GCC 3.2. Support for FreeBSD 2 (and any mutant a.out variants of FreeBSD 3) was discontinued in GCC 4.0. @@ -3386,7 +3389,8 @@ @html @end html -@heading @anchor{h8300-hms}h8300-hms +@anchor{h8300-hms} +@heading h8300-hms Renesas H8/300 series of processors. Please have a look at the @uref{binaries.html,,binaries page}. @@ -3399,7 +3403,8 @@ @html @end html -@heading @anchor{hppa-hp-hpux}hppa*-hp-hpux* +@anchor{hppa-hp-hpux} +@heading hppa*-hp-hpux* Support for HP-UX version 9 and older was discontinued in GCC 3.4. We require using gas/binutils on all hppa platforms. Version 2.19 or @@ -3451,8 +3456,8 @@ @html @end html -@heading @anchor{hppa-hp-hpux10}hppa*-hp-hpux10 - +@anchor{hppa-hp-hpux10} +@heading hppa*-hp-hpux10 For hpux10.20, we @emph{highly} recommend you pick up the latest sed patch @code{PHCO_19798} from HP@. @@ -3464,8 +3469,8 @@ @html @end html -@heading @anchor{hppa-hp-hpux11}hppa*-hp-hpux11 - +@anchor{hppa-hp-hpux11} +@heading hppa*-hp-hpux11 GCC 3.0 and up support HP-UX 11. GCC 2.95.x is not supported and cannot be used to compile GCC 3.0 and up. @@ -3575,8 +3580,8 @@ @html @end html -@heading @anchor
Re: [Patch, Darwin] update t-* and x-* fragments after switch to auto-deps.
On Sep 28, 2013, at 2:21 AM, Iain Sandoe wrote: > This updates the Darwin port {t,x}-* fragments after the switch to auto-deps > (thanks Tom!). > OK? Ok.
Re: [Patch, Darwin] improve cross-compiles.
On Sep 28, 2013, at 2:35 AM, Iain Sandoe wrote: > I've been experimenting with the idea of building native crosses on my most > capable machine, for the many variants of darwin we now have, and then using > the older/slower hardware for test only. > > This has uncovered a few issues with cross/native cross flags etc. > (i) that PIE is disabled for gcc exes on Darwin hosts since it is > incompatible with the current PCH implementation. Yup. > (ii) that -mdynamic-no-pic is used for m32 hosts. RIght. > OK for trunk? Ok.
Re: [Patch, Darwin/ppc] Fix altivec dwarf reg sizes.
On Sep 28, 2013, at 3:03 AM, Iain Sandoe wrote: > Anyway, after considerable debate about this and several approaches, here is > a patch that just ensures we set the altivec register size to its correct > value. > OK for trunk and open branches? Ok.
Re: [Patch, Darwin/PPC] fix PR10901
On Sep 28, 2013, at 3:26 AM, Iain Sandoe wrote: > We currently generate wrong code for non-local gotos which breaks, amongst > other things, nested functions. > I fixed this a while ago for x86 Darwin and here is a version to fix it on > PPC. > > OK for trunk? (and open branches?) Ok.
[SPARC] Add peephole for memory barriers
This is a bit gross, but this prevents consecutive memory barriers from being generated, for example on accesses to atomic objects in Ada. Tested on SPARC/Solaris, applied on the mainline. 2013-09-29 Eric Botcazou * config/sparc/sync.md: Add peephole for consecutive memory barriers. -- Eric BotcazouIndex: config/sparc/sync.md === --- config/sparc/sync.md (revision 202912) +++ config/sparc/sync.md (working copy) @@ -93,6 +93,18 @@ (define_insn "*membar" "membar\t%1" [(set_attr "type" "multi")]) +(define_peephole2 + [(set (match_operand:BLK 0 "" "") + (unspec:BLK [(match_dup 0) (match_operand:SI 1 "const_int_operand")] + UNSPEC_MEMBAR)) + (set (match_operand:BLK 2 "" "") + (unspec:BLK [(match_dup 2) (match_operand:SI 3 "const_int_operand")] + UNSPEC_MEMBAR))] + "" + [(set (match_operand:BLK 0 "" "") + (unspec:BLK [(match_dup 0) (match_dup 1)] UNSPEC_MEMBAR))] +{ operands[1] = GEN_INT (UINTVAL (operands[1]) | UINTVAL (operands[3])); }) + (define_expand "atomic_load" [(match_operand:I 0 "register_operand" "") (match_operand:I 1 "memory_operand" "")
Re: RFA [testsuite]: New ARC target specific tests
On Sep 28, 2013, at 5:40 AM, Joern Rennecke wrote: > This patch adds a number of tests for ARC target specific options. > > I'm a bit uncertain here if I still need approval for this patch. No, it is not required. Target maintainers can approve the usual tests suite patches. Everything in gcc.target/ certainly are in this set. They typical changes to .exp for target supports for example are fine, as are the typical patches in existing test cases to declare things like small stacks, low memory, no common support, 16 bit porting, 32 bit porting and so on… if in doubt, ask, we endeavor to be responsive to patches. > On the one hand the changes are all in an area that is normally within > the remit of a target maintainer, and patch to add the gcc.target/arc > directory has already been accepted. Though, technically, it should go in as the port goes in. :-) I did review the patch anyway, Ok.
[PATCH, doc]: Fix usage of @tie{} command
Hello! 2013-09-29 Uros Bizjak * doc/invoke.texi: Fix usage of @tie{} command. Tested with texinfo-5.1, committed to mainline SVN. Uros. Index: invoke.texi === --- invoke.texi (revision 203011) +++ invoke.texi (working copy) @@ -12047,7 +12047,7 @@ as needed before the operation. @item -If the device supports RAM larger than 64@tie{KiB} and the compiler +If the device supports RAM larger than 64@tie{}KiB and the compiler needs to change @code{RAMPZ} to accomplish an operation, @code{RAMPZ} is reset to zero after the operation. @@ -12057,7 +12057,7 @@ zero in case the ISR code might (implicitly) use it. @item -RAM larger than 64@tie{KiB} is not supported by GCC for AVR targets. +RAM larger than 64@tie{}KiB is not supported by GCC for AVR targets. If you use inline assembler to read from locations outside the 16-bit address range and change one of the @code{RAMP} registers, you must reset it to zero after the access.
Re: Remove algo logic duplication Round 3
Hi, "François Dumont" ha scritto: >I also get your remark about the open round bracket, I didn't know that > >round bracket was the other name for parentheses ! I also fix the one >you pointed me, I will be more careful next time. No problem ;) For your linguistic curiosity, I often find myself using round bracket: I find it more precise, less vague. Not 100% sure about french, but in italian we of course have the equivalent of parentheses but it's generic, kids soon learn about round, square and curly variants. English is probably more precise, it usefully provides 3 different words: parentheses, brackets and braces, which often, outside a technical context, should be clear enough, without further qualifications. Thanks! Paolo
[PATCH, doc]: Fix "@tex should only appear at a line beginning'" warnings
Hello! Attached patch decorates mail addresses with @mail{} to avoid "@tex should only appear at a line beginning'" warnings. Also the patch adds @uref and http:// prefix for a webpage. 2013-09-29 Uros Bizjak * doc/gcc.texi (titlepage): Use @uref and http:// prefix for website. Use @email for email addresses. Tested with make doc on texinfo 5.1 Fedora 19. Uros. Index: gcc.texi === --- gcc.texi(revision 203015) +++ gcc.texi(working copy) @@ -83,11 +83,11 @@ Published by: @multitable @columnfractions 0.5 0.5 @item GNU Press -@tab Website: www.gnupress.org +@tab Website: @uref{http://www.gnupress.org} @item a division of the -@tab General: @tex press@@gnu.org @end tex +@tab General: @email{press@@gnu.org} @item Free Software Foundation -@tab Orders: @tex sales@@gnu.org @end tex +@tab Orders: @email{sales@@gnu.org} @item 51 Franklin Street, Fifth Floor @tab Tel 617-542-5942 @item Boston, MA 02110-1301 USA
Re: [PATCH] Invalid unpoisoning of stack redzones on ARM
> Can you please be more verbose Right, I should have been. So as you can see from the asm log in the bug description, prologue writes shadow bytes corresponding to words at frame_shadow_base + { 0, 4, 8, 12, 16, 24, 28}. Epilogue should clear those but instead it zeros out frame_shadow_base + { 0, 4, 8, 12, 16, 40, 44}, thus causing words at frame_shadow_base + {24, 28} to remain poisoned and causing false Asan errors later. The reason as I see it is that we change the address of shadow_mem in asan_emit_stack_protection twice: once in asan_clear_shadow tmp = expand_simple_binop (Pmode, PLUS, addr, gen_int_mode (4, Pmode), addr, true, OPTAB_LIB_WIDEN); if (tmp != addr) emit_move_insn (addr, tmp); and asan_emit_stack_protection itself: if (last_size) { shadow_mem = adjust_address (shadow_mem, VOIDmode, (last_offset - prev_offset) >> ASAN_SHADOW_SHIFT); This would translate into add r4, r4, #4 and add r3, r4, #24 in the asm. The shadow_mem will thus have the next block offset added to it _twice_ and will point to invalid position. My simple fix uses a temp register in asan_clear_shadow to avoid spoiling the shadow_mem inside the loop. I'm not yet a gcc guru so I wanted some experienced person to say whether I'm doing something completely wrong here. BTW I forgot to mention that Asan tests pass both on ARM and on x86_64. > Also, you are missing a ChangeLog entry. Attached. -Y 2013-09-30 Yury Gribov PR sanitizer/58543 * asan.c: Use new temporary for iteration in asan_clear_shadow.
RE: [PATCH]Fix computation of offset in ivopt
> -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Friday, September 27, 2013 4:30 PM > To: Bin Cheng > Cc: GCC Patches > Subject: Re: [PATCH]Fix computation of offset in ivopt > > On Fri, Sep 27, 2013 at 7:07 AM, bin.cheng wrote: > > > > > > case INTEGER_CST: > > //... > > *offset = int_cst_value (expr); > > change to > > case INTEGER_CST: > > //... > > *offset = sext_hwi (int_cst_value (expr), type); > > > > and > > case MULT_EXPR: > > //... > > *offset = sext_hwi (int_cst_value (expr), type); to > > case MULT_EXPR: > > //... > > HOST_WIDE_INT xxx = (HOST_WIDE_INT)off0 * int_cst_value (op1); > > *offset = sext_hwi (xxx, type); > > > > Any comments? > > The issue is of course that we end up converting offsets to sizetype at some > point which makes them all appear unsigned. The fix for this is to simply > interpret them as signed ... but it's really a mess ;) > Hi, this is updated patch which calculates signed offset in strip_offset_1 then sign extend it in strip_offset. Bootstrap and test on x86_64/x86/arm. Is it OK? Thanks. bin 2013-09-30 Bin Cheng * tree-ssa-loop-ivopts.c (strip_offset_1): Change parameter type. Count DECL_FIELD_BIT_OFFSET when computing offset for COMPONENT_REF. (strip_offset): Sign extend before return.Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 202599) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -2037,12 +2037,12 @@ find_interesting_uses (struct ivopts_data *data) static tree strip_offset_1 (tree expr, bool inside_addr, bool top_compref, - unsigned HOST_WIDE_INT *offset) + HOST_WIDE_INT *offset) { tree op0 = NULL_TREE, op1 = NULL_TREE, tmp, step; enum tree_code code; tree type, orig_type = TREE_TYPE (expr); - unsigned HOST_WIDE_INT off0, off1, st; + HOST_WIDE_INT off0, off1, st; tree orig_expr = expr; STRIP_NOPS (expr); @@ -2133,19 +2133,26 @@ strip_offset_1 (tree expr, bool inside_addr, bool break; case COMPONENT_REF: - if (!inside_addr) - return orig_expr; + { + tree field; + HOST_WIDE_INT boffset = 0; + if (!inside_addr) + return orig_expr; - tmp = component_ref_field_offset (expr); - if (top_compref - && cst_and_fits_in_hwi (tmp)) - { - /* Strip the component reference completely. */ - op0 = TREE_OPERAND (expr, 0); - op0 = strip_offset_1 (op0, inside_addr, top_compref, &off0); - *offset = off0 + int_cst_value (tmp); - return op0; - } + tmp = component_ref_field_offset (expr); + field = TREE_OPERAND (expr, 1); + if (top_compref + && cst_and_fits_in_hwi (tmp) + && cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) + { + /* Strip the component reference completely. */ + op0 = TREE_OPERAND (expr, 0); + op0 = strip_offset_1 (op0, inside_addr, top_compref, &off0); + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); + *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; + return op0; + } + } break; case ADDR_EXPR: @@ -2196,7 +2203,16 @@ strip_offset_1 (tree expr, bool inside_addr, bool static tree strip_offset (tree expr, unsigned HOST_WIDE_INT *offset) { - return strip_offset_1 (expr, false, false, offset); + HOST_WIDE_INT off; + tree core = strip_offset_1 (expr, false, false, &off); + + /* Sign extend off if expr is in type which has lower precision + than HOST_WIDE_INT. */ + if (TYPE_PRECISION (TREE_TYPE (expr)) <= HOST_BITS_PER_WIDE_INT) +off = sext_hwi (off, TYPE_PRECISION (TREE_TYPE (expr))); + + *offset = off; + return core; } /* Returns variant of TYPE that can be used as base for different uses.
RE: [PATCH]Fix computation of offset in ivopt
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Oleg Endo > Sent: Wednesday, September 25, 2013 1:41 AM > To: Richard Biener > Cc: Bin Cheng; GCC Patches > Subject: Re: [PATCH]Fix computation of offset in ivopt > > > > > After reading "overflow" and "ivopt", I was wondering whether > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55190 is somehow related. No, this patch is irrelevant. But I do have some thought on the pr55190 and will follow in the bug entry. Thanks. bin