Re: [PATCH] Add an intermediate coverage format for gcov
Sorry about the badly formatted mail. Here is another version with a different mailer. -Sharad This patch was earlier submitted to google/main, but I propose it for the trunk as well. This patch adds an intermediate coverage format (enabled via 'gcov -i'). This is a compact format as it does not require source files. The new option ('gcov -i') outputs .gcov files in an intermediate text format that can be postprocessed by lcov or other applications. It will output a single *.gcov file per *.gcda file. No source code is required. The format of the intermediate .gcov file is plain text with one entry per line SF:source_file_name FN:line_number,function_name FNDA:execution_count,function_name BA:line_num,branch_coverage_type DA:line number,execution_count Where the branch_coverage_type is 0 (Branch not executed) 1 (Branch executed, but not taken) 2 (Branch executed and taken) There can be multiple SF entries in an intermediate gcov file. All entries following SF pertain to that source file until the next SF entry. A concrete example looks like this: SF:array.c FN:4,sum FNDA:0,sum FN:13,main FNDA:1,main DA:4,2 BA:8,2 DA:7,1 I have bootstrapped and tested this patch on x86_64-linux. No new test failures were observed. Okay for trunk? Thanks, Sharad 2011-10-04 Sharad Singhai * doc/gcov.texi: Document gcov intermediate format. * gcov.c (print_usage): Handle new option. (process_args): Handle new option. (get_gcov_file_intermediate_name): New function. (output_intermediate_file): New function. (generate_results): Handle new option. * testsuite/lib/gcov.exp: Handle intermediate format. * testsuite/g++.dg/gcov/gcov-8.C: New testcase. Index: doc/gcov.texi === --- doc/gcov.texi (revision 179475) +++ doc/gcov.texi (working copy) @@ -130,6 +130,7 @@ gcov [@option{-v}|@option{--version}] [@ [@option{-f}|@option{--function-summaries}] [@option{-o}|@option{--object-directory} @var{directory|file}] @var{sourcefiles} [@option{-u}|@option{--unconditional-branches}] + [@option{-i}|@option{--intermediate-format}] [@option{-d}|@option{--display-progress}] @c man end @c man begin SEEALSO @@ -216,6 +217,32 @@ Unconditional branches are normally not @itemx --display-progress Display the progress on the standard output. +@item -i +@itemx --intermediate-format +Output gcov file in an intermediate text format that can be used by +@command{lcov} or other applications. It will output a single *.gcov file per +*.gcda file. No source code is required. + +The format of the intermediate @file{.gcov} file is plain text with +one entry per line + +@smallexample +SF:@var{source_file_name} +FN:@var{line_number},@var{function_name} +FNDA:@var{execution_count},@var{function_name} +BA:@var{line_num},@var{branch_coverage_type} +DA:@var{line number},@var{execution_count} + +Where the @var{branch_coverage_type} is + 0 (Branch not executed) + 1 (Branch executed, but not taken) + 2 (Branch executed and taken) + +There can be multiple SF entries in an intermediate gcov file. All +entries following SF pertain to that source file until the next SF +entry. +@end smallexample + @end table @command{gcov} should be run with the current directory the same as that Index: gcov.c === --- gcov.c (revision 179475) +++ gcov.c (working copy) @@ -39,6 +39,7 @@ along with Gcov; see the file COPYING3. #include "intl.h" #include "diagnostic.h" #include "version.h" +#include "demangle.h" #include @@ -311,6 +312,9 @@ static int flag_gcov_file = 1; static int flag_display_progress = 0; +/* Output *.gcov file in intermediate format used by 'lcov'. */ +static int flag_intermediate_format = 0; + /* For included files, make the gcov output file name include the name of the input source file. For example, if x.h is included in a.c, then the output file name is a.c##x.h.gcov instead of x.h.gcov. */ @@ -436,6 +440,11 @@ print_usage (int error_p) fnotice (file, " -o, --object-directory DIR|FILE Search for object files in DIR or called FILE\n"); fnotice (file, " -p, --preserve-pathsPreserve all pathname components\n"); fnotice (file, " -u, --unconditional-branchesShow unconditional branch counts too\n"); + fnotice (file, " -i, --intermediate-format Output .gcov file in an intermediate text\n\ +format that can be used by 'lcov' or other\n\ +applications. It will output a single\n\ +.gcov file per .gcda file. No source file\n\ +is required.\n"); fnotice (file, " -d, --display-progress Display progress information\n");
[PATCH, i386]: Introduce ix86_emit_binop
Hello! No functional change. 2011-10-05 Uros Bizjak * config/i386/i386.c (ix86_emit_binop): New static function. (ix86_split_lea_for_addr): Use ix86_emit_binop to emit add and shl instructions. (x86_output_mi_thunk): Use ix86_emit_binop to emit add instructions. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: i386.c === --- i386.c (revision 179535) +++ i386.c (working copy) @@ -16470,6 +16470,21 @@ ix86_avoid_lea_for_addr (rtx insn, rtx operands[]) return !ix86_lea_outperforms (insn, regno0, regno1, regno2, split_cost); } +/* Emit x86 binary operand CODE in mode MODE, where the first operand + matches destination. RTX includes clobber of FLAGS_REG. */ + +static void +ix86_emit_binop (enum rtx_code code, enum machine_mode mode, +rtx dst, rtx src) +{ + rtx op, clob; + + op = gen_rtx_SET (VOIDmode, dst, gen_rtx_fmt_ee (code, mode, dst, src)); + clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG)); + + emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, op, clob))); +} + /* Split lea instructions into a sequence of instructions which are executed on ALU to avoid AGU stalls. It is assumed that it is allowed to clobber flags register @@ -16482,8 +16497,7 @@ ix86_split_lea_for_addr (rtx operands[], enum mach unsigned int regno1 = INVALID_REGNUM; unsigned int regno2 = INVALID_REGNUM; struct ix86_address parts; - rtx tmp, clob; - rtvec par; + rtx tmp; int ok, adds; ok = ix86_decompose_address (operands[1], &parts); @@ -16515,14 +16529,7 @@ ix86_split_lea_for_addr (rtx operands[], enum mach gcc_assert (regno2 != regno0); for (adds = parts.scale; adds > 0; adds--) - { - tmp = gen_rtx_PLUS (mode, operands[0], parts.index); - tmp = gen_rtx_SET (VOIDmode, operands[0], tmp); - clob = gen_rtx_CLOBBER (VOIDmode, - gen_rtx_REG (CCmode, FLAGS_REG)); - par = gen_rtvec (2, tmp, clob); - emit_insn (gen_rtx_PARALLEL (VOIDmode, par)); - } + ix86_emit_binop (PLUS, mode, operands[0], parts.index); } else { @@ -16531,30 +16538,14 @@ ix86_split_lea_for_addr (rtx operands[], enum mach emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.index)); /* Use shift for scaling. */ - tmp = gen_rtx_ASHIFT (mode, operands[0], - GEN_INT (exact_log2 (parts.scale))); - tmp = gen_rtx_SET (VOIDmode, operands[0], tmp); - clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG)); - par = gen_rtvec (2, tmp, clob); - emit_insn (gen_rtx_PARALLEL (VOIDmode, par)); + ix86_emit_binop (ASHIFT, mode, operands[0], + GEN_INT (exact_log2 (parts.scale))); if (parts.base) - { - tmp = gen_rtx_PLUS (mode, operands[0], parts.base); - tmp = gen_rtx_SET (VOIDmode, operands[0], tmp); - clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG)); - par = gen_rtvec (2, tmp, clob); - emit_insn (gen_rtx_PARALLEL (VOIDmode, par)); - } + ix86_emit_binop (PLUS, mode, operands[0], parts.base); if (parts.disp && parts.disp != const0_rtx) - { - tmp = gen_rtx_PLUS (mode, operands[0], parts.disp); - tmp = gen_rtx_SET (VOIDmode, operands[0], tmp); - clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG)); - par = gen_rtvec (2, tmp, clob); - emit_insn (gen_rtx_PARALLEL (VOIDmode, par)); - } + ix86_emit_binop (PLUS, mode, operands[0], parts.disp); } } else if (!parts.base && !parts.index) @@ -16565,41 +16556,32 @@ ix86_split_lea_for_addr (rtx operands[], enum mach else { if (!parts.base) - { -if (regno0 != regno2) - emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.index)); - } + { + if (regno0 != regno2) + emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.index)); + } else if (!parts.index) - { -if (regno0 != regno1) - emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.base)); - } - else - { - if (regno0 == regno1) - tmp = gen_rtx_PLUS (mode, operands[0], parts.index); - else if (regno0 == regno2) - tmp = gen_rtx_PLUS (mode, operands[0], parts.base); - else - { + { + if (regno0 != regno1) emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.base)); - tmp = gen_rtx_PLUS (mode, operands[0], parts.index); - } + } + else + { + if (regno0 == regno1) + tmp = parts.inde
Re: [PATCH] Handle side-effects of EH optimizations of callees
On Tue, Oct 4, 2011 at 9:02 PM, Maxim Kuvyrkov wrote: > On 5/10/2011, at 1:49 AM, Richard Guenther wrote: > >> On Tue, Oct 4, 2011 at 9:17 AM, Maxim Kuvyrkov >> wrote: >>> Richard, >>> >>> The following patch fixes a CFG consistency problem. >>> >>> When early IPA optimizations (e.g., early SRA) create a version of a >>> function that no longer throws, versioning machinery deletes EH landings >>> pads in _callers_ of the function [*]. However, the EH cfg edges are not >>> updated in the callers, which eventually leads to an ICE in verify_eh_edges. >> >> It needs to update EH edges in the callers via the common idiom >> >> if (maybe_clean_or_replace_eh_stmt (stmt, stmt) >> && gimple_purge_dead_eh_edges (gimple_bb (stmt))) >> ... arrange for cfg-cleanup to be run ... >> >> where the cfg-cleanup part probably explains why it doesn't. So one option >> is to not delete the landing pads. > > There are several places in the compiler where code does not the above idiom > and calls maybe_clean[_or_replace]_eh_stmt without following it up with > cfg-cleanup. > >> Where does it do that? > > The cases that I investigated are: > - cgraphunit.c: update_call_expr(). The right thing to do /seems/ to be to > remove call to maybe_clean_eh_stmt_fn(). Ok, that simply replaces the fndecl of a call statement, so yes, removing the maybe_clean_eh_stmt_fn call should fix this case. But then you might run into IL verification issues that you have a stmt with EH info that cannot throw ... Basically these kind of local IPA passes are bad :/ > - ipa-prop.c: ipa_modify_call_arguments() -> gsi_replace(). The right thing > to do /seems/ to be to add follow up call to gimple_purge_dead_eh_edges(). I'd say call gsi_replace with , false and copy EH info manually and unconditionally here. > Both cases above are triggered from early SRA. But maybe also used by other real IPA passes? > The other places that appear not to update CFG after calling > maybe_clean_or_replace_eh_stmt are: > - tree-cfg.c: replace_uses_by() > - tree-ssa-dce.c: eliminate_unnecessary_stmts(). > >> I suppose it >> merely fails to properly transfer the EH info from the old to the new call >> (thus, fails to forcefully do what maybe_duplicate_eh_stmt does). > > I don't follow you here. The EH CFG edges that get out-of-date are in > _caller_ function. It is the _callee_ function that is duplicated. Yes, what I mean is if a stmt is replaced in the caller then it will not copy EH info if the new call cannot throw. But as we are not updating the CFG we have to preserve EH info nevertheless (and allow "dead" EH info in the verifiers). The alternative is to, for example in update_call_expr (), remove dead EH edges, switch cfun to the caller context, call cfg_cleanup, switch back. Expensive (but maybe doesn't happen often enough to worry?). I wonder why we end up, at IPA SRA time, knowing the cloned function won't throw anymore. I suppose we don't. Do we? Thanks, Richard. > Thank you, > > -- > Maxim Kuvyrkov > CodeSourcery / Mentor Graphics > >
[PATCH] Use widening_optab_handler when expanding highpart mults
Hi, the optab_handler uses in expand_mult_highpart_optab haven't been replaced with the widening_optab_handler yet. Fixed with attached patch. Tested on s390x and x86_64. Bye, -Andreas- 2011-10-05 Andreas Krebbel * expmed.c (expand_mult_highpart_optab): Replace optab_handler with the new widening_optab_handler. Index: gcc/expmed.c === *** gcc/expmed.c.orig --- gcc/expmed.c *** expand_mult_highpart_optab (enum machine *** 3467,3473 /* Try widening multiplication. */ moptab = unsignedp ? umul_widen_optab : smul_widen_optab; ! if (optab_handler (moptab, wider_mode) != CODE_FOR_nothing && mul_widen_cost[speed][wider_mode] < max_cost) { tem = expand_binop (wider_mode, moptab, op0, narrow_op1, 0, --- 3467,3473 /* Try widening multiplication. */ moptab = unsignedp ? umul_widen_optab : smul_widen_optab; ! if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing && mul_widen_cost[speed][wider_mode] < max_cost) { tem = expand_binop (wider_mode, moptab, op0, narrow_op1, 0, *** expand_mult_highpart_optab (enum machine *** 3504,3510 /* Try widening multiplication of opposite signedness, and adjust. */ moptab = unsignedp ? smul_widen_optab : umul_widen_optab; ! if (optab_handler (moptab, wider_mode) != CODE_FOR_nothing && size - 1 < BITS_PER_WORD && (mul_widen_cost[speed][wider_mode] + 2 * shift_cost[speed][mode][size-1] + 4 * add_cost[speed][mode] < max_cost)) --- 3504,3510 /* Try widening multiplication of opposite signedness, and adjust. */ moptab = unsignedp ? smul_widen_optab : umul_widen_optab; ! if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing && size - 1 < BITS_PER_WORD && (mul_widen_cost[speed][wider_mode] + 2 * shift_cost[speed][mode][size-1] + 4 * add_cost[speed][mode] < max_cost))
Re: New warning for expanded vector operations
On Wed, Oct 5, 2011 at 12:18 AM, Artem Shinkarov wrote: > Hi > > Here is a patch to inform a programmer about the expanded vector operation. > Bootstrapped on x86-unknown-linux-gnu. > > ChangeLog: > > * gcc/tree-vect-generic.c (expand_vector_piecewise): Adjust to > produce the warning. > (expand_vector_parallel): Adjust to produce the warning. Entries start without gcc/, they are relative to the gcc/ChangeLog file. > (lower_vec_shuffle): Adjust to produce the warning. > * gcc/common.opt: New warning Wvector-operation-expanded. > * gcc/doc/invoke.texi: Document the wawning. > > > Ok? I don't like the name -Wvector-operation-expanded. We emit a similar warning for missed inline expansions with -Winline, so maybe -Wvector-extensions (that's the name that appears in the C extension documentation). + location_t loc = gimple_location (gsi_stmt (*gsi)); + + warning_at (loc, OPT_Wvector_operation_expanded, + "vector operation will be expanded piecewise"); v = VEC_alloc(constructor_elt, gc, (nunits + delta - 1) / delta); for (i = 0; i < nunits; @@ -260,6 +264,10 @@ expand_vector_parallel (gimple_stmt_iter tree result, compute_type; enum machine_mode mode; int n_words = tree_low_cst (TYPE_SIZE_UNIT (type), 1) / UNITS_PER_WORD; + location_t loc = gimple_location (gsi_stmt (*gsi)); + + warning_at (loc, OPT_Wvector_operation_expanded, + "vector operation will be expanded in parallel"); what's the difference between 'piecewise' and 'in parallel'? @@ -301,16 +309,15 @@ expand_vector_addition (gimple_stmt_iter { int parts_per_word = UNITS_PER_WORD / tree_low_cst (TYPE_SIZE_UNIT (TREE_TYPE (type)), 1); + location_t loc = gimple_location (gsi_stmt (*gsi)); if (INTEGRAL_TYPE_P (TREE_TYPE (type)) && parts_per_word >= 4 && TYPE_VECTOR_SUBPARTS (type) >= 4) -return expand_vector_parallel (gsi, f_parallel, - type, a, b, code); +return expand_vector_parallel (gsi, f_parallel, type, a, b, code); else -return expand_vector_piecewise (gsi, f, - type, TREE_TYPE (type), - a, b, code); +return expand_vector_piecewise (gsi, f, type, + TREE_TYPE (type), a, b, code); } /* Check if vector VEC consists of all the equal elements and unless i miss something loc is unused here. Please avoid random whitespace changes (just review your patch yourself before posting and revert pieces that do nothing). +@item -Wvector-operation-expanded +@opindex Wvector-operation-expanded +@opindex Wno-vector-operation-expanded +Warn if vector operation is not implemented via SIMD capabilities of the +architecture. Mainly useful for the performance tuning. I'd mention that this is for vector operations as of the C extension documented in "Vector Extensions". The vectorizer can produce some operations that will need further lowering - we probably should make sure to _not_ warn about those. Try running the vect.exp testsuite with the new warning turned on (eventually disabling SSE), like with obj/gcc> make check-gcc RUNTESTFLAGS="--target_board=unix/-Wvector-extensions/-mno-sse vect.exp" > P.S. It is hard to write a reasonable testcase for the patch, because > one needs to guess which architecture would expand a given vector > operation. But the patch is trivial. You can create an aritificial large vector type for example, or put a testcase under gcc.target/i386 and disable SSE. We should have a testcase for this. Thanks, Richard.
Re: Avoid optimized out references to appear in lto symbol table
> Jan Hubicka writes: > > > Hi, > > GNU LD has bug about reporting references to hidden symbol sthat has been > > optimized out. > > This however made me notice that we do output into LTO symbol tables things > > that do > > not belong there. In partiuclar we often output extern inline functions > > that has been > > fully inlined by early opts. > > > > This patch solves the problem by running cgraph unreachable function > > rmeoval before > > IPA (currently it is done only before early opts, not after). > > Will this solve the case of > > if (expression known at compile time) (but not constant expression) > this_is_a_static_assert_that_failed() > > ? Only if the expression is known at compile time early enough (i.e. known to the early optimizers). I would say that most of such expressions should be. I would be interested in seeing testcases where we still have problems. Honza > I had a lot of problems with that and had to add various dummy stubs > just work around this problem. > > -Andi > -- > a...@linux.intel.com -- Speaking for myself only
Re: [PATCH] Use widening_optab_handler when expanding highpart mults
On Wed 05 Oct 2011 09:33:09 BST, Andreas Krebbel wrote: Hi, the optab_handler uses in expand_mult_highpart_optab haven't been replaced with the widening_optab_handler yet. Apologies, I don't know how I missed that one? :( Andrew
Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
On Tue, Oct 4, 2011 at 10:58 PM, Richard Henderson wrote: > On 10/04/2011 01:17 PM, Tom de Vries wrote: >> In general, to fold vlas (which are lowered to allocas) to normal >> declarations, >> if the alloca argument is constant. > > Ah. Ok, I suppose. How often are you seeing this happening? I can imagine > a few instances via inlining, but even there not so much... > >> Any guidance on what to do if we have to expand the >> __builtin_alloca_with_align >> to a function call, specifically, with the second argument? This is currently >> handled like this, which is not very nice: > > Don't do anything special. Just let it fall through as with alloca. This > should > never happen, except for stupid user tricks like > -fno-builtin-alloca_with_align. > And if the user does that, they get what they deserve wrt needing to implement > that function in their runtime. Yes, especially as we only use builtin_alloca_with_align for VLAs. Setting the assembler name to alloca might be nice to users at least, the excess argument shouldn't be any problem (well, hopefully targets don't have a special ABI for alloca ...) Richard. > > r~ >
Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries wrote: > On 10/04/2011 03:03 PM, Richard Guenther wrote: >> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries wrote: >>> On 10/01/2011 05:46 PM, Tom de Vries wrote: On 09/30/2011 03:29 PM, Richard Guenther wrote: > On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries > wrote: >> On 09/28/2011 11:53 AM, Richard Guenther wrote: >>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries >>> wrote: Richard, I got a patch for PR50527. The patch prevents the alignment of vla-related allocas to be set to BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding the alloca. Bootstrapped and regtested on x86_64. OK for trunk? >>> >>> Hmm. As gfortran with -fstack-arrays uses VLAs it's probably bad that >>> the vectorizer then will no longer see that the arrays are properly >>> aligned. >>> >>> I'm not sure what the best thing to do is here, other than trying to >>> record >>> the alignment requirement of the VLA somewhere. >>> >>> Forcing the alignment of the alloca replacement decl to >>> BIGGEST_ALIGNMENT >>> has the issue that it will force stack-realignment which isn't free >>> (and the >>> point was to make the decl cheaper than the alloca). But that might >>> possibly be the better choice. >>> >>> Any other thoughts? >> >> How about the approach in this (untested) patch? Using the DECL_ALIGN of >> the vla >> for the new array prevents stack realignment for folded vla-allocas, >> also for >> large vlas. >> >> This will not help in vectorizing large folded vla-allocas, but I think >> it's not >> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that >> has >> been the case up until we started to fold). If you want to trigger >> vectorization >> for a vla, you can still use the aligned attribute on the declaration. >> >> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also >> without using >> an attribute on the decl. This patch exploits this by setting it at the >> end of >> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in >> propagation though, because although the ptr_info of the lhs is >> propagated via >> copy_prop afterwards, it's not propagated anymore via ccp. >> >> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of >> ccp2 and >> not fold during ccp3. > > Ugh, somehow I like this the least ;) > > How about lowering VLAs to > > p = __builtin_alloca (...); > p = __builtin_assume_aligned (p, DECL_ALIGN (vla)); > > and not assume anything for alloca itself if it feeds a > __builtin_assume_aligned? > > Or rather introduce a __builtin_alloca_with_align () and for VLAs do > > p = __builtin_alloca_with_align (..., DECL_ALIGN (vla)); > > that's less awkward to use? > > Sorry for not having a clear plan here ;) > Using assume_aligned is a more orthogonal way to represent this in gimple, but indeed harder to use. Another possibility is to add a 'tree vla_decl' field to struct gimple_statement_call, which is probably the easiest to implement. But I think __builtin_alloca_with_align might have a use beyond vlas, so I decided to try this one. Attached patch implements my first stab at this (now testing on x86_64). Is this an acceptable approach? >>> >>> bootstrapped and reg-tested (including ada) on x86_64. >>> >>> Ok for trunk? >> >> The idea is ok I think. But >> >> case BUILT_IN_ALLOCA: >> + case BUILT_IN_ALLOCA_WITH_ALIGN: >> /* If the allocation stems from the declaration of a variable-sized >> object, it cannot accumulate. */ >> target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp)); >> if (target) >> return target; >> + if (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) >> + == BUILT_IN_ALLOCA_WITH_ALIGN) >> + { >> + tree new_call = build_call_expr_loc (EXPR_LOCATION (exp), >> + >> built_in_decls[BUILT_IN_ALLOCA], >> + 1, CALL_EXPR_ARG (exp, 0)); >> + CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp); >> + exp = new_call; >> + } >> >> Ick. Why can't the rest of the compiler not handle >> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA? >> (thus, arrange things so the assembler name of alloca-with-align is alloca?) >> > > We can set the assembler name in the local_define_builtin call. But that will > still create a call alloca (12, 4). How do we deal with the second argument? > >> I don't see why you
Re: [PATCH 2/2] allow certain kinds of inputs to top level asm()-s
we have, like specifying the set of symbols _defined_ by a toplevel asm, right? I might misremember but sth like extern void foo (void); asm("" "foo"); was supposed to do the trick. Or should we treat those as outputs (given you use inputs for symbol uses)? I don't recall any discussion of how to deal with symbols defined by a top level asm - I was just asked to follow the "normal" asm syntax in having two colons in the middle instead of one as I had originally (not expecting any use for outputs here). Honza, do you remember if we decided on anything here? To be honest, I don't think we actually dicussed some particular syntax or my memory is even worse ;). We will need to make one. In the above example, I don't think quotes about foo is needed. We actualy provide definition of the declaration, so it should be just the decl itself, not a string. Honza Thanks, Richard.
Re: [PATCH] Handle side-effects of EH optimizations of callees
> On Tue, Oct 4, 2011 at 9:02 PM, Maxim Kuvyrkov wrote: > > On 5/10/2011, at 1:49 AM, Richard Guenther wrote: > > > >> On Tue, Oct 4, 2011 at 9:17 AM, Maxim Kuvyrkov > >> wrote: > >>> Richard, > >>> > >>> The following patch fixes a CFG consistency problem. > >>> > >>> When early IPA optimizations (e.g., early SRA) create a version of a > >>> function that no longer throws, versioning machinery deletes EH landings > >>> pads in _callers_ of the function [*]. However, the EH cfg edges are not > >>> updated in the callers, which eventually leads to an ICE in > >>> verify_eh_edges. > >> > >> It needs to update EH edges in the callers via the common idiom > >> > >> if (maybe_clean_or_replace_eh_stmt (stmt, stmt) > >> && gimple_purge_dead_eh_edges (gimple_bb (stmt))) > >> ... arrange for cfg-cleanup to be run ... > >> > >> where the cfg-cleanup part probably explains why it doesn't. So one option > >> is to not delete the landing pads. > > > > There are several places in the compiler where code does not the above > > idiom and calls maybe_clean[_or_replace]_eh_stmt without following it up > > with cfg-cleanup. > > > >> Where does it do that? > > > > The cases that I investigated are: > > - cgraphunit.c: update_call_expr(). The right thing to do /seems/ to be to > > remove call to maybe_clean_eh_stmt_fn(). > > Ok, that simply replaces the fndecl of a call statement, so yes, removing > the maybe_clean_eh_stmt_fn call should fix this case. But then you might > run into IL verification issues that you have a stmt with EH info that cannot > throw ... > > Basically these kind of local IPA passes are bad :/ This happens at materialization. Basically we should 1) materialize, 2) redirect callers and render some blocks unreachable, 3) inline, 4) cleanup cfg and get rid of them before updating SSA form. So it seems safe to me. Honza > > > - ipa-prop.c: ipa_modify_call_arguments() -> gsi_replace(). The right > > thing to do /seems/ to be to add follow up call to > > gimple_purge_dead_eh_edges(). > > I'd say call gsi_replace with , false and copy EH info manually and > unconditionally here. > > > Both cases above are triggered from early SRA. > > But maybe also used by other real IPA passes? > > > The other places that appear not to update CFG after calling > > maybe_clean_or_replace_eh_stmt are: > > - tree-cfg.c: replace_uses_by() > > - tree-ssa-dce.c: eliminate_unnecessary_stmts(). > > > >> I suppose it > >> merely fails to properly transfer the EH info from the old to the new call > >> (thus, fails to forcefully do what maybe_duplicate_eh_stmt does). > > > > I don't follow you here. The EH CFG edges that get out-of-date are in > > _caller_ function. It is the _callee_ function that is duplicated. > > Yes, what I mean is if a stmt is replaced in the caller then it will not > copy EH info if the new call cannot throw. But as we are not updating > the CFG we have to preserve EH info nevertheless (and allow "dead" > EH info in the verifiers). > > The alternative is to, for example in update_call_expr (), remove dead > EH edges, switch cfun to the caller context, call cfg_cleanup, switch > back. Expensive (but maybe doesn't happen often enough to worry?). > > I wonder why we end up, at IPA SRA time, knowing the cloned function > won't throw anymore. I suppose we don't. Do we? > > Thanks, > Richard. > > > Thank you, > > > > -- > > Maxim Kuvyrkov > > CodeSourcery / Mentor Graphics > > > >
Re: Ping #1: [Patch,AVR]: Clean-up some SP insns
Ping #1: http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01690.html Georg-Johann Lay wrote: > This is just a code clean-up. > > The bulky code from *addhi3_sp_R_pc2 and *addhi3_sp_R_pc3 is done by a small C > function that does the same (except that it prints some comment depending on > -dp or -fverbose-asm). > > *movhi_sp is an insn that should not be there and go away because it is a move > insn and there should be at most one move insn per mode (HImode in this case). > > stack_register_operand, i.e. (REG:HI 32), is in register class STACK_REG, thus > not in NO_REGS, thus element of register_operand, general_regs and > nonimmediate_operand which are the predicates/condition of *movhi. *movhi > already knows to handle "q,r" and "r,q" moves, same applies to the output > function output_movhi. > > The patch passes the test suite, of course. > > Ok? > > Moreover, I'd like to remove constraint "R" which is just used in one insn now > and replace it by 3-letter constraint C.. so that prefix R is free. > > R is of absolutely no use in inline assembly and the constraint can be > renamed/removed from documentation, IMO. > > Johann > > * config/avr/avr-protos.h (avr_out_addto_sp): New prototype. > * config/avr/avr.c (avr_out_addto_sp): New function. > (adjust_insn_length): Handle ADJUST_LEN_ADDTO_SP. > * config/avr/avr.md (adjust_len): Add "addto_sp". > (*movhi_sp): Remove insn. > (*addhi3_sp_R_pc2, *addhi3_sp_R_pc3): Merge to *addhi3_sp_R.
Re: [wwwdocs] IA-32/x86-64 Changes for upcoming 4.7.0 series
Thank you guys for your support! K
Commit: RX: Codegen bug fixes
Hi Guys, I am applying the patch below to fix a couple of bugs in the RX's machine description patterns. The first concerns the tablejump pattern, which needs to include a label to be referenced by the ASM_OUTPUT_ADDR_DIFF_ELT macro. The second problem is the ADDDI3 spiltter which was not marking its first output operand (operand 0) as early clobbered. This meant that it could be used as one of the inputs to the second part of the pattern (operands 4,5), causing chaos. The final fix was pointed out by Richard Henderson. The recently added support for narrow mode min and max instructions did not work for the SMAX insn, as the RX does not have narrow mode versions of this insn. Cheers Nick gcc/ChangeLog 2011-10-05 Nick Clifton * config/rx/rx.md (tablejump): Add missing label. (adddi3_internal): Mark operand 0 as early-clobbered. (smaxsi3): Revert previous delta. (adc_internal): Fix whitespace in generated asm. (adc_flags): Likewise. Index: gcc/config/rx/rx.md === --- gcc/config/rx/rx.md (revision 179540) +++ gcc/config/rx/rx.md (working copy) @@ -332,7 +332,7 @@ "" { return flag_pic ? (TARGET_AS100_SYNTAX ? "\n?:\tbra\t%0" : "\n1:\tbra\t%0") - : "jmp\t%0"; + : "\n1:jmp\t%0"; } [(set_attr "timings" "33") (set_attr "length" "2")] @@ -901,7 +901,7 @@ (match_operand:SI 2 "rx_source_operand" "r,Sint08,Sint16,Sint24,i,Q"))) (clobber (reg:CC CC_REG))] "reload_completed" - "adc %2,%0" + "adc\t%2, %0" [(set_attr "timings" "11,11,11,11,11,33") (set_attr "length" "3,4,5,6,7,6")] ) @@ -922,7 +922,7 @@ (match_dup 2)) (const_int 0)))] "reload_completed && rx_match_ccmode (insn, CC_ZSCmode)" - "adc %2,%0" + "adc\t%2, %0" [(set_attr "timings" "11,11,11,11,11,33") (set_attr "length" "3,4,5,6,7,6")] ) @@ -980,7 +980,7 @@ }) (define_insn_and_split "adddi3_internal" - [(set (match_operand:SI 0 "register_operand" "=r") + [(set (match_operand:SI 0 "register_operand" "=&r") (plus:SI (match_operand:SI 2 "register_operand" "r") (match_operand:SI 3 "rx_source_operand" "riQ"))) (set (match_operand:SI 1 "register_operand" "=r") @@ -1163,11 +1163,11 @@ (set_attr "timings" "22,44")] ) -(define_insn "smax3" - [(set (match_operand:int_modes 0 "register_operand" "=r,r,r,r,r,r") - (smax:int_modes (match_operand:int_modes 1 "register_operand" "%0,0,0,0,0,0") - (match_operand:int_modes 2 "rx_source_operand" - "r,Sint08,Sint16,Sint24,i,Q")))] +(define_insn "smaxsi3" + [(set (match_operand:SI 0 "register_operand" "=r,r,r,r,r,r") + (smax:SI (match_operand:SI 1 "register_operand" "%0,0,0,0,0,0") +(match_operand:SI 2 "rx_source_operand" + "r,Sint08,Sint16,Sint24,i,Q")))] "" "max\t%Q2, %0" [(set_attr "length" "3,4,5,6,7,6")
Re: Ping #1: [Patch,AVR]: Clean-up some SP insns
2011/10/5 Georg-Johann Lay : > > Ping #1: http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01690.html > > Georg-Johann Lay wrote: > >> This is just a code clean-up. >> >> The bulky code from *addhi3_sp_R_pc2 and *addhi3_sp_R_pc3 is done by a small >> C >> function that does the same (except that it prints some comment depending on >> -dp or -fverbose-asm). >> >> *movhi_sp is an insn that should not be there and go away because it is a >> move >> insn and there should be at most one move insn per mode (HImode in this >> case). >> >> stack_register_operand, i.e. (REG:HI 32), is in register class STACK_REG, >> thus >> not in NO_REGS, thus element of register_operand, general_regs and >> nonimmediate_operand which are the predicates/condition of *movhi. *movhi >> already knows to handle "q,r" and "r,q" moves, same applies to the output >> function output_movhi. >> >> The patch passes the test suite, of course. >> >> Ok? >> >> Moreover, I'd like to remove constraint "R" which is just used in one insn >> now >> and replace it by 3-letter constraint C.. so that prefix R is free. >> >> R is of absolutely no use in inline assembly and the constraint can be >> renamed/removed from documentation, IMO. >> >> Johann >> >> * config/avr/avr-protos.h (avr_out_addto_sp): New prototype. >> * config/avr/avr.c (avr_out_addto_sp): New function. >> (adjust_insn_length): Handle ADJUST_LEN_ADDTO_SP. >> * config/avr/avr.md (adjust_len): Add "addto_sp". >> (*movhi_sp): Remove insn. >> (*addhi3_sp_R_pc2, *addhi3_sp_R_pc3): Merge to *addhi3_sp_R. Please commit. Denis.
[PATCH] Fix COND_EXPR valueization
When making COND_EXPR/VEC_COND_EXPR a gimple ternary op I forgot to update ternary handling in gimple_fold_stmt_to_constant_1 to also valueize and fold the embedded comparison. Done so. This also adjusts SCCVN to accept a SSA name result from that folder, when simplifying A > 1 ? 2 : C to C, which is certainly useful (observed with a testcase which triggers vector lowering). Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2011-10-05 Richard Guenther * gimple-fold.c (gimple_fold_stmt_to_constant_1): For ternary ops with an embedded expression valueize and fold that as well. * tree-ssa-sccvn.c (try_to_simplify): Also allow SSA name results from gimple_fold_stmt_to_constant_1. Index: gcc/gimple-fold.c === *** gcc/gimple-fold.c (revision 179540) --- gcc/gimple-fold.c (working copy) *** gimple_fold_stmt_to_constant_1 (gimple s *** 2569,2574 --- 2569,2587 tree op1 = (*valueize) (gimple_assign_rhs2 (stmt)); tree op2 = (*valueize) (gimple_assign_rhs3 (stmt)); + /* Fold embedded expressions in ternary codes. */ + if ((subcode == COND_EXPR + || subcode == VEC_COND_EXPR) + && COMPARISON_CLASS_P (op0)) + { + tree op00 = (*valueize) (TREE_OPERAND (op0, 0)); + tree op01 = (*valueize) (TREE_OPERAND (op0, 1)); + tree tem = fold_binary_loc (loc, TREE_CODE (op0), + TREE_TYPE (op0), op00, op01); + if (tem) + op0 = tem; + } + return fold_ternary_loc (loc, subcode, gimple_expr_type (stmt), op0, op1, op2); } Index: gcc/tree-ssa-sccvn.c === *** gcc/tree-ssa-sccvn.c(revision 179539) --- gcc/tree-ssa-sccvn.c(working copy) *** simplify_unary_expression (gimple stmt) *** 2967,2993 static tree try_to_simplify (gimple stmt) { tree tem; /* For stores we can end up simplifying a SSA_NAME rhs. Just return in this case, there is no point in doing extra work. */ ! if (gimple_assign_copy_p (stmt) ! && TREE_CODE (gimple_assign_rhs1 (stmt)) == SSA_NAME) return NULL_TREE; /* First try constant folding based on our current lattice. */ ! tem = gimple_fold_stmt_to_constant (stmt, vn_valueize); ! if (tem) return tem; /* If that didn't work try combining multiple statements. */ ! switch (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))) { case tcc_reference: ! /* Fallthrough for some codes that can operate on registers. */ ! if (!(TREE_CODE (gimple_assign_rhs1 (stmt)) == REALPART_EXPR ! || TREE_CODE (gimple_assign_rhs1 (stmt)) == IMAGPART_EXPR ! || TREE_CODE (gimple_assign_rhs1 (stmt)) == VIEW_CONVERT_EXPR)) break; /* We could do a little more with unary ops, if they expand into binary ops, but it's debatable whether it is worth it. */ --- 2967,2995 static tree try_to_simplify (gimple stmt) { + enum tree_code code = gimple_assign_rhs_code (stmt); tree tem; /* For stores we can end up simplifying a SSA_NAME rhs. Just return in this case, there is no point in doing extra work. */ ! if (code == SSA_NAME) return NULL_TREE; /* First try constant folding based on our current lattice. */ ! tem = gimple_fold_stmt_to_constant_1 (stmt, vn_valueize); ! if (tem ! && (TREE_CODE (tem) == SSA_NAME ! || is_gimple_min_invariant (tem))) return tem; /* If that didn't work try combining multiple statements. */ ! switch (TREE_CODE_CLASS (code)) { case tcc_reference: ! /* Fallthrough for some unary codes that can operate on registers. */ ! if (!(code == REALPART_EXPR ! || code == IMAGPART_EXPR ! || code == VIEW_CONVERT_EXPR)) break; /* We could do a little more with unary ops, if they expand into binary ops, but it's debatable whether it is worth it. */
[C++ Patch] PR 38980
Hi, as analyzed by Jakub in the audit trail, after changes made by Mark back in 2005, constant_value_1 doesn't return aggregate constants for fear of inadvertent copies (in general their addresses must be the same everywhere). However, for the purpose of format checking in check_format_arg it is safe to do that, and necessary, thus Jakub suggested adding a parameter to decl_constant_value controlling that behavior. Which is what I tried to do with the below, tested x86_64-linux. Ok for mainline? Thanks, Paolo. / /c-family 2011-10-05 Paolo Carlini PR c++/38980 * c-common.h (decl_constant_value): Add bool parameter. * c-common.c (decl_constant_value_for_optimization): Adjust call. * c-format.c (check_format_arg): Likewise. 2011-10-05 Paolo Carlini PR c++/38980 * c-typeck.c (decl_constant_value): Adjust definition. /cp 2011-10-05 Paolo Carlini PR c++/38980 * typeck.c (decay_conversion): Adjust decl_constant_value call. * call.c (convert_like_real): Likewise. * init.c (constant_value_1): Add bool parameter. (integral_constant_value): Adjust call. (decl_constant_value): Adjust definition. /testsuite 2011-10-05 Paolo Carlini PR c++/38980 * g++.dg/warn/format5.C: New. Index: c-family/c-format.c === --- c-family/c-format.c (revision 179540) +++ c-family/c-format.c (working copy) @@ -1529,7 +1529,7 @@ check_format_arg (void *ctx, tree format_tree, format_tree = TREE_OPERAND (format_tree, 0); if (TREE_CODE (format_tree) == VAR_DECL && TREE_CODE (TREE_TYPE (format_tree)) == ARRAY_TYPE - && (array_init = decl_constant_value (format_tree)) != format_tree + && (array_init = decl_constant_value (format_tree, true)) != format_tree && TREE_CODE (array_init) == STRING_CST) { /* Extract the string constant initializer. Note that this may include Index: c-family/c-common.c === --- c-family/c-common.c (revision 179540) +++ c-family/c-common.c (working copy) @@ -1443,7 +1443,7 @@ decl_constant_value_for_optimization (tree exp) || DECL_MODE (exp) == BLKmode) return exp; - ret = decl_constant_value (exp); + ret = decl_constant_value (exp, false); /* Avoid unwanted tree sharing between the initializer and current function's body where the tree can be modified e.g. by the gimplifier. */ Index: c-family/c-common.h === --- c-family/c-common.h (revision 179540) +++ c-family/c-common.h (working copy) @@ -886,7 +886,7 @@ extern tree default_conversion (tree); extern tree common_type (tree, tree); -extern tree decl_constant_value (tree); +extern tree decl_constant_value (tree, bool); /* Handle increment and decrement of boolean types. */ extern tree boolean_increment (enum tree_code, tree); Index: testsuite/g++.dg/warn/format5.C === --- testsuite/g++.dg/warn/format5.C (revision 0) +++ testsuite/g++.dg/warn/format5.C (revision 0) @@ -0,0 +1,12 @@ +// PR c++/38980 +// { dg-options "-Wformat" } + +extern "C" +int printf(const char *format, ...) __attribute__((format(printf, 1, 2) )); + +const char fmt1[] = "Hello, %s"; + +void f() +{ + printf(fmt1, 3); // { dg-warning "expects argument" } +} Index: cp/typeck.c === --- cp/typeck.c (revision 179540) +++ cp/typeck.c (working copy) @@ -1827,7 +1827,7 @@ decay_conversion (tree exp) /* FIXME remove? at least need to remember that this isn't really a constant expression if EXP isn't decl_constant_var_p, like with C_MAYBE_CONST_EXPR. */ - exp = decl_constant_value (exp); + exp = decl_constant_value (exp, /*return_aggregate_cst_ok_p=*/false); if (error_operand_p (exp)) return error_mark_node; Index: cp/init.c === --- cp/init.c (revision 179540) +++ cp/init.c (working copy) @@ -1794,10 +1794,11 @@ build_offset_ref (tree type, tree member, bool add constant initializer, return the initializer (or, its initializers, recursively); otherwise, return DECL. If INTEGRAL_P, the initializer is only returned if DECL is an integral - constant-expression. */ + constant-expression. If RETURN_AGGREGATE_CST_OK_P, it is ok to + return an aggregate constant. */ static tree -constant_value_1 (tree decl, bool integral_p) +constant_value_1 (tree decl, bool integral_p, bool return_aggregate_cst_ok_p) { while (TREE_CODE (decl) == CONST_DECL || (integral_p @@ -1834,11 +1835,12 @@ static tree if (!init || !TREE_TYPE (init) || !TREE_CONSTANT (init) - || (!integral_p - /* Do not r
Re: New warning for expanded vector operations
On Wed, Oct 5, 2011 at 9:40 AM, Richard Guenther wrote: > On Wed, Oct 5, 2011 at 12:18 AM, Artem Shinkarov > wrote: >> Hi >> >> Here is a patch to inform a programmer about the expanded vector operation. >> Bootstrapped on x86-unknown-linux-gnu. >> >> ChangeLog: >> >> * gcc/tree-vect-generic.c (expand_vector_piecewise): Adjust to >> produce the warning. >> (expand_vector_parallel): Adjust to produce the warning. > > Entries start without gcc/, they are relative to the gcc/ChangeLog file. Sure, sorry. >> (lower_vec_shuffle): Adjust to produce the warning. >> * gcc/common.opt: New warning Wvector-operation-expanded. >> * gcc/doc/invoke.texi: Document the wawning. >> >> >> Ok? > > I don't like the name -Wvector-operation-expanded. We emit a > similar warning for missed inline expansions with -Winline, so > maybe -Wvector-extensions (that's the name that appears > in the C extension documentation). Hm, I don't care much about the name, unless it gets clear what the warning is used for. I am not really sure that Wvector-extensions makes it clear. Also, I don't see anything bad if the warning will pop up during the vectorisation. Any vector operation performed outside the SIMD accelerator looks suspicious, because it actually doesn't improve performance. Such a warning during the vectorisation could mean that a programmer forgot some flag, or the constant propagation failed to deliver a constant, or something else. Conceptually the text I am producing is not really a warning, it is more like an information, but I am not aware of the mechanisms that would allow me to introduce a flag triggering inform () or something similar. What I think we really need to avoid is including this warning in the standard Ox. > + location_t loc = gimple_location (gsi_stmt (*gsi)); > + > + warning_at (loc, OPT_Wvector_operation_expanded, > + "vector operation will be expanded piecewise"); > > v = VEC_alloc(constructor_elt, gc, (nunits + delta - 1) / delta); > for (i = 0; i < nunits; > @@ -260,6 +264,10 @@ expand_vector_parallel (gimple_stmt_iter > tree result, compute_type; > enum machine_mode mode; > int n_words = tree_low_cst (TYPE_SIZE_UNIT (type), 1) / UNITS_PER_WORD; > + location_t loc = gimple_location (gsi_stmt (*gsi)); > + > + warning_at (loc, OPT_Wvector_operation_expanded, > + "vector operation will be expanded in parallel"); > > what's the difference between 'piecewise' and 'in parallel'? Parallel is a little bit better for performance than piecewise. > @@ -301,16 +309,15 @@ expand_vector_addition (gimple_stmt_iter > { > int parts_per_word = UNITS_PER_WORD > / tree_low_cst (TYPE_SIZE_UNIT (TREE_TYPE (type)), 1); > + location_t loc = gimple_location (gsi_stmt (*gsi)); > > if (INTEGRAL_TYPE_P (TREE_TYPE (type)) > && parts_per_word >= 4 > && TYPE_VECTOR_SUBPARTS (type) >= 4) > - return expand_vector_parallel (gsi, f_parallel, > - type, a, b, code); > + return expand_vector_parallel (gsi, f_parallel, type, a, b, code); > else > - return expand_vector_piecewise (gsi, f, > - type, TREE_TYPE (type), > - a, b, code); > + return expand_vector_piecewise (gsi, f, type, > + TREE_TYPE (type), a, b, code); > } > > /* Check if vector VEC consists of all the equal elements and > > unless i miss something loc is unused here. Please avoid random > whitespace changes (just review your patch yourself before posting > and revert pieces that do nothing). Yes you are right, sorry. > +@item -Wvector-operation-expanded > +@opindex Wvector-operation-expanded > +@opindex Wno-vector-operation-expanded > +Warn if vector operation is not implemented via SIMD capabilities of the > +architecture. Mainly useful for the performance tuning. > > I'd mention that this is for vector operations as of the C extension > documented in "Vector Extensions". > > The vectorizer can produce some operations that will need further > lowering - we probably should make sure to _not_ warn about those. > Try running the vect.exp testsuite with the new warning turned on > (eventually disabling SSE), like with > > obj/gcc> make check-gcc > RUNTESTFLAGS="--target_board=unix/-Wvector-extensions/-mno-sse > vect.exp" Again, see the comment above. I think, if the warning can be triggered only manually, then we are fine. But I'll check anyway how many warnings I'll get from vect.exp. >> P.S. It is hard to write a reasonable testcase for the patch, because >> one needs to guess which architecture would expand a given vector >> operation. But the patch is trivial. > > You can create an aritificial large vector type for example, or put a > testcase under gcc.target/i386 and disable SSE. We should have > a testcase for this. Yeah, disabling SSE should help. Thanks, Artem. > Thanks, > Richard
[PATCH] Fix ICE in find_equal_ptrs (PR tree-optimization/50613)
Hi! I didn't consider that the rhs1 of a gimple cast might be something other than SSA_NAME. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-10-05 Jakub Jelinek PR tree-optimization/50613 * tree-ssa-strlen.c (find_equal_ptrs): If CASE_CONVERT operand is ADDR_EXPR, fallthru into ADDR_EXPR handling, and if it is neither that not SSA_NAME, give up. * gcc.dg/pr50613.c: New test. --- gcc/tree-ssa-strlen.c.jj2011-10-05 08:13:55.0 +0200 +++ gcc/tree-ssa-strlen.c 2011-10-05 08:19:16.0 +0200 @@ -692,6 +692,14 @@ find_equal_ptrs (tree ptr, int idx) { case SSA_NAME: break; + CASE_CONVERT: + if (!POINTER_TYPE_P (TREE_TYPE (ptr))) + return; + if (TREE_CODE (ptr) == SSA_NAME) + break; + if (TREE_CODE (ptr) != ADDR_EXPR) + return; + /* FALLTHRU */ case ADDR_EXPR: { int *pidx = addr_stridxptr (TREE_OPERAND (ptr, 0)); @@ -699,10 +707,6 @@ find_equal_ptrs (tree ptr, int idx) *pidx = idx; return; } - CASE_CONVERT: - if (POINTER_TYPE_P (TREE_TYPE (ptr))) - break; - return; default: return; } --- gcc/testsuite/gcc.dg/pr50613.c.jj 2011-10-05 08:22:32.0 +0200 +++ gcc/testsuite/gcc.dg/pr50613.c 2011-10-05 08:21:31.0 +0200 @@ -0,0 +1,20 @@ +/* PR tree-optimization/50613 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-tree-ccp" } */ + +#include "strlenopt.h" + +char buf[26]; + +static inline void +bar (char *__restrict dest, const char *__restrict src) +{ + strcpy (dest, src); +} + +void +foo (char *p) +{ + if (strlen (p) < 50) +bar (buf, p); +} Jakub
Re: New warning for expanded vector operations
On Wed, Oct 5, 2011 at 1:28 PM, Artem Shinkarov wrote: > On Wed, Oct 5, 2011 at 9:40 AM, Richard Guenther > wrote: >> On Wed, Oct 5, 2011 at 12:18 AM, Artem Shinkarov >> wrote: >>> Hi >>> >>> Here is a patch to inform a programmer about the expanded vector operation. >>> Bootstrapped on x86-unknown-linux-gnu. >>> >>> ChangeLog: >>> >>> * gcc/tree-vect-generic.c (expand_vector_piecewise): Adjust to >>> produce the warning. >>> (expand_vector_parallel): Adjust to produce the warning. >> >> Entries start without gcc/, they are relative to the gcc/ChangeLog file. > > Sure, sorry. > >>> (lower_vec_shuffle): Adjust to produce the warning. >>> * gcc/common.opt: New warning Wvector-operation-expanded. >>> * gcc/doc/invoke.texi: Document the wawning. >>> >>> >>> Ok? >> >> I don't like the name -Wvector-operation-expanded. We emit a >> similar warning for missed inline expansions with -Winline, so >> maybe -Wvector-extensions (that's the name that appears >> in the C extension documentation). > > Hm, I don't care much about the name, unless it gets clear what the > warning is used for. I am not really sure that Wvector-extensions > makes it clear. Also, I don't see anything bad if the warning will > pop up during the vectorisation. Any vector operation performed > outside the SIMD accelerator looks suspicious, because it actually > doesn't improve performance. Such a warning during the vectorisation > could mean that a programmer forgot some flag, or the constant > propagation failed to deliver a constant, or something else. > > Conceptually the text I am producing is not really a warning, it is > more like an information, but I am not aware of the mechanisms that > would allow me to introduce a flag triggering inform () or something > similar. > > What I think we really need to avoid is including this warning in the > standard Ox. > >> + location_t loc = gimple_location (gsi_stmt (*gsi)); >> + >> + warning_at (loc, OPT_Wvector_operation_expanded, >> + "vector operation will be expanded piecewise"); >> >> v = VEC_alloc(constructor_elt, gc, (nunits + delta - 1) / delta); >> for (i = 0; i < nunits; >> @@ -260,6 +264,10 @@ expand_vector_parallel (gimple_stmt_iter >> tree result, compute_type; >> enum machine_mode mode; >> int n_words = tree_low_cst (TYPE_SIZE_UNIT (type), 1) / UNITS_PER_WORD; >> + location_t loc = gimple_location (gsi_stmt (*gsi)); >> + >> + warning_at (loc, OPT_Wvector_operation_expanded, >> + "vector operation will be expanded in parallel"); >> >> what's the difference between 'piecewise' and 'in parallel'? > > Parallel is a little bit better for performance than piecewise. I see. That difference should probably be documented, maybe with an example. Richard. >> @@ -301,16 +309,15 @@ expand_vector_addition (gimple_stmt_iter >> { >> int parts_per_word = UNITS_PER_WORD >> / tree_low_cst (TYPE_SIZE_UNIT (TREE_TYPE (type)), 1); >> + location_t loc = gimple_location (gsi_stmt (*gsi)); >> >> if (INTEGRAL_TYPE_P (TREE_TYPE (type)) >> && parts_per_word >= 4 >> && TYPE_VECTOR_SUBPARTS (type) >= 4) >> - return expand_vector_parallel (gsi, f_parallel, >> - type, a, b, code); >> + return expand_vector_parallel (gsi, f_parallel, type, a, b, code); >> else >> - return expand_vector_piecewise (gsi, f, >> - type, TREE_TYPE (type), >> - a, b, code); >> + return expand_vector_piecewise (gsi, f, type, >> + TREE_TYPE (type), a, b, code); >> } >> >> /* Check if vector VEC consists of all the equal elements and >> >> unless i miss something loc is unused here. Please avoid random >> whitespace changes (just review your patch yourself before posting >> and revert pieces that do nothing). > > Yes you are right, sorry. > >> +@item -Wvector-operation-expanded >> +@opindex Wvector-operation-expanded >> +@opindex Wno-vector-operation-expanded >> +Warn if vector operation is not implemented via SIMD capabilities of the >> +architecture. Mainly useful for the performance tuning. >> >> I'd mention that this is for vector operations as of the C extension >> documented in "Vector Extensions". >> >> The vectorizer can produce some operations that will need further >> lowering - we probably should make sure to _not_ warn about those. >> Try running the vect.exp testsuite with the new warning turned on >> (eventually disabling SSE), like with >> >> obj/gcc> make check-gcc >> RUNTESTFLAGS="--target_board=unix/-Wvector-extensions/-mno-sse >> vect.exp" > > Again, see the comment above. I think, if the warning can be triggered > only manually, then we are fine. But I'll check anyway how many > warnings I'll get from vect.exp. > >>> P.S. It is hard to write a reasonable testcase for the patch, because >>> one needs to guess which architecture would expand a
Re: [PATCH] Fix ICE in find_equal_ptrs (PR tree-optimization/50613)
On Wed, Oct 5, 2011 at 1:31 PM, Jakub Jelinek wrote: > Hi! > > I didn't consider that the rhs1 of a gimple cast might be something > other than SSA_NAME. Fixed thusly, bootstrapped/regtested on x86_64-linux > and i686-linux, ok for trunk? Ok. Thanks, Richard. > 2011-10-05 Jakub Jelinek > > PR tree-optimization/50613 > * tree-ssa-strlen.c (find_equal_ptrs): If CASE_CONVERT > operand is ADDR_EXPR, fallthru into ADDR_EXPR handling, > and if it is neither that not SSA_NAME, give up. > > * gcc.dg/pr50613.c: New test. > > --- gcc/tree-ssa-strlen.c.jj 2011-10-05 08:13:55.0 +0200 > +++ gcc/tree-ssa-strlen.c 2011-10-05 08:19:16.0 +0200 > @@ -692,6 +692,14 @@ find_equal_ptrs (tree ptr, int idx) > { > case SSA_NAME: > break; > + CASE_CONVERT: > + if (!POINTER_TYPE_P (TREE_TYPE (ptr))) > + return; > + if (TREE_CODE (ptr) == SSA_NAME) > + break; > + if (TREE_CODE (ptr) != ADDR_EXPR) > + return; > + /* FALLTHRU */ > case ADDR_EXPR: > { > int *pidx = addr_stridxptr (TREE_OPERAND (ptr, 0)); > @@ -699,10 +707,6 @@ find_equal_ptrs (tree ptr, int idx) > *pidx = idx; > return; > } > - CASE_CONVERT: > - if (POINTER_TYPE_P (TREE_TYPE (ptr))) > - break; > - return; > default: > return; > } > --- gcc/testsuite/gcc.dg/pr50613.c.jj 2011-10-05 08:22:32.0 +0200 > +++ gcc/testsuite/gcc.dg/pr50613.c 2011-10-05 08:21:31.0 +0200 > @@ -0,0 +1,20 @@ > +/* PR tree-optimization/50613 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-tree-ccp" } */ > + > +#include "strlenopt.h" > + > +char buf[26]; > + > +static inline void > +bar (char *__restrict dest, const char *__restrict src) > +{ > + strcpy (dest, src); > +} > + > +void > +foo (char *p) > +{ > + if (strlen (p) < 50) > + bar (buf, p); > +} > > Jakub >
[PATCH] Propagate vector CONSTRUCTOR and element extract in SCCVN
Noticed from vector lowering testcases. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2011-10-05 Richard Guenther * tree-ssa-sccvn.c (vn_get_expr_for): Handle CONSTRUCTOR of vector type. (simplify_unary_expression): Handle BIT_FIELD_REFs. (try_to_simplify): Handle BIT_FIELD_REFs. Index: gcc/tree-ssa-sccvn.c === --- gcc/tree-ssa-sccvn.c(revision 179543) +++ gcc/tree-ssa-sccvn.c(working copy) @@ -275,6 +275,13 @@ vn_get_expr_for (tree name) gimple_assign_rhs2 (def_stmt)); break; +case tcc_exceptional: + if (code == CONSTRUCTOR + && TREE_CODE + (TREE_TYPE (gimple_assign_rhs1 (def_stmt))) == VECTOR_TYPE) + expr = gimple_assign_rhs1 (def_stmt); + break; + default:; } if (expr == NULL_TREE) @@ -2922,7 +2929,8 @@ simplify_unary_expression (gimple stmt) GIMPLE_ASSIGN_SINGLE codes. */ if (code == REALPART_EXPR || code == IMAGPART_EXPR - || code == VIEW_CONVERT_EXPR) + || code == VIEW_CONVERT_EXPR + || code == BIT_FIELD_REF) op0 = TREE_OPERAND (op0, 0); if (TREE_CODE (op0) != SSA_NAME) @@ -2934,7 +2942,8 @@ simplify_unary_expression (gimple stmt) else if (CONVERT_EXPR_CODE_P (code) || code == REALPART_EXPR || code == IMAGPART_EXPR - || code == VIEW_CONVERT_EXPR) + || code == VIEW_CONVERT_EXPR + || code == BIT_FIELD_REF) { /* We want to do tree-combining on conversion-like expressions. Make sure we feed only SSA_NAMEs or constants to fold though. */ @@ -2943,6 +2952,7 @@ simplify_unary_expression (gimple stmt) || BINARY_CLASS_P (tem) || TREE_CODE (tem) == VIEW_CONVERT_EXPR || TREE_CODE (tem) == SSA_NAME + || TREE_CODE (tem) == CONSTRUCTOR || is_gimple_min_invariant (tem)) op0 = tem; } @@ -2951,7 +2961,14 @@ simplify_unary_expression (gimple stmt) if (op0 == orig_op0) return NULL_TREE; - result = fold_unary_ignore_overflow (code, gimple_expr_type (stmt), op0); + if (code == BIT_FIELD_REF) +{ + tree rhs = gimple_assign_rhs1 (stmt); + result = fold_ternary (BIT_FIELD_REF, TREE_TYPE (rhs), +op0, TREE_OPERAND (rhs, 1), TREE_OPERAND (rhs, 2)); +} + else +result = fold_unary_ignore_overflow (code, gimple_expr_type (stmt), op0); if (result) { STRIP_USELESS_TYPE_CONVERSION (result); @@ -2989,7 +3006,8 @@ try_to_simplify (gimple stmt) /* Fallthrough for some unary codes that can operate on registers. */ if (!(code == REALPART_EXPR || code == IMAGPART_EXPR - || code == VIEW_CONVERT_EXPR)) + || code == VIEW_CONVERT_EXPR + || code == BIT_FIELD_REF)) break; /* We could do a little more with unary ops, if they expand into binary ops, but it's debatable whether it is worth it. */ @@ -3159,7 +3177,8 @@ visit_use (tree use) /* VOP-less references can go through unary case. */ if ((code == REALPART_EXPR || code == IMAGPART_EXPR - || code == VIEW_CONVERT_EXPR) + || code == VIEW_CONVERT_EXPR + || code == BIT_FIELD_REF) && TREE_CODE (TREE_OPERAND (rhs1, 0)) == SSA_NAME) { changed = visit_nary_op (lhs, stmt);
[PATCH, testsuite]: "Fix" avx256-unaligned-{load,store}-3.c scan-assembler failures
Hello! Atom does not vectorize DFmode arrays by default, so add -mtune=generic to dg-options to fix scan-assembler failures [1]. 2011-10-05 Uros Bizjak * gcc.target/i386/avx256-unaligned-load-3.c (dg-options): Add -mtune=generic. * gcc.target/i386/avx256-unaligned-store-3.c (dg-options): Ditto. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. [1] http://gcc.gnu.org/ml/gcc-testresults/2011-10/msg00530.html Uros. Index: gcc.target/i386/avx256-unaligned-store-3.c === --- gcc.target/i386/avx256-unaligned-store-3.c (revision 179535) +++ gcc.target/i386/avx256-unaligned-store-3.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-store" } */ +/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-store -mtune=generic" } */ #define N 1024 Index: gcc.target/i386/avx256-unaligned-load-3.c === --- gcc.target/i386/avx256-unaligned-load-3.c (revision 179535) +++ gcc.target/i386/avx256-unaligned-load-3.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-load" } */ +/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-load -mtune=generic" } */ #define N 1024
Re: Modify gcc for use with gdb (issue5132047)
On 11-09-27 03:37 , Richard Guenther wrote: On Tue, Sep 27, 2011 at 9:35 AM, Richard Guenther wrote: On Tue, Sep 27, 2011 at 9:14 AM, Jakub Jelinek wrote: On Mon, Sep 26, 2011 at 03:05:00PM -0700, Lawrence Crowl wrote: There a non-transparent change in behavior that may affect some users. The inline functions will introduce additional lines in a sequence of gdb 'step' commands. Use 'next' instead. That is IMHO a serious obstackle. If anything, the inlines should use always_inline, artificial attributes, but don't know if GDB treats them already specially and doesn't step into them with step. Also, I'm afraid it will significantly grow cc1plus/cc1 debug info. The .gdbinit macro redefinition Paolo posted sounds to be a better approach. Yeah, I already see me typing sfinish gazillion of times when trying to step into a function call that produces a function argument ... there is already the very very very annoying tree_code_length function that you get for each TREE_OPERAND (...) macro on function arguments. I'd be very opposed to any change that makes this situation worse. tree_operand_length actually. Please produce a patch that makes this function transparent to gdb, then I might be convinced converting the other macros to such function might be worthwhile. Thanks, Richard. I think we need to find a solution for this situation. The suggestions I see in this thread are nothing but workarounds to cope with current debugging limitations. We have learned to live with them and hack around them, but debugging GCC is already a daunting task, and I think we can improve it. Richard, yes, stepping into one-liner inline functions can be aggravating (particularly with some of the 2-3 embedded function calls we have in some places). However, the ability to easily inspect state of data structures using the standard accessors is fundamental. Particularly, for developers that may not have the extensive knowledge of internals that you do. Jakub, is size of a debuggable cc1/cc1plus really all that important? Yes, more debug info makes for bigger binaries. But when debugging, that hardly matters. Unless we were talking about a multi-Gb binary, which we aren't. I would like to solve this problem for all inline functions that we may not care to step into. There is a whole bunch of one-liners that generally annoy me: tree_operand_length, gimple_op, ... in fact, all the gimple accessors, etc. How about one of these two ideas? 1- Add -g to the list of supported settings in #pragma GCC optimize (or create a new #pragma GCC no_debug). This way, we can brace all these one liners with: #pragma push_options #pragma GCC optimize ("-g0") [ ... inline functions ... ] #pragma pop_options 2- Similar to #1, but with __attribute__ in each function declaration. I think I prefer #1 since it's simpler for the user to specify. This would also let us generate smaller debug binaries, since the bracketed functions would not get any debug info at all. Any reason why that scheme couldn't work? It works well for separate TUs. Thanks. Diego.
Re: Modify gcc for use with gdb (issue5132047)
On Wed, Oct 5, 2011 at 3:18 PM, Diego Novillo wrote: > On 11-09-27 03:37 , Richard Guenther wrote: >> >> On Tue, Sep 27, 2011 at 9:35 AM, Richard Guenther >> wrote: >>> >>> On Tue, Sep 27, 2011 at 9:14 AM, Jakub Jelinek wrote: On Mon, Sep 26, 2011 at 03:05:00PM -0700, Lawrence Crowl wrote: > > There a non-transparent change in behavior that may affect some users. > The inline functions will introduce additional lines in a sequence of > gdb 'step' commands. Use 'next' instead. That is IMHO a serious obstackle. If anything, the inlines should use always_inline, artificial attributes, but don't know if GDB treats them already specially and doesn't step into them with step. Also, I'm afraid it will significantly grow cc1plus/cc1 debug info. The .gdbinit macro redefinition Paolo posted sounds to be a better approach. >>> >>> Yeah, I already see me typing sfinish gazillion of times >>> when >>> trying to step into a function call that produces a function argument >>> ... there is >>> already the very very very annoying tree_code_length function that you >>> get for >>> each TREE_OPERAND (...) macro on function arguments. I'd be very opposed >>> to any change that makes this situation worse. >> >> tree_operand_length actually. Please produce a patch that makes this >> function >> transparent to gdb, then I might be convinced converting the other macros >> to >> such function might be worthwhile. >> >> Thanks, >> Richard. > > I think we need to find a solution for this situation. The suggestions I > see in this thread are nothing but workarounds to cope with current > debugging limitations. We have learned to live with them and hack around > them, but debugging GCC is already a daunting task, and I think we can > improve it. > > Richard, yes, stepping into one-liner inline functions can be aggravating > (particularly with some of the 2-3 embedded function calls we have in some > places). However, the ability to easily inspect state of data structures > using the standard accessors is fundamental. Particularly, for developers > that may not have the extensive knowledge of internals that you do. It is much more important to optimize my debugging time as experienced developer resources are more scarce than some random unexperienced guy that happens to dig into GCC. ;) or not really ;). > Jakub, is size of a debuggable cc1/cc1plus really all that important? Yes, > more debug info makes for bigger binaries. But when debugging, that hardly > matters. Unless we were talking about a multi-Gb binary, which we aren't. > > I would like to solve this problem for all inline functions that we may not > care to step into. There is a whole bunch of one-liners that generally > annoy me: tree_operand_length, gimple_op, ... in fact, all the gimple > accessors, etc. > > How about one of these two ideas? > > 1- Add -g to the list of supported settings in #pragma GCC optimize (or > create a new #pragma GCC no_debug). This way, we can brace all these one > liners with: > > #pragma push_options > #pragma GCC optimize ("-g0") > [ ... inline functions ... ] > #pragma pop_options > > > 2- Similar to #1, but with __attribute__ in each function declaration. I > think I prefer #1 since it's simpler for the user to specify. > > > This would also let us generate smaller debug binaries, since the bracketed > functions would not get any debug info at all. > > Any reason why that scheme couldn't work? It works well for separate TUs. If you crash inside those -g0 marked functions, what happens? Why not use the artificial attribute on them instead? At least what is documented is exactly what we want (well, at least it seems to me). Thus, produce a patch that improves tree_operand_length with either approach so we can test it. Richard. > > Thanks. Diego. >
[PATCH] Fix PR38885
I'm testing a pair of patches to fix PR38885 (for constants) and PR38884 (for non-constants) stores to complex/vector memory and CSE of component accesses from SCCVN. This is the piece that handles stores from constants and partial reads of it. We can conveniently re-use fold-const native encode/interpret code for this. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2011-10-05 Richard Guenther PR tree-optimization/38885 * tree-ssa-sccvn.c (vn_reference_lookup_3): Handle partial reads from constants. * gcc.dg/tree-ssa/ssa-fre-33.c: New testcase. Index: gcc/tree-ssa-sccvn.c === *** gcc/tree-ssa-sccvn.c(revision 179549) --- gcc/tree-ssa-sccvn.c(working copy) *** vn_reference_lookup_3 (ao_ref *ref, tree *** 1442,1448 } } ! /* 3) For aggregate copies translate the reference through them if the copy kills ref. */ else if (vn_walk_kind == VN_WALKREWRITE && gimple_assign_single_p (def_stmt) --- 1442,1495 } } ! /* 3) Assignment from a constant. We can use folds native encode/interpret ! routines to extract the assigned bits. */ ! else if (CHAR_BIT == 8 && BITS_PER_UNIT == 8 ! && ref->size == maxsize ! && maxsize % BITS_PER_UNIT == 0 ! && offset % BITS_PER_UNIT == 0 ! && is_gimple_reg_type (vr->type) ! && gimple_assign_single_p (def_stmt) ! && is_gimple_min_invariant (gimple_assign_rhs1 (def_stmt))) ! { ! tree base2; ! HOST_WIDE_INT offset2, size2, maxsize2; ! base2 = get_ref_base_and_extent (gimple_assign_lhs (def_stmt), ! &offset2, &size2, &maxsize2); ! if (maxsize2 != -1 ! && maxsize2 == size2 ! && size2 % BITS_PER_UNIT == 0 ! && offset2 % BITS_PER_UNIT == 0 ! && operand_equal_p (base, base2, 0) ! && offset2 <= offset ! && offset2 + size2 >= offset + maxsize) ! { ! /* We support up to 512-bit values (for V8DFmode). */ ! unsigned char buffer[64]; ! int len; ! ! len = native_encode_expr (gimple_assign_rhs1 (def_stmt), ! buffer, sizeof (buffer)); ! if (len > 0) ! { ! tree val = native_interpret_expr (vr->type, ! buffer ! + ((offset - offset2) ! / BITS_PER_UNIT), ! ref->size / BITS_PER_UNIT); ! if (val) ! { ! unsigned int value_id = get_or_alloc_constant_value_id (val); ! return vn_reference_insert_pieces ! (vuse, vr->set, vr->type, ! VEC_copy (vn_reference_op_s, heap, vr->operands), ! val, value_id); ! } ! } ! } ! } ! ! /* 4) For aggregate copies translate the reference through them if the copy kills ref. */ else if (vn_walk_kind == VN_WALKREWRITE && gimple_assign_single_p (def_stmt) *** vn_reference_lookup_3 (ao_ref *ref, tree *** 1540,1546 return NULL; } ! /* 4) For memcpy copies translate the reference through them if the copy kills ref. */ else if (vn_walk_kind == VN_WALKREWRITE && is_gimple_reg_type (vr->type) --- 1587,1593 return NULL; } ! /* 5) For memcpy copies translate the reference through them if the copy kills ref. */ else if (vn_walk_kind == VN_WALKREWRITE && is_gimple_reg_type (vr->type) Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-33.c === *** gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-33.c (revision 0) --- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-33.c (revision 0) *** *** 0 --- 1,21 + /* { dg-do compile } */ + /* { dg-options "-O -fdump-tree-fre1-details" } */ + + #define vector __attribute__((vector_size(16) )) + + struct { + float i; + vector float global_res; + } s; + float x; + int main(int argc) + { + vector float res = (vector float){0.0f,0.0f,0.0f,1.0f}; + res += (vector float){1.0f,2.0f,3.0f,4.0f}; + s.global_res = res; + x = *((float*)&s.global_res + 1); + return 0; + } + + /* { dg-final { scan-tree-dump "Replaced BIT_FIELD_REF.*with 2" "fre1" } } */ + /* { dg-final { cleanup-tree-dump "fre1" } } */
Re: Modify gcc for use with gdb (issue5132047)
On Wed, Oct 5, 2011 at 09:45, Richard Guenther wrote: > On Wed, Oct 5, 2011 at 3:18 PM, Diego Novillo wrote: >> On 11-09-27 03:37 , Richard Guenther wrote: >>> >>> On Tue, Sep 27, 2011 at 9:35 AM, Richard Guenther >>> wrote: On Tue, Sep 27, 2011 at 9:14 AM, Jakub Jelinek wrote: > > On Mon, Sep 26, 2011 at 03:05:00PM -0700, Lawrence Crowl wrote: >> >> There a non-transparent change in behavior that may affect some users. >> The inline functions will introduce additional lines in a sequence of >> gdb 'step' commands. Use 'next' instead. > > That is IMHO a serious obstackle. If anything, the inlines should > use always_inline, artificial attributes, but don't know if GDB treats > them > already specially and doesn't step into them with step. > Also, I'm afraid it will significantly grow cc1plus/cc1 debug info. > > The .gdbinit macro redefinition Paolo posted sounds to be a better > approach. Yeah, I already see me typing sfinish gazillion of times when trying to step into a function call that produces a function argument ... there is already the very very very annoying tree_code_length function that you get for each TREE_OPERAND (...) macro on function arguments. I'd be very opposed to any change that makes this situation worse. >>> >>> tree_operand_length actually. Please produce a patch that makes this >>> function >>> transparent to gdb, then I might be convinced converting the other macros >>> to >>> such function might be worthwhile. >>> >>> Thanks, >>> Richard. >> >> I think we need to find a solution for this situation. The suggestions I >> see in this thread are nothing but workarounds to cope with current >> debugging limitations. We have learned to live with them and hack around >> them, but debugging GCC is already a daunting task, and I think we can >> improve it. >> >> Richard, yes, stepping into one-liner inline functions can be aggravating >> (particularly with some of the 2-3 embedded function calls we have in some >> places). However, the ability to easily inspect state of data structures >> using the standard accessors is fundamental. Particularly, for developers >> that may not have the extensive knowledge of internals that you do. > > It is much more important to optimize my debugging time as experienced > developer resources are more scarce than some random unexperienced > guy that happens to dig into GCC. > > ;) > > or not really ;). You are being facetious, I hope. Part of the reason that experienced developers are scarce is precisely because dealing with GCC's code base is so daunting. We should be trying to attract those random inexperienced developers, not scare them away. The experienced developers will retire, eventually. Who is going to replace them? >> Jakub, is size of a debuggable cc1/cc1plus really all that important? Yes, >> more debug info makes for bigger binaries. But when debugging, that hardly >> matters. Unless we were talking about a multi-Gb binary, which we aren't. >> >> I would like to solve this problem for all inline functions that we may not >> care to step into. There is a whole bunch of one-liners that generally >> annoy me: tree_operand_length, gimple_op, ... in fact, all the gimple >> accessors, etc. >> >> How about one of these two ideas? >> >> 1- Add -g to the list of supported settings in #pragma GCC optimize (or >> create a new #pragma GCC no_debug). This way, we can brace all these one >> liners with: >> >> #pragma push_options >> #pragma GCC optimize ("-g0") >> [ ... inline functions ... ] >> #pragma pop_options >> >> >> 2- Similar to #1, but with __attribute__ in each function declaration. I >> think I prefer #1 since it's simpler for the user to specify. >> >> >> This would also let us generate smaller debug binaries, since the bracketed >> functions would not get any debug info at all. >> >> Any reason why that scheme couldn't work? It works well for separate TUs. > > If you crash inside those -g0 marked functions, what happens? You don't see the body of the function, but you can go up into the exact call site. > Why > not use the artificial attribute on them instead? At least what is documented > is exactly what we want (well, at least it seems to me). Yes, I forgot to mention in my reply. I tried it, but you still step into the functions. If this is a bug with the attribute, then it could be fixed. > Thus, produce a patch that improves tree_operand_length with either > approach so we can test it. I have a slight preference for making these functions totally opaque to the debugger. But may be there is a setting we can use that will not affect step and still give us some debug info in the presence of crashes. Tom, Cary, Ian, any suggestions? We are trying to figure out a compromise for tiny inline functions that are generally a nuisance when debugging. The scenario is a call like this: big_function_fo
Re: Modify gcc for use with gdb (issue5132047)
On Wed, Oct 05, 2011 at 10:10:44AM -0400, Diego Novillo wrote: > > Why > > not use the artificial attribute on them instead? At least what is > > documented > > is exactly what we want (well, at least it seems to me). > > Yes, I forgot to mention in my reply. I tried it, but you still step > into the functions. If this is a bug with the attribute, then it > could be fixed. If it doesn't work, then something should be done about it in gdb. Hiding the inlines altogether from the debugger is undesirable, it is better if the debugger has choice what to do, still the default should be that it ignores the artificial inlines in the backtraces as well as when stepping through, e.g. glibc headers use artificial attribute heavily for inlines which are of no interest to users to step through too. Jakub
Re: Modify gcc for use with gdb (issue5132047)
Diego Novillo writes: >> It is much more important to optimize my debugging time as experienced >> developer resources are more scarce than some random unexperienced >> guy that happens to dig into GCC. >> >> ;) >> >> or not really ;). > > You are being facetious, I hope. Part of the reason that experienced > developers are scarce is precisely because dealing with GCC's code > base is so daunting. We should be trying to attract those random > inexperienced developers, not scare them away. > > The experienced developers will retire, eventually. Who is going to > replace them? Yes. We experienced gcc developers can adapt to the tools, and we can modify the tools to work better. We should not hold up code improvements because of tool deficiencies. We should fix both problems, but we don't have to fix them in sequence. > Tom, Cary, Ian, any suggestions? We are trying to figure out a > compromise for tiny inline functions that are generally a nuisance > when debugging. The scenario is a call like this: big_function_foo > (inlined_f (x), inlined_g (y)); > We want to use 's' to step inside the call to big_function_foo(), but > we don't want to step into either inlined_f() or inlined_g(). This is a general problem when debugging C++ programs, so any solution to this gcc-specific issue will be of general use. However, I don't know of a way to make it work today without changing gdb. There is a lot I don't know about gdb, so it is possible that there is some approach that will work. Basically, gdb's relatively new support for debugging inline functions is sometimes nice, but sometimes it just gets in the way. I guess the most general solution is a way to mark the inline function in the source as uninteresting. I don't really understand the doc for the "artificial" function attribute, but it looks like it was intended to serve this purpose. So it seems to me that there is a bug in gdb: "step" should not step into a function with the "artificial" attribute. I agree that it does not currently work that way. Looking at the gdb sources, it seems to me that it currently ignores DW_AT_artificial on an ordinary function. Ian
Re: Modify gcc for use with gdb (issue5132047)
On Wed, Oct 5, 2011 at 4:10 PM, Diego Novillo wrote: > On Wed, Oct 5, 2011 at 09:45, Richard Guenther > wrote: >> On Wed, Oct 5, 2011 at 3:18 PM, Diego Novillo wrote: >>> On 11-09-27 03:37 , Richard Guenther wrote: On Tue, Sep 27, 2011 at 9:35 AM, Richard Guenther wrote: > > On Tue, Sep 27, 2011 at 9:14 AM, Jakub Jelinek wrote: >> >> On Mon, Sep 26, 2011 at 03:05:00PM -0700, Lawrence Crowl wrote: >>> >>> There a non-transparent change in behavior that may affect some users. >>> The inline functions will introduce additional lines in a sequence of >>> gdb 'step' commands. Use 'next' instead. >> >> That is IMHO a serious obstackle. If anything, the inlines should >> use always_inline, artificial attributes, but don't know if GDB treats >> them >> already specially and doesn't step into them with step. >> Also, I'm afraid it will significantly grow cc1plus/cc1 debug info. >> >> The .gdbinit macro redefinition Paolo posted sounds to be a better >> approach. > > Yeah, I already see me typing sfinish gazillion of times > when > trying to step into a function call that produces a function argument > ... there is > already the very very very annoying tree_code_length function that you > get for > each TREE_OPERAND (...) macro on function arguments. I'd be very opposed > to any change that makes this situation worse. tree_operand_length actually. Please produce a patch that makes this function transparent to gdb, then I might be convinced converting the other macros to such function might be worthwhile. Thanks, Richard. >>> >>> I think we need to find a solution for this situation. The suggestions I >>> see in this thread are nothing but workarounds to cope with current >>> debugging limitations. We have learned to live with them and hack around >>> them, but debugging GCC is already a daunting task, and I think we can >>> improve it. >>> >>> Richard, yes, stepping into one-liner inline functions can be aggravating >>> (particularly with some of the 2-3 embedded function calls we have in some >>> places). However, the ability to easily inspect state of data structures >>> using the standard accessors is fundamental. Particularly, for developers >>> that may not have the extensive knowledge of internals that you do. >> >> It is much more important to optimize my debugging time as experienced >> developer resources are more scarce than some random unexperienced >> guy that happens to dig into GCC. >> >> ;) >> >> or not really ;). > > You are being facetious, I hope. Part of the reason that experienced > developers are scarce is precisely because dealing with GCC's code > base is so daunting. We should be trying to attract those random > inexperienced developers, not scare them away. > > The experienced developers will retire, eventually. Who is going to > replace them? > > >>> Jakub, is size of a debuggable cc1/cc1plus really all that important? Yes, >>> more debug info makes for bigger binaries. But when debugging, that hardly >>> matters. Unless we were talking about a multi-Gb binary, which we aren't. >>> >>> I would like to solve this problem for all inline functions that we may not >>> care to step into. There is a whole bunch of one-liners that generally >>> annoy me: tree_operand_length, gimple_op, ... in fact, all the gimple >>> accessors, etc. >>> >>> How about one of these two ideas? >>> >>> 1- Add -g to the list of supported settings in #pragma GCC optimize (or >>> create a new #pragma GCC no_debug). This way, we can brace all these one >>> liners with: >>> >>> #pragma push_options >>> #pragma GCC optimize ("-g0") >>> [ ... inline functions ... ] >>> #pragma pop_options >>> >>> >>> 2- Similar to #1, but with __attribute__ in each function declaration. I >>> think I prefer #1 since it's simpler for the user to specify. >>> >>> >>> This would also let us generate smaller debug binaries, since the bracketed >>> functions would not get any debug info at all. >>> >>> Any reason why that scheme couldn't work? It works well for separate TUs. >> >> If you crash inside those -g0 marked functions, what happens? > > You don't see the body of the function, but you can go up into the > exact call site. > >> Why >> not use the artificial attribute on them instead? At least what is >> documented >> is exactly what we want (well, at least it seems to me). > > Yes, I forgot to mention in my reply. I tried it, but you still step > into the functions. If this is a bug with the attribute, then it > could be fixed. Did you also mark the function with always_inline? That's a requirement as artificial only works for inlined function bodies. >> Thus, produce a patch that improves tree_operand_length with either >> approach so we can test it. > > I have a slight preference for making these functions totally opaque > to the debugger. But may
Re: [PATCH] Add support for more sparc VIS 3.0 instructions.
On Tue, 4 Oct 2011, David Miller wrote: There are only a few VIS 3.0 instructions left after this, Hello, I am happy to see all this work, but I was wondering: are these instructions documented somewhere? It makes sense for gcc to only provide a list in extend.texi, but I couldn't find the VIS 3.0 documentation on the Oracle website, which makes it rather hard to use those builtins. +int64_t __builtin_vis_umulxhi (int64_t, int64_t); 'u' looks like "unsigned", I guess that doesn't matter for a builtin? -- Marc Glisse
Commit: RX: Add PID support
Hi Guys, I am applying the attached patch to add support for position independent data to the RX backend. When this mode is enabled constant data is referenced via an offset from a base address held in a fixed register. This allows the position of this data to be determined at run-time, rather than link-time, and without the overhead of storing relocations in the executable image. The code was written by DJ Delorie for Renesas and it is now being contributed back to the FSF. Cheers Nick gcc/ChangeLog 2011-10-05 DJ Delorie Nick Clifton * config/rx/rx.opt (mpid): Define. * config/rx/t-rx (MULTILIB_OPTIONS): Add -mpid (MULTILIB_DIRNAMES): Add pid. * config/rx/rx.c (rx_gp_base_regnum_val, rx_pid_base_regnum_val) (rx_num_interrupt_regs): New variable. (rx_gp_base_regnum): New function. Returns the number of the small data area register. (rx_pid_base_regnum): New function. Returns the number of the pid base register. (rx_decl_for_addr): New function. Returns the symbolic part of a MEM. (rx_pid_data_operand): New function. Returns whether an object is in the position independent data area. (rx_legitimize_address): New function. Puts undecided PID objects in the PID data area. (rx_is_legitimate_address): Add support for PID operands. (rx_print_operand_address): Likewise. (rx_print_operand): Likewise. (rx_maybe_pidify_operand): New function. Determine if an operand is suitable for PID addressing. (rx_gen_move_template): Add PID support. (rx_conditional_register_usage): Likewise. (rx_option_override): Initialise rx_num_interrupt_regs. (rx_is_legitimate_constant): Add support for PID constants. (TARGET_LEGITIMIZE_ADDRESS): Define. * config/rx/constraints.md (Rpid): Define. (Rpda): Define. * config/rx/rx.md (UNSPEC_PID_ADDR): Define. (tablejump): Add PID support. (mov<>): Likewise. (mov<>_internal): Likewise. (addsi3): Convert to an expander. Add PID support. (pid_addr): New pattern. * config/rx/rx.h (CPP_SPEC): Define. (ASM_SPEC): Pass -mpid and -mint-register on to assembler. (CASE_VECTOR_PC_RELATIVE): Define. (JUMP_TABLES_IN_TEXT_SECTION): Enable for PID mode. * config/rx/rx-protos.h (rx_maybe_pidify_operand): Prototype. * doc/invoke.texi (RX Options): Document -mpid command line option. rx.gcc.pid.patch.bz2 Description: BZip2 compressed data
Re: Initial shrink-wrapping patch
On Wed, Oct 5, 2011 at 12:29 AM, Richard Henderson wrote: > On 10/04/2011 03:10 PM, Bernd Schmidt wrote: >> * doc/invoke.texi (-fshrink-wrap): Document. >> * opts.c (default_options_table): Add it. >> * common.opt (fshrink-wrap): Add. >> * function.c (emit_return_into_block): Remove useless declaration. >> (record_hard_reg_uses_1, record_hard_reg_uses, frame_required_for_rtx, >> requires_stack_frame_p, gen_return_pattern): New static functions. >> (emit_return_into_block): New arg simple_p. All callers changed. >> Use gen_return_pattern. >> (thread_prologue_and_epilogue_insns): Implement shrink-wrapping. >> * config/i386/i386.md (return): Expand into a simple_return. >> (simple_return): New expander): >> (simple_return_internal, simple_return_internal_long, >> simple_return_pop_internal_long, simple_return_indirect_internal): >> Renamed from return_internal, return_internal_long, >> return_pop_internal_long and return_indirect_internal; changed to use >> simple_return. >> * config/i386/i386.c (ix86_expand_epilogue): Adjust to expand >> simple returns. >> (ix86_pad_returns): Likewise. >> * function.h (struct rtl_data): Add member shrink_wrapped. >> * cfgcleanup.c (outgoing_edges_match): If shrink-wrapped, edges that >> are not jumps or sibcalls can't be compared. >> >> * gcc.target/i386/sw-1.c: New test. > > Ok. > > As a followup, I think this option needs to be disabled for profiling > and profile_after_prologue. Should be a mere matter of frobbing the > options at startup. This breaks bootstrap on x86_64-linux. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50621 Richard. > > r~ >
Re: Fix pr50607 bconstp-3.c failure
On Wed, 5 Oct 2011, Artem Shinkarov wrote: > On Tue, Oct 4, 2011 at 11:51 PM, Joseph S. Myers > wrote: > > On Tue, 4 Oct 2011, Artem Shinkarov wrote: > > > >> Hi > >> > >> Here is the patch tho fix bconstp-3.c failure in the bug 50607. The > >> failure was cause because the new parser routine did not consider > >> original_tree_code of the expression. > >> > >> The patch is bootstrapped on x86-unknown-linux-gnu and is being tested. > > > > Please repost the patch for review without the unrelated whitespace > > changes. > > > > -- > > Joseph S. Myers > > jos...@codesourcery.com > > > > Sure. The new version is in the attachment. This version is OK, provided that it has passed bootstrap with no regressions and that it fixes the failures in question. -- Joseph S. Myers jos...@codesourcery.com
Re: Modify gcc for use with gdb (issue5132047)
> "Diego" == Diego Novillo writes: Diego> Tom, Cary, Ian, any suggestions? We are trying to figure out a Diego> compromise for tiny inline functions that are generally a nuisance Diego> when debugging. The scenario is a call like this: big_function_foo Diego> (inlined_f (x), inlined_g (y)); Diego> We want to use 's' to step inside the call to big_function_foo(), but Diego> we don't want to step into either inlined_f() or inlined_g(). FWIW, I wrote the "macro define" stuff that Paolo posted back when I was actively hacking on gcc. I consider it to be mildly superior to the inline approach, because it circumvents the runtime type checking -- this is a plus because it means that if I type the wrong thing to gdb it doesn't cause cc1 to abort. At least, this is a plus for me, since I make mistakes like that with reasonable frequency. Diego> I proposed extending #pragma GCC options to bracket these functions Diego> with -g0. This would help reduce the impact of debug info size. I think this is fixing the wrong component: it means making a one-size-fits-all decision in the gcc build, instead of just making the debugger be more flexible. If you want to pursue the inline function approach, I suggest resurrecting this gdb patch: http://sourceware.org/bugzilla/show_bug.cgi?id=8287 http://sourceware.org/ml/gdb-patches/2010-06/msg00417.html Then you could add the appropriate blacklisting commands to gcc's .gdbinit by default. Tom
Re: Modify gcc for use with gdb (issue5132047)
On Wed, Oct 5, 2011 at 10:51, Richard Guenther wrote: > Did you also mark the function with always_inline? That's a requirement > as artificial only works for inlined function bodies. Yeah. It doesn't quite work as I expect it to. It steps into the function at odd places. Diego.
Re: Modify gcc for use with gdb (issue5132047)
On 10/05/2011 05:27 PM, Tom Tromey wrote: FWIW, I wrote the "macro define" stuff that Paolo posted back when I was actively hacking on gcc. Yes, thanks Tom! Actually, I suspected that, but couldn't remember where I actually got it from, maybe you posted it on a public discussion thread somewhere... Paolo.
Re: Modify gcc for use with gdb (issue5132047)
On Wed, Oct 5, 2011 at 11:27, Tom Tromey wrote: > Diego> I proposed extending #pragma GCC options to bracket these functions > Diego> with -g0. This would help reduce the impact of debug info size. > > I think this is fixing the wrong component: it means making a > one-size-fits-all decision in the gcc build, instead of just making the > debugger be more flexible. That's true. > If you want to pursue the inline function approach, I suggest > resurrecting this gdb patch: > > http://sourceware.org/bugzilla/show_bug.cgi?id=8287 > http://sourceware.org/ml/gdb-patches/2010-06/msg00417.html > > Then you could add the appropriate blacklisting commands to gcc's > .gdbinit by default. I think this could work. I'm not sure I like the idea of having to specify all these blacklist commands, but I appreciate how it can make debugging more flexible. Is this patch stalled? the last I see is Justin's reply to review feedback, but no indication of whether it will be accepted. Richi, Jakub, Lawrence, would you be OK with this approach? IIUC, this means we'd have to add a bunch of blacklist commands to gcc/gdbinit.in. This does ties us to gdb, but I don't think that's a problem. How does this work with C++? Do the functions need to be specified with their mangled names? Diego.
RFC: ARM: Add comments to emitted .eabi_attribute directives
Hi Richard, Hi Paul, Hi Ramana, Well my idea to have a header file containing all of the possible binary file attributes does not seem to have taken off, so here is a much simpler patch for the GCC ARM backend. It just adds comments to the .eabi_directives (when -dA or -fverbose-asm are in effect) emitted by GCC. It does not have any dependencies upon header files, nor does it attempt to make use of the symbolic names in the actual .eabi_directives. Any objections to this version of the patch ? Cheers Nick gcc/ChangeLog 2011-10-05 Nick Clifton * config/arm/arm.c (EMIT_EABI_ATTRIBUTE): New macro. Used to emit a .eabi_attribute assembler directive, possibly with a comment attached. (asm_file_start): Use the new macro. Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 179554) +++ gcc/config/arm/arm.c(working copy) @@ -22243,6 +22243,21 @@ asm_fprintf (stream, "%U%s", name); } +/* This macro is used to emit an EABI tag and its associated value. + We emit the numerical value of the tag in case the assembler does not + support textual tags. (Eg gas prior to 2.20). If requested we include + the tag name in a comment so that anyone reading the assembler output + will know which tag is being set. */ +#define EMIT_EABI_ATTRIBUTE(NAME,NUM,VAL) \ + do \ +{ \ + asm_fprintf (asm_out_file, "\t.eabi_attribute %d, %d", NUM, VAL); \ + if (flag_verbose_asm || flag_debug_asm) \ + asm_fprintf (asm_out_file, "\t%s " #NAME, ASM_COMMENT_START); \ + asm_fprintf (asm_out_file, "\n"); \ +} \ + while (0) + static void arm_file_start (void) { @@ -22274,9 +22289,9 @@ if (arm_fpu_desc->model == ARM_FP_MODEL_VFP) { if (TARGET_HARD_FLOAT) - asm_fprintf (asm_out_file, "\t.eabi_attribute 27, 3\n"); + EMIT_EABI_ATTRIBUTE (Tag_ABI_HardFP_use, 27, 3); if (TARGET_HARD_FLOAT_ABI) - asm_fprintf (asm_out_file, "\t.eabi_attribute 28, 1\n"); + EMIT_EABI_ATTRIBUTE (Tag_ABI_VFP_args, 28, 1); } } asm_fprintf (asm_out_file, "\t.fpu %s\n", fpu_name); @@ -22285,31 +22300,24 @@ are used. However we don't have any easy way of figuring this out. Conservatively record the setting that would have been used. */ - /* Tag_ABI_FP_rounding. */ if (flag_rounding_math) - asm_fprintf (asm_out_file, "\t.eabi_attribute 19, 1\n"); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_rounding, 19, 1); + if (!flag_unsafe_math_optimizations) { - /* Tag_ABI_FP_denomal. */ - asm_fprintf (asm_out_file, "\t.eabi_attribute 20, 1\n"); - /* Tag_ABI_FP_exceptions. */ - asm_fprintf (asm_out_file, "\t.eabi_attribute 21, 1\n"); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_denormal, 20, 1); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_exceptions, 21, 1); } - /* Tag_ABI_FP_user_exceptions. */ if (flag_signaling_nans) - asm_fprintf (asm_out_file, "\t.eabi_attribute 22, 1\n"); - /* Tag_ABI_FP_number_model. */ - asm_fprintf (asm_out_file, "\t.eabi_attribute 23, %d\n", - flag_finite_math_only ? 1 : 3); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_user_exceptions, 22, 1); - /* Tag_ABI_align8_needed. */ - asm_fprintf (asm_out_file, "\t.eabi_attribute 24, 1\n"); - /* Tag_ABI_align8_preserved. */ - asm_fprintf (asm_out_file, "\t.eabi_attribute 25, 1\n"); - /* Tag_ABI_enum_size. */ - asm_fprintf (asm_out_file, "\t.eabi_attribute 26, %d\n", - flag_short_enums ? 1 : 2); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_number_model, 23, + flag_finite_math_only ? 1 : 3); + EMIT_EABI_ATTRIBUTE (Tag_ABI_align8_needed, 24, 1); + EMIT_EABI_ATTRIBUTE (Tag_ABI_align8_preserved, 25, 1); + EMIT_EABI_ATTRIBUTE (Tag_ABI_enum_size, 26, flag_short_enums ? 1 : 2); + /* Tag_ABI_optimization_goals. */ if (optimize_size) val = 4; @@ -22319,16 +22327,12 @@ val = 1; else val = 6; - asm_fprintf (asm_out_file, "\t.eabi_attribute 30, %d\n", val); + EMIT_EABI_ATTRIBUTE (Tag_ABI_optimization_goals, 30, val); - /* Tag_CPU_unaligned_access. */ - asm_fprintf (asm_out_file, "\t.eabi_attribute 34, %d\n", - unaligned_access); + EMIT_EABI_ATTRIBUTE (Tag_CPU_unaligned_access, 34, unaligned_access); - /* Tag_ABI_FP_16bit_format. */ if (arm_fp16_format) - asm_fprintf (asm_out_file, "\t.eabi_att
Re: Modify gcc for use with gdb (issue5132047)
On Wed, Oct 05, 2011 at 11:42:51AM -0400, Diego Novillo wrote: > Richi, Jakub, Lawrence, would you be OK with this approach? IIUC, > this means we'd have to add a bunch of blacklist commands to > gcc/gdbinit.in. I don't mind if it goes into gdb, but IMHO the blacklisting should definitely default to blacklisting DW_AT_artificial inline functions (and allowing to unblacklist them), because the artificial attribute has been designed for that purpose already more than 4 years ago and many headers use it. Jakub
Re: Initial shrink-wrapping patch
On 10/05/11 17:13, Richard Guenther wrote: > On Wed, Oct 5, 2011 at 12:29 AM, Richard Henderson wrote: >> On 10/04/2011 03:10 PM, Bernd Schmidt wrote: >>> * doc/invoke.texi (-fshrink-wrap): Document. >>> * opts.c (default_options_table): Add it. >>> * common.opt (fshrink-wrap): Add. >>> * function.c (emit_return_into_block): Remove useless declaration. >>> (record_hard_reg_uses_1, record_hard_reg_uses, frame_required_for_rtx, >>> requires_stack_frame_p, gen_return_pattern): New static functions. >>> (emit_return_into_block): New arg simple_p. All callers changed. >>> Use gen_return_pattern. >>> (thread_prologue_and_epilogue_insns): Implement shrink-wrapping. >>> * config/i386/i386.md (return): Expand into a simple_return. >>> (simple_return): New expander): >>> (simple_return_internal, simple_return_internal_long, >>> simple_return_pop_internal_long, simple_return_indirect_internal): >>> Renamed from return_internal, return_internal_long, >>> return_pop_internal_long and return_indirect_internal; changed to use >>> simple_return. >>> * config/i386/i386.c (ix86_expand_epilogue): Adjust to expand >>> simple returns. >>> (ix86_pad_returns): Likewise. >>> * function.h (struct rtl_data): Add member shrink_wrapped. >>> * cfgcleanup.c (outgoing_edges_match): If shrink-wrapped, edges that >>> are not jumps or sibcalls can't be compared. >>> >>> * gcc.target/i386/sw-1.c: New test. >> >> Ok. >> >> As a followup, I think this option needs to be disabled for profiling >> and profile_after_prologue. Should be a mere matter of frobbing the >> options at startup. > > This breaks bootstrap on x86_64-linux. > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50621 Caused by the x86_64 expand_epilogue not generating REG_CFA_RESTORE notes, and in another case by queuing but not emitting them. Bootstrapping the following now. Ok? (Alternatively, could keep the redzone logic, but make it depend on !flag_shrink_wrap). Bernd * config/i386/i386.c (ix86_add_cfa_restore_note): Lose CFA_OFFSET argument. All callers changed. Always emit a note. (ix86_expand_epilogue): Ensure queued cfa_adjust notes are attached to an insn. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 179553) +++ gcc/config/i386/i386.c (working copy) @@ -9126,17 +9126,11 @@ ix86_emit_save_sse_regs_using_mov (HOST_ static GTY(()) rtx queued_cfa_restores; /* Add a REG_CFA_RESTORE REG note to INSN or queue them until next stack - manipulation insn. The value is on the stack at CFA - CFA_OFFSET. - Don't add the note if the previously saved value will be left untouched - within stack red-zone till return, as unwinders can find the same value - in the register and on the stack. */ + manipulation insn. */ static void -ix86_add_cfa_restore_note (rtx insn, rtx reg, HOST_WIDE_INT cfa_offset) +ix86_add_cfa_restore_note (rtx insn, rtx reg) { - if (cfa_offset <= cfun->machine->fs.red_zone_offset) -return; - if (insn) { add_reg_note (insn, REG_CFA_RESTORE, reg); @@ -10298,7 +10292,7 @@ ix86_emit_restore_reg_using_pop (rtx reg struct machine_function *m = cfun->machine; rtx insn = emit_insn (gen_pop (reg)); - ix86_add_cfa_restore_note (insn, reg, m->fs.sp_offset); + ix86_add_cfa_restore_note (insn, reg); m->fs.sp_offset -= UNITS_PER_WORD; if (m->fs.cfa_reg == crtl->drap_reg @@ -10383,8 +10377,7 @@ ix86_emit_leave (void) add_reg_note (insn, REG_CFA_DEF_CFA, plus_constant (stack_pointer_rtx, m->fs.sp_offset)); RTX_FRAME_RELATED_P (insn) = 1; - ix86_add_cfa_restore_note (insn, hard_frame_pointer_rtx, -m->fs.fp_offset); + ix86_add_cfa_restore_note (insn, hard_frame_pointer_rtx); } } @@ -10421,7 +10414,7 @@ ix86_emit_restore_regs_using_mov (HOST_W m->fs.drap_valid = true; } else - ix86_add_cfa_restore_note (NULL_RTX, reg, cfa_offset); + ix86_add_cfa_restore_note (NULL_RTX, reg); cfa_offset -= UNITS_PER_WORD; } @@ -10446,7 +10439,7 @@ ix86_emit_restore_sse_regs_using_mov (HO set_mem_align (mem, 128); emit_move_insn (reg, mem); - ix86_add_cfa_restore_note (NULL_RTX, reg, cfa_offset); + ix86_add_cfa_restore_note (NULL_RTX, reg); cfa_offset -= 16; } @@ -10738,6 +10731,8 @@ ix86_expand_epilogue (int style) GEN_INT (m->fs.sp_offset - UNITS_PER_WORD), style, true); } + else +ix86_add_queued_cfa_restore_notes (get_last_insn ()); /* Sibcall epilogues don't want a return instruction. */ if (style == 0)
Ping! Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL
Ping! http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01854.html On Wed, 2011-09-28 at 17:15 +0100, Sameera Deshpande wrote: > Hi! > > This patch generates Thumb2 epilogues in RTL form. > > The work involves defining new functions, predicates and patterns along with > few changes in existing code: > * The load_multiple_operation predicate was found to be too restrictive for > integer loads as it required consecutive destination regs, so this > restriction was lifted. > * Variations of load_multiple_operation were required to handle cases >- where SP must be the base register >- where FP values were being loaded (which do require consecutive > destination registers) >- where PC can be in register-list (which requires return pattern along > with register loads). > Hence, the common code was factored out into a new function in arm.c and > parameterised to show >- whether consecutive destination regs are needed >- the data type being loaded >- whether the base register has to be SP >- whether PC is in register-list > > The patch is tested with arm-eabi with no regressions. > > ChangeLog: > > 2011-09-28 Ian Bolton > Sameera Deshpande > >* config/arm/arm-protos.h (load_multiple_operation_p): New > declaration. > (thumb2_expand_epilogue): Likewise. > (thumb2_output_return): Likewise > (thumb2_expand_return): Likewise. > (thumb_unexpanded_epilogue): Rename to... > (thumb1_unexpanded_epilogue): ...this >* config/arm/arm.c (load_multiple_operation_p): New function. > (thumb2_emit_multi_reg_pop): Likewise. > (thumb2_emit_vfp_multi_reg_pop): Likewise. > (thumb2_expand_return): Likewise. > (thumb2_expand_epilogue): Likewise. > (thumb2_output_return): Likewise > (thumb_unexpanded_epilogue): Rename to... > ( thumb1_unexpanded_epilogue): ...this >* config/arm/arm.md (pop_multiple_with_stack_update): New pattern. > (pop_multiple_with_stack_update_and_return): Likewise. > (thumb2_ldr_with_return): Likewise. > (floating_point_pop_multiple_with_stack_update): Likewise. > (return): Update condition and code for pattern. > (arm_return): Likewise. > (epilogue_insns): Likewise. >* config/arm/predicates.md (load_multiple_operation): Update > predicate. > (load_multiple_operation_stack_and_return): New predicate. > (load_multiple_operation_stack): Likewise. > (load_multiple_operation_stack_fp): Likewise. >* config/arm/thumb2.md (thumb2_return): Remove. > (thumb2_rtl_epilogue_return): New pattern. > > > - Thanks and regards, > Sameera D. --
Ping! Re: [RFA/ARM][Patch 02/02]: ARM epilogues in RTL
Ping! http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html On Wed, 2011-09-28 at 17:15 +0100, Sameera Deshpande wrote: > Hi! > > This patch generates ARM epilogue in RTL form. > > The work defines new functions and reuses most of the static functions and > patterns defined in the previous patch (Thumb2 epilogues in RTL) with minor > changes to handle mode specific details. > Hence, this patch depends completely on previous patch. > > It is tested with arm-eabi with no regression. > > ChangeLog: > > 2011-09-28 Sameera Deshpande > > >* config/arm/arm-protos.h (arm_expand_epilogue): New declarations. > (arm_expand_return): Likewise. > (thumb2_expand_epilogue): Add new boolean argument. >* config/arm/arm.c (print_multi_reg): Remove. > (vfp_output_fldmd): Likewise. > > (arm_output_epilogue): Likewise. > (output_return_instruction): Update the function. > (thumb2_emit_multi_reg_pop): Rename to... > (arm_emit_multi_reg_pop): ...this > (thumb2_emit_vfp_multi_reg_pop): Rename to... > (arm_emit_vfp_multi_reg_pop): ...this > (arm_emit_vfp_multi_reg_pop): Add new argument base_reg. > (arm_expand_return): New function. > > (arm_expand_epilogue): Likewise. > (thumb2_expand_epilogue): Add new argument is_sibling. >* config/arm/arm.md (pop_multiple_with_stack_update): Update > condition and code for pattern. > (arm_return): Likewise. > (pop_multiple_with_stack_update_and_return): Likewise. > (floating_point_pop_multiple_with_stack_update): Likewise. > (thumb2_ldr_with_return): Rename to... > (ldr_with_return): ...this > (ldr_with_return): Update condition. > (cond_return): Remove. > (cond_return_inverted): Likewise. > (return): Update code. > (epilogue): Likewise. > (sibcall_epilogue): Likewise. > (epilogue_insns): Update condition and code. > > > - Thanks and regards, > Sameera D. --
[PATCH] Fix PR46556 (poor address generation)
This patch addresses the poor code generation in PR46556 for the following code: struct x { int a[16]; int b[16]; int c[16]; }; extern void foo (int, int, int); void f (struct x *p, unsigned int n) { foo (p->a[n], p->c[n], p->b[n]); } Prior to the fix for PR32698, gcc calculated the offset for accessing the array elements as: n*4; 64+n*4; 128+n*4. Following that fix, the offsets are calculated as: n*4; (n+16)*4; (n +32)*4. This led to poor code generation on powerpc64 targets, among others. The poor code generation was observed to not occur in loops, as the IVOPTS code does a good job of lowering these expressions to MEM_REFs. It was previously suggested that perhaps a general pass to lower memory accesses to MEM_REFs in GIMPLE would solve not only this, but other similar problems. I spent some time looking into various approaches to this, and reviewing some previous attempts to do similar things. In the end, I've concluded that this is a bad idea in practice because of the loss of useful aliasing information. In particular, early lowering of component references causes us to lose the ability to disambiguate non-overlapping references in the same structure, and there is no simple way to carry the necessary aliasing information along with the replacement MEM_REFs to avoid this. While some performance gains are available with GIMPLE lowering of memory accesses, there are also offsetting performance losses, and I suspect this would just be a continuous source of bug reports into the future. Therefore the current patch is a much simpler approach to solve the specific problem noted in the PR. There are two pieces to the patch: * The offending addressing pattern is matched in GIMPLE and transformed into a restructured MEM_REF that distributes the multiply, so that (n +32)*4 becomes 4*n+128 as before. This is done during the reassociation pass, for reasons described below. The transformation only occurs in non-loop blocks, since IVOPTS does a good job on such things within loops. * A tweak is added to the RTL forward-propagator to avoid propagating into memory references based on a single base register with no offset, under certain circumstances. This improves sharing of base registers for accesses within the same structure and slightly lowers register pressure. It would be possible to separate these into two patches if that's preferred. I chose to combine them because together they provide the ideal code generation that the new test cases test for. I initially implemented the pattern matcher during expand, but I found that the expanded code for two accesses to the same structure was often not being CSEd well. So I moved it back into the GIMPLE phases prior to DOM to take advantage of its CSE. To avoid an additional complete pass over the IL, I chose to piggyback on the reassociation pass. This transformation is not technically a reassociation, but it is related enough to not be a complete wart. One noob question about this: It would probably be preferable to have this transformation only take place during the second reassociation pass, so the ARRAY_REFs are seen by earlier optimization phases. Is there an easy way to detect that it's the second pass without having to generate a separate pass entry point? One other general question about the pattern-match transformation: Is this an appropriate transformation for all targets, or should it be somehow gated on available addressing modes on the target processor? Bootstrapped and regression tested on powerpc64-linux-gnu. Verified no performance degradations on that target for SPEC CPU2000 and CPU2006. I'm looking for eventual approval for trunk after any comments are resolved. Thanks! Bill 2011-10-05 Bill Schmidt gcc: PR rtl-optimization/46556 * fwprop.c (fwprop_bb_aux_d): New struct. (MEM_PLUS_REGS): New macro. (record_mem_plus_reg): New function. (record_mem_plus_regs): Likewise. (single_def_use_enter_block): Record mem-plus-reg patterns. (build_single_def_use_links): Allocate aux storage. (locally_poor_mem_replacement): New function. (forward_propagate_and_simplify): Call locally_poor_mem_replacement. (fwprop_init): Free storage. * tree.h (copy_ref_info): Expose existing function. * tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token. * tree-ssa-reassoc.c (restructure_base_and_offset): New function. (restructure_mem_ref): Likewise. (reassociate_bb): Look for opportunities to call restructure_mem_ref; clean up immediate use lists. gcc/testsuite: PR rtl-optimization/46556 * gcc.target/powerpc/ppc-pr46556-1.c: New testcase. * gcc.target/powerpc/ppc-pr46556-2.c: Likewise. * gcc.target/powerpc/ppc-pr46556-3.c: Likewise. * gcc.target/powerpc/ppc-pr46556-4.c: Likewise. * gcc.dg/tree-ssa/pr46556-1.c: Likewise.
Re: Modify gcc for use with gdb (issue5132047)
> "Diego" == Diego Novillo writes: Tom> http://sourceware.org/bugzilla/show_bug.cgi?id=8287 Diego> I think this could work. I'm not sure I like the idea of having to Diego> specify all these blacklist commands, but I appreciate how it can make Diego> debugging more flexible. Yeah, that's a drawback, since you have to remember to update .gdbinit. Diego> Is this patch stalled? the last I see is Justin's reply to review Diego> feedback, but no indication of whether it will be accepted. The thread continues in the next month: http://sourceware.org/ml/gdb-patches/2010-07/threads.html#00318 The patch needed some updates. I don't know what happened. Diego> How does this work with C++? Do the functions need to be specified Diego> with their mangled names? I would hope not, but I don't remember any more. Tom
Re: [RFC] Split -mrecip
Hi, On Mon, 19 Sep 2011, Uros Bizjak wrote: > Looking at this topic again, I'd propose that x86 adopts approach from > rs6000. The rs6000 approach is more extensible, and offers the same > flexibility, due to "!". > > So, x86 could have "-mrecip=", with all, default, none, div, > vec-div, divf, vec-divf, rsqrt, etc ... combinations, Like so. Currently regstrapping on x86_64-linux. Okay if that succeeds? Ciao, Michael. * i386/i386.opt (recip_mask, recip_mask_explicit, x_recip_mask_explicit): New variables and cl_target member. (mrecip=): New option. * i386/i386.h (RECIP_MASK_DIV, RECIP_MASK_SQRT, RECIP_MASK_VEC_DIV, RECIP_MASK_VEC_SQRT, RECIP_MASK_ALL): New bitmasks. (TARGET_RECIP_DIV, TARGET_RECIP_SQRT, TARGET_RECIP_VEC_DIV, TARGET_RECIP_VEC_SQRT): New tests. * i386/i386.md (divsf3): Check TARGET_RECIP_DIV. (sqrt2): Check TARGET_RECIP_SQRT. * i386/sse.md (div3): Check TARGET_RECIP_VEC_DIV. (sqrt2): Check TARGET_RECIP_VEC_SQRT. * i386/i386.c (ix86_option_override_internal): Set recip_mask for -mrecip and -mrecip=options. (ix86_function_specific_save): Save recip_mask_explicit. (ix86_function_specific_restore): Restore recip_mask_explicit. * doc/invoke.texi (ix86 Options): Document the new option. Index: config/i386/i386.h === --- config/i386/i386.h (revision 179558) +++ config/i386/i386.h (working copy) @@ -2315,6 +2315,18 @@ extern void debug_dispatch_window (int); ((FLAGS) & (IX86_CALLCVT_CDECL | IX86_CALLCVT_STDCALL \ | IX86_CALLCVT_FASTCALL | IX86_CALLCVT_THISCALL)) +#define RECIP_MASK_DIV 0x01 +#define RECIP_MASK_SQRT0x02 +#define RECIP_MASK_VEC_DIV 0x04 +#define RECIP_MASK_VEC_SQRT0x08 +#define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \ +| RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) + +#define TARGET_RECIP_DIV ((recip_mask & RECIP_MASK_DIV) != 0) +#define TARGET_RECIP_SQRT ((recip_mask & RECIP_MASK_SQRT) != 0) +#define TARGET_RECIP_VEC_DIV ((recip_mask & RECIP_MASK_VEC_DIV) != 0) +#define TARGET_RECIP_VEC_SQRT ((recip_mask & RECIP_MASK_VEC_SQRT) != 0) + /* Local variables: version-control: t Index: config/i386/i386.md === --- config/i386/i386.md (revision 179558) +++ config/i386/i386.md (working copy) @@ -7062,7 +7062,9 @@ (define_expand "divsf3" "(TARGET_80387 && X87_ENABLE_ARITH (SFmode)) || TARGET_SSE_MATH" { - if (TARGET_SSE_MATH && TARGET_RECIP && optimize_insn_for_speed_p () + if (TARGET_SSE_MATH + && TARGET_RECIP_DIV + && optimize_insn_for_speed_p () && flag_finite_math_only && !flag_trapping_math && flag_unsafe_math_optimizations) { @@ -13438,7 +13440,9 @@ (define_expand "sqrt2" || (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)" { if (mode == SFmode - && TARGET_SSE_MATH && TARGET_RECIP && !optimize_function_for_size_p (cfun) + && TARGET_SSE_MATH + && TARGET_RECIP_SQRT + && !optimize_function_for_size_p (cfun) && flag_finite_math_only && !flag_trapping_math && flag_unsafe_math_optimizations) { Index: config/i386/sse.md === --- config/i386/sse.md (revision 179558) +++ config/i386/sse.md (working copy) @@ -779,7 +779,9 @@ (define_expand "div3" { ix86_fixup_binary_operands_no_copy (DIV, mode, operands); - if (TARGET_SSE_MATH && TARGET_RECIP && !optimize_insn_for_size_p () + if (TARGET_SSE_MATH + && TARGET_RECIP_VEC_DIV + && !optimize_insn_for_size_p () && flag_finite_math_only && !flag_trapping_math && flag_unsafe_math_optimizations) { @@ -857,7 +859,9 @@ (define_expand "sqrt2" (sqrt:VF1 (match_operand:VF1 1 "nonimmediate_operand" "")))] "TARGET_SSE" { - if (TARGET_SSE_MATH && TARGET_RECIP && !optimize_insn_for_size_p () + if (TARGET_SSE_MATH + && TARGET_RECIP_VEC_SQRT + && !optimize_insn_for_size_p () && flag_finite_math_only && !flag_trapping_math && flag_unsafe_math_optimizations) { Index: config/i386/i386.opt === --- config/i386/i386.opt(revision 179558) +++ config/i386/i386.opt(working copy) @@ -31,6 +31,15 @@ HOST_WIDE_INT ix86_isa_flags = TARGET_64 Variable HOST_WIDE_INT ix86_isa_flags_explicit +TargetVariable +int recip_mask + +Variable +int recip_mask_explicit + +TargetSave +int x_recip_mask_explicit + ;; Definitions to add to the cl_target_option structure ;; -march= processor TargetSave @@ -373,6 +382,10 @@ mrecip Target Report Mask(RECIP) Save Generate reciprocals instead of divss and sqrtss. +mrecip= +Target Report RejectNegative Joined Var(ix86_recip_name) Save +Control gene
Re: Modify gcc for use with gdb (issue5132047)
> "Jakub" == Jakub Jelinek writes: Jakub> I don't mind if it goes into gdb, but IMHO the blacklisting should Jakub> definitely default to blacklisting DW_AT_artificial inline functions Jakub> (and allowing to unblacklist them), because the artificial attribute Jakub> has been designed for that purpose already more than 4 years ago Jakub> and many headers use it. Could you file a gdb bug report for this? In general I'd appreciate it if folks making debug changes to gcc filed bug reports against gdb. I do follow the gcc list looking for things like this, but I miss things sometimes, and this isn't really the best way. Tom
Re: Initial shrink-wrapping patch
On 10/05/2011 08:59 AM, Bernd Schmidt wrote: > Bootstrapping the following now. Ok? (Alternatively, could keep the > redzone logic, but make it depend on !flag_shrink_wrap). It's a good space-saving optimization, that redzone logic. We should be able to keep it based on crtl->shrink_wrapped; that does appear to get set before we actually emit the epilogue. r~
Re: [PATCH] Fix PR46556 (poor address generation)
On 10/05/2011 06:13 PM, William J. Schmidt wrote: One other general question about the pattern-match transformation: Is this an appropriate transformation for all targets, or should it be somehow gated on available addressing modes on the target processor? Bootstrapped and regression tested on powerpc64-linux-gnu. Verified no performance degradations on that target for SPEC CPU2000 and CPU2006. I'm looking for eventual approval for trunk after any comments are resolved. Thanks! How do the costs look like for the two transforms you mention in the head comment of locally_poor_mem_replacement? Paolo
Re: Fix pr50607 bconstp-3.c failure
On Wed, Oct 5, 2011 at 4:22 PM, Joseph S. Myers wrote: > On Wed, 5 Oct 2011, Artem Shinkarov wrote: > >> On Tue, Oct 4, 2011 at 11:51 PM, Joseph S. Myers >> wrote: >> > On Tue, 4 Oct 2011, Artem Shinkarov wrote: >> > >> >> Hi >> >> >> >> Here is the patch tho fix bconstp-3.c failure in the bug 50607. The >> >> failure was cause because the new parser routine did not consider >> >> original_tree_code of the expression. >> >> >> >> The patch is bootstrapped on x86-unknown-linux-gnu and is being tested. >> > >> > Please repost the patch for review without the unrelated whitespace >> > changes. >> > >> > -- >> > Joseph S. Myers >> > jos...@codesourcery.com >> > >> >> Sure. The new version is in the attachment. > > This version is OK, provided that it has passed bootstrap with no > regressions and that it fixes the failures in question. > > -- > Joseph S. Myers > jos...@codesourcery.com > Thanks, I am going to apply it then, when the regtesting is over in case it would be successfull. Joseph, is it possible to commit the patch together with the space fixes? Thanks, Artem.
Re: Fix pr50607 bconstp-3.c failure
On Wed, 5 Oct 2011, Artem Shinkarov wrote: > Joseph, is it possible to commit the patch together with the space fixes? You should not commit whitespace fixes to lines not otherwise modified by a patch, except in a separate commit that *only* has whitespace or other formatting fixes, so that "svn blame" results are more meaningful. That is: commit whitespace fixes (assuming they are genuinely fixes rather than making things worse) *separately* in a commit whose commit message makes clear it is just whitespace fixes. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Fix PR46556 (poor address generation)
On Wed, Oct 5, 2011 at 6:13 PM, William J. Schmidt wrote: > * tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token. Rather than this, why not move the function to common code somewhere? Ciao! Steven
Re: Fix pr50607 bconstp-3.c failure
On Wed, Oct 5, 2011 at 5:28 PM, Joseph S. Myers wrote: > On Wed, 5 Oct 2011, Artem Shinkarov wrote: > >> Joseph, is it possible to commit the patch together with the space fixes? > > You should not commit whitespace fixes to lines not otherwise modified by > a patch, except in a separate commit that *only* has whitespace or other > formatting fixes, so that "svn blame" results are more meaningful. That > is: commit whitespace fixes (assuming they are genuinely fixes rather than > making things worse) *separately* in a commit whose commit message makes > clear it is just whitespace fixes. Ok, I see. In the given patch all the fixes are mainly auto-generated and remove the trailing spaces. But I see your point. Thanks, Artem. > -- > Joseph S. Myers > jos...@codesourcery.com >
Re: fix for c++/44473, mangling of decimal types, checked in
On Fri, 2011-09-30 at 10:37 -0700, Janis Johnson wrote: > Patch http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00625.html was > approved by Jason last December but I never got around to checking > it in. Paolo Carlini said in PR44473 that it was already approved > and doesn't need a new approval, so I checked it in after a > bootstrap and regtest of c,c++ for i686-pc-linux-gnu. Hi Janis, Thanks for committing this to mainline! One note though, you asked for permission to commit this also to the 4.5 release branch and Jason approved that. So do you have plans for committing it there too (and to the 4.6 branch since that was created between when you submitted that patch and committed it)? I'd like to see this fixed on both of those FSF release branches. Peter
Re: Modify gcc for use with gdb (issue5132047)
On Wed, Oct 5, 2011 at 11:28, Diego Novillo wrote: > On Wed, Oct 5, 2011 at 10:51, Richard Guenther > wrote: > >> Did you also mark the function with always_inline? That's a requirement >> as artificial only works for inlined function bodies. > > Yeah. It doesn't quite work as I expect it to. It steps into the > function at odd places. So, I played with this some more with this, and there seems to be some inconsistency in how these attributes get handled. http://sourceware.org/bugzilla/show_bug.cgi?id=13263 static inline int foo (int) __attribute__((always_inline,artificial)); static inline int foo (int x) { int y = x - 3; return y; } int bar (int y) { return y == 0; } main () { foo (10); return bar (foo (3)); } With GCC 4.7, the stand alone call foo(10) is not ignored by 'step'. However, the embedded call bar(foo(3)) is ignored as I was expecting. Diego.
Re: [PATCH] Add an intermediate coverage format for gcov
On Oct 5, 2011, at 12:47 AM, Sharad Singhai wrote: > This patch adds an intermediate coverage format (enabled via 'gcov > -i'). This is a compact format as it does not require source files. I don't like any of the tags, I think if you showed them to 100 people that had used gcov before, very few of them would be able to figure out the meaning without reading the manual. Why make it completely cryptic? A binary encoded compress format is smaller and just as readable. SF, sounds like single float to me, I'd propose file. FN, sounds like filename, I'd propose line. FNDA, can't even guess at what it means, I'd propose fcount. BA, I'd propose branch, for 0, notexe, for 1, nottaken, for 2 taken. DA, I'd propose lcount I'd propose fcount be listed as fname,ecount, to match the ordering of lcount. Also, implicit in the name, is the ordering f, then count, l, then count. I think if you showed the above to 100 people that had used gcov before, I think we'd be up past 90% of the people able to figure it out, without reading the doc.
Re: [RFC] Split -mrecip
On Wed, Oct 5, 2011 at 6:19 PM, Michael Matz wrote: >> Looking at this topic again, I'd propose that x86 adopts approach from >> rs6000. The rs6000 approach is more extensible, and offers the same >> flexibility, due to "!". >> >> So, x86 could have "-mrecip=", with all, default, none, div, >> vec-div, divf, vec-divf, rsqrt, etc ... combinations, > > Like so. Currently regstrapping on x86_64-linux. Okay if that succeeds? OK, with a nit - I'd introduce RECIP_MASK_NONE and use it in place of 0 in a couple of places. Thanks, Uros.
Re: [PATCH] Fix PR46556 (poor address generation)
On Wed, 2011-10-05 at 18:29 +0200, Steven Bosscher wrote: > On Wed, Oct 5, 2011 at 6:13 PM, William J. Schmidt > wrote: > >* tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token. > > Rather than this, why not move the function to common code somewhere? > > Ciao! > Steven An alternative would be to move it into tree-ssa-address.c, where there is already a simpler version called copy_mem_ref_info. I'm open to that if it's preferable. Bill
Re: Initial shrink-wrapping patch
On 10/05/11 18:21, Richard Henderson wrote: > On 10/05/2011 08:59 AM, Bernd Schmidt wrote: >> Bootstrapping the following now. Ok? (Alternatively, could keep the >> redzone logic, but make it depend on !flag_shrink_wrap). > > It's a good space-saving optimization, that redzone logic. We > should be able to keep it based on crtl->shrink_wrapped; that > does appear to get set before we actually emit the epilogue. I've committed the following after a x86_64-linux bootstrap. Bernd PR bootstrap/50621 * config/i386/i386.c (ix86_add_cfa_restore_note): Omit notes only if the function was not shrink-wrapped. (ix86_expand_epilogue): Ensure queued cfa_adjust notes are attached to an insn. * function.c (thread_prologue_and_epilogue_insns): Make sure the shrink_wrapped flag is set even if there is no dump file. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 179553) +++ gcc/config/i386/i386.c (working copy) @@ -9134,7 +9134,8 @@ static GTY(()) rtx queued_cfa_restores; static void ix86_add_cfa_restore_note (rtx insn, rtx reg, HOST_WIDE_INT cfa_offset) { - if (cfa_offset <= cfun->machine->fs.red_zone_offset) + if (!crtl->shrink_wrapped + && cfa_offset <= cfun->machine->fs.red_zone_offset) return; if (insn) @@ -10738,6 +10739,8 @@ ix86_expand_epilogue (int style) GEN_INT (m->fs.sp_offset - UNITS_PER_WORD), style, true); } + else +ix86_add_queued_cfa_restore_notes (get_last_insn ()); /* Sibcall epilogues don't want a return instruction. */ if (style == 0) Index: gcc/function.c === --- gcc/function.c (revision 179553) +++ gcc/function.c (working copy) @@ -5741,10 +5741,11 @@ thread_prologue_and_epilogue_insns (void if (dump_file) fprintf (dump_file, "Shrink-wrapping aborted due to clobber.\n"); } - else if (dump_file && entry_edge != orig_entry_edge) + else if (entry_edge != orig_entry_edge) { crtl->shrink_wrapped = true; - fprintf (dump_file, "Performing shrink-wrapping.\n"); + if (dump_file) + fprintf (dump_file, "Performing shrink-wrapping.\n"); } fail_shrinkwrap:
Re: Initial shrink-wrapping patch
On 10/05/11 00:29, Richard Henderson wrote: > As a followup, I think this option needs to be disabled for profiling > and profile_after_prologue. Should be a mere matter of frobbing the > options at startup. The other code seems to test crtl->profile rather than an option flag, so how's this? Bootstrapped along with the x86_64 fix. Bernd * function.c (thread_prologue_and_epilogue_insns): Don't shrink-wrap if profiling after the prologue. Index: gcc/function.c === --- gcc/function.c (revision 179553) +++ gcc/function.c (working copy) @@ -5571,6 +5571,7 @@ thread_prologue_and_epilogue_insns (void } if (flag_shrink_wrap && HAVE_simple_return + && (targetm.profile_before_prologue () || !crtl->profile) && nonempty_prologue && !crtl->calls_eh_return) { HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge;
[PATCH, i386]: A couple of cleanups.
Hello! No functional change. 2011-10-05 Uros Bizjak * config/i386/i386.c (distance_non_agu_define): Simplify calculation of "found". Simplify return value calculation. (distance_agu_use): Ditto. Tested on x86_64-pc-linux-gnu {,-m32} with shrink-wrapping disabled, committed to mainline SVN. Uros. Index: i386.c === --- i386.c (revision 179558) +++ i386.c (working copy) @@ -16139,19 +16139,20 @@ distance_non_agu_define (unsigned int regno1, unsi FOR_EACH_EDGE (e, ei, bb->preds) { - int bb_dist = distance_non_agu_define_in_bb (regno1, regno2, - insn, distance, - BB_END (e->src), - &found_in_bb); + int bb_dist + = distance_non_agu_define_in_bb (regno1, regno2, +insn, distance, +BB_END (e->src), +&found_in_bb); if (found_in_bb) { if (shortest_dist < 0) shortest_dist = bb_dist; else if (bb_dist > 0) shortest_dist = MIN (bb_dist, shortest_dist); - } - found = found || found_in_bb; + found = true; + } } distance = shortest_dist; @@ -16164,11 +16165,9 @@ distance_non_agu_define (unsigned int regno1, unsi extract_insn_cached (insn); if (!found) -distance = -1; - else -distance = distance >> 1; +return -1; - return distance; + return distance >> 1; } /* Return the distance in half-cycles between INSN and the next @@ -16181,9 +16180,9 @@ distance_non_agu_define (unsigned int regno1, unsi found and false otherwise. */ static int -distance_agu_use_in_bb(unsigned int regno, - rtx insn, int distance, rtx start, - bool *found, bool *redefined) +distance_agu_use_in_bb (unsigned int regno, + rtx insn, int distance, rtx start, + bool *found, bool *redefined) { basic_block bb = start ? BLOCK_FOR_INSN (start) : NULL; rtx next = start; @@ -16268,18 +16267,19 @@ distance_agu_use (unsigned int regno0, rtx insn) FOR_EACH_EDGE (e, ei, bb->succs) { - int bb_dist = distance_agu_use_in_bb (regno0, insn, - distance, BB_HEAD (e->dest), - &found_in_bb, &redefined_in_bb); + int bb_dist + = distance_agu_use_in_bb (regno0, insn, + distance, BB_HEAD (e->dest), + &found_in_bb, &redefined_in_bb); if (found_in_bb) { if (shortest_dist < 0) shortest_dist = bb_dist; else if (bb_dist > 0) shortest_dist = MIN (bb_dist, shortest_dist); - } - found = found || found_in_bb; + found = true; + } } distance = shortest_dist; @@ -16287,11 +16287,9 @@ distance_agu_use (unsigned int regno0, rtx insn) } if (!found || redefined) -distance = -1; - else -distance = distance >> 1; +return -1; - return distance; + return distance >> 1; } /* Define this macro to tune LEA priority vs ADD, it take effect when @@ -16346,7 +16344,7 @@ ix86_lea_outperforms (rtx insn, unsigned int regno false otherwise. */ static bool -ix86_ok_to_clobber_flags(rtx insn) +ix86_ok_to_clobber_flags (rtx insn) { basic_block bb = BLOCK_FOR_INSN (insn); df_ref *use;
Re: [PATCH] Fix PR46556 (poor address generation)
On Wed, 2011-10-05 at 18:21 +0200, Paolo Bonzini wrote: > On 10/05/2011 06:13 PM, William J. Schmidt wrote: > > One other general question about the pattern-match transformation: Is > > this an appropriate transformation for all targets, or should it be > > somehow gated on available addressing modes on the target processor? > > > > Bootstrapped and regression tested on powerpc64-linux-gnu. Verified no > > performance degradations on that target for SPEC CPU2000 and CPU2006. > > > > I'm looking for eventual approval for trunk after any comments are > > resolved. Thanks! > > How do the costs look like for the two transforms you mention in the > head comment of locally_poor_mem_replacement? > > Paolo > I don't know off the top of my head -- I'll have to gather that information. The issue is that the profitability is really context-sensitive, so just the isolated costs of insns aren't enough. The forward propagation of the add into (mem (reg REG)) looks like a slam dunk in the absence of other information, but if there are other nearby references using nonzero offsets from REG, this just extends the lifetimes of X and Y without eliminating the need for REG.
[PATCH, testsuite]: Fix UNRESOLVED: gcc.dg/vect/vec-scal-opt*.c scan-tree-dump-times veclower
Hello! 2011-10-05 Uros Bizjak * gcc.dg/vect/vect.exp (VEC_CFLAGS): Move initialization after DEFAULT_VECTFLAGS initialization. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: gcc.dg/vect/vect.exp === --- gcc.dg/vect/vect.exp(revision 179561) +++ gcc.dg/vect/vect.exp(working copy) @@ -39,15 +39,15 @@ return } -global VEC_FLAGS -set VEC_FLAGS $DEFAULT_VECTCFLAGS - # These flags are used for all targets. lappend DEFAULT_VECTCFLAGS "-ftree-vectorize" "-fno-vect-cost-model" # Initialize `dg'. dg-init +global VEC_FLAGS +set VEC_FLAGS $DEFAULT_VECTCFLAGS + global O1_VECTCFLAGS set O1_VECTCFLAGS $DEFAULT_VECTCFLAGS lappend O1_VECTCFLAGS "-O1"
[PATCH 0/3] Fix vector shuffle problems
The first problem is that the generic lowering to scalar implementation didn't match the documentation in that the shuffle indicies are to be masked to the range of the inputs. Or, perhaps more exactly, the generic lowering didn't match the SSE implementation which does do the masking. The OpenCL spec from whence this feature is taken does explicitly say that the masking should happen; our current documentation in extend.texi isn't quite as clear on this point. The second problem is that the SSE implementation was broken. It simply didn't work much of the time. This was masked by... The third problem is that the test cases weren't testing what they were intended to test. In particular, with optimization the compiler was able to constant-propagate away many of the shuffling and tests. Which might be an interesting missed-optimization test, had they actually been constructed differently. But what actually needed testing first is that the operation actually works at all. Tested on x86_64 with check-gcc//unix/{,-mssse3,-msse4} Hopefully one of the AMD guys can test on a bulldozer with -mxop? Committed. r~ Richard Henderson (3): Fix lower_vec_shuffle. i386: Rewrite ix86_expand_vshuffle. Fix vect-shuffle-* test cases. gcc/ChangeLog | 14 + gcc/config/i386/i386-protos.h |2 +- gcc/config/i386/i386.c | 208 +++ gcc/config/i386/sse.md |4 +- gcc/testsuite/ChangeLog| 11 + .../gcc.c-torture/execute/vect-shuffle-1.c | 98 +--- .../gcc.c-torture/execute/vect-shuffle-2.c | 96 +--- .../gcc.c-torture/execute/vect-shuffle-3.c | 90 --- .../gcc.c-torture/execute/vect-shuffle-4.c | 99 .../gcc.c-torture/execute/vect-shuffle-5.c | 113 .../gcc.c-torture/execute/vect-shuffle-6.c | 64 + .../gcc.c-torture/execute/vect-shuffle-7.c | 70 + .../gcc.c-torture/execute/vect-shuffle-8.c | 55 gcc/tree-vect-generic.c| 276 +++- 14 files changed, 698 insertions(+), 502 deletions(-) create mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-6.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-7.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-8.c -- 1.7.6.4
[PATCH 1/3] Fix lower_vec_shuffle.
1: It can never fail. 2: It should mask the input indicies. --- gcc/ChangeLog |8 ++ gcc/tree-vect-generic.c | 276 +-- 2 files changed, 107 insertions(+), 177 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 4bdda3d..a88854d 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2011-10-05 Richard Henderson + + * tree-vect-generic.c (vector_element): Never fail. Use + build_zero_cst. Tidy up type references. + (lower_vec_shuffle): Never fail. Mask shuffle indicies. Reduce + code duplication. Do update_stmt here ... + (expand_vector_operations_1): ... not here. + 2011-10-05 DJ Delorie Nick Clifton diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c index 1a6d1f1..ef03b35 100644 --- a/gcc/tree-vect-generic.c +++ b/gcc/tree-vect-generic.c @@ -512,22 +512,29 @@ type_for_widest_vector_mode (enum machine_mode inner_mode, optab op, int satp) static tree vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec) { - tree type; + tree vect_type, vect_elt_type; gimple asgn; tree tmpvec; tree arraytype; bool need_asgn = true; + unsigned int elements; - gcc_assert (TREE_CODE (TREE_TYPE (vect)) == VECTOR_TYPE); + vect_type = TREE_TYPE (vect); + vect_elt_type = TREE_TYPE (vect_type); + elements = TYPE_VECTOR_SUBPARTS (vect_type); - type = TREE_TYPE (vect); if (TREE_CODE (idx) == INTEGER_CST) { unsigned HOST_WIDE_INT index; - if (!host_integerp (idx, 1) - || (index = tree_low_cst (idx, 1)) > TYPE_VECTOR_SUBPARTS (type)-1) -return error_mark_node; + /* Given that we're about to compute a binary modulus, +we don't care about the high bits of the value. */ + index = TREE_INT_CST_LOW (idx); + if (!host_integerp (idx, 1) || index >= elements) + { + index &= elements - 1; + idx = build_int_cst (TREE_TYPE (idx), index); + } if (TREE_CODE (vect) == VECTOR_CST) { @@ -536,33 +543,30 @@ vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec) for (i = 0; vals; vals = TREE_CHAIN (vals), ++i) if (i == index) return TREE_VALUE (vals); - return error_mark_node; + return build_zero_cst (vect_elt_type); } else if (TREE_CODE (vect) == CONSTRUCTOR) { unsigned i; - VEC (constructor_elt, gc) *vals = CONSTRUCTOR_ELTS (vect); - constructor_elt *elt; + tree elt_i, elt_v; - for (i = 0; VEC_iterate (constructor_elt, vals, i, elt); i++) -if (operand_equal_p (elt->index, idx, 0)) - return elt->value; - return fold_convert (TREE_TYPE (type), integer_zero_node); + FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (vect), i, elt_i, elt_v) +if (operand_equal_p (elt_i, idx, 0)) + return elt_v; + return build_zero_cst (vect_elt_type); } - else if (TREE_CODE (vect) == SSA_NAME) + else { - tree size = TYPE_SIZE (TREE_TYPE (type)); + tree size = TYPE_SIZE (vect_elt_type); tree pos = fold_build2 (MULT_EXPR, TREE_TYPE (idx), idx, size); - return fold_build3 (BIT_FIELD_REF, TREE_TYPE (type), vect, size, pos); + return fold_build3 (BIT_FIELD_REF, vect_elt_type, vect, size, pos); } - else - return error_mark_node; } if (!ptmpvec) -tmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp"); +tmpvec = create_tmp_var (vect_type, "vectmp"); else if (!*ptmpvec) -tmpvec = *ptmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp"); +tmpvec = *ptmpvec = create_tmp_var (vect_type, "vectmp"); else { tmpvec = *ptmpvec; @@ -576,17 +580,14 @@ vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec) gsi_insert_before (gsi, asgn, GSI_SAME_STMT); } - arraytype = build_array_type_nelts (TREE_TYPE (type), - TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect))); - - return build4 (ARRAY_REF, TREE_TYPE (type), + arraytype = build_array_type_nelts (vect_elt_type, elements); + return build4 (ARRAY_REF, vect_elt_type, build1 (VIEW_CONVERT_EXPR, arraytype, tmpvec), idx, NULL_TREE, NULL_TREE); } /* Check if VEC_SHUFFLE_EXPR within the given setting is supported - by hardware, or lower it piecewise. Function returns false when - the expression must be replaced with TRAP_RETURN, true otherwise. + by hardware, or lower it piecewise. When VEC_SHUFFLE_EXPR has the same first and second operands: VEC_SHUFFLE_EXPR the lowered version would be @@ -597,162 +598,96 @@ vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec) {mask[0] < len(v0) ? v0[mask[0]] : v1[mask[0]], ...} V0 and V1 must have the
[PATCH 2/3] i386: Rewrite ix86_expand_vshuffle.
1: Handle TARGET_XOP. 2: Reduce code duplication. 3: Use ASHIFT instead of MULT for scaling. 4: Fix errors in building convert-to-v16qi indicies. 5: Handle v2di without sse4.1. --- gcc/ChangeLog |6 + gcc/config/i386/i386-protos.h |2 +- gcc/config/i386/i386.c| 208 - gcc/config/i386/sse.md|4 +- 4 files changed, 109 insertions(+), 111 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index a88854d..4b5816d 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -6,6 +6,12 @@ code duplication. Do update_stmt here ... (expand_vector_operations_1): ... not here. + * config/i386/i386.c (ix86_expand_vshuffle): Never fail. Handle + TARGET_XOP. Fix pshufb constant vector creation. Reduce code + duplication. Handle V2DI without SSE4.1. + * config/i386/i386-protos.h (ix86_expand_vshuffle): Update decl. + * config/i386/i386.md (vshuffle): Remove assert for ok. + 2011-10-05 DJ Delorie Nick Clifton diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 99327ed..0bbfa9b 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -123,7 +123,7 @@ extern bool ix86_expand_int_movcc (rtx[]); extern bool ix86_expand_fp_movcc (rtx[]); extern bool ix86_expand_fp_vcond (rtx[]); extern bool ix86_expand_int_vcond (rtx[]); -extern bool ix86_expand_vshuffle (rtx[]); +extern void ix86_expand_vshuffle (rtx[]); extern void ix86_expand_sse_unpack (rtx[], bool, bool); extern bool ix86_expand_int_addcc (rtx[]); extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 4c1db3a..80a9e73 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -19236,145 +19236,139 @@ ix86_expand_int_vcond (rtx operands[]) return true; } -bool +void ix86_expand_vshuffle (rtx operands[]) { rtx target = operands[0]; rtx op0 = operands[1]; rtx op1 = operands[2]; rtx mask = operands[3]; - rtx new_mask, vt, t1, t2, w_vector; + rtx vt, vec[16]; enum machine_mode mode = GET_MODE (op0); enum machine_mode maskmode = GET_MODE (mask); - enum machine_mode maskinner = GET_MODE_INNER (mode); - rtx vec[16]; - int w, i, j; - bool one_operand_shuffle = op0 == op1; + int w, e, i; + bool one_operand_shuffle = rtx_equal_p (op0, op1); - gcc_assert ((TARGET_SSSE3 || TARGET_AVX) && GET_MODE_BITSIZE (mode) == 128); + gcc_checking_assert (GET_MODE_BITSIZE (mode) == 128); /* Number of elements in the vector. */ - w = GET_MODE_BITSIZE (maskmode) / GET_MODE_BITSIZE (maskinner); - - /* generate w_vector = {w, w, ...} */ - for (i = 0; i < w; i++) -vec[i] = GEN_INT (w); - w_vector = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec)); - - /* mask = mask & {w-1, w-1, w-1,...} */ - for (i = 0; i < w; i++) -vec[i] = GEN_INT (w - 1); + w = GET_MODE_NUNITS (mode); + e = GET_MODE_UNIT_SIZE (mode); - vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec)); - new_mask = expand_simple_binop (maskmode, AND, mask, vt, - NULL_RTX, 0, OPTAB_DIRECT); - - /* If the original vector mode is V16QImode, we can just - use pshufb directly. */ - if (mode == V16QImode && one_operand_shuffle) + if (TARGET_XOP) { - t1 = gen_reg_rtx (V16QImode); - emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, new_mask)); - emit_insn (gen_rtx_SET (VOIDmode, target, t1)); - return true; + /* The XOP VPPERM insn supports three inputs. By ignoring the +one_operand_shuffle special case, we avoid creating another +set of constant vectors in memory. */ + one_operand_shuffle = false; + + /* mask = mask & {2*w-1, ...} */ + vt = GEN_INT (2*w - 1); } - else if (mode == V16QImode) + else { - rtx xops[6]; - - t1 = gen_reg_rtx (V16QImode); - t2 = gen_reg_rtx (V16QImode); - emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, new_mask)); - emit_insn (gen_ssse3_pshufbv16qi3 (t2, op1, new_mask)); - - /* mask = mask & {w, w, ...} */ - mask = expand_simple_binop (V16QImode, AND, mask, w_vector, - NULL_RTX, 0, OPTAB_DIRECT); - xops[0] = target; - xops[1] = operands[1]; - xops[2] = operands[2]; - xops[3] = gen_rtx_EQ (mode, mask, w_vector); - xops[4] = t1; - xops[5] = t2; - - return ix86_expand_int_vcond (xops); + /* mask = mask & {w-1, ...} */ + vt = GEN_INT (w - 1); } - /* mask = mask * {w, w, ...} */ - new_mask = expand_simple_binop (maskmode, MULT, new_mask, w_vector, - NULL_RTX, 0, OPTAB_DIRECT); - - /* Convert mask to vector of chars. */ - new_mask = simplify_gen_subreg (V16QImode, new_mask, maskmode, 0); - new_mask = force_reg (V16QImode, new_mask); - - /* Build a helper mask wich we will use in pshufb - (v4si) --> {0,0
[PATCH 3/3] Fix vect-shuffle-* test cases.
--- gcc/testsuite/ChangeLog| 11 ++ .../gcc.c-torture/execute/vect-shuffle-1.c | 98 +++--- .../gcc.c-torture/execute/vect-shuffle-2.c | 96 +++-- .../gcc.c-torture/execute/vect-shuffle-3.c | 90 ++-- .../gcc.c-torture/execute/vect-shuffle-4.c | 99 +- .../gcc.c-torture/execute/vect-shuffle-5.c | 113 ++-- .../gcc.c-torture/execute/vect-shuffle-6.c | 64 +++ .../gcc.c-torture/execute/vect-shuffle-7.c | 70 .../gcc.c-torture/execute/vect-shuffle-8.c | 55 ++ 9 files changed, 482 insertions(+), 214 deletions(-) create mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-6.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-7.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-8.c diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 4a09d43..eff558f 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,14 @@ +2011-10-05 Richard Henderson + + * gcc.c-torture/execute/vect-shuffle-1.c: Rewrite. + * gcc.c-torture/execute/vect-shuffle-2.c: Rewrite. + * gcc.c-torture/execute/vect-shuffle-3.c: Rewrite. + * gcc.c-torture/execute/vect-shuffle-4.c: Rewrite. + * gcc.c-torture/execute/vect-shuffle-5.c: Rewrite. + * gcc.c-torture/execute/vect-shuffle-6.c: New test. + * gcc.c-torture/execute/vect-shuffle-7.c: New test. + * gcc.c-torture/execute/vect-shuffle-8.c: New test. + 2011-10-05 Richard Guenther PR tree-optimization/38885 diff --git a/gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c b/gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c index 20f0261..3b83636 100644 --- a/gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c +++ b/gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c @@ -1,46 +1,68 @@ -#define vector(elcount, type) \ -__attribute__((vector_size((elcount)*sizeof(type type +#if __SIZEOF_INT__ == 4 +typedef unsigned int V __attribute__((vector_size(16), may_alias)); -#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx)) +struct S +{ + V in, mask, out; +}; -#define shufcompare(type, count, vres, v0, mask) \ -do { \ -int __i; \ -for (__i = 0; __i < count; __i++) { \ -if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \ -__builtin_abort (); \ -} \ -} while (0) +struct S tests[] = { + { +{ 0x, 0x, 0x, 0x }, +{ 0, 1, 2, 3 }, +{ 0x, 0x, 0x, 0x }, + }, + { +{ 0x, 0x, 0x, 0x }, +{ 0+1*4, 1+2*4, 2+3*4, 3+4*4 }, +{ 0x, 0x, 0x, 0x }, + }, + { +{ 0x, 0x, 0x, 0x }, +{ 3, 2, 1, 0 }, +{ 0x, 0x, 0x, 0x }, + }, + { +{ 0x, 0x, 0x, 0x }, +{ 0, 3, 2, 1 }, +{ 0x, 0x, 0x, 0x }, + }, + { +{ 0x, 0x, 0x, 0x }, +{ 0, 2, 1, 3 }, +{ 0x, 0x, 0x, 0x }, + }, + { +{ 0x11223344, 0x55667788, 0x99aabbcc, 0xddeeff00 }, +{ 3, 1, 2, 0 }, +{ 0xddeeff00, 0x55667788, 0x99aabbcc, 0x11223344 }, + }, + { +{ 0x11223344, 0x55667788, 0x99aabbcc, 0xddeeff00 }, +{ 0, 0, 0, 0 }, +{ 0x11223344, 0x11223344, 0x11223344, 0x11223344 }, + }, + { +{ 0x11223344, 0x55667788, 0x99aabbcc, 0xddeeff00 }, +{ 1, 2, 1, 2 }, +{ 0x55667788, 0x99aabbcc, 0x55667788, 0x99aabbcc }, + } +}; +extern void abort(void); -int main (int argc, char *argv[]) { -/*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7}; -vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7}; -vector (8, short) v2; +int main() +{ + int i; -vector (8, short) smask = {0,0,1,2,3,4,5,6}; + for (i = 0; i < sizeof(tests)/sizeof(tests[0]); ++i) +{ + V r = __builtin_shuffle(tests[i].in, tests[i].mask); + if (__builtin_memcmp(&r, &tests[i].out, sizeof(V)) != 0) + abort(); +} -v2 = __builtin_shuffle (v0, smask); -shufcompare (short, 8, v2, v0, smask); -v2 = __builtin_shuffle (v0, v1); -shufcompare (short, 8, v2, v0, v1); -v2 = __builtin_shuffle (smask, v0); -shufcompare (short, 8, v2, smask, v0);*/ - -vector (4, int) i0 = {argc, 1,2,3}; -vector (4, int) i1 = {argc, 1, argc, 3}; -vector (4, int) i2; - -vector (4, int) imask = {0,3,2,1}; - -/*i2 = __builtin_shuffle (i0, imask); -shufcompare (int, 4, i2, i0, imask);*/ -i2 = __builtin_shuffle (i0, i1); -shufcompare (int, 4, i2, i0, i1); - -i2 = __builtin_shuffle (imask, i0); -shufcompare (int, 4, i2, imask, i0); - -return 0; + return 0; } +#endif /* SIZEOF_INT */ diff --git a/gcc/testsuite/gcc.c-torture/execute/vect-
Re: Initial shrink-wrapping patch
On 10/05/2011 10:19 AM, Bernd Schmidt wrote: > * function.c (thread_prologue_and_epilogue_insns): Don't shrink-wrap > if profiling after the prologue. Ok. r~
[PATCH] Fix memory leak in vect_pattern_recog_1
Hi! If vect_recog_func fails (or the other spot where vect_pattern_recog_1 returns early), the vector allocated in the function isn't freed, leading to memory leak. But, more importantly, doing a VEC_alloc + VEC_free num_stmts_in_loop * NUM_PATTERNS times seems to be completely unnecessary, the following patch allocates just one vector for that purpose in the caller and only performs VEC_truncate in each call to make it independent from previous uses of the vector. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-10-05 Jakub Jelinek * tree-vect-patterns.c (vect_pattern_recog_1): Add stmts_to_replace argument, truncate it at the beginning instead of allocating there and freeing at the end. (vect_pattern_recog): Allocate stmts_to_replace here and free at end, pass its address to vect_pattern_recog_1. --- gcc/tree-vect-patterns.c.jj 2011-09-26 14:06:52.0 +0200 +++ gcc/tree-vect-patterns.c2011-10-05 15:57:38.0 +0200 @@ -1281,7 +1281,8 @@ vect_mark_pattern_stmts (gimple orig_stm static void vect_pattern_recog_1 ( gimple (* vect_recog_func) (VEC (gimple, heap) **, tree *, tree *), - gimple_stmt_iterator si) + gimple_stmt_iterator si, + VEC (gimple, heap) **stmts_to_replace) { gimple stmt = gsi_stmt (si), pattern_stmt; stmt_vec_info stmt_info; @@ -1291,14 +1292,14 @@ vect_pattern_recog_1 ( enum tree_code code; int i; gimple next; - VEC (gimple, heap) *stmts_to_replace = VEC_alloc (gimple, heap, 1); - VEC_quick_push (gimple, stmts_to_replace, stmt); - pattern_stmt = (* vect_recog_func) (&stmts_to_replace, &type_in, &type_out); + VEC_truncate (gimple, *stmts_to_replace, 0); + VEC_quick_push (gimple, *stmts_to_replace, stmt); + pattern_stmt = (* vect_recog_func) (stmts_to_replace, &type_in, &type_out); if (!pattern_stmt) return; - stmt = VEC_last (gimple, stmts_to_replace); + stmt = VEC_last (gimple, *stmts_to_replace); stmt_info = vinfo_for_stmt (stmt); loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); @@ -1363,8 +1364,8 @@ vect_pattern_recog_1 ( /* It is possible that additional pattern stmts are created and inserted in STMTS_TO_REPLACE. We create a stmt_info for each of them, and mark the relevant statements. */ - for (i = 0; VEC_iterate (gimple, stmts_to_replace, i, stmt) - && (unsigned) i < (VEC_length (gimple, stmts_to_replace) - 1); + for (i = 0; VEC_iterate (gimple, *stmts_to_replace, i, stmt) + && (unsigned) i < (VEC_length (gimple, *stmts_to_replace) - 1); i++) { stmt_info = vinfo_for_stmt (stmt); @@ -1377,8 +1378,6 @@ vect_pattern_recog_1 ( vect_mark_pattern_stmts (stmt, pattern_stmt, NULL_TREE); } - - VEC_free (gimple, heap, stmts_to_replace); } @@ -1468,6 +1467,7 @@ vect_pattern_recog (loop_vec_info loop_v gimple_stmt_iterator si; unsigned int i, j; gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *); + VEC (gimple, heap) *stmts_to_replace = VEC_alloc (gimple, heap, 1); if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "=== vect_pattern_recog ==="); @@ -1483,8 +1483,11 @@ vect_pattern_recog (loop_vec_info loop_v for (j = 0; j < NUM_PATTERNS; j++) { vect_recog_func_ptr = vect_vect_recog_func_ptrs[j]; - vect_pattern_recog_1 (vect_recog_func_ptr, si); + vect_pattern_recog_1 (vect_recog_func_ptr, si, + &stmts_to_replace); } } } + + VEC_free (gimple, heap, stmts_to_replace); } Jakub
Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD
On Wed, Oct 5, 2011 at 12:49 PM, Kirill Yukhin wrote: > We're prepared and bunch of tests which checks autogeneration of FMA3 > instructions family. > FMA3 typo in .md file is fixed as well (it was catched by tests). > > ChangeLog entry: > > 2011-10-04 Kirill Yukhin > Yakovlev Vladimir > > * config/i386/sse.md (fma_fnmsub_): Fix a typo. > > testsuite/ChangeLog entry: > > 2011-10-04 Kirill Yukhin > Yakovlev Vladimir > > * gcc.target/i386/fma_1.h: New test. > * gcc.target/i386/fma_2.h: Ditto. > * gcc.target/i386/fma_3.h: Ditto. > * gcc.target/i386/fma_4.h: Ditto. > * gcc.target/i386/fma_5.h: Ditto. > * gcc.target/i386/fma_6.h: Ditto. Just say "New file." for various headers. They are not standalone tests... > * gcc.target/i386/fma_double_1.c: Ditto. > * gcc.target/i386/fma_double_2.c: Ditto. > * gcc.target/i386/fma_double_3.c: Ditto. > * gcc.target/i386/fma_double_4.c: Ditto. > * gcc.target/i386/fma_double_5.c: Ditto. > * gcc.target/i386/fma_double_6.c: Ditto. > * gcc.target/i386/fma_float_1.c: Ditto. > * gcc.target/i386/fma_float_2.c: Ditto. > * gcc.target/i386/fma_float_3.c: Ditto. > * gcc.target/i386/fma_float_4.c: Ditto. > * gcc.target/i386/fma_float_5.c: Ditto. > * gcc.target/i386/fma_float_6.c: Ditto. > * gcc.target/i386/fma_main.h: Ditto. > * gcc.target/i386/fma_run_double_1.c: Ditto. > * gcc.target/i386/fma_run_double_2.c: Ditto. > * gcc.target/i386/fma_run_double_3.c: Ditto. > * gcc.target/i386/fma_run_double_4.c: Ditto. > * gcc.target/i386/fma_run_double_5.c: Ditto. > * gcc.target/i386/fma_run_double_6.c: Ditto. > * gcc.target/i386/fma_run_double_results_1.h: Ditto. > * gcc.target/i386/fma_run_double_results_2.h: Ditto. > * gcc.target/i386/fma_run_double_results_3.h: Ditto. > * gcc.target/i386/fma_run_double_results_4.h: Ditto. > * gcc.target/i386/fma_run_double_results_5.h: Ditto. > * gcc.target/i386/fma_run_double_results_6.h: Ditto. > * gcc.target/i386/fma_run_float_1.c: Ditto. > * gcc.target/i386/fma_run_float_2.c: Ditto. > * gcc.target/i386/fma_run_float_3.c: Ditto. > * gcc.target/i386/fma_run_float_4.c: Ditto. > * gcc.target/i386/fma_run_float_5.c: Ditto. > * gcc.target/i386/fma_run_float_6.c: Ditto. > * gcc.target/i386/fma_run_float_results_1.h: Ditto. > * gcc.target/i386/fma_run_float_results_2.h: Ditto. > * gcc.target/i386/fma_run_float_results_3.h: Ditto. > * gcc.target/i386/fma_run_float_results_4.h: Ditto. > * gcc.target/i386/fma_run_float_results_5.h: Ditto. > * gcc.target/i386/fma_run_float_results_6.h: Ditto. > * gcc.target/i386/l_fma_1.h: Ditto. > * gcc.target/i386/l_fma_2.h: Ditto. > * gcc.target/i386/l_fma_3.h: Ditto. > * gcc.target/i386/l_fma_4.h: Ditto. > * gcc.target/i386/l_fma_5.h: Ditto. > * gcc.target/i386/l_fma_6.h: Ditto. > * gcc.target/i386/l_fma_double_1.c: Ditto. > * gcc.target/i386/l_fma_double_2.c: Ditto. > * gcc.target/i386/l_fma_double_3.c: Ditto. > * gcc.target/i386/l_fma_double_4.c: Ditto. > * gcc.target/i386/l_fma_double_5.c: Ditto. > * gcc.target/i386/l_fma_double_6.c: Ditto. > * gcc.target/i386/l_fma_float_1.c: Ditto. > * gcc.target/i386/l_fma_float_2.c: Ditto. > * gcc.target/i386/l_fma_float_3.c: Ditto. > * gcc.target/i386/l_fma_float_4.c: Ditto. > * gcc.target/i386/l_fma_float_5.c: Ditto. > * gcc.target/i386/l_fma_float_6.c: Ditto. > * gcc.target/i386/l_fma_main.h: Ditto. > * gcc.target/i386/l_fma_run_double_1.c: Ditto. > * gcc.target/i386/l_fma_run_double_2.c: Ditto. > * gcc.target/i386/l_fma_run_double_3.c: Ditto. > * gcc.target/i386/l_fma_run_double_4.c: Ditto. > * gcc.target/i386/l_fma_run_double_5.c: Ditto. > * gcc.target/i386/l_fma_run_double_6.c: Ditto. > * gcc.target/i386/l_fma_run_float_1.c: Ditto. > * gcc.target/i386/l_fma_run_float_2.c: Ditto. > * gcc.target/i386/l_fma_run_float_3.c: Ditto. > * gcc.target/i386/l_fma_run_float_4.c: Ditto. > * gcc.target/i386/l_fma_run_float_5.c: Ditto. > * gcc.target/i386/l_fma_run_float_6.c: Ditto. > > Could you please have a look? +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-require-effective-target fma } */ +/* { dg-options "-O2 -mfma" } */ { dg-require-effective-target fma } directive is not needed for compile-only tests, as they stop at generating assembly file. Also, you have disabled all tests on ia32 - unconditionally use "-O2 -mfma -mfpmath=sse" for dg-options, and these instructions will magically appear on all targets. Otherwise, the patch looks OK. Uros.
Re: Modify gcc for use with gdb (issue5132047)
On Oct 5, 2011, at 6:18 AM, Diego Novillo wrote: > I think we need to find a solution for this situation. The solution Apple found and implemented is a __nodebug__ attribute, as can be seen in Apple's gcc. We use it like so: #define __always_inline__ __always_inline__, __nodebug__ #undef __always_inline__ in headers like mmintrn.h: __STATIC_INLINE void __attribute__((__always_inline__)) /* APPLE LOCAL end radar 5618945 */ _mm_empty (void) { __builtin_ia32_emms (); } to disappear the debug information for all the routines, so that the context is the context in which the routine is called (because the routine is always inlined). It is implemented and works great. Easy to use and understand. Since we use the #define, #undef around the functions, it is mostly equivalent to #pragma, though, it does attach a little closer to the function. You can strip #.* from the .i files, and not have it removed (which is nice).
Re: Modify gcc for use with gdb (issue5132047)
On Wed, Oct 5, 2011 at 14:20, Mike Stump wrote: > On Oct 5, 2011, at 6:18 AM, Diego Novillo wrote: >> I think we need to find a solution for this situation. > > The solution Apple found and implemented is a __nodebug__ attribute, as can > be seen in Apple's gcc. > > We use it like so: > > #define __always_inline__ __always_inline__, __nodebug__ > #undef __always_inline__ > > in headers like mmintrn.h: > > __STATIC_INLINE void __attribute__((__always_inline__)) > /* APPLE LOCAL end radar 5618945 */ > _mm_empty (void) > { > __builtin_ia32_emms (); > } Ah, nice. Though, one of the things I am liking more and more about the blacklist solution is that it (a) does not need any modifications to the source code, and (b) it works with no-inline functions as well. This gives total control to the developer. I would blacklist a bunch of functions I never care to go into, for instance. Others may choose to blacklist a different set. And you can change that from debug session to the next. I agree with Jakub that artificial functions should be blacklisted automatically, however. Richi, Jakub, if the blacklist solution was implemented in GCC would you agree with promoting these macros into inline functions? This is orthogonal to http://sourceware.org/bugzilla/show_bug.cgi?id=13263, of course. Thanks. Diego.
Re: [PATCH] Fix PR46556 (poor address generation)
On 10/05/2011 07:22 PM, William J. Schmidt wrote: I don't know off the top of my head -- I'll have to gather that information. The issue is that the profitability is really context-sensitive, so just the isolated costs of insns aren't enough. The forward propagation of the add into (mem (reg REG)) looks like a slam dunk in the absence of other information, but if there are other nearby references using nonzero offsets from REG, this just extends the lifetimes of X and Y without eliminating the need for REG. True, however there are other passes that do this kind of un-CSE and lifetime reduction. Paolo
Re: [PATCH, testsuite]: Fix UNRESOLVED: gcc.dg/vect/vec-scal-opt*.c scan-tree-dump-times veclower
On Wed, Oct 5, 2011 at 7:42 PM, Uros Bizjak wrote: > 2011-10-05 Uros Bizjak > > * gcc.dg/vect/vect.exp (VEC_CFLAGS): Move initialization after > DEFAULT_VECTFLAGS initialization. > Actually, these testcases need a bit more surgery... 2011-10-05 Uros Bizjak * gcc.dg/vect/vect.exp (VEC_CFLAGS): Move initialization after DEFAULT_VECTFLAGS initialization. Append "-fdump-tree-veclower2". * gcc.dg/vect/vec-scal-opt.c: Scan and cleanup veclower2 tree dump. * gcc.dg/vect/vec-scal-opt1.c: Ditto. * gcc.dg/vect/vec-scal-opt2.c: Ditto. Attached additional patch tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: gcc.dg/vect/vec-scal-opt1.c === --- gcc.dg/vect/vec-scal-opt1.c (revision 179565) +++ gcc.dg/vect/vec-scal-opt1.c (working copy) @@ -17,5 +17,5 @@ return vidx(short, r1, 0); } -/* { dg-final { scan-tree-dump-times ">> 2" 1 "veclower" { target vect_shift_scalar } } } */ -/* { dg-final { cleanup-tree-dump "veclower" } } */ +/* { dg-final { scan-tree-dump-times ">> 2" 1 "veclower2" { target vect_shift_scalar } } } */ +/* { dg-final { cleanup-tree-dump "veclower2" } } */ Index: gcc.dg/vect/vec-scal-opt2.c === --- gcc.dg/vect/vec-scal-opt2.c (revision 179565) +++ gcc.dg/vect/vec-scal-opt2.c (working copy) @@ -16,5 +16,5 @@ return vidx(short, r1, 0); } -/* { dg-final { scan-tree-dump-times ">> 2" 1 "veclower" { target vect_shift_scalar } } } */ -/* { dg-final { cleanup-tree-dump "veclower" } } */ +/* { dg-final { scan-tree-dump-times ">> 2" 1 "veclower2" { target vect_shift_scalar } } } */ +/* { dg-final { cleanup-tree-dump "veclower2" } } */ Index: gcc.dg/vect/vec-scal-opt.c === --- gcc.dg/vect/vec-scal-opt.c (revision 179565) +++ gcc.dg/vect/vec-scal-opt.c (working copy) @@ -19,5 +19,5 @@ return vidx(short, r1, 0); } -/* { dg-final { scan-tree-dump-times ">> k.\[0-9_\]*" 1 "veclower" { target vect_shift_scalar } } } */ -/* { dg-final { cleanup-tree-dump "veclower" } } */ +/* { dg-final { scan-tree-dump-times ">> k.\[0-9_\]*" 1 "veclower2" { target vect_shift_scalar } } } */ +/* { dg-final { cleanup-tree-dump "veclower2" } } */ Index: gcc.dg/vect/vect.exp === --- gcc.dg/vect/vect.exp(revision 179565) +++ gcc.dg/vect/vect.exp(working copy) @@ -64,8 +64,8 @@ dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/nodump-*.\[cS\]]] \ "" $DEFAULT_VECTCFLAGS -# "-O -fdump-tree-veclower" -lappend VEC_FLAGS "-O" "-fdump-tree-veclower" +# "-O -fdump-tree-veclower2" +lappend VEC_FLAGS "-O" "-fdump-tree-veclower2" dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vec-scal-*.\[cS\]]] \ "" $VEC_FLAGS
Fix htab lookup bug in reregister_specialization (issue5190046)
I found this bug while debugging a failure in pph images with template classes. When writing the decl_specializations table, the writer calls htab_elements() to output the length of the table. It then traverses the table with htab_traverse_noresize() to emit all the entries. The reader uses that value to know how many entries it needs to read. However, the table was out of sync wrt empty entries because reregister_specialization does a lookup using INSERT instead of NO_INSERT, so it was leaving a bunch of empty entries in it (thanks Jakub for helping me diagnose this). Since empty entries are never traversed by htab_traverse, they were never written out and the reader was trying to read more entries than there really were. Jason, I don't think this is affecting any bugs in trunk, but it looks worth fixing. OK for trunk? Diego. * pt.c (reregister_specialization): Call htab_find_slot with NO_INSERT. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 04e7767..2366dc9 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -1648,8 +1648,8 @@ reregister_specialization (tree spec, tree tinfo, tree new_spec) elt.args = TI_ARGS (tinfo); elt.spec = NULL_TREE; - slot = (spec_entry **) htab_find_slot (decl_specializations, &elt, INSERT); - if (*slot) + slot = (spec_entry **) htab_find_slot (decl_specializations, &elt, NO_INSERT); + if (slot && *slot) { gcc_assert ((*slot)->spec == spec || (*slot)->spec == new_spec); gcc_assert (new_spec != NULL_TREE); -- This patch is available for review at http://codereview.appspot.com/5190046
[pph] Fix template class tables streaming (issue5199041)
This fixes the problem I described in http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00376.html The tests are now failing assembly comparisons because the elements in these tables are emitted in different sequence in the pph and non-pph case. Other than that, the assembly produced is identical. Tested on x86_64. Committed to branch. Diego. cp/ChangeLog.pph * pt.c (reregister_specialization): Call htab_find_slot with NO_INSERT. testsuite/ChangeLog.pph * g++.dg/pph/x1tmplclass2.cc: Mark partially fixed. * g++.dg/pph/x4tmplclass2.cc: Likewise. * g++.dg/pph/z4tmplclass2.cc: Likewise. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 04e7767..2366dc9 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -1648,8 +1648,8 @@ reregister_specialization (tree spec, tree tinfo, tree new_spec) elt.args = TI_ARGS (tinfo); elt.spec = NULL_TREE; - slot = (spec_entry **) htab_find_slot (decl_specializations, &elt, INSERT); - if (*slot) + slot = (spec_entry **) htab_find_slot (decl_specializations, &elt, NO_INSERT); + if (slot && *slot) { gcc_assert ((*slot)->spec == spec || (*slot)->spec == new_spec); gcc_assert (new_spec != NULL_TREE); diff --git a/gcc/testsuite/g++.dg/pph/x1tmplclass2.cc b/gcc/testsuite/g++.dg/pph/x1tmplclass2.cc index f04335d..a254106 100644 --- a/gcc/testsuite/g++.dg/pph/x1tmplclass2.cc +++ b/gcc/testsuite/g++.dg/pph/x1tmplclass2.cc @@ -1,5 +1,3 @@ -// { dg-xfail-if "BOGUS" { "*-*-*" } { "-fpph-map=pph.map" } } -// { dg-bogus "x1tmplclass2.cc:1:0: internal compiler error: in pph_in_record_marker, at cp/pph-streamer.h" "" { xfail *-*-* } 0 } - +// pph asm xdiff 37711 #include "x0tmplclass23.h" #include "a0tmplclass2_u.h" diff --git a/gcc/testsuite/g++.dg/pph/x4tmplclass2.cc b/gcc/testsuite/g++.dg/pph/x4tmplclass2.cc index 585d4c0..e605e40 100644 --- a/gcc/testsuite/g++.dg/pph/x4tmplclass2.cc +++ b/gcc/testsuite/g++.dg/pph/x4tmplclass2.cc @@ -1,6 +1,4 @@ -// { dg-xfail-if "BOGUS" { "*-*-*" } { "-fpph-map=pph.map" } } -// { dg-bogus "x4tmplclass2.cc:1:0: internal compiler error: in pph_in_record_marker, at cp/pph-streamer.h" "" { xfail *-*-* } 0 } - +// pph asm xdiff 49533 #include "x0tmplclass21.h" #include "x0tmplclass22.h" #include "a0tmplclass2_u.h" diff --git a/gcc/testsuite/g++.dg/pph/z4tmplclass2.cc b/gcc/testsuite/g++.dg/pph/z4tmplclass2.cc index 0243829..d025942 100644 --- a/gcc/testsuite/g++.dg/pph/z4tmplclass2.cc +++ b/gcc/testsuite/g++.dg/pph/z4tmplclass2.cc @@ -1,6 +1,4 @@ -// { dg-xfail-if "BOGUS" { "*-*-*" } { "-fpph-map=pph.map" } } -// { dg-bogus "z4tmplclass2.cc:1:0: internal compiler error: in pph_in_record_marker, at cp/pph-streamer.h" "" { xfail *-*-* } 0 } - +// pph asm xdiff 59292 #include "x0tmplclass23.h" #include "x0tmplclass24.h" #include "a0tmplclass2_u.h" -- This patch is available for review at http://codereview.appspot.com/5199041
Re: Fix htab lookup bug in reregister_specialization (issue5190046)
On Wed, Oct 05, 2011 at 03:49:38PM -0400, Diego Novillo wrote: > Jason, I don't think this is affecting any bugs in trunk, but it looks > worth fixing. OK for trunk? Well, it can cause problems. Consider 3 entries in the hash table with the same hash. (1) inserted first, then (2), then (3), then (2) gets htab_remove_elt (decl_specializations has deletions on it too), so (2) in the hash table is overwritten with HTAB_DELETED_ENTRY. Next reregister_specialization is called with the same hash. It will find the slot (where (2) used to be stored), overwrites the HTAB_DELETED_ENTRY with HTAB_EMPTY_ENTRY (aka NULL) and return that slot, but as reregister_specialization doesn't store anything there, afterwards htab_find won't be able to find (3). BTW, register_specialization has the same problems. If slot != NULL and fn == NULL, it can still return without storing non-NULL into *slot (when optimize_specialization_lookup_p (tmpl) returns non-zero). You can do else if (slot != NULL && fn == NULL) htab_clear_slot (decl_specializations, slot); And, IMHO slot should be void **, otherwise there is aliasing violation. > --- a/gcc/cp/pt.c > +++ b/gcc/cp/pt.c > @@ -1648,8 +1648,8 @@ reregister_specialization (tree spec, tree tinfo, tree > new_spec) >elt.args = TI_ARGS (tinfo); >elt.spec = NULL_TREE; > > - slot = (spec_entry **) htab_find_slot (decl_specializations, &elt, INSERT); > - if (*slot) > + slot = (spec_entry **) htab_find_slot (decl_specializations, &elt, > NO_INSERT); > + if (slot && *slot) > { >gcc_assert ((*slot)->spec == spec || (*slot)->spec == new_spec); >gcc_assert (new_spec != NULL_TREE); If you never insert, it should be htab_find only (with spec_entry * instead of spec_entry ** variable). Jakub
Re: [PATCH] Fix PR46556 (poor address generation)
On Wed, 2011-10-05 at 21:01 +0200, Paolo Bonzini wrote: > On 10/05/2011 07:22 PM, William J. Schmidt wrote: > > I don't know off the top of my head -- I'll have to gather that > > information. The issue is that the profitability is really > > context-sensitive, so just the isolated costs of insns aren't enough. > > The forward propagation of the add into (mem (reg REG)) looks like a > > slam dunk in the absence of other information, but if there are other > > nearby references using nonzero offsets from REG, this just extends the > > lifetimes of X and Y without eliminating the need for REG. > > True, however there are other passes that do this kind of un-CSE and > lifetime reduction. > > Paolo OK, I see. If there's a better place downstream to make a swizzle, I'm certainly fine with that. I disabled locally_poor_mem_replacement and added some dump information in should_replace_address to show the costs for the replacement I'm trying to avoid: In should_replace_address: old_rtx = (reg/f:DI 125 [ D.2036 ]) new_rtx = (plus:DI (reg/v/f:DI 126 [ p ]) (reg:DI 128)) address_cost (old_rtx) = 0 address_cost (new_rtx) = 0 set_src_cost (old_rtx) = 0 set_src_cost (new_rtx) = 4 In insn 11, replacing (mem/s:SI (reg/f:DI 125 [ D.2036 ]) [2 p_1(D)->a S4 A32]) with (mem/s:SI (plus:DI (reg/v/f:DI 126 [ p ]) (reg:DI 128)) [2 p_1(D)->a S4 A32]) Changed insn 11 deferring rescan insn with uid = 11. deferring rescan insn with uid = 11. Hope this helps, Bill
Find more shrink-wrapping opportunities
This adds a little mini-pass to shrink-wrapping, to eliminate a common case that often makes shrink-wrapping unavailable. If a move insn copies an argument registers to a call-saved register, the prologue must be emitted before this insn. We should therefore try to delay such moves for as long as possible. A followup patch will add an extra cprop pass, which replaces uses of the call-saved register with uses of the incoming argument register, thereby allowing us to move the copies in even more cases. Bootstrapped and tested on i686-linux. Ok? Bernd * function.c (prepare_shrink_wrap, bb_active_p): New function. (thread_prologue_and_epilogue_insns): Use bb_active_p. Call prepare_shrink_wrap, then recompute bb_active_p for the last block. Index: gcc/function.c === --- gcc/function.c (revision 179577) +++ gcc/function.c (working copy) @@ -5352,6 +5352,127 @@ requires_stack_frame_p (rtx insn, HARD_R return true; return false; } + +/* Look for sets of call-saved registers in the first block of the + function, and move them down into successor blocks if the register + is used only on one path. This exposes more opportunities for + shrink-wrapping. + These kinds of sets often occur when incoming argument registers are + moved to call-saved registers because their values are live across + one or more calls during the function. */ + +static void +prepare_shrink_wrap (basic_block entry_block) +{ + rtx insn, curr; + FOR_BB_INSNS_SAFE (entry_block, insn, curr) +{ + basic_block next_bb; + edge e, live_edge; + edge_iterator ei; + rtx set, scan; + unsigned destreg, srcreg; + + if (!NONDEBUG_INSN_P (insn)) + continue; + set = single_set (insn); + if (!set) + continue; + + if (!REG_P (SET_SRC (set)) || !REG_P (SET_DEST (set))) + continue; + srcreg = REGNO (SET_SRC (set)); + destreg = REGNO (SET_DEST (set)); + if (hard_regno_nregs[srcreg][GET_MODE (SET_SRC (set))] > 1 + || hard_regno_nregs[destreg][GET_MODE (SET_DEST (set))] > 1) + continue; + + next_bb = entry_block; + scan = insn; + + for (;;) + { + live_edge = NULL; + FOR_EACH_EDGE (e, ei, next_bb->succs) + { + if (REGNO_REG_SET_P (df_get_live_in (e->dest), destreg)) + { + if (live_edge) + { + live_edge = NULL; + break; + } + live_edge = e; + } + } + if (!live_edge) + break; + /* We can sometimes encounter dead code. Don't try to move it +into the exit block. */ + if (live_edge->dest == EXIT_BLOCK_PTR) + break; + if (EDGE_COUNT (live_edge->dest->preds) > 1) + break; + while (scan != BB_END (next_bb)) + { + scan = NEXT_INSN (scan); + if (NONDEBUG_INSN_P (scan)) + { + rtx link; + HARD_REG_SET set_regs; + + CLEAR_HARD_REG_SET (set_regs); + note_stores (PATTERN (scan), record_hard_reg_sets, + &set_regs); + if (CALL_P (scan)) + IOR_HARD_REG_SET (set_regs, call_used_reg_set); + for (link = REG_NOTES (scan); link; link = XEXP (link, 1)) + if (REG_NOTE_KIND (link) == REG_INC) + record_hard_reg_sets (XEXP (link, 0), NULL, &set_regs); + + if (TEST_HARD_REG_BIT (set_regs, srcreg) + || reg_referenced_p (SET_DEST (set), + PATTERN (scan))) + { + scan = NULL_RTX; + break; + } + if (CALL_P (scan)) + { + rtx link = CALL_INSN_FUNCTION_USAGE (scan); + while (link) + { + rtx tmp = XEXP (link, 0); + if (GET_CODE (tmp) == USE + && reg_referenced_p (SET_DEST (set), tmp)) + break; + link = XEXP (link, 1); + } + if (link) + { + scan = NULL_RTX; + break; + } + } + } + } + if (!scan) + break; + next_bb = live_edge->dest; + } + + if (next_bb != entry_block) + { + rtx after = BB_HEAD (next_bb); + while (!NOTE_P (after) +|| NOTE_KIND (after) != NOTE_INSN_BASIC_BLOCK) + after = NEXT_INSN (after); + emit_insn_after (PATTERN (insn), af
Re: Fix htab lookup bug in reregister_specialization (issue5190046)
On 10/05/2011 04:05 PM, Jakub Jelinek wrote: BTW, register_specialization has the same problems. If slot != NULL and fn == NULL, it can still return without storing non-NULL into *slot (when optimize_specialization_lookup_p (tmpl) returns non-zero). You can do else if (slot != NULL&& fn == NULL) htab_clear_slot (decl_specializations, slot); I don't think there's a problem in that function; if fn == NULL, then we store spec in its place. If you never insert, it should be htab_find only (with spec_entry * instead of spec_entry ** variable). Makes sense. Jason
Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
On 10/05/2011 10:46 AM, Richard Guenther wrote: > On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries wrote: >> On 10/04/2011 03:03 PM, Richard Guenther wrote: >>> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries wrote: On 10/01/2011 05:46 PM, Tom de Vries wrote: > On 09/30/2011 03:29 PM, Richard Guenther wrote: >> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries >> wrote: >>> On 09/28/2011 11:53 AM, Richard Guenther wrote: On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries wrote: > Richard, > > I got a patch for PR50527. > > The patch prevents the alignment of vla-related allocas to be set to > BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after > folding > the alloca. > > Bootstrapped and regtested on x86_64. > > OK for trunk? Hmm. As gfortran with -fstack-arrays uses VLAs it's probably bad that the vectorizer then will no longer see that the arrays are properly aligned. I'm not sure what the best thing to do is here, other than trying to record the alignment requirement of the VLA somewhere. Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT has the issue that it will force stack-realignment which isn't free (and the point was to make the decl cheaper than the alloca). But that might possibly be the better choice. Any other thoughts? >>> >>> How about the approach in this (untested) patch? Using the DECL_ALIGN >>> of the vla >>> for the new array prevents stack realignment for folded vla-allocas, >>> also for >>> large vlas. >>> >>> This will not help in vectorizing large folded vla-allocas, but I think >>> it's not >>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although >>> that has >>> been the case up until we started to fold). If you want to trigger >>> vectorization >>> for a vla, you can still use the aligned attribute on the declaration. >>> >>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also >>> without using >>> an attribute on the decl. This patch exploits this by setting it at the >>> end of >>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective >>> in >>> propagation though, because although the ptr_info of the lhs is >>> propagated via >>> copy_prop afterwards, it's not propagated anymore via ccp. >>> >>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of >>> ccp2 and >>> not fold during ccp3. >> >> Ugh, somehow I like this the least ;) >> >> How about lowering VLAs to >> >> p = __builtin_alloca (...); >> p = __builtin_assume_aligned (p, DECL_ALIGN (vla)); >> >> and not assume anything for alloca itself if it feeds a >> __builtin_assume_aligned? >> >> Or rather introduce a __builtin_alloca_with_align () and for VLAs do >> >> p = __builtin_alloca_with_align (..., DECL_ALIGN (vla)); >> >> that's less awkward to use? >> >> Sorry for not having a clear plan here ;) >> > > Using assume_aligned is a more orthogonal way to represent this in > gimple, but > indeed harder to use. > > Another possibility is to add a 'tree vla_decl' field to struct > gimple_statement_call, which is probably the easiest to implement. > > But I think __builtin_alloca_with_align might have a use beyond vlas, so I > decided to try this one. Attached patch implements my first stab at this > (now > testing on x86_64). > > Is this an acceptable approach? > bootstrapped and reg-tested (including ada) on x86_64. Ok for trunk? >>> >>> The idea is ok I think. But >>> >>> case BUILT_IN_ALLOCA: >>> +case BUILT_IN_ALLOCA_WITH_ALIGN: >>>/* If the allocation stems from the declaration of a variable-sized >>> object, it cannot accumulate. */ >>>target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp)); >>>if (target) >>> return target; >>> + if (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) >>> + == BUILT_IN_ALLOCA_WITH_ALIGN) >>> + { >>> + tree new_call = build_call_expr_loc (EXPR_LOCATION (exp), >>> + >>> built_in_decls[BUILT_IN_ALLOCA], >>> + 1, CALL_EXPR_ARG (exp, 0)); >>> + CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp); >>> + exp = new_call; >>> + } >>> >>> Ick. Why can't the rest of the compiler not handle >>> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA? >>> (thus, arrange things so the assembler name of alloca-with-align is alloca?) >>>
Re: [C++-11] User defined literals
On 10/05/2011 11:55 AM, Ed Smith-Rowland wrote: + int num_parms = TREE_VEC_LENGTH (parameter_list); + if (num_parms != 1) + ok = false; + else + { + tree parm_list = TREE_VEC_ELT (parameter_list, 0); + tree parm = INNERMOST_TEMPLATE_PARMS (parm_list); This is backwards; you want to do INNERMOST_TEMPLATE_PARMS before looking at the number of parms or pulling out the one at index 0. +struct GTY(()) tree_userdef_literal { + struct tree_common common; I think you can use tree_base instead of tree_common here, since you aren't using TREE_TYPE or TREE_CHAIN. One thing doesn't work. Earlier I had said that friend declarations failed. Not true. What fails is defining the function body in the class definition. The function is simply not recorded. I'm trying to track this down but pointers would be helpful. If I write the function body outside the class definition friend works perfectly. About friends. The lookup for user-defined literal operators isn't ADL. They are a lot like normal functions because of the suffix-id as opposed to operator<< in this regard. Thus a friend literal operator should be accessible both as an explicit operator call and via a literal. Both are tested and except as noted above about inline defs the tests pass. A function that is only declared in a friend declaration can only be found by ADL, so that doesn't sound like a failure to me; it sounds like it's working the way it should. However, if you declare the function earlier and then define it in the friend declaration it ought to work. Jason
Re: Fix htab lookup bug in reregister_specialization (issue5190046)
On Wed, Oct 05, 2011 at 05:03:45PM -0400, Jason Merrill wrote: > On 10/05/2011 04:05 PM, Jakub Jelinek wrote: > >BTW, register_specialization has the same problems. If slot != NULL and fn > >== NULL, it can still return without storing non-NULL into *slot > >(when optimize_specialization_lookup_p (tmpl) returns non-zero). > >You can do else if (slot != NULL&& fn == NULL) htab_clear_slot > >(decl_specializations, slot); > > I don't think there's a problem in that function; if fn == NULL, > then we store spec in its place. If optimize_specialization_lookup_p (tmpl) doesn't change return value in between the two calls, then you are right. But perhaps in that case you could avoid the second call and use slot != NULL instead. Jakub
Re: Find more shrink-wrapping opportunities
On Wed, Oct 5, 2011 at 10:48 PM, Bernd Schmidt wrote: > Bootstrapped and tested on i686-linux. Ok? > +/* Return true if BB has any active insns. */ > +static bool > +bb_active_p (basic_block bb) > +{ > + rtx label; > + > + /* Test whether there are active instructions in the last block. */ > + label = BB_END (bb); > + while (label && !LABEL_P (label)) > +{ > + if (active_insn_p (label)) > + break; > + label = PREV_INSN (label); > +} > + return BB_HEAD (bb) != label || !LABEL_P (label); > +} This assumes that all basic blocks start with a label, I don't think that's true. It seems better to use FOR_BB_INSNS_REVERSE here. Ciao! Steven
[PATCH] Remove OUTPUT_ADDR_CONST_EXTRA macro
Hi. No one back end does not use OUTPUT_ADDR_CONST_EXTRA macro now, this patch remove it. The TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA target hook should be use instead. The patch has been bootstrapped on and regression tested on x86_64-unknown-linux-gnu for c and c++. This patch is pre-approved and should be committed within a week if no objections. * system.h (OUTPUT_ADDR_CONST_EXTRA): Poison. * doc/tm.texi.in (OUTPUT_ADDR_CONST_EXTRA): Remove documentation. * doc/tm.texi: Regenerate. * target.def (output_addr_const_extra): Use hook_bool_FILEptr_rtx_false. * targhooks.c (default_asm_output_addr_const_extra): Remove. * targhooks.h (default_asm_output_addr_const_extra): Remove. * hooks.c (hook_bool_FILEptr_rtx_false): New functions. * hooks.h (hook_bool_FILEptr_rtx_false): Declare. Index: gcc/doc/tm.texi === --- gcc/doc/tm.texi (revision 179476) +++ gcc/doc/tm.texi (working copy) @@ -7530,18 +7530,6 @@ return @code{true}. @end deftypefn -@defmac OUTPUT_ADDR_CONST_EXTRA (@var{stream}, @var{x}, @var{fail}) -A C statement to recognize @var{rtx} patterns that -@code{output_addr_const} can't deal with, and output assembly code to -@var{stream} corresponding to the pattern @var{x}. This may be used to -allow machine-dependent @code{UNSPEC}s to appear within constants. - -If @code{OUTPUT_ADDR_CONST_EXTRA} fails to recognize a pattern, it must -@code{goto fail}, so that a standard error message is printed. If it -prints an error message itself, by calling, for example, -@code{output_operand_lossage}, it may just complete normally. -@end defmac - @defmac ASM_OUTPUT_ASCII (@var{stream}, @var{ptr}, @var{len}) A C statement to output to the stdio stream @var{stream} an assembler instruction to assemble a string constant containing the @var{len} Index: gcc/doc/tm.texi.in === --- gcc/doc/tm.texi.in (revision 179476) +++ gcc/doc/tm.texi.in (working copy) @@ -7446,18 +7446,6 @@ return @code{true}. @end deftypefn -@defmac OUTPUT_ADDR_CONST_EXTRA (@var{stream}, @var{x}, @var{fail}) -A C statement to recognize @var{rtx} patterns that -@code{output_addr_const} can't deal with, and output assembly code to -@var{stream} corresponding to the pattern @var{x}. This may be used to -allow machine-dependent @code{UNSPEC}s to appear within constants. - -If @code{OUTPUT_ADDR_CONST_EXTRA} fails to recognize a pattern, it must -@code{goto fail}, so that a standard error message is printed. If it -prints an error message itself, by calling, for example, -@code{output_operand_lossage}, it may just complete normally. -@end defmac - @defmac ASM_OUTPUT_ASCII (@var{stream}, @var{ptr}, @var{len}) A C statement to output to the stdio stream @var{stream} an assembler instruction to assemble a string constant containing the @var{len} Index: gcc/targhooks.c === --- gcc/targhooks.c (revision 179476) +++ gcc/targhooks.c (working copy) @@ -371,21 +371,6 @@ return get_identifier (stripped); } -/* The default implementation of TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA. */ - -bool -default_asm_output_addr_const_extra (FILE *file ATTRIBUTE_UNUSED, -rtx x ATTRIBUTE_UNUSED) -{ -#ifdef OUTPUT_ADDR_CONST_EXTRA - OUTPUT_ADDR_CONST_EXTRA (file, x, fail); - return true; - -fail: -#endif - return false; -} - /* True if MODE is valid for the target. By "valid", we mean able to be manipulated in non-trivial ways. In particular, this means all the arithmetic is supported. Index: gcc/targhooks.h === --- gcc/targhooks.h (revision 179476) +++ gcc/targhooks.h (working copy) @@ -67,8 +67,6 @@ extern bool default_print_operand_punct_valid_p (unsigned char); extern tree default_mangle_assembler_name (const char *); -extern bool default_asm_output_addr_const_extra (FILE *, rtx); - extern bool default_scalar_mode_supported_p (enum machine_mode); extern bool targhook_words_big_endian (void); extern bool targhook_float_words_big_endian (void); Index: gcc/hooks.c === --- gcc/hooks.c (revision 179476) +++ gcc/hooks.c (working copy) @@ -132,6 +132,14 @@ { } +/* Generic hook that takes (FILE *, rtx) and returns false. */ +bool +hook_bool_FILEptr_rtx_false (FILE *a ATTRIBUTE_UNUSED, +rtx b ATTRIBUTE_UNUSED) +{ + return false; +} + /* Used for the TARGET_ASM_CAN_OUTPUT_MI_THUNK hook. */ bool hook_bool_const_tree_hwi_hwi_const_tree_false (const_tree a ATTRIBUTE_UNUSED, Index: gcc/hooks.h === --- gcc/hooks.h (revision 179476) +++ gcc/hooks.h (working copy) @@ -63,6 +63,7 @@ extern void hook_v
Re: Find more shrink-wrapping opportunities
On 10/05/11 23:23, Steven Bosscher wrote: > On Wed, Oct 5, 2011 at 10:48 PM, Bernd Schmidt > wrote: >> Bootstrapped and tested on i686-linux. Ok? > >> +/* Return true if BB has any active insns. */ >> +static bool >> +bb_active_p (basic_block bb) >> +{ >> + rtx label; >> + >> + /* Test whether there are active instructions in the last block. */ >> + label = BB_END (bb); >> + while (label && !LABEL_P (label)) >> +{ >> + if (active_insn_p (label)) >> +break; >> + label = PREV_INSN (label); >> +} >> + return BB_HEAD (bb) != label || !LABEL_P (label); >> +} > > This assumes that all basic blocks start with a label, I don't think > that's true. It seems better to use FOR_BB_INSNS_REVERSE here. It's the same code as before... and if you get a block without a label and without active insns, then cfgcleanup hasn't done its job. Bernd
Re: [Patch, Fortran] PR 35831: [F95] Shape mismatch check missing for dummy procedure argument
>> If you have a cute idea how to elegantly introduce warnings into this >> mechanism, I'm all ears. > I'm not sure that it qualifies as cute, but we could produce multi-line > diagnostics in the same way c++ does (for template candidates for example), > like: > error/warning: #the error/warning > note: in actual argument at <> # the context Maybe not the worst idea. The question is, how would we do this technically? Can it be handled with the existing infrastructure (gfc_error_... )? Alternatives might be: * simply throw the warning without context * have 'check_dummy_characteristics' not return SUCCESS/FAILURE, but something like SUCCESS/WARNING/ERROR and then react on this appropriately by throwing either an error or a warning (more tedious) Cheers, Janus
Re: Initial shrink-wrapping patch
On Wed, Oct 5, 2011 at 10:17 AM, Bernd Schmidt wrote: > > I've committed the following after a x86_64-linux bootstrap. This patch appears to have broken the Go bootstrap when compiling a C file in libgo. Here is a reduced test case: static struct { int n; unsigned int m; _Bool f; } g; _Bool f (int s) { unsigned int b, m; if (!g.f) return 0; b = 1 << s; while (1) { m = g.m; if (m & b) break; if (__sync_bool_compare_and_swap (&g.m, m, m|b)) { if (m == 0) f2 (&g.n); break; } } return 1; } Compiling this file with -O2 -fsplit-stack causes an ICE: foo.c:29:1: internal compiler error: in maybe_record_trace_start, at dwarf2cfi.c:2243 Compiling the file with -O2 -fsplit-stack -fno-shrink-wrap works. Ian
Re: [PATCH] Remove OUTPUT_ADDR_CONST_EXTRA macro
On 10/05/2011 02:28 PM, Anatoly Sokolov wrote: > * system.h (OUTPUT_ADDR_CONST_EXTRA): Poison. > * doc/tm.texi.in (OUTPUT_ADDR_CONST_EXTRA): Remove documentation. > * doc/tm.texi: Regenerate. > * target.def (output_addr_const_extra): Use > hook_bool_FILEptr_rtx_false. > * targhooks.c (default_asm_output_addr_const_extra): Remove. > * targhooks.h (default_asm_output_addr_const_extra): Remove. > * hooks.c (hook_bool_FILEptr_rtx_false): New functions. > * hooks.h (hook_bool_FILEptr_rtx_false): Declare. Ok. r~
Re: Commit: RX: Codegen bug fixes
On 10/05/2011 03:23 AM, Nick Clifton wrote: > The final fix was pointed out by Richard Henderson. The recently > added support for narrow mode min and max instructions did not work > for the SMAX insn, as the RX does not have narrow mode versions of > this insn. The SMIN pattern has the same problem. r~
Re: [v3] versioned-namespaces spelling/soname change
> I'm going to let this chill a bit on mainline and then check in to > 4.6.x. Seems fine so I dropped this into the 4_6-branch tested x86/linux -benjamin 2011-10-05 Benjamin Kosnik PR libstdc++/48698 * acinclude.m4 (GLIBCXX_ENABLE_SYMVERS): Set libtool_VERSION here. * configure.ac: Move AC_SUBST of libtool_VERSION past call to GLIBCXX_ENABLE_SYMVERS. * configure: Regenerate. * include/bits/c++config: Use __7 as versioned namespace name. * config/abi/pre/gnu-versioned-namespace.ver: Change mangling as per above. * include/c_global/cwchar: Adjust nested namespaces. * testsuite/20_util/bind/48698.cc: Add test case. * testsuite/ext/profile/mutex_extensions_neg.cc: Change line number. Index: configure.ac === --- configure.ac (revision 179579) +++ configure.ac (working copy) @@ -11,10 +11,6 @@ # exported. Only used at the end of this file. ### am handles this now? ORIGINAL_LD_FOR_MULTILIBS=$LD -# For libtool versioning info, format is CURRENT:REVISION:AGE -libtool_VERSION=6:16:0 -AC_SUBST(libtool_VERSION) - # Find the rest of the source tree framework. AM_ENABLE_MULTILIB(, ..) @@ -303,6 +299,8 @@ # This depends on GLIBCXX CHECK_LINKER_FEATURES, but without it assumes no. GLIBCXX_ENABLE_SYMVERS([yes]) +AC_SUBST(libtool_VERSION) + GLIBCXX_ENABLE_VISIBILITY([yes]) ac_ldbl_compat=no Index: include/bits/locale_facets.tcc === --- include/bits/locale_facets.tcc (revision 179579) +++ include/bits/locale_facets.tcc (working copy) @@ -635,15 +635,11 @@ const char_type __c = *__beg; - if (!__donef) - __testf = __c == __lc->_M_falsename[__n]; - + __testf = __c == __lc->_M_falsename[__n]; if (!__testf && __donet) break; - if (!__donet) - __testt = __c == __lc->_M_truename[__n]; - + __testt = __c == __lc->_M_truename[__n]; if (!__testt && __donef) break; Index: include/bits/c++config === --- include/bits/c++config (revision 179579) +++ include/bits/c++config (working copy) @@ -162,41 +162,42 @@ // Defined if inline namespaces are used for versioning. -#define _GLIBCXX_INLINE_VERSION +#define _GLIBCXX_INLINE_VERSION // Inline namespace for symbol versioning. #if _GLIBCXX_INLINE_VERSION + namespace std { - inline namespace _6 { } + inline namespace __7 { } - namespace rel_ops { inline namespace _6 { } } + namespace rel_ops { inline namespace __7 { } } namespace tr1 { -inline namespace _6 { } -namespace placeholders { inline namespace _6 { } } -namespace regex_constants { inline namespace _6 { } } -namespace __detail { inline namespace _6 { } } +inline namespace __7 { } +namespace placeholders { inline namespace __7 { } } +namespace regex_constants { inline namespace __7 { } } +namespace __detail { inline namespace __7 { } } } - namespace decimal { inline namespace _6 { } } + namespace decimal { inline namespace __7 { } } - namespace chrono { inline namespace _6 { } } - namespace placeholders { inline namespace _6 { } } - namespace regex_constants { inline namespace _6 { } } - namespace this_thread { inline namespace _6 { } } + namespace chrono { inline namespace __7 { } } + namespace placeholders { inline namespace __7 { } } + namespace regex_constants { inline namespace __7 { } } + namespace this_thread { inline namespace __7 { } } - namespace __detail { inline namespace _6 { } } - namespace __regex { inline namespace _6 { } } + namespace __detail { inline namespace __7 { } } + namespace __regex { inline namespace __7 { } } } namespace __gnu_cxx { - inline namespace _6 { } - namespace __detail { inline namespace _6 { } } + inline namespace __7 { } + namespace __detail { inline namespace __7 { } } } -# define _GLIBCXX_BEGIN_NAMESPACE_VERSION namespace _6 { +# define _GLIBCXX_BEGIN_NAMESPACE_VERSION namespace __7 { # define _GLIBCXX_END_NAMESPACE_VERSION } #else # define _GLIBCXX_BEGIN_NAMESPACE_VERSION @@ -213,7 +214,7 @@ namespace __cxx1998 { #if _GLIBCXX_INLINE_VERSION - inline namespace _6 { } + inline namespace __7 { } #endif } Index: include/c_global/cwchar === --- include/c_global/cwchar (revision 179579) +++ include/c_global/cwchar (working copy) @@ -136,6 +136,8 @@ namespace std _GLIBCXX_VISIBILITY(default) { +_GLIBCXX_BEGIN_NAMESPACE_VERSION + using ::wint_t; using ::btowc; @@ -207,8 +209,6 @@ using ::wcsstr; using ::wmemchr; -_GLIBCXX_BEGIN_NAMESPACE_VERSION - #ifndef __CORRECT_ISO_CPP_WCHAR_H_PROTO inline wchar_t* wcschr(wchar_t* __p, wchar_t __c) Index: testsuite/ext/profile/mutex_extensions_neg.cc ===
Re: Initial shrink-wrapping patch
On 10/06/11 01:04, Ian Lance Taylor wrote: > On Wed, Oct 5, 2011 at 10:17 AM, Bernd Schmidt > wrote: >> >> I've committed the following after a x86_64-linux bootstrap. > > This patch appears to have broken the Go bootstrap when compiling a C > file in libgo. Here is a reduced test case: > > static struct > { > int n; > unsigned int m; > _Bool f; > } g; > > _Bool > f (int s) > { > unsigned int b, m; > > if (!g.f) > return 0; > b = 1 << s; > while (1) > { > m = g.m; > if (m & b) > break; > if (__sync_bool_compare_and_swap (&g.m, m, m|b)) > { > if (m == 0) > f2 (&g.n); > break; > } > } > return 1; > } > > Compiling this file with -O2 -fsplit-stack causes an ICE: > > foo.c:29:1: internal compiler error: in maybe_record_trace_start, at > dwarf2cfi.c:2243 > > Compiling the file with -O2 -fsplit-stack -fno-shrink-wrap works. This appears to be because the split prologue contains a jump, which means the find_many_sub_blocks call reorders the block numbers, and our indices into bb_flags are off. I'm testing the following patch, but it won't finish today - feel free to test and check it in, or to just disable shrink-wrapping with split prologues. Bernd Index: gcc/function.c === --- gcc/function.c (revision 179577) +++ gcc/function.c (working copy) @@ -5455,7 +5455,7 @@ thread_prologue_and_epilogue_insns (void basic_block last_bb; bool last_bb_active; #ifdef HAVE_simple_return - bool unconverted_simple_returns = false; + VEC (basic_block, heap) *unconverted_simple_returns = NULL; basic_block simple_return_block_hot = NULL; basic_block simple_return_block_cold = NULL; bool nonempty_prologue; @@ -5876,7 +5876,8 @@ thread_prologue_and_epilogue_insns (void { #ifdef HAVE_simple_return if (simple_p) - unconverted_simple_returns = true; + VEC_safe_push (basic_block, heap, + unconverted_simple_returns, bb); #endif continue; } @@ -5894,7 +5895,8 @@ thread_prologue_and_epilogue_insns (void { #ifdef HAVE_simple_return if (simple_p) - unconverted_simple_returns = true; + VEC_safe_push (basic_block, heap, + unconverted_simple_returns, bb); #endif continue; } @@ -6042,10 +6044,11 @@ epilogue_done: convert to conditional simple_returns, but couldn't for some reason, create a block to hold a simple_return insn and redirect those remaining edges. */ - if (unconverted_simple_returns) + if (!VEC_empty (basic_block, unconverted_simple_returns)) { - edge_iterator ei2; basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; + basic_block src_bb; + int i; gcc_assert (entry_edge != orig_entry_edge); @@ -6062,19 +6065,12 @@ epilogue_done: simple_return_block_cold = e->dest; } -restart_scan: - for (ei2 = ei_start (last_bb->preds); (e = ei_safe_edge (ei2)); ) + FOR_EACH_VEC_ELT (basic_block, unconverted_simple_returns, i, src_bb) { - basic_block bb = e->src; + edge e = find_edge (src_bb, last_bb); basic_block *pdest_bb; - if (bb == ENTRY_BLOCK_PTR - || bitmap_bit_p (&bb_flags, bb->index)) - { - ei_next (&ei2); - continue; - } - if (BB_PARTITION (e->src) == BB_HOT_PARTITION) + if (BB_PARTITION (src_bb) == BB_HOT_PARTITION) pdest_bb = &simple_return_block_hot; else pdest_bb = &simple_return_block_cold; @@ -6094,8 +6090,8 @@ epilogue_done: make_edge (bb, EXIT_BLOCK_PTR, 0); } redirect_edge_and_branch_force (e, *pdest_bb); - goto restart_scan; } + VEC_free (basic_block, heap, unconverted_simple_returns); } #endif
Re: [v3] versioned-namespaces spelling/soname change
Hi, the below hunk seems spurious?!? Paolo. / Index: include/bits/locale_facets.tcc === --- include/bits/locale_facets.tcc (revision 179579) +++ include/bits/locale_facets.tcc (working copy) @@ -635,15 +635,11 @@ const char_type __c = *__beg; - if (!__donef) - __testf = __c == __lc->_M_falsename[__n]; - + __testf = __c == __lc->_M_falsename[__n]; if (!__testf&& __donet) break; - if (!__donet) - __testt = __c == __lc->_M_truename[__n]; - + __testt = __c == __lc->_M_truename[__n]; if (!__testt&& __donef) break;
[v3] Avoid pod_char_traits.h warnings in C++0x mode
Hi, committed to mainline. Paolo. 2011-10-05 Paolo Carlini * include/ext/pod_char_traits.h: Avoid warnings in C++0x mode when int_type is unsigned. Index: include/ext/pod_char_traits.h === --- include/ext/pod_char_traits.h (revision 179582) +++ include/ext/pod_char_traits.h (working copy) @@ -1,6 +1,7 @@ // POD character, std::char_traits specialization -*- C++ -*- -// Copyright (C) 2002, 2003, 2004, 2005, 2007, 2009 Free Software Foundation, Inc. +// Copyright (C) 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011 +// Free Software Foundation, Inc. // // This file is part of the GNU ISO C++ Library. This library is free // software; you can redistribute it and/or modify it under the @@ -32,6 +33,8 @@ #ifndef _POD_CHAR_TRAITS_H #define _POD_CHAR_TRAITS_H 1 +#pragma GCC system_header + #include namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
Re: Fix htab lookup bug in reregister_specialization (issue5190046)
On 10/05/2011 05:15 PM, Jakub Jelinek wrote: If optimize_specialization_lookup_p (tmpl) doesn't change return value in between the two calls, then you are right. But perhaps in that case you could avoid the second call and use slot != NULL instead. That makes sense too. Jason
Re: [v3] add max_size and rebind to __alloc_traits
Hi, 2011-10-04 Jonathan Wakely * include/ext/alloc_traits.h (__alloc_traits::max_size): Define. (__alloc_traits::rebind): Define. * include/bits/stl_vector.h: Use them. * testsuite/util/testsuite_allocator.h (SimpleAllocator): Define. * testsuite/23_containers/vector/allocator/minimal.cc: New. * testsuite/23_containers/vector/requirements/dr438/assign_neg.cc: Adjust dg-error line numbers. * testsuite/23_containers/vector/requirements/dr438/insert_neg.cc: Likewise. * testsuite/23_containers/vector/requirements/dr438/ constructor_1_neg.cc: Likewise. * testsuite/23_containers/vector/requirements/dr438/ constructor_2_neg.cc: Likewise. today I ran the whole testsuite in C++0x mode and I'm pretty sure that 23_containers/vector/modifiers/swap/3.cc, which is now failing, wasn't a couple of days ago (I ran the whole testsuite like that in order to validate my std::list changes). When you have time, could you please double check? (maybe after all we *do* want it to fail in C++0x mode, but I'd like to understand if the behavior changed inadvertently!) Thanks in advance, Paolo.
Re: Fix pr50607 bconstp-3.c failure
On Wed, Oct 5, 2011 at 5:32 PM, Artem Shinkarov wrote: > On Wed, Oct 5, 2011 at 5:28 PM, Joseph S. Myers > wrote: >> On Wed, 5 Oct 2011, Artem Shinkarov wrote: >> >>> Joseph, is it possible to commit the patch together with the space fixes? >> >> You should not commit whitespace fixes to lines not otherwise modified by >> a patch, except in a separate commit that *only* has whitespace or other >> formatting fixes, so that "svn blame" results are more meaningful. That >> is: commit whitespace fixes (assuming they are genuinely fixes rather than >> making things worse) *separately* in a commit whose commit message makes >> clear it is just whitespace fixes. > > Ok, I see. In the given patch all the fixes are mainly auto-generated > and remove the trailing spaces. But I see your point. > > Thanks, > Artem. >> -- >> Joseph S. Myers >> jos...@codesourcery.com >> > Successfully regtested on x86-unknown-linux-gnu. Committed to the mainline with the revision 179588. ChangeLog: 2011-10-06 Artjoms Sinkarovs * c-tree.h (c_expr_t): New typedef for struct c_expr. (C_EXPR_APPEND): New macro. * c-parser.c (c_parser_get_builtin_args): Preserve original_tree_code of c_expr structure. Fixes bconstp-3.c failure of PR50607. (c_parser_postfix_expression): Adjust to the new function. Artem.
Re: Fix pr50607 bconstp-3.c failure
On Thu, 6 Oct 2011, Artem Shinkarov wrote: > Successfully regtested on x86-unknown-linux-gnu. Committed to the > mainline with the revision 179588. > > ChangeLog: > 2011-10-06 Artjoms Sinkarovs > * c-tree.h (c_expr_t): New typedef for struct c_expr. > (C_EXPR_APPEND): New macro. > * c-parser.c (c_parser_get_builtin_args): Preserve > original_tree_code of c_expr structure. Fixes bconstp-3.c > failure of PR50607. > (c_parser_postfix_expression): Adjust to the new function. Write that changelog entry as: PR middle-end/50607 * c-tree.h (c_expr_t): New typedef for struct c_expr. (C_EXPR_APPEND): New macro. * c-parser.c (c_parser_get_builtin_args): Preserve original_tree_code of c_expr structure. (c_parser_postfix_expression): Adjust to the new function. (note top marker and without the "Fixes...") and there'll be an automatic entry in bugzilla, like for example PR50609. You still have to close the bugzilla entry manually (not all noteworthy fixes are reason to close a bug report). Mostly as a note for the future but you could fix that now, when adding the missing empty line after the date+email line. ;) brgds, H-P
Re: Fix pr50607 bconstp-3.c failure
On Thu, Oct 6, 2011 at 3:27 AM, Hans-Peter Nilsson wrote: > On Thu, 6 Oct 2011, Artem Shinkarov wrote: >> Successfully regtested on x86-unknown-linux-gnu. Committed to the >> mainline with the revision 179588. >> >> ChangeLog: >> 2011-10-06 Artjoms Sinkarovs >> * c-tree.h (c_expr_t): New typedef for struct c_expr. >> (C_EXPR_APPEND): New macro. >> * c-parser.c (c_parser_get_builtin_args): Preserve >> original_tree_code of c_expr structure. Fixes bconstp-3.c >> failure of PR50607. >> (c_parser_postfix_expression): Adjust to the new function. > > Write that changelog entry as: > > PR middle-end/50607 > * c-tree.h (c_expr_t): New typedef for struct c_expr. > (C_EXPR_APPEND): New macro. > * c-parser.c (c_parser_get_builtin_args): Preserve > original_tree_code of c_expr structure. > (c_parser_postfix_expression): Adjust to the new function. > > (note top marker and without the "Fixes...") > and there'll be an automatic entry in bugzilla, like for example > PR50609. You still have to close the bugzilla entry manually > (not all noteworthy fixes are reason to close a bug report). > > Mostly as a note for the future but you could fix that now, when > adding the missing empty line after the date+email line. ;) > > brgds, H-P > Thanks for the advice, I didn't know about the automatic mail. Fixed with commit 179589. As for closing bugzilla entry, I would prefer to make sure that the patch really fixes the problem. It would be very helpful if you could confirm this. Thanks, Artem.
Merge to gccgo branch
I merged mainlined revision 179565 to the gccgo branch. I included a patch disabling -fshrink-wrap if -fstack-prologue is enabled. Ian