Re: PR c++/56782 - Regression with empty pack expansions
Jason Merrill writes: > I thought I approved this on IRC; please apply it to trunk and 4.8. Applied to both branches, thanks. -- Dodji
Re: [PATCH] Don't invalidate string length cache when not needed
On Wed, May 15, 2013 at 7:20 PM, Jakub Jelinek wrote: > On Wed, May 15, 2013 at 06:59:09PM +0200, Marek Polacek wrote: >> This is a strlen opt patch that better optimizes attached testcase; >> there's just no need to call strlen again, as we're not changing >> the length of the string. Unfortunately this still handles only >> p[0], not for instance p[1], p[2], ... so we likely don't want to put >> this in now. But I'm posting it at least for archive reasons anyway. >> It works by not invalidating the length cache if we're not storing >> \0 into a string and the length of a string is > 0. >> >> Bootstrapped/regtested on x86_64-linux. >> >> 2013-05-15 Marek Polacek >> >> * tree-ssa-strlen.c (handle_char_store): Don't invalidate >> cached length when doing non-zero store. >> >> * gcc.dg/strlenopt-25.c: New test. >> >> --- gcc/tree-ssa-strlen.c.mp 2013-05-15 14:11:20.079707492 +0200 >> +++ gcc/tree-ssa-strlen.c 2013-05-15 17:21:23.772094679 +0200 >> @@ -1717,6 +1717,11 @@ handle_char_store (gimple_stmt_iterator >> si->endptr = ssaname; >> si->dont_invalidate = true; >> } >> + else if (si != NULL && si->length != NULL_TREE >> +&& TREE_CODE (si->length) == INTEGER_CST >> +&& integer_nonzerop (gimple_assign_rhs1 (stmt)) >> +&& tree_int_cst_sgn (si->length) > 0) >> + si->dont_invalidate = true; > > Well, if si->length is known constant != 0 (note, it is enough to > test that it is non-zero and probably the code above this has > tested that already?) and we are storing non-zero, then that > should mean we are overwriting a non-zero value with another (or same) > non-zero value. Such operation shouldn't change the length of any strings > at all, not just the ones related to the current strinfo. > That is either just setting all current strinfos to dont_invalidate, or > perhaps faster doing gsi_next (gsi); return false; (that means the caller > won't call maybe_invalidate - of course in that case you must not set > any dont_invalidate on any strinfo on that stmt) and the caller of > strlen_optimize_stmt will not do gsi_next (&gsi) - that is why you need > to do it instead. Can we properly distinguish the case of char *s = "Hello\0World!"; s[5] = ' '; (minor the imperfections in that example)? Thus, overwriting the terminating 0? Richard. > Jakub
Re: cfgexpand.c patch for [was new port: msp430-elf]
On Wed, May 15, 2013 at 7:24 PM, Mike Stump wrote: > On May 15, 2013, at 1:27 AM, Richard Biener > wrote: >> My question is, if you end up with >> >> (truncate:PSI (reg:SI 27)) >> >> and then constant propagate 0x7fff to reg:SI 27, what does simplify-rtx.c >> do here? Truncate to _what_ precision exactly? > > In my world, I change PSI to be P28SI, and then answer is that there are 28 > bits. All ports that create partial int modes full well know the exact > precision, and that can be added. There is a ton of support already in the > compiler for this, and the last little bit in the mode def language and to > strap it in is light weight and obvious. Indeed. What's the blocker to convert the existing 5 cases of PARTIAL_INT_MODE use to specify a precision? Richard. >> Recent introduction of PTImode to rs6000 makes me think that the >> PARTIAL_INT_MODE()s are a hack to simply get another name for >> TImode (in this case). >> >> Thus to the middle-end it shouldn't be >> >> (truncate:PSI (reg:SI 27)) >> >> but >> >> (set (reg:PSI 28 (reg:SI 27))) >> >> or maybe >> >> (subreg:PSI (reg:SI 27)) > > I see all forms as valid, but not the same. > > (set (reg:P28SI (reg:SI 29)) >(truncate:P28SI (reg:SI 27))) > > is natural and reasonable, which combines two of the forms above. Using > truncate is fine. > >> config/avr/avr-modes.def:FRACTIONAL_INT_MODE (PSI, 24, 3); > > I never got any joy from FRACTIONAL_INT_MODE. > >> So ... time to remove PARTIAL_INT_MODE ()s? Btw, I wonder why > > No.
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On Thu, May 16, 2013 at 5:38 AM, Jeff Law wrote: > On 05/15/2013 12:28 PM, Steve Ellcey wrote: >> >> Here is a patch that adds a flag to gimple_duplicate_sese_region to tell >> it whether or not to update the dominator information. I had to add the >> same flag to copy_bbs to make it all work. How does this look? I >> tested it with a bootstrap and test on x86 (with my optimization >> enabled) and got no regressions. >> >> 2013-05-15 Steve Ellcey >> >> * cfghooks.c (copy_bbs): Add update_dominance argument. >> * cfghooks.h (copy_bbs): Update prototype. >> * tree-cfg.c (gimple_duplicate_sese_region): >> Add update_dominance argument. >> * tree-flow.h (gimple_duplicate_sese_region): Update prototype. >> * tree-ssa-loop-ch.c (copy_loop_headers): Update >> gimple_duplicate_sese_region call. >> * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): >> Update copy_bbs call. >> * cfgloopmanip.c (duplicate_loop_to_header_edge): Ditto. >> * trans-mem.c (ipa_uninstrument_transaction): Ditto. > > So I'd change gimple_duplicate_sese_region to gimple_duplicate_seme region > per comments from others. > > Where you document UPDATE_DOMINANCE, I'd add something like: When > UPDATE_DOMINANCE is true, it is assumed that duplicating the region (or > copying the blocks for copy_bbs) may change the dominator tree in ways that > are not suitable for an incremental update and the caller is responsible for > destroying and recomputing the dominator tree. > > Hmm, not terribly happy with that wording, but that gives you an idea of > what I'm after. When would someone set UPDATE_DOMINANCE to true and what > are their responsibilities when they do so. > > Approved with the name change and a better comment for UPDATE_DOMINANCE. Btw, the function does _not_ handle arbitrary SEME regions - it only handles a single exit correctly and assumes no (SSA) data flows across the others. So I'd rather not rename it. Richard. > Jeff >
Re: Break infinite folding loop
On Thu, May 16, 2013 at 8:42 AM, Marc Glisse wrote: > Hello, > > we can get into a cycle where: > (x<0)|1 becomes (x<0)?-1:1 > and > (y?-1:1) becomes y|1 > > Contrary to what I posted in the PR, I am disabling the second > transformation here. It can be done later (the x86 target partially does it > in the vcond expansion), and the VEC_COND_EXPR allows us to perform further > operations: > (((x<0)|1)*5-1)/2 becomes (untested) (x<0)?-3:2 > > Also, this is a partial revert of the last patch, which sounds safer. I am > leaving the vector tests in the disabled transformations in case we decide > to re-enable them later. > > This isn't the end of the story, even for fold-const.c. I kept the > transformation a?-1:0 -> a, but it does not work because: > > /* If we try to convert OP0 to our type, the > call to fold will try to move the conversion inside > a COND, which will recurse. In that case, the COND_EXPR > is probably the best choice, so leave it alone. */ > && type == TREE_TYPE (arg0)) > > and when a is a comparison, its type is a different type (even looking > through main variant and canonical), only useless_type_conversion_p notices > they are so similar, and I would still need a conversion, otherwise the > front-end complains when I try to assign the result that it has a different > type than the variable I want to assign it to (I expected the result of the > comparison to be opaque, and thus no complaining, but apparently not). Which is why most of these non-trivial transforms should happen on GIMPLE via what I and Andrew proposed some year(s) ago. But well ... > Also, we may want to make fold_binary_op_with_conditional_arg more strict on > how much folding is necessary to consider the transformation worth it. For > VEC_COND_EXPR where both branches are evaluated anyway, at least if we > started from a comparison and not already a VEC_COND_EXPR, we could require > that both branches fold. > > But it seems better to fix the ICE quickly and do the rest later. > > Passes bootstrap+testsuite on x86_64-linux-gnu. Ok. Thanks, Richard. > 2013-05-16 Marc Glisse > > PR middle-end/57286 > gcc/ > * fold-const.c (fold_ternary_loc) : Disable some > transformations to avoid an infinite loop. > > gcc/testsuite/ > * gcc.dg/pr57286.c: New testcase. > * gcc.dg/vector-shift-2.c: Don't assume int has size 4. > * g++.dg/ext/vector22.C: Comment out transformations not > performed anymore. > > -- > Marc Glisse > Index: testsuite/gcc.dg/pr57286.c > === > --- testsuite/gcc.dg/pr57286.c (revision 0) > +++ testsuite/gcc.dg/pr57286.c (revision 0) > @@ -0,0 +1,7 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O" } */ > + > +typedef int vec __attribute__ ((vector_size (4*sizeof(int; > +void f (vec *x){ > +*x = (*x < 0) | 1; > +} > > Property changes on: testsuite/gcc.dg/pr57286.c > ___ > Added: svn:keywords >+ Author Date Id Revision URL > Added: svn:eol-style >+ native > > Index: testsuite/gcc.dg/vector-shift-2.c > === > --- testsuite/gcc.dg/vector-shift-2.c (revision 198950) > +++ testsuite/gcc.dg/vector-shift-2.c (working copy) > @@ -1,13 +1,13 @@ > /* { dg-do compile } */ > /* { dg-options "-O -fdump-tree-ccp1" } */ > > -typedef unsigned vec __attribute__ ((vector_size (16))); > +typedef unsigned vec __attribute__ ((vector_size (4*sizeof(int; > void > f (vec *a) > { >vec s = { 5, 5, 5, 5 }; >*a = *a << s; > } > > /* { dg-final { scan-tree-dump "<< 5" "ccp1" } } */ > /* { dg-final { cleanup-tree-dump "ccp1" } } */ > Index: testsuite/g++.dg/ext/vector22.C > === > --- testsuite/g++.dg/ext/vector22.C (revision 198950) > +++ testsuite/g++.dg/ext/vector22.C (working copy) > @@ -1,20 +1,22 @@ > /* { dg-do compile } */ > /* { dg-options "-O -fdump-tree-gimple" } */ > > typedef unsigned vec __attribute__((vector_size(4*sizeof(int; > > +/* Disabled after PR57286 > void f(vec*a,vec*b){ >*a=(*a)?-1:(*b<10); >*b=(*b)?(*a<10):0; > } > +*/ > void g(vec*a,vec*b){ >*a=(*a)?(*a<*a):-1; > - *b=(*b)?-1:(*b<*b); > +// *b=(*b)?-1:(*b<*b); > } > void h(vec*a){ >*a=(~*a==5); > } > > /* { dg-final { scan-tree-dump-not "~" "gimple" } } */ > /* { dg-final { scan-tree-dump-not "VEC_COND_EXPR" "gimple" } } */ > /* { dg-final { cleanup-tree-dump "gimple" } } */ > Index: fold-const.c > === > --- fold-const.c(revision 198950) > +++ fold-const.c(working copy) > @@ -14204,20 +14204,26 @@ fold_ternary_loc (location_t loc, enum t > && TREE_CODE (arg0) == NE_EXPR > && integer_zerop (TREE_OPERAND (arg0, 1))
Re: [PATCH] Don't invalidate string length cache when not needed
On Thu, May 16, 2013 at 10:54:53AM +0200, Richard Biener wrote: > Can we properly distinguish the case of > > char *s = "Hello\0World!"; > s[5] = ' '; > > (minor the imperfections in that example)? Thus, overwriting the terminating > 0? I think so. Because then for &s[5], either si should be NULL, or si->length should be NULL (either case suggests we either never knew or don't know any longer the string length at that address), or should be non-constant, or should be zero. Because if it is constant non-zero at that point, it would mean that strlen (&s[5]) at that point would return non-zero constant, but should have returned 0. Jakub
Re: [PATCH] Add SCEV cprop dump
On Wed, May 15, 2013 at 4:35 PM, Marek Polacek wrote: > On Wed, May 15, 2013 at 02:58:22PM +0200, Richard Biener wrote: >> On Wed, May 15, 2013 at 2:07 PM, Marek Polacek wrote: >> > /* Replace the uses of the name. */ >> > if (name != ev) >> > - replace_uses_by (name, ev); >> > + { >> > + replace_uses_by (name, ev); >> > + if (dump_file && (dump_flags & TDF_SCEV)) >> >> should be without dump_flags checking > > Ok. > >> > + { >> > + fprintf (dump_file, "(replace_stmt \n ("); >> > + print_generic_expr (dump_file, name, 0); >> > + fprintf (dump_file, " with "); >> > + print_generic_expr (dump_file, ev, 0); >> > + fprintf (dump_file, ")\n) \n"); >> >> and no need to do it the LISP-y way ;) > > Good, I didn't like it much anyway. > >> I would have liked to see failed attempts as well, then with TDF_DETAILS. >> Failed attempts for the "real" final value replacement stuff (I'm not sure >> the constant propagation part is worth keeping ... how often does it >> trigger?) > > Not much often: I've measured it and it happens only in ~150 testcases > from the whole c/c++/fortran testsuites. So, like this? Thanks, > > It looks like: > not replacing: > n_4 = PHI > > and > > final value replacement: > n_4 = PHI > with > n_4 = _1 + _12; Ok. Thanks, Richard. > 2013-05-15 Marek Polacek > > * tree-scalar-evolution.c (scev_const_prop): Add more dumps. > > --- gcc/tree-scalar-evolution.c.mp 2013-05-15 15:09:06.579122696 +0200 > +++ gcc/tree-scalar-evolution.c 2013-05-15 16:32:11.569217537 +0200 > @@ -3385,12 +3385,24 @@ scev_const_prop (void) > to be turned into n %= 45. */ > || expression_expensive_p (def)) > { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + fprintf (dump_file, "not replacing:\n "); > + print_gimple_stmt (dump_file, phi, 0, 0); > + fprintf (dump_file, "\n"); > + } > gsi_next (&psi); > continue; > } > > /* Eliminate the PHI node and replace it by a computation outside > the loop. */ > + if (dump_file) > + { > + fprintf (dump_file, "\nfinal value replacement:\n "); > + print_gimple_stmt (dump_file, phi, 0, 0); > + fprintf (dump_file, " with\n "); > + } > def = unshare_expr (def); > remove_phi_node (&psi, false); > > @@ -3398,6 +3410,11 @@ scev_const_prop (void) > true, GSI_SAME_STMT); > ass = gimple_build_assign (rslt, def); > gsi_insert_before (&bsi, ass, GSI_SAME_STMT); > + if (dump_file) > + { > + print_gimple_stmt (dump_file, ass, 0, 0); > + fprintf (dump_file, "\n"); > + } > } > } >return 0; > > Marek
Re: Break infinite folding loop
On Thu, 16 May 2013, Richard Biener wrote: On Thu, May 16, 2013 at 8:42 AM, Marc Glisse wrote: Hello, we can get into a cycle where: (x<0)|1 becomes (x<0)?-1:1 and (y?-1:1) becomes y|1 Contrary to what I posted in the PR, I am disabling the second transformation here. It can be done later (the x86 target partially does it in the vcond expansion), and the VEC_COND_EXPR allows us to perform further operations: (((x<0)|1)*5-1)/2 becomes (untested) (x<0)?-3:2 Also, this is a partial revert of the last patch, which sounds safer. I am leaving the vector tests in the disabled transformations in case we decide to re-enable them later. This isn't the end of the story, even for fold-const.c. I kept the transformation a?-1:0 -> a, but it does not work because: /* If we try to convert OP0 to our type, the call to fold will try to move the conversion inside a COND, which will recurse. In that case, the COND_EXPR is probably the best choice, so leave it alone. */ && type == TREE_TYPE (arg0)) and when a is a comparison, its type is a different type (even looking through main variant and canonical), only useless_type_conversion_p notices they are so similar, and I would still need a conversion, otherwise the front-end complains when I try to assign the result that it has a different type than the variable I want to assign it to (I expected the result of the comparison to be opaque, and thus no complaining, but apparently not). Which is why most of these non-trivial transforms should happen on GIMPLE Indeed, I guess I'll try that instead of risking more infinite loops, thanks. Although when a transformation exists in fold-const.c, it is always tempting to adapt it instead of rewriting it elsewhere in the compiler (where it ends up 3 times as long and complicated)... via what I and Andrew proposed some year(s) ago. But well ... You mean the existing tree-ssa-forwprop.c, or something different? (I remember the valueize idea) Also, we may want to make fold_binary_op_with_conditional_arg more strict on how much folding is necessary to consider the transformation worth it. For VEC_COND_EXPR where both branches are evaluated anyway, at least if we started from a comparison and not already a VEC_COND_EXPR, we could require that both branches fold. But it seems better to fix the ICE quickly and do the rest later. Passes bootstrap+testsuite on x86_64-linux-gnu. Ok. Thanks, Richard. 2013-05-16 Marc Glisse PR middle-end/57286 gcc/ * fold-const.c (fold_ternary_loc) : Disable some transformations to avoid an infinite loop. gcc/testsuite/ * gcc.dg/pr57286.c: New testcase. * gcc.dg/vector-shift-2.c: Don't assume int has size 4. * g++.dg/ext/vector22.C: Comment out transformations not performed anymore. -- Marc Glisse
[committed] Fix expansion of some #pragma omp for loops
Hi! As the testcase show, if in a schedule(static) (the default schedule) or schedule(static,N) non-collapsed loop with unsigned integral iterator the loop condition is false upon entering the loop, but (n2 + step-1 - n1) / step is not 0, we could run the loop body, possibly many times, rather than never. Similarly for collapsed loops, if any of the collapsed loops had the condition false initially, but we computed non-zero number of iterations. For non-collapsed loops with signed iterators or pointer iterators this isn't a problem, because (n2 + step-1 - n1) / step is then evaluated in a signed integer type and thus for the condition initially false that count is negative and we never loop. Similarly, for the cases where we call the runtime, the runtime already checks for this case properly. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk and 4.8 branch. 2013-05-16 Jakub Jelinek * omp-low.c (extract_omp_for_data): For collapsed loops, if at least one of the loops is known at compile time to iterate zero times, set count to 0. (expand_omp_regimplify_p): New function. (expand_omp_for_generic): For collapsed loops, if at least one of the loops isn't known to iterate at least once, add runtime check with setting count to 0. (expand_omp_for_static_nochunk, expand_omp_for_static_chunk): For unsigned types if it isn't known at compile time that the loop will iterate at least once, add runtime check to bypass the whole loop if initial condition isn't true. * testsuite/libgomp.c/loop-13.c: New test. * testsuite/libgomp.c/loop-14.c: New test. * testsuite/libgomp.c/loop-15.c: New test. * testsuite/libgomp.c++/loop-13.C: New test. * testsuite/libgomp.c++/loop-14.C: New test. * testsuite/libgomp.c++/loop-15.C: New test. --- gcc/omp-low.c.jj2013-04-30 10:45:10.0 +0200 +++ gcc/omp-low.c 2013-05-16 08:43:49.535590890 +0200 @@ -398,11 +398,16 @@ extract_omp_for_data (gimple for_stmt, s if (collapse_count && *collapse_count == NULL) { - if ((i == 0 || count != NULL_TREE) - && TREE_CODE (TREE_TYPE (loop->v)) == INTEGER_TYPE - && TREE_CONSTANT (loop->n1) - && TREE_CONSTANT (loop->n2) - && TREE_CODE (loop->step) == INTEGER_CST) + t = fold_binary (loop->cond_code, boolean_type_node, + fold_convert (TREE_TYPE (loop->v), loop->n1), + fold_convert (TREE_TYPE (loop->v), loop->n2)); + if (t && integer_zerop (t)) + count = build_zero_cst (long_long_unsigned_type_node); + else if ((i == 0 || count != NULL_TREE) + && TREE_CODE (TREE_TYPE (loop->v)) == INTEGER_TYPE + && TREE_CONSTANT (loop->n1) + && TREE_CONSTANT (loop->n2) + && TREE_CODE (loop->step) == INTEGER_CST) { tree itype = TREE_TYPE (loop->v); @@ -435,7 +440,7 @@ extract_omp_for_data (gimple for_stmt, s if (TREE_CODE (count) != INTEGER_CST) count = NULL_TREE; } - else + else if (count && !integer_zerop (count)) count = NULL_TREE; } } @@ -3387,6 +3392,25 @@ optimize_omp_library_calls (gimple entry } } +/* Callback for expand_omp_build_assign. Return non-NULL if *tp needs to be + regimplified. */ + +static tree +expand_omp_regimplify_p (tree *tp, int *walk_subtrees, void *) +{ + tree t = *tp; + + /* Any variable with DECL_VALUE_EXPR needs to be regimplified. */ + if (TREE_CODE (t) == VAR_DECL && DECL_HAS_VALUE_EXPR_P (t)) +return t; + + if (TREE_CODE (t) == ADDR_EXPR) +recompute_tree_invariant_for_addr_expr (t); + + *walk_subtrees = !TYPE_P (t) && !DECL_P (t); + return NULL_TREE; +} + /* Expand the OpenMP parallel or task directive starting at REGION. */ static void @@ -3662,22 +3686,29 @@ expand_omp_taskreg (struct omp_region *r we generate pseudocode + if (__builtin_expect (N32 cond3 N31, 0)) goto Z0; if (cond3 is <) adj = STEP3 - 1; else adj = STEP3 + 1; count3 = (adj + N32 - N31) / STEP3; + if (__builtin_expect (N22 cond2 N21, 0)) goto Z0; if (cond2 is <) adj = STEP2 - 1; else adj = STEP2 + 1; count2 = (adj + N22 - N21) / STEP2; + if (__builtin_expect (N12 cond1 N11, 0)) goto Z0; if (cond1 is <) adj = STEP1 - 1; else adj = STEP1 + 1; count1 = (adj + N12 - N11) / STEP1; count = count1 * count2 * count3; + goto Z1; +Z0: + count = 0; +Z1: more = GOMP_loop_foo_start (0, count, 1, CHUNK, &istart0, &iend0); if (more) goto L0; else goto L3; L0: @@ -3785,6 +3816,9 @@ expand_omp_for_generic (struct omp_regio gcc_assert (gimple_
[PATCH] Move autopar pass
I noticed that when you enable vectorization for the autopar testcases most of them fail. This is because vectorization introduces new loops and tests that make the CFG complicated enough for autopar to give up. But then autopar is a high-level transform and should be done earlier anyway. Which is what the following does - move it right after GRAPHITE (IIRC autopar may re-use GRAPHITE dependence analysis, so it has to come after it) and IV canonicalization (not sure why we don't do that earlier). Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2013-05-16 Richard Biener * passes.c (init_optimization_passes): Move pass_parallelize_loops earlier, after GRAPHITE transforms and IV canonicalization. Index: gcc/passes.c === *** gcc/passes.c(revision 198962) --- gcc/passes.c(working copy) *** init_optimization_passes (void) *** 1475,1480 --- 1475,1481 NEXT_PASS (pass_dce_loop); } NEXT_PASS (pass_iv_canon); + NEXT_PASS (pass_parallelize_loops); NEXT_PASS (pass_if_conversion); NEXT_PASS (pass_vectorize); { *** init_optimization_passes (void) *** 1484,1490 NEXT_PASS (pass_predcom); NEXT_PASS (pass_complete_unroll); NEXT_PASS (pass_slp_vectorize); - NEXT_PASS (pass_parallelize_loops); NEXT_PASS (pass_loop_prefetch); NEXT_PASS (pass_iv_optimize); NEXT_PASS (pass_lim); --- 1485,1490
Re: RFC: PATCH to avoid linking multiple front ends at once with parallel make
On May 15, 2013, Jason Merrill wrote: > + elif [ $count == 300 ]; then s/==/=/ Ok with this change. Thanks, -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [build, doc] Obsolete Solaris 9 support
Rainer Orth writes: > I think the time has come to obsolete Solaris 9 support: > > * According to > > http://www.oracle.com/us/support/library/lsp-coverage-sun-software-309122.pdf, > p.17, Premier Support has already ended in October 2011 and even > Extended Support will end in October 2014, which means it's impossible > to get patches without a special contract. By the time GCC 4.10 is > expected to be released (Spring 2015), this period is well past. This > timescale is in line with what happened for Solaris 7 (obsoleted in > GCC 4.5) and Solaris 8 (obsoleted in GCC 4.7). > > * Solaris 9 seems to be far less popular than Solaris 8 was: last time I > checked there was only a single Solaris 9 testresults posting apart > from my own. > > * By the time Solaris 12 appears, I'll need to reduce the testing matrix > to keep the amount of work manageable. > > Therefore the following patch does just that. Tested by > configuring/building without and with --enable-obsolete and checking > gccinstall.info on i386-pc-solaris2.9. The config-list.mk part is > untested, but should be straightforward. > > Unless there are strong objections, I plan to install this patch in a > day or two. Given that there were no comments, I installed the patch. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[wwwdocs] Announce Solaris 9 obsoletion
Corresponding to the Solaris 9 obsoletion patch, I've now installed the following wwwdocs patch to document it. Rainer * htdocs/gcc-4.9/changes.html: Document Solaris 9 obsoletion. Index: htdocs/gcc-4.9/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v retrieving revision 1.11 retrieving revision 1.13 diff -u -p -r1.11 -r1.13 --- htdocs/gcc-4.9/changes.html 9 May 2013 17:08:19 - 1.11 +++ htdocs/gcc-4.9/changes.html 16 May 2013 11:48:37 - 1.13 @@ -16,6 +16,23 @@ Caveats --> + +Support for a number of older systems and recently +unmaintained or untested target ports of GCC has been declared +obsolete in GCC 4.9. Unless there is activity to revive them, the +next release of GCC will have their sources permanently +removed. + +The following ports for individual systems on +particular architectures have been obsoleted: + + + Solaris 9 (*-*-solaris2.9). Details can be found in the + http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00728.html";> + announcement. + + +
Re: [patch] Small emit-rtl.c / reorg.c cleanup
Steven Bosscher writes: > This just removes one unused function, and moves two functions from > emit-rtl.c to reorg.c which is the only place where they're used. > > Will commit in a few days, barring objections. > > Ciao! > Steven > > > * rtl.h (next_label, skip_consecutive_labels, link_cc0_insns): > Remove prototypes. > * emit-rtl.c (next_label): Remove unused function. > (skip_consecutive_labels, link_cc0_insns): Move to ... > * reorg.c (skip_consecutive_labels, link_cc0_insns): ... here, the > only place where these functions are used. Unfortunately, this patch broke SPARC bootstrap since it lost the HAVE_cc0 guard around link_cc0_insns: /vol/gcc/src/hg/trunk/local/gcc/reorg.c:164:1: error: 'void link_cc0_insns(rtx)' defined but not used [-Werror=unused-function] link_cc0_insns (rtx insn) ^ cc1plus: all warnings being treated as errors make[3]: *** [reorg.o] Error 1 I'll install the obvious patch once testing on sparc-sun-solaris2.11 has gotten into stage 3. Rainer 2013-05-16 Rainer Orth * reorg.c (link_cc0_insns): Wrap in #ifdef HAVE_cc0. # HG changeset patch # Parent 4901ecbded49adb7097c93614fa708cb6cd53695 Restore bootstrap on non-cc0 targets diff --git a/gcc/reorg.c b/gcc/reorg.c --- a/gcc/reorg.c +++ b/gcc/reorg.c @@ -157,6 +157,7 @@ skip_consecutive_labels (rtx label) return label; } +#ifdef HAVE_cc0 /* INSN uses CC0 and is being moved into a delay slot. Set up REG_CC_SETTER and REG_CC_USER notes so we can find it. */ @@ -171,6 +172,7 @@ link_cc0_insns (rtx insn) add_reg_note (user, REG_CC_SETTER, insn); add_reg_note (insn, REG_CC_USER, user); } +#endif /* Insns which have delay slots that have not yet been filled. */ -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH][2/n] 2nd try: Re-organize -fvect-cost-model, enable basic vectorization at -O2
The following is a revision of the original patch in that it also constrains versioning for aliasing with the cheap cost model (formerly -ftree-vect-loop-version only constrained versioning for alignment). It also changes vectorizer related gates to rely on global_options_set instead of playing magic games with flag values (that won't extend to vectorization being enabled by default at -O2 with a different default cost model). It also disables if-conversion at -O2 as the current way that works makes code changes to non-vectorized code as well. I've done some measurements with that (also with the previous patch, not quoted here), plus [3/n] that just changes -ftree-vectorize to be enabled at -O2. Thus, SPEC 2k6 CPU -O2 -fno-tree-vectorize vs. -O2 on x86_64 SandyBridge (but without any extra -m flag). Basically checking what differences we can expect in distribution builds. > du -k */exe/* 1212400.perlbench/exe/perlbench_base.amd64-m64-gcc42-nn 1216400.perlbench/exe/perlbench_peak.amd64-m64-gcc42-nn 76 401.bzip2/exe/bzip2_base.amd64-m64-gcc42-nn 80 401.bzip2/exe/bzip2_peak.amd64-m64-gcc42-nn 3632403.gcc/exe/gcc_base.amd64-m64-gcc42-nn 3640403.gcc/exe/gcc_peak.amd64-m64-gcc42-nn 48 410.bwaves/exe/bwaves_base.amd64-m64-gcc42-nn 60 410.bwaves/exe/bwaves_peak.amd64-m64-gcc42-nn 8952416.gamess/exe/gamess_base.amd64-m64-gcc42-nn 9556416.gamess/exe/gamess_peak.amd64-m64-gcc42-nn 28 429.mcf/exe/mcf_base.amd64-m64-gcc42-nn 28 429.mcf/exe/mcf_peak.amd64-m64-gcc42-nn 148 433.milc/exe/milc_base.amd64-m64-gcc42-nn 148 433.milc/exe/milc_peak.amd64-m64-gcc42-nn 280 434.zeusmp/exe/zeusmp_base.amd64-m64-gcc42-nn 348 434.zeusmp/exe/zeusmp_peak.amd64-m64-gcc42-nn 1096435.gromacs/exe/gromacs_base.amd64-m64-gcc42-nn 1104435.gromacs/exe/gromacs_peak.amd64-m64-gcc42-nn 804 436.cactusADM/exe/cactusADM_base.amd64-m64-gcc42-nn 932 436.cactusADM/exe/cactusADM_peak.amd64-m64-gcc42-nn 132 437.leslie3d/exe/leslie3d_base.amd64-m64-gcc42-nn 204 437.leslie3d/exe/leslie3d_peak.amd64-m64-gcc42-nn 340 444.namd/exe/namd_base.amd64-m64-gcc42-nn 348 444.namd/exe/namd_peak.amd64-m64-gcc42-nn 3956445.gobmk/exe/gobmk_base.amd64-m64-gcc42-nn 3960445.gobmk/exe/gobmk_peak.amd64-m64-gcc42-nn 4012447.dealII/exe/dealII_base.amd64-m64-gcc42-nn 4136447.dealII/exe/dealII_peak.amd64-m64-gcc42-nn 468 450.soplex/exe/soplex_base.amd64-m64-gcc42-nn 476 450.soplex/exe/soplex_peak.amd64-m64-gcc42-nn 1152453.povray/exe/povray_base.amd64-m64-gcc42-nn 1156453.povray/exe/povray_peak.amd64-m64-gcc42-nn 1796454.calculix/exe/calculix_base.amd64-m64-gcc42-nn 1824454.calculix/exe/calculix_peak.amd64-m64-gcc42-nn 324 456.hmmer/exe/hmmer_base.amd64-m64-gcc42-nn 328 456.hmmer/exe/hmmer_peak.amd64-m64-gcc42-nn 160 458.sjeng/exe/sjeng_base.amd64-m64-gcc42-nn 164 458.sjeng/exe/sjeng_peak.amd64-m64-gcc42-nn 432 459.GemsFDTD/exe/GemsFDTD_base.amd64-m64-gcc42-nn 576 459.GemsFDTD/exe/GemsFDTD_peak.amd64-m64-gcc42-nn 68 462.libquantum/exe/libquantum_base.amd64-m64-gcc42-nn 68 462.libquantum/exe/libquantum_peak.amd64-m64-gcc42-nn 572 464.h264ref/exe/h264ref_base.amd64-m64-gcc42-nn 576 464.h264ref/exe/h264ref_peak.amd64-m64-gcc42-nn 4488465.tonto/exe/tonto_base.amd64-m64-gcc42-nn 4580465.tonto/exe/tonto_peak.amd64-m64-gcc42-nn 28 470.lbm/exe/lbm_base.amd64-m64-gcc42-nn 28 470.lbm/exe/lbm_peak.amd64-m64-gcc42-nn 784 471.omnetpp/exe/omnetpp_base.amd64-m64-gcc42-nn 784 471.omnetpp/exe/omnetpp_peak.amd64-m64-gcc42-nn 60 473.astar/exe/astar_base.amd64-m64-gcc42-nn 64 473.astar/exe/astar_peak.amd64-m64-gcc42-nn 4460481.wrf/exe/wrf_base.amd64-m64-gcc42-nn 5332481.wrf/exe/wrf_peak.amd64-m64-gcc42-nn 208 482.sphinx3/exe/sphinx_livepretend_base.amd64-m64-gcc42-nn 212 482.sphinx3/exe/sphinx_livepretend_peak.amd64-m64-gcc42-nn 5660483.xalancbmk/exe/Xalan_base.amd64-m64-gcc42-nn 5668483.xalancbmk/exe/Xalan_peak.amd64-m64-gcc42-nn 12 998.specrand/exe/specrand_base.amd64-m64-gcc42-nn 12 998.specrand/exe/specrand_peak.amd64-m64-gcc42-nn 12 999.specrand/exe/specrand_base.amd64-m64-gcc42-nn 12 999.specrand/exe/specrand_peak.amd64-m64-gcc42-nn (serial make) > grep 'Elapsed compile ' /abuild/rguenther/spec2k6/result/CPU2006.497.log Elapsed compile for '400.perlbench': 00:00:28 (28) Elapsed compile for '401.bzip2': 00:00:05 (5) Elapsed compile for '403.gcc': 00:01:07 (67) Elapsed compile for '429.mcf': 00:00:03 (3) Elapsed compile for '445.gobmk': 00:00:21 (21) Elapsed compile for '456.hmmer': 00:00:11 (11) Elapsed compile for '458.sjeng': 00:00:06 (6) Elapsed compile for '462.libquantum': 00:00:05 (5) Elapsed compile for '464.h264ref': 00:00:15 (15) Elapsed compile for '471.omnetpp': 00:00:29 (29) Elapsed compile for '473.astar': 00:00:04 (4) Elapsed compile for '483.xalancbmk': 00:03:02 (182) Elapsed compile for
Re: Break infinite folding loop
On Thu, May 16, 2013 at 12:05 PM, Marc Glisse wrote: > On Thu, 16 May 2013, Richard Biener wrote: > >> On Thu, May 16, 2013 at 8:42 AM, Marc Glisse wrote: >>> >>> Hello, >>> >>> we can get into a cycle where: >>> (x<0)|1 becomes (x<0)?-1:1 >>> and >>> (y?-1:1) becomes y|1 >>> >>> Contrary to what I posted in the PR, I am disabling the second >>> transformation here. It can be done later (the x86 target partially does >>> it >>> in the vcond expansion), and the VEC_COND_EXPR allows us to perform >>> further >>> operations: >>> (((x<0)|1)*5-1)/2 becomes (untested) (x<0)?-3:2 >>> >>> Also, this is a partial revert of the last patch, which sounds safer. I >>> am >>> leaving the vector tests in the disabled transformations in case we >>> decide >>> to re-enable them later. >>> >>> This isn't the end of the story, even for fold-const.c. I kept the >>> transformation a?-1:0 -> a, but it does not work because: >>> >>> /* If we try to convert OP0 to our type, the >>> call to fold will try to move the conversion inside >>> a COND, which will recurse. In that case, the COND_EXPR >>> is probably the best choice, so leave it alone. */ >>> && type == TREE_TYPE (arg0)) >>> >>> and when a is a comparison, its type is a different type (even looking >>> through main variant and canonical), only useless_type_conversion_p >>> notices >>> they are so similar, and I would still need a conversion, otherwise the >>> front-end complains when I try to assign the result that it has a >>> different >>> type than the variable I want to assign it to (I expected the result of >>> the >>> comparison to be opaque, and thus no complaining, but apparently not). >> >> >> Which is why most of these non-trivial transforms should happen on GIMPLE > > > Indeed, I guess I'll try that instead of risking more infinite loops, > thanks. Although when a transformation exists in fold-const.c, it is always > tempting to adapt it instead of rewriting it elsewhere in the compiler > (where it ends up 3 times as long and complicated)... > > >> via what I and Andrew proposed some year(s) ago. But well ... > > > You mean the existing tree-ssa-forwprop.c, or something different? > (I remember the valueize idea) Basically what tree-ssa-forwprop.c does but wrapped in a fold-const like interface. See http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01099.html, Andrew was having a branch somewhen. Richard. > >>> Also, we may want to make fold_binary_op_with_conditional_arg more strict >>> on >>> how much folding is necessary to consider the transformation worth it. >>> For >>> VEC_COND_EXPR where both branches are evaluated anyway, at least if we >>> started from a comparison and not already a VEC_COND_EXPR, we could >>> require >>> that both branches fold. >>> >>> But it seems better to fix the ICE quickly and do the rest later. >>> >>> Passes bootstrap+testsuite on x86_64-linux-gnu. >> >> >> Ok. >> >> Thanks, >> Richard. >> >>> 2013-05-16 Marc Glisse >>> >>> PR middle-end/57286 >>> gcc/ >>> * fold-const.c (fold_ternary_loc) : Disable some >>> transformations to avoid an infinite loop. >>> >>> gcc/testsuite/ >>> * gcc.dg/pr57286.c: New testcase. >>> * gcc.dg/vector-shift-2.c: Don't assume int has size 4. >>> * g++.dg/ext/vector22.C: Comment out transformations not >>> performed anymore. > > > -- > Marc Glisse
Re: [PATCH, PR preprocessor/42014] Added LAST_SOURCE_COLUMN in while loop
On May 15, 2013, at 10:41 PM, Shakthi Kannan wrote: > | I like using ~/contrib/compare_tests gcc-before.sum gcc-after.sum to > | determine if there are regressions. You can also use that script to > | check for regressions between two build trees as well. > \-- > > I ran the the script between the two build trees. Here is the output: > > $ ./without/gcc/contrib/compare_tests ./without/build ./with/build > # Comparing directories > ## Dir1=./without/build: 12 sum files > ## Dir2=./with/build: 12 sum files > > # Comparing 12 common sum files > ## /bin/sh ./without/gcc/contrib/compare_tests /tmp/gxx-sum1.2065 > /tmp/gxx-sum2.2065 > # No differences found in 12 common sum files :-) That's two thumbs up.
Re: [wwwdocs] Announce Solaris 9 obsoletion
Am 16.05.2013 13:49, schrieb Rainer Orth: Corresponding to the Solaris 9 obsoletion patch, I've now installed the following wwwdocs patch to document it. --- htdocs/gcc-4.9/changes.html 9 May 2013 17:08:19 - 1.11 +++ htdocs/gcc-4.9/changes.html 16 May 2013 11:48:37 - 1.13 @@ -16,6 +16,23 @@ Caveats --> + +Support for a number of older systems and recently Shouldn't you also uncomment the "Caveats"? Currently, it is . Tobias
Re: [wwwdocs] Announce Solaris 9 obsoletion
Tobias Burnus writes: > Am 16.05.2013 13:49, schrieb Rainer Orth: >> Corresponding to the Solaris 9 obsoletion patch, I've now installed the >> following wwwdocs patch to document it. >> --- htdocs/gcc-4.9/changes.html 9 May 2013 17:08:19 - 1.11 >> +++ htdocs/gcc-4.9/changes.html 16 May 2013 11:48:37 - 1.13 >> @@ -16,6 +16,23 @@ >> Caveats >> --> >> + >> +Support for a number of older systems and recently > > Shouldn't you also uncomment the "Caveats"? Currently, it is . You're right, of course. Fixed. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] Don't invalidate string length cache when not needed
On Wed, May 15, 2013 at 07:20:03PM +0200, Jakub Jelinek wrote: > Well, if si->length is known constant != 0 (note, it is enough to > test that it is non-zero and probably the code above this has > tested that already?) and we are storing non-zero, then that > should mean we are overwriting a non-zero value with another (or same) > non-zero value. Such operation shouldn't change the length of any strings > at all, not just the ones related to the current strinfo. Right. Adjusted. > That is either just setting all current strinfos to dont_invalidate, or > perhaps faster doing gsi_next (gsi); return false; (that means the caller > won't call maybe_invalidate - of course in that case you must not set > any dont_invalidate on any strinfo on that stmt) and the caller of > strlen_optimize_stmt will not do gsi_next (&gsi) - that is why you need > to do it instead. Cool, I took the gsi_next approach; seems to work nicely. So, updated version (it still doesn't handle p[1], p[2], etc.). Regtested/bootstrapped on x86_64-linux. 2013-05-16 Marek Polacek * tree-ssa-strlen.c (handle_char_store): Don't invalidate cached length when doing non-zero store. * gcc.dg/strlenopt-25.c: New test. --- gcc/tree-ssa-strlen.c.mp2013-05-15 14:11:20.079707492 +0200 +++ gcc/tree-ssa-strlen.c 2013-05-16 14:44:06.496545662 +0200 @@ -1717,6 +1717,13 @@ handle_char_store (gimple_stmt_iterator si->endptr = ssaname; si->dont_invalidate = true; } + else if (si != NULL && si->length != NULL_TREE + && TREE_CODE (si->length) == INTEGER_CST + && integer_nonzerop (gimple_assign_rhs1 (stmt))) + { + gsi_next (gsi); + return false; + } } else if (idx == 0 && initializer_zerop (gimple_assign_rhs1 (stmt))) { --- gcc/testsuite/gcc.dg/strlenopt-25.c.mp 2013-05-15 17:15:18.702118637 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-25.c 2013-05-15 18:26:27.881030317 +0200 @@ -0,0 +1,18 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fdump-tree-strlen" } */ + +#include "strlenopt.h" + +int +main () +{ + char p[] = "foobar"; + int len, len2; + len = strlen (p); + p[0] = 'O'; + len2 = strlen (p); + return len - len2; +} + +/* { dg-final { scan-tree-dump-times "strlen \\(" 0 "strlen" } } */ +/* { dg-final { cleanup-tree-dump "strlen" } } */ Marek
Re: [PATCH] Don't invalidate string length cache when not needed
On Thu, May 16, 2013 at 03:31:40PM +0200, Marek Polacek wrote: > Cool, I took the gsi_next approach; seems to work nicely. So, updated > version (it still doesn't handle p[1], p[2], etc.). > > Regtested/bootstrapped on x86_64-linux. > > 2013-05-16 Marek Polacek > > * tree-ssa-strlen.c (handle_char_store): Don't invalidate > cached length when doing non-zero store. > > * gcc.dg/strlenopt-25.c: New test. > > --- gcc/tree-ssa-strlen.c.mp 2013-05-15 14:11:20.079707492 +0200 > +++ gcc/tree-ssa-strlen.c 2013-05-16 14:44:06.496545662 +0200 > @@ -1717,6 +1717,13 @@ handle_char_store (gimple_stmt_iterator > si->endptr = ssaname; > si->dont_invalidate = true; > } Please add here a comment what it does and why, that if si->length is non-zero constant, we know that the character at that spot is not '\0' and when storing non-'\0' to that location, we can't affect size of any strings at all. Therefore we do the gsi_next + return false to signal caller that it shouldn't invalidate anything. Ok with that change, thanks. Jakub
[fixincludes] solaris_pow_int_overload should use __cplusplus
Work is going on to incorporate all applicable fixincludes fixes into the Solaris headers proper. One fix is currently problematic since it uses an G++-internal macro (__GXX_EXPERIMENTAL_CXX0X__) where libstdc++ already switched to testing __cplusplus. The following patch updates the fix to match . Tested by mainline bootstraps on i386-pc-solaris2.11, sparc-sun-solaris2.11 and 4.8 bootstrap on i386-pc-solaris2.10. Ok for mainline and 4.8 branch if they pass? Thanks. Rainer 2013-05-15 Rainer Orth * inclhack.def (solaris_pow_int_overload): Update comment. Change guard to match . * fixincl.x: Regenerate. * tests/base/iso/math_iso.h [SOLARIS_POW_INT_OVERLOAD_CHECK]: Matching change. # HG changeset patch # Parent c4272fed2b181caf1d8a82b6c0c727d0371c4f18 solaris_pow_int_overload should use __cplusplus diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def --- a/fixincludes/inclhack.def +++ b/fixincludes/inclhack.def @@ -3474,7 +3474,7 @@ fix = { /* - * The pow overloads with int were removed in C++ 2011. + * The pow overloads with int were removed in C++ 2011 DR 550. */ fix = { hackname = solaris_pow_int_overload; @@ -3483,7 +3483,7 @@ fix = { select= "^[ \t]*inline [a-z ]* pow\\([^()]*, int [^()]*\\)" " *\\{[^{}]*\n[^{}]*\\}"; c_fix = format; -c_fix_arg = "#ifndef __GXX_EXPERIMENTAL_CXX0X__\n%0\n#endif"; +c_fix_arg = "#if __cplusplus < 201103L\n%0\n#endif"; test_text = " inline long double pow(long double __X, int __Y) { return\n" diff --git a/fixincludes/tests/base/iso/math_iso.h b/fixincludes/tests/base/iso/math_iso.h --- a/fixincludes/tests/base/iso/math_iso.h +++ b/fixincludes/tests/base/iso/math_iso.h @@ -10,7 +10,7 @@ #if defined( SOLARIS_POW_INT_OVERLOAD_CHECK ) -#ifndef __GXX_EXPERIMENTAL_CXX0X__ +#if __cplusplus < 201103L inline long double pow(long double __X, int __Y) { return __powl(__X, (long double) (__Y)); } #endif -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] Don't invalidate string length cache when not needed
On Thu, May 16, 2013 at 03:38:47PM +0200, Jakub Jelinek wrote: > Please add here a comment what it does and why, that if si->length > is non-zero constant, we know that the character at that spot is > not '\0' and when storing non-'\0' to that location, we can't affect > size of any strings at all. Therefore we do the gsi_next + return false > to signal caller that it shouldn't invalidate anything. > > Ok with that change, thanks. Thanks, this what I'll commit later today if no objections. 2013-05-16 Marek Polacek * tree-ssa-strlen.c (handle_char_store): Don't invalidate cached length when doing non-zero store. * gcc.dg/strlenopt-25.c: New test. --- gcc/tree-ssa-strlen.c.mp2013-05-15 14:11:20.079707492 +0200 +++ gcc/tree-ssa-strlen.c 2013-05-16 16:03:50.373504796 +0200 @@ -1717,6 +1717,27 @@ handle_char_store (gimple_stmt_iterator si->endptr = ssaname; si->dont_invalidate = true; } + /* If si->length is non-zero constant, we aren't overwriting '\0', + and if we aren't storing '\0', we know that the length of the +string remains the same. In that case we move to the next +gimple statement and return to signal the caller that it shouldn't +invalidate anything. + +This is benefical for cases like: + +char p[] = "foobar"; +size_t len = strlen (p); +p[0] = 'X' +size_t len2 = strlen (p); + +where we should be able to optimize away the second strlen call. */ + else if (si != NULL && si->length != NULL_TREE + && TREE_CODE (si->length) == INTEGER_CST + && integer_nonzerop (gimple_assign_rhs1 (stmt))) + { + gsi_next (gsi); + return false; + } } else if (idx == 0 && initializer_zerop (gimple_assign_rhs1 (stmt))) { --- gcc/testsuite/gcc.dg/strlenopt-25.c.mp 2013-05-15 17:15:18.702118637 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-25.c 2013-05-15 18:26:27.881030317 +0200 @@ -0,0 +1,18 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fdump-tree-strlen" } */ + +#include "strlenopt.h" + +int +main () +{ + char p[] = "foobar"; + int len, len2; + len = strlen (p); + p[0] = 'O'; + len2 = strlen (p); + return len - len2; +} + +/* { dg-final { scan-tree-dump-times "strlen \\(" 0 "strlen" } } */ +/* { dg-final { cleanup-tree-dump "strlen" } } */ Marek
Re: [PATCH] Don't invalidate string length cache when not needed
On Thu, May 16, 2013 at 04:07:44PM +0200, Marek Polacek wrote: > --- gcc/tree-ssa-strlen.c.mp 2013-05-15 14:11:20.079707492 +0200 > +++ gcc/tree-ssa-strlen.c 2013-05-16 16:03:50.373504796 +0200 > @@ -1717,6 +1717,27 @@ handle_char_store (gimple_stmt_iterator > si->endptr = ssaname; > si->dont_invalidate = true; > } > + /* If si->length is non-zero constant, we aren't overwriting '\0', > + and if we aren't storing '\0', we know that the length of the The above line has 8 spaces instead of tab. Also, please write "that the length of the string and any other zero terminated string in memory remains the same." > + string remains the same. In that case we move to the next > + gimple statement and return to signal the caller that it shouldn't > + invalidate anything. > + > + This is benefical for cases like: > + > + char p[] = "foobar"; > + size_t len = strlen (p); > + p[0] = 'X' Missing ; at the end of the above line. > + size_t len2 = strlen (p); Also, you could make it clear that it affects any other strings. Perhaps char p[20]; void foo (char *q) { strcpy (p, "foobar"); size_t len = strlen (p); // This can be optimized into 6 size_t len2 = strlen (q); // This has to be computed p[0] = 'X'; size_t len3 = strlen (p); // This can be optimized into 6 size_t len4 = strlen (q); // This can be optimized into len2 bar (len, len2, len3, len4); } As q could point to p, if we didn't do what your patch does on the p[0] = 'X'; store, then we'd need to invalidate the recorded length of the q string. Similarly if there is p[0] = '\0' or p[0] = var. Jakub
Re: [PATCH] Don't invalidate string length cache when not needed
On Thu, May 16, 2013 at 04:18:27PM +0200, Jakub Jelinek wrote: > As q could point to p, if we didn't do what your patch does on the p[0] = 'X'; > store, then we'd need to invalidate the recorded length of the q string. > Similarly if there is p[0] = '\0' or p[0] = var. Ah, another thing while we are at it. For p[0] = '\0'; case when p[0] is known to be '\0' already, we remove it if: /* When storing '\0', the store can be removed if we know it has been stored in the current function. */ if (!stmt_could_throw_p (stmt) && si->writable) (and in that case don't invalidate anything either). But the above condition is false, we set si->writable (correct) and si->dont_invalidate, while we could do instead of that the same what you do for non-zero store to non-zero location, i.e. gsi_next (gsi); return false;. Perhaps a testcase for that is: size_t bar (char *p, char *r) { size_t len1 = strlen (r); char *q = strchr (p, '\0'); *q = '\0'; return len1 - strlen (r); // This strlen should be optimized into len1. } strlen (q) should be known to be zero at that point, but si->writable should be false, we don't know if p doesn't point say into .rodata, and stmt_could_throw_p probably should return true too. Jakub
Re: section anchors and weak hidden symbols
Nathan Sidwell writes: > This patch fixes a problem with section anchors. Found on powerpc, but > also appears on MIPS and ARM targets. > > Section anchors can only be used for definitions known to bind in the > current object file. The default predicate uses the bind_local_p hook to > determine this. Unfortunately that hook determines whether the decl's > binding is determined at static link time (i.e. within the dynamic object > this object is linked). That's very nearly the same, except for symbols > that have a weak hidden definition in this object file. For such symbols, > binds_local_p returns true, because the binding must be within the dynamic > object. But we shouldn't use a section anchor as a definition in a > different object file could win at static link time. (I'm not 100% sure > there aren't other cases where module-binding and object-binding differ for > a definition.) > > It surprised me that binds_local_p has the semantics it does -- perhaps its > meaning has changed, or it is simply poorly named. I would have thought > binds_module_p would be a better name. > > Anyway, rather than go on a renaming exercise, I chose to adjust > default_use_anchors_for_symbol_p to reject any weak symbol. > > tested on powerpc-linux-gnu, ok? The new gcc.dg/visibility-21.c testcase fails on i386-pc-solaris2.11 and x86_64-unknown-linux-gnu: FAIL: gcc.dg/visibility-21.c (test for excess errors) Excess errors: /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/visibility-21.c:1:0: warning: this target does not support '-fsection-anchors' [-fsection-anchors] Fixed as follows, tested with the appropriate runtest invokation on both targets where the test becomes UNSUPPORTED, installed on mainline. Rainer 2013-05-16 Rainer Orth * gcc.dg/visibility-21.c: Require section_anchors. # HG changeset patch # Parent e3635f5f20529d75a74064c8282ce002932dde78 Require section_anchors in gcc.dg/visibility-21.c diff --git a/gcc/testsuite/gcc.dg/visibility-21.c b/gcc/testsuite/gcc.dg/visibility-21.c --- a/gcc/testsuite/gcc.dg/visibility-21.c +++ b/gcc/testsuite/gcc.dg/visibility-21.c @@ -3,6 +3,7 @@ /* { dg-options "-O2 -fsection-anchors" } */ /* { dg-require-visibility "" } */ /* { dg-require-weak "" } */ +/* { dg-require-effective-target section_anchors } */ /* { dg-final { scan-assembler-not "ANCHOR" } } */ int __attribute__((weak, visibility("hidden"))) weak_hidden[3]; -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [fixincludes] solaris_pow_int_overload should use __cplusplus
On Thu, May 16, 2013 at 8:41 AM, Rainer Orth wrote: > Work is going on to incorporate all applicable fixincludes fixes into > the Solaris headers proper. One fix is currently problematic since it > uses an G++-internal macro (__GXX_EXPERIMENTAL_CXX0X__) where libstdc++ > already switched to testing __cplusplus. The following patch > updates the fix to match . > > Tested by mainline bootstraps on i386-pc-solaris2.11, > sparc-sun-solaris2.11 and 4.8 bootstrap on i386-pc-solaris2.10. > > Ok for mainline and 4.8 branch if they pass? > > Thanks. > Rainer Ok with me if it is OK with Bruce. > > > 2013-05-15 Rainer Orth > > * inclhack.def (solaris_pow_int_overload): Update comment. > Change guard to match . > * fixincl.x: Regenerate. > * tests/base/iso/math_iso.h [SOLARIS_POW_INT_OVERLOAD_CHECK]: > Matching change. > > > > -- > - > Rainer Orth, Center for Biotechnology, Bielefeld University >
[Patch ARM] Fix arm-none-eabi builds.
My patch for PR19599 yesterday had a problem with arm-none-eabi builds. I had missed a null pointer check of decl in an AAPCS only configuration. Now applied as obvious. regards Ramana 2013-05-16 Ramana Radhakrishnan PR target/19599 * config/arm/arm.c (arm_function_ok_for_sibcall): Add check for NULL decl. Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 198971) +++ gcc/config/arm/arm.c(working copy) @@ -5426,6 +5426,7 @@ arm_function_ok_for_sibcall (tree decl, sibling calls. */ if (TARGET_AAPCS_BASED && arm_abi == ARM_ABI_AAPCS + && decl && DECL_WEAK (decl)) return false;
C++ PATCH for c++/57279 (function qualifiers in alias-declaration)
In C++11 we also need to allow function qualifiers in alias-declarations, since they are equivalent to typedefs. There doesn't seem to be any good reason for giving an error in grokdeclarator at all, since we need to complain about uses of qualified function types via typedef or template parameter anyway. But I'll make the small change first so it's suitable for 4.8. Tested x86_64-pc-linux-gnu, applying to trunk, will apply to 4.8 after 4.8.1. commit c0931003b979bd016f8e2b507f9843c3e01e3fd1 Author: Jason Merrill Date: Tue May 14 17:07:11 2013 -0400 PR c++/57279 * decl.c (grokdeclarator): Allow member function qualifiers in TYPENAME context. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index b16472f..a4f686a 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -10295,8 +10295,10 @@ grokdeclarator (const cp_declarator *declarator, if (ctype) type = build_memfn_type (type, ctype, memfn_quals, rqual); - /* Core issue #547: need to allow this in template type args. */ - else if (template_type_arg && TREE_CODE (type) == FUNCTION_TYPE) + /* Core issue #547: need to allow this in template type args. + Allow it in general in C++11 for alias-declarations. */ + else if ((template_type_arg || cxx_dialect >= cxx11) + && TREE_CODE (type) == FUNCTION_TYPE) type = apply_memfn_quals (type, memfn_quals, rqual); else error ("invalid qualifiers on non-member function type"); diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-35.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-35.C new file mode 100644 index 000..f412b30 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-35.C @@ -0,0 +1,9 @@ +// PR c++/57279 +// { dg-require-effective-target c++11 } + +typedef void fc1() const; // OK +typedef void frr1() &&; // OK +typedef void fcr1() const &; +using fc2 = void() const; // #4 +using frr2 = void() &&; // OK +using fcr2 = void() const &; // #6
[C++] Don't build NE_EXPR manually
Hello, this is a minor patch, instead of manually building a NE_EXPR, it seems better to call the front-end function whose job it is. Incidentally, it means one expression is folded a little bit more, so I am re-enabling a test that I commented out this morning. Bootstrap+testsuite on x86_64-linux-gnu. 2013-05-16 Marc Glisse gcc/cp/ * call.c (build_conditional_expr_1): Use cp_build_binary_op instead of directly calling fold_build2. gcc/testsuite/ * g++.dg/ext/vector22.C: Uncomment working test. -- Marc GlisseIndex: cp/call.c === --- cp/call.c (revision 198970) +++ cp/call.c (working copy) @@ -4448,22 +4448,22 @@ build_conditional_expr_1 (tree arg1, tre || TYPE_SIZE (arg1_type) != TYPE_SIZE (arg2_type)) { if (complain & tf_error) error ("incompatible vector types in conditional expression: " "%qT, %qT and %qT", TREE_TYPE (arg1), TREE_TYPE (orig_arg2), TREE_TYPE (orig_arg3)); return error_mark_node; } if (!COMPARISON_CLASS_P (arg1)) - arg1 = fold_build2 (NE_EXPR, signed_type_for (arg1_type), arg1, - build_zero_cst (arg1_type)); + arg1 = cp_build_binary_op (input_location, NE_EXPR, arg1, + build_zero_cst (arg1_type), complain); return fold_build3 (VEC_COND_EXPR, arg2_type, arg1, arg2, arg3); } /* [expr.cond] The first expression is implicitly converted to bool (clause _conv_). */ arg1 = perform_implicit_conversion_flags (boolean_type_node, arg1, complain, LOOKUP_NORMAL); if (error_operand_p (arg1)) Index: testsuite/g++.dg/ext/vector22.C === --- testsuite/g++.dg/ext/vector22.C (revision 198970) +++ testsuite/g++.dg/ext/vector22.C (working copy) @@ -4,19 +4,19 @@ typedef unsigned vec __attribute__((vector_size(4*sizeof(int; /* Disabled after PR57286 void f(vec*a,vec*b){ *a=(*a)?-1:(*b<10); *b=(*b)?(*a<10):0; } */ void g(vec*a,vec*b){ *a=(*a)?(*a<*a):-1; -// *b=(*b)?-1:(*b<*b); + *b=(*b)?-1:(*b<*b); } void h(vec*a){ *a=(~*a==5); } /* { dg-final { scan-tree-dump-not "~" "gimple" } } */ /* { dg-final { scan-tree-dump-not "VEC_COND_EXPR" "gimple" } } */ /* { dg-final { cleanup-tree-dump "gimple" } } */
Re: [PATCH] Refactor rtl_verify_flow_info and rtl_verify_flow_info_1
On Thu, May 16, 2013 at 6:55 AM, Teresa Johnson wrote: > > * cfgrtl.c (verify_hot_cold_block_grouping): Return err. > (rtl_verify_edges): New function. > (rtl_verify_bb_insns): Ditto. > (rtl_verify_bb_pointers): Ditto. > (rtl_verify_bb_insn_chain): Ditto. > (rtl_verify_fallthru): Ditto. > (rtl_verify_bb_layout): Ditto. > (rtl_verify_flow_info_1): Outline checks into new functions. > (rtl_verify_flow_info): Ditto. Looks good to me. Ciao! Steven
Re: [fixincludes] solaris_pow_int_overload should use __cplusplus
On 05/16/13 06:41, Rainer Orth wrote: Work is going on to incorporate all applicable fixincludes fixes into the Solaris headers proper. One fix is currently problematic since it uses an G++-internal macro (__GXX_EXPERIMENTAL_CXX0X__) where libstdc++ already switched to testing __cplusplus. The following patch updates the fix to match . Tested by mainline bootstraps on i386-pc-solaris2.11, sparc-sun-solaris2.11 and 4.8 bootstrap on i386-pc-solaris2.10. Ok for mainline and 4.8 branch if they pass? Look good to me. Thanks.
RFA: RL78: Add support for naked function attribute.
Hi DJ, OK, I am abandoning my RL78 interrupt prologue patch for now. There are just too many complications to make it work. Instead here is a much simpler patch to add support for a naked function attribute. OK to apply ? Cheers Nick gcc/ChangeLog 2013-05-16 Nick Clifton * config/rl78/rl78.c (rl78_attribute_table): Add naked. (rl78_is_naked_func): New function. (rl78_expand_prologue): Skip prologue generation for naked functions. (rl78_expand_epilogue): Skip epilogue generation for naked functions. * doc/extend.texi (naked): Add RL78 to the list of processors that supports this attribute. Index: gcc/config/rl78/rl78.c === --- gcc/config/rl78/rl78.c (revision 198971) +++ gcc/config/rl78/rl78.c (working copy) @@ -499,6 +499,8 @@ false }, { "brk_interrupt", 0, 0, true, false, false, rl78_handle_func_attribute, false }, + { "naked", 0, 0, true, false, false, rl78_handle_func_attribute, +false }, { NULL, 0, 0, false, false, false, NULL, false } }; @@ -825,6 +827,12 @@ return rv; } +static int +rl78_is_naked_func (void) +{ + return (lookup_attribute ("naked", DECL_ATTRIBUTES (current_function_decl)) != NULL_TREE); +} + /* Expand the function prologue (from the prologue pattern). */ void rl78_expand_prologue (void) @@ -833,6 +841,9 @@ rtx sp = gen_rtx_REG (HImode, STACK_POINTER_REGNUM); int rb = 0; + if (rl78_is_naked_func ()) +return; + if (!cfun->machine->computed) rl78_compute_frame_info (); @@ -877,6 +888,9 @@ rtx sp = gen_rtx_REG (HImode, STACK_POINTER_REGNUM); int rb = 0; + if (rl78_is_naked_func ()) +return; + if (frame_pointer_needed) { emit_move_insn (gen_rtx_REG (HImode, STACK_POINTER_REGNUM), Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 198959) +++ gcc/doc/extend.texi (working copy) @@ -3142,7 +3142,7 @@ @item naked @cindex function without a prologue/epilogue code -Use this attribute on the ARM, AVR, MCORE, RX and SPU ports to indicate that +Use this attribute on the ARM, AVR, MCORE, RL78, RX and SPU ports to indicate that the specified function does not need prologue/epilogue sequences generated by the compiler. It is up to the programmer to provide these sequences. The only statements that can be safely included in naked functions are
[C++ Patch] PR 17314
Hi, as agreed, I regtested on x86_64-linux the below diagnostic tweak. Thanks, Paolo. // /cp 2013-05-16 Paolo Carlini PR c++/17314 * call.c (enforce_access): Improve protected access error message. /testsuite 2013-05-16 Paolo Carlini PR c++/17314 * g++.dg/inherit/access9.C: New. Index: cp/call.c === --- cp/call.c (revision 198962) +++ cp/call.c (working copy) @@ -5707,7 +5707,7 @@ enforce_access (tree basetype_path, tree decl, tre if (TREE_PRIVATE (decl)) error ("%q+#D is private", diag_decl); else if (TREE_PROTECTED (decl)) - error ("%q+#D is protected", diag_decl); + error ("%q+#D, declared protected, is inaccessible", diag_decl); else error ("%q+#D is inaccessible", diag_decl); error ("within this context"); Index: testsuite/g++.dg/inherit/access9.C === --- testsuite/g++.dg/inherit/access9.C (revision 0) +++ testsuite/g++.dg/inherit/access9.C (working copy) @@ -0,0 +1,16 @@ +// PR c++/17314 + +class A +{ +protected: + A(){} // { dg-error "declared protected" } +}; + +class B : virtual A {}; + +class C : public B {}; // { dg-error "within this context" } +// { dg-message "deleted" "" { target c++11 } 11 } +int main () +{ + C c; // { dg-message "required here" "" { target c++98 } } +} // { dg-error "deleted" "" { target c++11 } 15 }
[PATCH] Fix extendsidi2_1 splitting (PR rtl-optimization/57281, PR rtl-optimization/57300 wrong-code)
Hi! As discussed in the PR, there seem to be only 3 define_split patterns that use dead_or_set_p, one in i386.md and two in s390.md, but unfortunately insn splitting is done in many passes (combine, split{1,2,3,4,5}, dbr, pro_and_epilogue, final, sometimes mach) and only in combine the note problem is computed. Computing the note problem in split{1,2,3,4,5} just because of the single pattern on i?86 -m32 and one on s390x -m64 might be too expensive, and while neither of these targets do dbr scheduling, e.g. during final without cfg one can't df_analyze. So, the following patch fixes it by doing the transformation instead in the peephole2 pass which computes the notes problem and has REG_DEAD notes up2date (and peep2_reg_dead_p is used there heavily and works). The splitters in i386.md for extendsidi2_1 were reload_completed, and I think the usual case is that the pattern emerges already from register allocation, the ordering of the relevant passes for i386 then is: split2, ..., pro_and_epilogue, ..., peephole2, ..., split{4,3}, ... split5 not run, ... final When the first splitter (for the case where the register is dead) is turned into peephole2, and the latter remains define_split (so that it is split even if matched later on), we need to prevent the second splitter from splitting during split2, because then peephole2 would likely never match. That is done through the epilogue_completed check, while it isn't exactly peephole2_completed, it is close enough (there are no other splitting passes in between pro_and_epilogue and peephole2). For -O0, which doesn't run either split{3,4,5} passes, we need to use reload_completed, so that it is split at split2, peephole2 won't be run anyway, and if -fno-peephole2, there is no point not splitting immediately during split2 either. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8? If applied, I think s390x should do something similar. 2013-05-16 Jakub Jelinek PR rtl-optimization/57281 PR rtl-optimization/57300 * config/i386/i386.md (extendsidi2_1 splits): Turn the first one into define_peephole2 from define_split, use peep2_reg_dead_p instead of dead_or_set_p. Guard the second splitter with epilogue_completed if possible. * gcc.dg/pr57300.c: New test. * gcc.c-torture/execute/pr57281.c: New test. --- gcc/config/i386/i386.md.jj 2013-05-16 12:36:29.669418198 +0200 +++ gcc/config/i386/i386.md 2013-05-16 16:03:08.833424642 +0200 @@ -,13 +,13 @@ (define_insn "extendsidi2_1" "#") ;; Extend to memory case when source register does die. -(define_split - [(set (match_operand:DI 0 "memory_operand") - (sign_extend:DI (match_operand:SI 1 "register_operand"))) - (clobber (reg:CC FLAGS_REG)) - (clobber (match_operand:SI 2 "register_operand"))] - "(reload_completed -&& dead_or_set_p (insn, operands[1]) +;; This is a peephole2, so that it can use peep2_reg_dead_p. +(define_peephole2 + [(parallel [(set (match_operand:DI 0 "memory_operand") + (sign_extend:DI (match_operand:SI 1 "register_operand"))) + (clobber (reg:CC FLAGS_REG)) + (clobber (match_operand:SI 2 "register_operand"))])] + "(peep2_reg_dead_p (1, operands[1]) && !reg_mentioned_p (operands[1], operands[0]))" [(set (match_dup 3) (match_dup 1)) (parallel [(set (match_dup 1) (ashiftrt:SI (match_dup 1) (const_int 31))) @@ -3348,12 +3348,15 @@ (define_split "split_double_mode (DImode, &operands[0], 1, &operands[3], &operands[4]);") ;; Extend to memory case when source register does not die. +;; Guarded by epilogue_completed, instead of reload_completed, if possible, +;; so that it isn't split before peephole2 has been run. Don't do that +;; if none of the split[345] passes will be run or peephole2 will not happen. (define_split [(set (match_operand:DI 0 "memory_operand") (sign_extend:DI (match_operand:SI 1 "register_operand"))) (clobber (reg:CC FLAGS_REG)) (clobber (match_operand:SI 2 "register_operand"))] - "reload_completed" + "(optimize > 0 && flag_peephole2) ? epilogue_completed : reload_completed" [(const_int 0)] { split_double_mode (DImode, &operands[0], 1, &operands[3], &operands[4]); --- gcc/testsuite/gcc.dg/pr57300.c.jj 2013-05-16 15:51:25.084707211 +0200 +++ gcc/testsuite/gcc.dg/pr57300.c 2013-05-16 15:51:25.084707211 +0200 @@ -0,0 +1,21 @@ +/* PR rtl-optimization/57300 */ +/* { dg-do run } */ +/* { dg-options "-O3" } */ +/* { dg-additional-options "-msse2" { target sse2_runtime } } */ + +extern void abort (void); +int a, b, d[10]; +long long c; + +int +main () +{ + int e; + for (e = 0; e < 10; e++) +d[e] = 1; + if (d[0]) +c = a = (b == 0 || 1 % b); + if (a != 1) +abort (); + return 0; +} --- gcc/testsuite/gcc.c-torture/execute/pr57281.c.jj2013-05-16 15:51:25.085707131 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr57281.c 2013-05-16 15:51:25.085707131 +0200
Re: [PATCH] Pattern recognizer for rotates
On Wed, May 15, 2013 at 03:24:37PM +0200, Richard Biener wrote: > Ok with ... > > + /* Pattern detected. */ > > + if (dump_enabled_p ()) > > +dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, > > +"vect_recog_rotate_pattern: detected: "); > > Please use MSG_NOTE here. Ok, here is what I've committed (the above change plus added 6 i386.exp testcases). 2013-05-16 Jakub Jelinek * tree-vectorizer.h (NUM_PATTERNS): Increment. * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add vect_recog_rotate_pattern. (vect_recog_rotate_pattern): New function. * gcc.target/i386/rotate-3.c: New test. * gcc.target/i386/rotate-3a.c: New test. * gcc.target/i386/rotate-4.c: New test. * gcc.target/i386/rotate-4a.c: New test. * gcc.target/i386/rotate-5.c: New test. * gcc.target/i386/rotate-5a.c: New test. --- gcc/tree-vectorizer.h.jj2013-05-13 09:44:45.822529016 +0200 +++ gcc/tree-vectorizer.h 2013-05-16 13:56:08.093576302 +0200 @@ -1005,7 +1005,7 @@ extern void vect_slp_transform_bb (basic Additional pattern recognition functions can (and will) be added in the future. */ typedef gimple (* vect_recog_func_ptr) (vec *, tree *, tree *); -#define NUM_PATTERNS 10 +#define NUM_PATTERNS 11 void vect_pattern_recog (loop_vec_info, bb_vec_info); /* In tree-vectorizer.c. */ --- gcc/tree-vect-patterns.c.jj 2013-05-16 12:36:29.496419141 +0200 +++ gcc/tree-vect-patterns.c2013-05-16 13:56:08.095576326 +0200 @@ -50,6 +50,7 @@ static gimple vect_recog_over_widening_p tree *); static gimple vect_recog_widen_shift_pattern (vec *, tree *, tree *); +static gimple vect_recog_rotate_pattern (vec *, tree *, tree *); static gimple vect_recog_vector_vector_shift_pattern (vec *, tree *, tree *); static gimple vect_recog_divmod_pattern (vec *, @@ -64,6 +65,7 @@ static vect_recog_func_ptr vect_vect_rec vect_recog_pow_pattern, vect_recog_widen_shift_pattern, vect_recog_over_widening_pattern, + vect_recog_rotate_pattern, vect_recog_vector_vector_shift_pattern, vect_recog_divmod_pattern, vect_recog_mixed_size_cond_pattern, @@ -1446,6 +1448,218 @@ vect_recog_widen_shift_pattern (vecsafe_push (last_stmt); + return pattern_stmt; +} + +/* Detect a rotate pattern wouldn't be otherwise vectorized: + + type a_t, b_t, c_t; + + S0 a_t = b_t r<< c_t; + + Input/Output: + + * STMTS: Contains a stmt from which the pattern search begins, +i.e. the shift/rotate stmt. The original stmt (S0) is replaced +with a sequence: + + S1 d_t = -c_t; + S2 e_t = d_t & (B - 1); + S3 f_t = b_t << c_t; + S4 g_t = b_t >> e_t; + S0 a_t = f_t | g_t; + +where B is element bitsize of type. + + Output: + + * TYPE_IN: The type of the input arguments to the pattern. + + * TYPE_OUT: The type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the rotate +S0 stmt. */ + +static gimple +vect_recog_rotate_pattern (vec *stmts, tree *type_in, tree *type_out) +{ + gimple last_stmt = stmts->pop (); + tree oprnd0, oprnd1, lhs, var, var1, var2, vectype, type, stype, def, def2; + gimple pattern_stmt, def_stmt; + enum tree_code rhs_code; + stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt); + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo); + bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo); + enum vect_def_type dt; + optab optab1, optab2; + + if (!is_gimple_assign (last_stmt)) +return NULL; + + rhs_code = gimple_assign_rhs_code (last_stmt); + switch (rhs_code) +{ +case LROTATE_EXPR: +case RROTATE_EXPR: + break; +default: + return NULL; +} + + if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo)) +return NULL; + + lhs = gimple_assign_lhs (last_stmt); + oprnd0 = gimple_assign_rhs1 (last_stmt); + type = TREE_TYPE (oprnd0); + oprnd1 = gimple_assign_rhs2 (last_stmt); + if (TREE_CODE (oprnd0) != SSA_NAME + || TYPE_PRECISION (TREE_TYPE (lhs)) != TYPE_PRECISION (type) + || !INTEGRAL_TYPE_P (type) + || !TYPE_UNSIGNED (type)) +return NULL; + + if (!vect_is_simple_use (oprnd1, last_stmt, loop_vinfo, bb_vinfo, &def_stmt, + &def, &dt)) +return NULL; + + if (dt != vect_internal_def + && dt != vect_constant_def + && dt != vect_external_def) +return NULL; + + vectype = get_vectype_for_scalar_type (type); + if (vectype == NULL_TREE) +return NULL; + + /* If vector/vector or vector/scalar rotate is supported by the target, + don't do anything here. */ + optab1 = optab_for_tree_code (rhs_code, vectype, optab_vector); + if (optab1 + && optab_handler (optab1, TYPE_MODE (vectype)) != CODE_FOR_nothing) +return NULL; + + if (bb_vinfo != NULL || dt != vect_intern
Re: RFA: RL78: Add support for naked function attribute.
This one is OK. > 2013-05-16 Nick Clifton > > * config/rl78/rl78.c (rl78_attribute_table): Add naked. > (rl78_is_naked_func): New function. > (rl78_expand_prologue): Skip prologue generation for naked > functions. > (rl78_expand_epilogue): Skip epilogue generation for naked > functions. > * doc/extend.texi (naked): Add RL78 to the list of processors > that supports this attribute.
Re: cfgexpand.c patch for [was new port: msp430-elf]
> Interestingly we have exactly that for AVR: > > config/avr/avr-modes.def:FRACTIONAL_INT_MODE (PSI, 24, 3); I know. I tried copying them, it didn't work for me.
Re: cfgexpand.c patch for [was new port: msp430-elf]
> What's the blocker to convert the existing 5 cases of > PARTIAL_INT_MODE use to specify a precision? In general? For me, PARTIAL_INT_MODE() works, FRACTIONAL_INT_MODE() didn't.
Re: [PATCH] Don't invalidate string length cache when not needed
On Thu, May 16, 2013 at 04:28:19PM +0200, Jakub Jelinek wrote: > On Thu, May 16, 2013 at 04:18:27PM +0200, Jakub Jelinek wrote: > > As q could point to p, if we didn't do what your patch does on the p[0] = > > 'X'; > > store, then we'd need to invalidate the recorded length of the q string. > > Similarly if there is p[0] = '\0' or p[0] = var. > > Ah, another thing while we are at it. > For p[0] = '\0'; case when p[0] is known to be '\0' already, we > remove it if: > /* When storing '\0', the store can be removed > if we know it has been stored in the current function. */ > if (!stmt_could_throw_p (stmt) && si->writable) > (and in that case don't invalidate anything either). But the above > condition is false, we set si->writable (correct) and si->dont_invalidate, > while we could do instead of that the same what you do for non-zero store to > non-zero location, i.e. gsi_next (gsi); return false;. > Perhaps a testcase for that is: > size_t > bar (char *p, char *r) > { > size_t len1 = strlen (r); > char *q = strchr (p, '\0'); > *q = '\0'; > return len1 - strlen (r); // This strlen should be optimized into len1. > } > > strlen (q) should be known to be zero at that point, but si->writable should > be false, we don't know if p doesn't point say into .rodata, and > stmt_could_throw_p probably should return true too. Nice, thanks! In the patch below I hopefully addressed all the issues; I also set si->writable to false, but we can't return true in the stmt_could_throw_p case. So, how does it look now? Regtested/bootstrapped on x86_64-linux. 2013-05-16 Marek Polacek * tree-ssa-strlen.c (handle_char_store): Don't invalidate cached length when doing non-zero store of storing '\0' to '\0', don't mark string as writable. * gcc.dg/strlenopt-25.c: New test. * gcc.dg/strlenopt-26.c: Likewise. --- gcc/tree-ssa-strlen.c.mp2013-05-15 14:11:20.079707492 +0200 +++ gcc/tree-ssa-strlen.c 2013-05-16 17:57:33.963150006 +0200 @@ -1693,8 +1693,10 @@ handle_char_store (gimple_stmt_iterator } else { - si->writable = true; - si->dont_invalidate = true; + /* The string might be e.g. in the .rodata section. */ + si->writable = false; + gsi_next (gsi); + return false; } } else @@ -1717,6 +1719,33 @@ handle_char_store (gimple_stmt_iterator si->endptr = ssaname; si->dont_invalidate = true; } + /* If si->length is non-zero constant, we aren't overwriting '\0', +and if we aren't storing '\0', we know that the length of the +string and any other zero terminated string in memory remains +the same. In that case we move to the next gimple statement and +return to signal the caller that it shouldn't invalidate anything. + +This is benefical for cases like: + +char p[20]; +void foo (char *q) +{ + strcpy (p, "foobar"); + size_t len = strlen (p);// This can be optimized into 6 + size_t len2 = strlen (q);// This has to be computed + p[0] = 'X'; + size_t len3 = strlen (p);// This can be optimized into 6 + size_t len4 = strlen (q);// This can be optimized into len2 + bar (len, len2, len3, len4); +} + */ + else if (si != NULL && si->length != NULL_TREE + && TREE_CODE (si->length) == INTEGER_CST + && integer_nonzerop (gimple_assign_rhs1 (stmt))) + { + gsi_next (gsi); + return false; + } } else if (idx == 0 && initializer_zerop (gimple_assign_rhs1 (stmt))) { --- gcc/testsuite/gcc.dg/strlenopt-25.c.mp 2013-05-15 17:15:18.702118637 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-25.c 2013-05-15 18:26:27.881030317 +0200 @@ -0,0 +1,18 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fdump-tree-strlen" } */ + +#include "strlenopt.h" + +int +main () +{ + char p[] = "foobar"; + int len, len2; + len = strlen (p); + p[0] = 'O'; + len2 = strlen (p); + return len - len2; +} + +/* { dg-final { scan-tree-dump-times "strlen \\(" 0 "strlen" } } */ +/* { dg-final { cleanup-tree-dump "strlen" } } */ --- gcc/testsuite/gcc.dg/strlenopt-26.c.mp 2013-05-16 17:33:00.302060413 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-26.c 2013-05-16 18:30:51.906342948 +0200 @@ -0,0 +1,25 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fdump-tree-strlen" } */ + +#include "strlenopt.h" + +__attribute__((noinline, noclone)) size_t +fn1 (char *p, const char *r) +{ + size_t len1 = __builtin_strlen (r); + char *q = __builtin_strchr (p, '\0'); + *q = '\0'; + return len1 - __builtin_strlen (r); // This strlen should be optimized into len1. +} + +int +main (void) +{ + char p[] = "foobar"; + const char *volatile q =
Re: cfgexpand.c patch for [was new port: msp430-elf]
On Thu, 16 May 2013, DJ Delorie wrote: > > What's the blocker to convert the existing 5 cases of > > PARTIAL_INT_MODE use to specify a precision? > > In general? For me, PARTIAL_INT_MODE() works, FRACTIONAL_INT_MODE() > didn't. I thought it was the other way round - that after Bernd's fixes (July 2011) towards support for 40-bit integers, FRACTIONAL_INT_MODE worked better than PARTIAL_INT_MODE. -- Joseph S. Myers jos...@codesourcery.com
[C++ testcase, committed] PR 17410
Hi, testcase committed to mainline, I'm closing the PR as fixed. Thanks, Paolo. 2013-05-16 Paolo Carlini PR c++/17410 * g++.dg/template/pr17410.C: New. Index: g++.dg/template/pr17410.C === --- g++.dg/template/pr17410.C (revision 0) +++ g++.dg/template/pr17410.C (working copy) @@ -0,0 +1,17 @@ +// PR c++/17410 + +template +struct Outer { + template struct Inner {}; +}; + +template +struct A; + +template class Q, class P> +struct A > {}; + +template struct UNRELATED; +template struct UNRELATED::Inner >; + +template struct A::Inner >;
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On Thu, 2013-05-16 at 10:58 +0200, Richard Biener wrote: > > > > Hmm, not terribly happy with that wording, but that gives you an idea of > > what I'm after. When would someone set UPDATE_DOMINANCE to true and what > > are their responsibilities when they do so. > > > > Approved with the name change and a better comment for UPDATE_DOMINANCE. > > Btw, the function does _not_ handle arbitrary SEME regions - it only handles > a single exit correctly and assumes no (SSA) data flows across the others. > So I'd rather not rename it. > > Richard. > > > Jeff I went ahead and checked in the change with the comment updates that Jeff wanted but left the name of the function as is. Steve Ellcey sell...@imgtec.com
Re: cfgexpand.c patch for [was new port: msp430-elf]
> I thought it was the other way round - that after Bernd's fixes (July > 2011) towards support for 40-bit integers, FRACTIONAL_INT_MODE worked > better than PARTIAL_INT_MODE. Probably most accurate to say that both ways are not well supported. In general, I always have a hard time with anything that isn't a power-of-two size in gcc, because most maintainers have no reason to even consider such cases. There are even some fundamantal problems I've run into, for example... SIZE_TYPE is compared - with individual strcmp's! - against a fixed set of C types, instead of using some sort of table lookup to support non-power-of-two types. P*type modes are not stored in the modes table with *type modes, but the functions to get the "next bigger mode" blindly assumes that mode[N+1] is the next mode, which ignores partial int modes completely. Operations with 3-byte PSImode end up using BLKmode sometimes.
Re: [patch] Small emit-rtl.c / reorg.c cleanup
On Thu, May 16, 2013 at 1:55 PM, Rainer Orth wrote: > Steven Bosscher writes: > > Unfortunately, this patch broke SPARC bootstrap since it lost the > HAVE_cc0 guard around link_cc0_insns: Oops... I followed Bernhard's suggestion and made those two moved functions static in reorg.c just before committing. Without "static" the compiler bootstrapped but now it's an unused function (as it was before but g++ doesn't complain about that). Sorry for the breakage. Apparently I should test such seemingly innocent changes anyway. Ciao! Steven
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On 05/16/2013 11:10 AM, Steve Ellcey wrote: On Thu, 2013-05-16 at 10:58 +0200, Richard Biener wrote: Hmm, not terribly happy with that wording, but that gives you an idea of what I'm after. When would someone set UPDATE_DOMINANCE to true and what are their responsibilities when they do so. Approved with the name change and a better comment for UPDATE_DOMINANCE. Btw, the function does _not_ handle arbitrary SEME regions - it only handles a single exit correctly and assumes no (SSA) data flows across the others. So I'd rather not rename it. Richard. Jeff I went ahead and checked in the change with the comment updates that Jeff wanted but left the name of the function as is. Ok. Thanks. jeff
Re: [PATCH] Refactor rtl_verify_flow_info and rtl_verify_flow_info_1
On 05/15/2013 10:55 PM, Teresa Johnson wrote: This patch refactors rtl_verify_flow_info_1 and rtl_verify_flow_info by outlining the verification code into several different routines. rtl_verify_flow_info_1 in particular was getting very large. For the most part the functionality is exactly the same, although I did eliminate one redundant check on the BLOCK_FOR_INSN pointer for instructions inside blocks (both were in rtl_verify_flow_info_1, now in rtl_verify_bb_pointers). Bootstrapped and tested on x86_64-unknown-linux-gnu, and built cpu2006int with profile feedback. Ok for trunk? Thanks, Teresa 2013-05-15 Teresa Johnson * cfgrtl.c (verify_hot_cold_block_grouping): Return err. (rtl_verify_edges): New function. (rtl_verify_bb_insns): Ditto. (rtl_verify_bb_pointers): Ditto. (rtl_verify_bb_insn_chain): Ditto. (rtl_verify_fallthru): Ditto. (rtl_verify_bb_layout): Ditto. (rtl_verify_flow_info_1): Outline checks into new functions. (rtl_verify_flow_info): Ditto. OK. Please install. Thanks, Jeff
Re: [PATCH] Fold VEC_[LR]SHIFT_EXPR (PR tree-optimization/57051)
Jakub Jelinek writes: > On Thu, Apr 25, 2013 at 11:47:02PM +0200, Jakub Jelinek wrote: > > This patch adds folding of constant arguments v>> and v<<, which helps to > > optimize the testcase from the PR back into constant store after vectorized > > loop is unrolled. > > As this fixes a regression on the 4.8 branch, I've backported it (and > minimal prerequisite for that) to 4.8 branch too. Unfortunately this patch makes gcc.dg/vect/no-scevccp-outer-{7,13}.c fail on powerpc64-linux: +FAIL: gcc.dg/vect/no-scevccp-outer-13.c execution test +FAIL: gcc.dg/vect/no-scevccp-outer-7.c execution test which is a regression from 4.8-20130502. Reverting r198580 fixes it. The same FAILs also occur on trunk. /Mikael
[c++-concepts] merge from trunk
Trunk was merged into c++-concepts branch at revision 198984. -- Gaby
Re: cfgexpand.c patch for [was new port: msp430-elf]
On Thu, 16 May 2013, DJ Delorie wrote: > SIZE_TYPE is compared - with individual strcmp's! - against a fixed > set of C types, instead of using some sort of table lookup to support > non-power-of-two types. Joern had a patch (as of December 2010, possibly updated since then) to convert at least some such macros to use enum values (stdint.h ones, anyway) to use enum values rather than strings as a cleaner API. Though that wouldn't be sufficient for what you want. The front-end type pieces of Bernd's changes didn't go in, since they embedded things specific to 40-bit types all over the place rather than providing a more general infrastructure for types of architecture-specific precision. Logically I'd like a way for an architecture to define a set of precisions for which there are __intN keywords and associated __intN, unsigned __intN types (plus the _Complex versions of those). Then maybe the values given for types such as SIZE_TYPE would be either an enum value for a standard type, or e.g. ITK_INTN (40) or ITK_UNSIGNED_INTN (40), where ITK_INTN expands to some expression giving a value of the enum type not corresponding to any of the standard named types. Likewise for floating-point types. DTS 18661-3 (the third part of the draft ISO C bindings to IEEE 754-2008) defines types _FloatN, where N is 16, 32, 64, or >= 128 and a multiple of 32 _DecimalN, where N >= 32 and a multiple of 32 and _Complex variants of the _FloatN types, where the particular set supported depends on the implementation (and so for GCC would depend on the architecture). Again, these should be keywords. (I don't think any of the above should be particularly hard to implement.) -- Joseph S. Myers jos...@codesourcery.com
Re: [C++] Don't build NE_EXPR manually
OK. Jason
[PATCH, i386]: Handle VIA/Centaur processors with -march=native
Hello! Attached patch introduces handling of VIA/Centaur processors with -march=native compile directive. Comparing to original patch in PR45359, attached patch handles detection of "unknown" processors via default detection procedure. I also choose to use generic tuning for all detected processors. 2013-05-16 Uros Bizjak Dzianis Kahanovich PR target/45359 PR target/46396 * config/i386/driver-i386.c (host_detect_local_cpu): Detect VIA/Centaur processors and determine their cache parameters using detect_caches_amd. The patch was (compile) tested on x86_64-pc-linux-gnu and committed to mainline SVN. According to the PR45359, some other fix regressed the detection of VIA/Centaur processors in 4.5, so I plan to backport the patch to release branches. Uros. Index: driver-i386.c === --- driver-i386.c (revision 198977) +++ driver-i386.c (working copy) @@ -517,7 +517,8 @@ const char *host_detect_local_cpu (int argc, const if (!arch) { - if (vendor == signature_AMD_ebx) + if (vendor == signature_AMD_ebx + || vendor == signature_CENTAUR_ebx) cache = detect_caches_amd (ext_level); else if (vendor == signature_INTEL_ebx) { @@ -560,6 +561,32 @@ const char *host_detect_local_cpu (int argc, const else processor = PROCESSOR_PENTIUM; } + else if (vendor == signature_CENTAUR_ebx) +{ + if (arch) + { + if (family == 6) + { + if (model > 9) + /* Use the default detection procedure. */ + processor = PROCESSOR_GENERIC32; + else if (model == 9) + cpu = "c3-2"; + else if (model >= 6) + cpu = "c3"; + else + /* We have no idea. */ + processor = PROCESSOR_GENERIC32; + } + else if (has_3dnow) + cpu = "winchip2"; + else if (has_mmx) + cpu = "winchip2-c6"; + else + /* We have no idea. */ + processor = PROCESSOR_GENERIC32; + } +} else { switch (family)
Re: [PATCH] Don't invalidate string length cache when not needed
On Thu, May 16, 2013 at 06:44:03PM +0200, Marek Polacek wrote: > --- gcc/tree-ssa-strlen.c.mp 2013-05-15 14:11:20.079707492 +0200 > +++ gcc/tree-ssa-strlen.c 2013-05-16 17:57:33.963150006 +0200 > @@ -1693,8 +1693,10 @@ handle_char_store (gimple_stmt_iterator > } > else > { > - si->writable = true; > - si->dont_invalidate = true; > + /* The string might be e.g. in the .rodata section. */ > + si->writable = false; No, this really should be si->writable = true; (and comment not needed). Ok with that change. The thing is, while the string might not be known to be writable before, i.e. we can't optimize away this store, because supposedly it should trap, if we notice e.g. another write to the same location (writing zero there again), we can optimize that other write already, because we know that this store stored there something. > +#include "strlenopt.h" > + > +__attribute__((noinline, noclone)) size_t > +fn1 (char *p, const char *r) > +{ > + size_t len1 = __builtin_strlen (r); > + char *q = __builtin_strchr (p, '\0'); > + *q = '\0'; > + return len1 - __builtin_strlen (r); // This strlen should be optimized > into len1. With strlenopt.h include you can avoid using __builtin_ prefixes, all the builtins needed are prototyped in that header. Jakub
Re: debuggability of recog_data
Steven Bosscher writes: > On Wed, May 15, 2013 at 12:14 AM, Mike Stump wrote: >> I don't what to bike shed. So, I'm happy if the next poor soul that >> touches it just does so. If people like recog_data_info, I'd be happy >> to change it to that. Let's give then peanut gallery a day to vote on >> it. :-) > > Usually we append "_d" or "_def" to structure definitions, so recog_data_def? Gah, I wrote the patch from memory and forgot about the bit after the comma. I'm not trying to be contrary really. :-) Bootstrapped & regression-tested on x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * recog.h (Recog_data): Rename to... (recog_data_t): ...this. (recog_data): Update accordingly. * recog.c (recog_data): Likewise. * reload.c (save_recog_data): Likewise. * config/picochip/picochip.c (picochip_saved_recog_data): Likewise. (picochip_save_recog_data, picochip_restore_recog_data): Likewise. Index: gcc/config/picochip/picochip.c === --- gcc/config/picochip/picochip.c 2013-05-15 20:11:29.433232045 +0100 +++ gcc/config/picochip/picochip.c 2013-05-16 19:46:08.317740846 +0100 @@ -187,7 +187,7 @@ struct vliw_state picochip_current_vliw_ /* Save/restore recog_data. */ static int picochip_saved_which_alternative; -static struct Recog_data picochip_saved_recog_data; +static struct recog_data_d picochip_saved_recog_data; /* Determine which ALU to use for the instruction in picochip_current_prescan_insn. */ @@ -3150,7 +3150,7 @@ picochip_save_recog_data (void) { picochip_saved_which_alternative = which_alternative; memcpy (&picochip_saved_recog_data, &recog_data, - sizeof (struct Recog_data)); + sizeof (struct recog_data_d)); } /* Restore some of the contents of global variable recog_data. */ @@ -3159,7 +3159,7 @@ picochip_restore_recog_data (void) { which_alternative = picochip_saved_which_alternative; memcpy (&recog_data, &picochip_saved_recog_data, - sizeof (struct Recog_data)); + sizeof (struct recog_data_d)); } /* Ensure that no var tracking notes are emitted in the middle of a Index: gcc/recog.c === --- gcc/recog.c 2013-05-15 20:11:26.453211775 +0100 +++ gcc/recog.c 2013-05-16 19:45:18.317837923 +0100 @@ -70,7 +70,7 @@ static rtx split_insn (rtx); int volatile_ok; -struct Recog_data recog_data; +struct recog_data_d recog_data; /* Contains a vector of operand_alternative structures for every operand. Set up by preprocess_constraints. */ Index: gcc/recog.h === --- gcc/recog.h 2013-05-15 20:11:27.507218945 +0100 +++ gcc/recog.h 2013-05-16 19:43:40.498441810 +0100 @@ -179,7 +179,7 @@ skip_alternative (const char *p) /* The following vectors hold the results from insn_extract. */ -struct Recog_data +struct recog_data_d { /* It is very tempting to make the 5 operand related arrays into a structure and index on that. However, to be source compatible @@ -245,7 +245,7 @@ struct Recog_data rtx insn; }; -extern struct Recog_data recog_data; +extern struct recog_data_d recog_data; /* Contains a vector of operand_alternative structures for every operand. Set up by preprocess_constraints. */ Index: gcc/reload.c === --- gcc/reload.c2013-05-15 20:11:21.368177166 +0100 +++ gcc/reload.c2013-05-16 19:45:48.018116702 +0100 @@ -895,7 +895,7 @@ can_reload_into (rtx in, int regno, enum { rtx dst, test_insn; int r = 0; - struct Recog_data save_recog_data; + struct recog_data_d save_recog_data; /* For matching constraints, we often get notional input reloads where we want to use the original register as the reload register. I.e.
Re: debuggability of recog_data
On Thu, May 16, 2013 at 2:02 PM, Richard Sandiford wrote: > Steven Bosscher writes: >> On Wed, May 15, 2013 at 12:14 AM, Mike Stump wrote: >>> I don't what to bike shed. So, I'm happy if the next poor soul that >>> touches it just does so. If people like recog_data_info, I'd be happy >>> to change it to that. Let's give then peanut gallery a day to vote on >>> it. :-) >> >> Usually we append "_d" or "_def" to structure definitions, so recog_data_def? > > Gah, I wrote the patch from memory and forgot about the bit after the comma. > I'm not trying to be contrary really. :-) > > Bootstrapped & regression-tested on x86_64-linux-gnu. OK to install? > > Thanks, > Richard > > > gcc/ > * recog.h (Recog_data): Rename to... > (recog_data_t): ...this. ^^^ It should be recog_data_d. -- H.J.
[PATCH, i386]: Use detect_caches_amd for other vendors
Hello! According to cpuid dumps at [1], we can use detect_caches_amd for several othver vendors. 2013-05-16 Uros Bizjak * config/i386/driver-i386.c (host_detect_local_cpu): Determine cache parameters using detect_caches_amd also for CYRIX, NSC and TM2 signatures. Tested on x86_64-pc-linux-gnu and committed to mainline SVN. [1] http://instlatx64.atw.hu/ Uros. Index: config/i386/driver-i386.c === --- config/i386/driver-i386.c (revision 198987) +++ config/i386/driver-i386.c (working copy) @@ -518,7 +518,10 @@ if (!arch) { if (vendor == signature_AMD_ebx - || vendor == signature_CENTAUR_ebx) + || vendor == signature_CENTAUR_ebx + || vendor == signature_CYRIX_ebx + || vendor == signature_NSC_ebx + || vendor == signature_TM2_ebx) cache = detect_caches_amd (ext_level); else if (vendor == signature_INTEL_ebx) {
Re: GCC does not support *mmintrin.h with function specific opts
Hi Jakub, I have taken your proposed changes and made patch for this. Please let me know what you think. I have changed only the headers mmintrin.h and x86intrin.h as that includes all the other headers. The builtins get enabled automatically when the pragma target is specified so need to do any thing to def_builtin. I have included 4 test case, where intrinsics_4.c uses your example with __mm256_and_ps. I had to fix a bug with lzcnt builtins in i386-common.c as that was not handled there. Thanks Sri On Wed, May 15, 2013 at 7:25 PM, Sriraman Tallam wrote: > On Tue, May 14, 2013 at 3:04 AM, Jakub Jelinek wrote: >> On Tue, May 14, 2013 at 10:39:13AM +0200, Jakub Jelinek wrote: >>> When trying with -O2 -mno-avx: >>> #ifndef __AVX__ >>> #pragma GCC push_options >>> #pragma GCC target("avx") >>> #define __DISABLE_AVX__ >>> #endif >>> typedef float __v8sf __attribute__ ((__vector_size__ (32))); >>> typedef float __m256 __attribute__ ((__vector_size__ (32), __may_alias__)); >>> extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, >>> __artificial__)) >>> _mm256_and_ps (__m256 __A, __m256 __B) { return (__m256) >>> __builtin_ia32_andps256 ((__v8sf)__A, (__v8sf)__B); } >>> #ifdef __DISABLE_AVX__ >>> #pragma GCC pop_options >>> #undef __DISABLE_AVX__ >>> #endif >>> __m256 a, b, c; >>> void __attribute__((target ("avx"))) >>> foo (void) >>> { >>> a = _mm256_and_ps (b, c); >>> } >>> we get bogus errors and ICE: >>> tty2.c: In function '_mm256_and_ps': >>> tty2.c:9:1: note: The ABI for passing parameters with 32-byte alignment has >>> changed in GCC 4.6 >>> tty2.c: In function 'foo': >>> tty2.c:9:82: error: '__builtin_ia32_andps256' needs isa option -m32 >>> tty2.c:9:82: internal compiler error: in emit_move_insn, at expr.c:3486 >>> 0x77a3d2 emit_move_insn(rtx_def*, rtx_def*) >>> ../../gcc/expr.c:3485 >>> (I have added "1 ||" instead of your generate_builtins into i386.c >>> (def_builtin)), that just shows that target attribute/pragma support still >>> has very severe issues that need to be fixed, instead of papered around. >>> >>> Note, we ICE on: >>> #pragma GCC target ("mavx") >>> That should be fixed too. >> >> Ok, I had a brief look at the above two issues. >> >> The first testcase has the problem that the ix86_previous_fndecl cache >> gets out of date. When set_cfun is called on _mm256_and_ps (with the >> implicit avx attribute), then ix86_previous_fndecl is set to _mm256_and_ps, >> TARGET_AVX is set to true, target reinited. Then set_cfun is called >> with NULL, we don't do anything. Later on #pragma GCC pop_options appears, >> sets !TARGET_AVX (as that is the new target_option_current_node). >> Next foo is being parsed, avx attribute is noticed, the same target node >> is used for it, but when set_cfun is called for foo, ix86_previous_fndecl's >> target node is the same as foo's and so we don't do cl_target_restore_option >> at all, so !TARGET_AVX remains, while it should be set. That is the reason >> for the bogus inform etc. Fixed by resetting the ix86_previous_fndecl cache >> on any #pragma GCC target below. The #pragma GCC target ("mavx") is also >> fixed below. The patch also includes the "1 ||" to enable building all >> builtins. We still ICE with: >> #0 fancy_abort (file=0x11d8fad "../../gcc/expr.c", line=316, >> function=0x11dada3 "convert_move") at ../../gcc/diagnostic.c:1180 >> #1 0x00771c39 in convert_move (to=0x71b2df00, >> from=0x71b314e0, unsignedp=0) at ../../gcc/expr.c:316 >> #2 0x0078009f in store_expr (exp=0x719ab390, >> target=0x71b2df00, call_param_p=0, nontemporal=false) at >> ../../gcc/expr.c:5300 >> #3 0x0077eba1 in expand_assignment (to=0x71b35090, >> from=0x719ab390, nontemporal=false) at ../../gcc/expr.c:5025 >> on the first testcase. We don't ICE say on: >> #ifndef __AVX__ >> #pragma GCC push_options >> #pragma GCC target("avx") >> #define __DISABLE_AVX__ >> #endif >> typedef float __v8sf __attribute__ ((__vector_size__ (32))); >> typedef float __m256 __attribute__ ((__vector_size__ (32), __may_alias__)); >> extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, >> __artificial__)) >> _mm256_and_ps (__m256 __A, __m256 __B) { return (__m256) >> __builtin_ia32_andps256 ((__v8sf)__A, (__v8sf)__B); } >> #ifdef __DISABLE_AVX__ >> #pragma GCC pop_options >> #undef __DISABLE_AVX__ >> #endif >> __m256 a[10], b[10], c[10]; >> void __attribute__((target ("avx"))) >> foo (void) >> { >> a[0] = _mm256_and_ps (b[0], c[0]); >> } >> The problem is that in the first testcase, the VAR_DECL c (guess also b and >> a) have TYPE_MODE (TREE_TYPE (c)) == V8SFmode (this is dynamic, for vector >> types TYPE_MODE is a function call), but DECL_MODE (c) is BLKmode >> (it has been laid out while -mno-avx has been the current) and also >> DECL_RTL which is a mem:BLK. Guess expr.c would need to special case >> TREE_STATIC or DECL_EXTERNAL VAR_DECLs with vector type, if they have >> DECL_
Re: GCC does not support *mmintrin.h with function specific opts
On Thu, 16 May 2013, Sriraman Tallam wrote: Hi Jakub, I have taken your proposed changes and made patch for this. Please let me know what you think. I have changed only the headers mmintrin.h and x86intrin.h as that includes all the other headers. I don't really understand why you made the change to x86intrin.h instead of making it inside each *mmintrin.h header. The code would be the same size, it would let us include smmintrin.h directly if we wanted to, and x86intrin.h would also automatically work. -- Marc Glisse
Re: GCC does not support *mmintrin.h with function specific opts
On Thu, May 16, 2013 at 3:55 PM, Marc Glisse wrote: > On Thu, 16 May 2013, Sriraman Tallam wrote: > >> Hi Jakub, >> >> I have taken your proposed changes and made patch for this. Please >> let me know what you think. I have changed only the headers mmintrin.h >> and x86intrin.h as that includes all the other headers. > > > I don't really understand why you made the change to x86intrin.h instead of > making it inside each *mmintrin.h header. The code would be the same size, > it would let us include smmintrin.h directly if we wanted to, and > x86intrin.h would also automatically work. Right, I should have done that instead! Sri > > -- > Marc Glisse
Re: [PATCH v2] Test case for PR55033
On Fri, May 10, 2013 at 11:43 PM, Chung-Ju Wu wrote: > 2013/5/10 Sebastian Huber : > v2: Format changes > > gcc/testsuite/ChangeLog > 2013-05-10 Sebastian Huber > > PR target/55033 > * gcc.target/powerpc/pr55033.c: New. The testcase is okay. Do you have CVS write access or do you need someone to commit the patch for you? Thanks, David
Re: [libitm,PATCH] Fix bootstrap due to __always_inline in libitm
Ping. This will fix bootstrap on FreeBSD (and it seems NetBSD). (Paolo provided some comments, though this looks like the simplest patch to fix the issue.) Gerald On Mon, 1 Apr 2013, Gerald Pfeifer wrote: > Andi's patch broke bootstrap on all FreeBSD platforms, which took me > a bit to realize since he did not update the ChangeLog: > >2013-03-23 Andi Kleen > > * local_atomic (__always_inline): Add. > (__calculate_memory_order, atomic_thread_fence, > atomic_signal_fence, test_and_set, clear, store, load, > exchange, compare_exchange_weak, compare_exchange_strong, > fetch_add, fetch_sub, fetch_and, fetch_or, fetch_xor): > Add __always_inline to force inlining. > > The problem is the he added the following to local_atomic > > #ifndef __always_inline > #define __always_inline inline __attribute__((always_inline)) > #endif > > whereas /usr/include/sys/cdefs.h on FreeBSD has the following > > #define__always_inline __attribute__((__always_inline__)) > > and hence misses the inline (plus libitm/common.h already has > ALWAYS_INLINE for that purpose). > > I am fixing this by adding an explicit inline to those cases where > necessary. I did not add it to struct members, which are considered > inline by default (and believe Andi's patch may have been a bit over- > eager from that perspective). > > Bootstrapped and regression tested on i386-unknown-freebsd10.0. > > Okay? > > Gerald > > 2013-03-31 Gerald Pfeifer > > PR bootstrap/56714 > * local_atomic (__calculate_memory_order): Mark inline. > (atomic_thread_fence): Ditto. > (atomic_signal_fence): Ditto. > (atomic_bool::atomic_flag_test_and_set_explicit): Ditto. > (atomic_bool::atomic_flag_clear_explicit): Ditto. > (atomic_bool::atomic_flag_test_and_set): Ditto. > (atomic_bool::atomic_flag_clear): Ditto. > > Index: local_atomic > === > --- local_atomic (revision 197262) > +++ local_atomic (working copy) > @@ -75,7 +75,7 @@ >memory_order_seq_cst > } memory_order; > > - __always_inline memory_order > + inline __always_inline memory_order >__calculate_memory_order(memory_order __m) noexcept >{ > const bool __cond1 = __m == memory_order_release; > @@ -85,13 +85,13 @@ > return __mo2; >} > > - __always_inline void > + inline __always_inline void >atomic_thread_fence(memory_order __m) noexcept >{ > __atomic_thread_fence (__m); >} > > - __always_inline void > + inline __always_inline void >atomic_signal_fence(memory_order __m) noexcept >{ > __atomic_thread_fence (__m); > @@ -1545,38 +1545,38 @@ > > >// Function definitions, atomic_flag operations. > - __always_inline bool > + inline __always_inline bool >atomic_flag_test_and_set_explicit(atomic_flag* __a, > memory_order __m) noexcept >{ return __a->test_and_set(__m); } > > - __always_inline bool > + inline __always_inline bool >atomic_flag_test_and_set_explicit(volatile atomic_flag* __a, > memory_order __m) noexcept >{ return __a->test_and_set(__m); } > > - __always_inline void > + inline __always_inline void >atomic_flag_clear_explicit(atomic_flag* __a, memory_order __m) noexcept >{ __a->clear(__m); } > > - __always_inline void > + inline __always_inline void >atomic_flag_clear_explicit(volatile atomic_flag* __a, >memory_order __m) noexcept >{ __a->clear(__m); } > > - __always_inline bool > + inline __always_inline bool >atomic_flag_test_and_set(atomic_flag* __a) noexcept >{ return atomic_flag_test_and_set_explicit(__a, memory_order_seq_cst); } > > - __always_inline bool > + inline __always_inline bool >atomic_flag_test_and_set(volatile atomic_flag* __a) noexcept >{ return atomic_flag_test_and_set_explicit(__a, memory_order_seq_cst); } > > - __always_inline void > + inline __always_inline void >atomic_flag_clear(atomic_flag* __a) noexcept >{ atomic_flag_clear_explicit(__a, memory_order_seq_cst); } > > - __always_inline void > + inline __always_inline void >atomic_flag_clear(volatile atomic_flag* __a) noexcept >{ atomic_flag_clear_explicit(__a, memory_order_seq_cst); } > >
[C++ Patch] PR 18126
Hi, in this old issue Joseph pointed out that the C++ parser doesn't accept the GNU Extension 'sizeof compound-literal' and provided indications about the way to approach it. Indeed, I found very easy to use in cp_parser_sizeof_operand the same approach adopted in cp_parser_cast_expression (modulo the comments, only a few lines of code). Tested x86_64-linux. Thanks, Paolo. / /cp 2013-05-17 Paolo Carlini PR c++/18126 * parser.c (cp_parser_sizeof_operand): As a GNU Extension, parse correctly sizeof compound-literal; update comments. /testsuite 2013-05-17 Paolo Carlini PR c++/18126 * g++.dg/ext/sizeof-complit.C: New. Index: cp/parser.c === --- cp/parser.c (revision 198994) +++ cp/parser.c (working copy) @@ -6591,6 +6591,9 @@ cp_parser_pseudo_destructor_name (cp_parser* parse __real__ cast-expression __imag__ cast-expression && identifier + sizeof ( type-id ) { initializer-list , [opt] } + alignof ( type-id ) { initializer-list , [opt] } [C++0x] + __alignof__ ( type-id ) { initializer-list , [opt] } ADDRESS_P is true iff the unary-expression is appearing as the operand of the `&' operator. CAST_P is true if this expression is @@ -13968,6 +13971,7 @@ cp_parser_type_specifier (cp_parser* parser, __int128 __typeof__ unary-expression __typeof__ ( type-id ) + __typeof__ ( type-id ) { initializer-list , [opt] } Returns the indicated TYPE_DECL. If DECL_SPECS is not NULL, it is appropriately updated. */ @@ -22988,21 +22992,44 @@ cp_parser_sizeof_operand (cp_parser* parser, enum construction. */ if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)) { - tree type; - bool saved_in_type_id_in_expr_p; + tree type = NULL_TREE; + bool compound_literal_p; /* We can't be sure yet whether we're looking at a type-id or an expression. */ cp_parser_parse_tentatively (parser); /* Consume the `('. */ cp_lexer_consume_token (parser->lexer); - /* Parse the type-id. */ - saved_in_type_id_in_expr_p = parser->in_type_id_in_expr_p; - parser->in_type_id_in_expr_p = true; - type = cp_parser_type_id (parser); - parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p; - /* Now, look for the trailing `)'. */ - cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN); + /* Note: as a GNU Extension, compound literals are considered +postfix-expressions as they are in C99, so they are valid +arguments to sizeof. See comment in cp_parser_cast_expression +for details. */ + cp_lexer_save_tokens (parser->lexer); + /* Skip tokens until the next token is a closing parenthesis. +If we find the closing `)', and the next token is a `{', then +we are looking at a compound-literal. */ + compound_literal_p + = (cp_parser_skip_to_closing_parenthesis (parser, false, false, + /*consume_paren=*/true) + && cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE)); + /* Roll back the tokens we skipped. */ + cp_lexer_rollback_tokens (parser->lexer); + /* If we were looking at a compound-literal, simulate an error +so that the call to cp_parser_parse_definitely below will +fail. */ + if (compound_literal_p) + cp_parser_simulate_error (parser); + else + { + bool saved_in_type_id_in_expr_p = parser->in_type_id_in_expr_p; + parser->in_type_id_in_expr_p = true; + /* Look for the type-id. */ + type = cp_parser_type_id (parser); + /* Look for the closing `)'. */ + cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN); + parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p; + } + /* If all went well, then we're done. */ if (cp_parser_parse_definitely (parser)) { Index: testsuite/g++.dg/ext/sizeof-complit.C === --- testsuite/g++.dg/ext/sizeof-complit.C (revision 0) +++ testsuite/g++.dg/ext/sizeof-complit.C (working copy) @@ -0,0 +1,5 @@ +// PR c++/18126 +// { dg-options "" } + +struct s { int a; int b; }; +char x[((sizeof (struct s){ 1, 2 }) == sizeof (struct s)) ? 1 : -1];
Re: web ICEs on subreg
Mike, This patch is creating new segfaults for 32 bit POWER AIX. Was this patch tested with PowerPC? Program received signal SIGSEGV, Segmentation fault. 0x10a1db88 in _ZL8web_mainv () at /nasfarm/dje/src/src/gcc/web.c:138 138 if (DF_REF_REAL_LOC (*ref) == recog_data.operand_loc[op]) This fails for libgomp vla1.f90, vla2.f90, vla4.f90, vla5.f90, vla6.f90, vla8.f90 -O3 -funroll-all-loops -fopenmp and gcc.dg/torture/stackalign/setjmp-1.c and gcc.c-torture/execute/built-in-setjmp.c -O3 -funroll-all-loops Thanks, David
Re: [libitm,PATCH] Fix bootstrap due to __always_inline in libitm
On Fri, May 17, 2013 at 02:08:27AM +0200, Gerald Pfeifer wrote: > Ping. This will fix bootstrap on FreeBSD (and it seems NetBSD). > > (Paolo provided some comments, though this looks like the simplest > patch to fix the issue.) It's ok for me, but I cannot approve it. -Andi
Re: web ICEs on subreg
On May 16, 2013, at 5:26 PM, David Edelsohn wrote: > This patch is creating new segfaults for 32 bit POWER AIX. Was this > patch tested with PowerPC? No, x86_64. I've added a patch to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57304 If you could let us know if that fixes the problem you've seen, that would be a great help, thanks.
[PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC
This removes two degradations in CPU2006 for 32-bit PowerPC due to lost vectorization opportunities. Previously, GCC treated malloc'd arrays as only guaranteeing 4-byte alignment, even though the glibc implementation guarantees 8-byte alignment. This raises the guarantee to 8 bytes, which is sufficient to permit the missed vectorization opportunities. The guarantee for 64-bit PowerPC should be raised to 16-byte alignment, but doing so currently exposes a latent bug that degrades a 64-bit benchmark. I have therefore not included that change at this time, but added a FIXME recording the information. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new regressions. Verified that SPEC CPU2006 degradations are fixed with no new degradations. Ok for trunk? Also, do you want any backports? Thanks, Bill 2013-05-16 Bill Schmidt * config/rs6000/rs6000.h (MALLOC_ABI_ALIGNMENT): New #define. Index: gcc/config/rs6000/rs6000.h === --- gcc/config/rs6000/rs6000.h (revision 198998) +++ gcc/config/rs6000/rs6000.h (working copy) @@ -2297,6 +2297,13 @@ extern char rs6000_reg_names[][8]; /* register nam /* How to align the given loop. */ #define LOOP_ALIGN(LABEL) rs6000_loop_align(LABEL) +/* Alignment guaranteed by __builtin_malloc. */ +/* FIXME: 128-bit alignment is guaranteed by glibc for TARGET_64BIT. + However, specifying the stronger guarantee currently leads to + a regression in SPEC CPU2006 437.leslie3d. The stronger + guarantee should be implemented here once that's fixed. */ +#define MALLOC_ABI_ALIGNMENT (64) + /* Pick up the return address upon entry to a procedure. Used for dwarf2 unwind information. This also enables the table driven mechanism. */
[patch,fortran] PR50405 - Statement function with itself as argument SEGV's
Not to much to add beyond the title and the patch. The test file fails before (eventually, when you run out of stack) and passes after the patch is applied. No new testsuite failures. --bud Index: gcc/gcc/fortran/resolve.c === --- gcc/gcc/fortran/resolve.c (revision 198955) +++ gcc/gcc/fortran/resolve.c (working copy) @@ -306,6 +306,14 @@ && !resolve_procedure_interface (sym)) return; + if (strcmp (proc->name,sym->name) == 0) + { + gfc_error ("Self referential argument " + "'%s' at %L is not allowed", sym->name, + &proc->declared_at); + return; + } + if (sym->attr.if_source != IFSRC_UNKNOWN) resolve_formal_arglist (sym); !{ dg-do compile } ! submitted by zec...@gmail.com !{ dg-prune-output "Obsolescent feature: Statement function at" } f(f) = 0 ! { dg-error "Self referential argument" } end 2013-05-17 Bud Davis PR fortran/50405 resolve.c (resolve_formal_arglist): Detect error when an argument has the same name as the function.
Fix missing dependency in Makefile.in
I've stumbled across this a few times. tree-switch-conversion.c includes optabs.h and thus should depend on $(OPTABS_H). Installed as obvious. commit 7028127e5cdc8c8662a0850dcd3b08df10d229b3 Author: Jeff Law Date: Thu May 16 21:31:09 2013 -0600 * Makefile.in (tree-switch-conversion.o): Depend on $(OPTABS_H). diff --git a/gcc/ChangeLog b/gcc/ChangeLog index d2371ba..38e8f18 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,7 @@ +2013-05-16 Jeff Law + + * Makefile.in (tree-switch-conversion.o): Depend on $(OPTABS_H). + 2013-05-16 Uros Bizjak * config/i386/driver-i386.c (host_detect_local_cpu): Determine diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 23e2926..63d114b 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3068,7 +3068,7 @@ tree-switch-conversion.o : tree-switch-conversion.c $(CONFIG_H) $(SYSTEM_H) \ $(TM_H) coretypes.h $(GIMPLE_H) $(CFGLOOP_H) \ $(TREE_PASS_H) $(FLAGS_H) $(EXPR_H) $(BASIC_BLOCK_H) \ $(GGC_H) $(OBSTACK_H) $(PARAMS_H) $(CPPLIB_H) $(PARAMS_H) \ -$(GIMPLE_PRETTY_PRINT_H) langhooks.h +$(GIMPLE_PRETTY_PRINT_H) langhooks.h $(OPTABS_H) tree-complex.o : tree-complex.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TREE_H) \ $(TM_H) $(FLAGS_H) $(TREE_FLOW_H) $(TREE_HASHER_H) $(GIMPLE_H) \ $(CFGLOOP_H) tree-iterator.h $(TREE_PASS_H) tree-ssa-propagate.h
Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC
On Thu, May 16, 2013 at 10:40 PM, Bill Schmidt wrote: > This removes two degradations in CPU2006 for 32-bit PowerPC due to lost > vectorization opportunities. Previously, GCC treated malloc'd arrays as > only guaranteeing 4-byte alignment, even though the glibc implementation > guarantees 8-byte alignment. This raises the guarantee to 8 bytes, > which is sufficient to permit the missed vectorization opportunities. > > The guarantee for 64-bit PowerPC should be raised to 16-byte alignment, > but doing so currently exposes a latent bug that degrades a 64-bit > benchmark. I have therefore not included that change at this time, but > added a FIXME recording the information. > > Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new > regressions. Verified that SPEC CPU2006 degradations are fixed with no > new degradations. Ok for trunk? Also, do you want any backports? Okay. If you think that this is appropriate for 4.8 branch, that is okay as well. Is there a FSF GCC Bugzilla open for the 437.leslie3d problem? The FIXME doesn't need to contain the complete explanation, but it would be nice to reference a longer explanation and not pretend to be Fermat's theorem that is too large to write in the marge. Thanks, David
RE: [PATCH, i386]: Update processor_alias_table for missing PTA_PRFCHW and PTA_FXSR flags
Thank you Uros for the patch. Could you backport this to the 4.8.0? -Original Message- From: Uros Bizjak [mailto:ubiz...@gmail.com] Sent: Wednesday, May 15, 2013 11:16 PM To: gcc-patches@gcc.gnu.org Cc: Gopalasubramanian, Ganesh Subject: [PATCH, i386]: Update processor_alias_table for missing PTA_PRFCHW and PTA_FXSR flags Hello! Attached patch adds missing PTA_PRFCHW and PTA_FXSR flags to x86 processor alias table. PRFCHW CPUID flag is shared with 3dnow prefetch flag, so some additional logic is needed to avoid generating SSE prefetches for non-SSE 3dNow! targets, while still generating full set of 3dnow prefetches on 3dNow! targets. 2013-05-15 Uros Bizjak * config/i386/i386.c (iy86_option_override_internal): Update processor_alias_table for missing PTA_PRFCHW and PTA_FXSR flags. Add PTA_POPCNT to corei7 entry and remove PTA_SSE from athlon-4 entry. Do not enable SSE prefetch on non-SSE 3dNow! targets. Enable TARGET_PRFCHW for TARGET_3DNOW targets. * config/i386/i386.md (prefetch): Enable for TARGET_PRFCHW instead of TARGET_3DNOW. (*prefetch_3dnow): Enable for TARGET_PRFCHW only. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and was committed to mainline SVN. The patch will be backported to 4.8 branch in a couple of days. Uros.
Re: GCC does not support *mmintrin.h with function specific opts
On Thu, May 16, 2013 at 04:00:53PM -0700, Sriraman Tallam wrote: > On Thu, May 16, 2013 at 3:55 PM, Marc Glisse wrote: > > I don't really understand why you made the change to x86intrin.h instead of > > making it inside each *mmintrin.h header. The code would be the same size, > > it would let us include smmintrin.h directly if we wanted to, and > > x86intrin.h would also automatically work. > > Right, I should have done that instead! Yeah, definitely. For the standalone headers, which have currently guards inside of it, please replace it by the larger snippets involving #pragma, and in the x86intrin.h/immintrin.h headers include those unconditionally, instead of just if is defined. For the non-standalone headers (newer ones like avxintrin.h), replace the #ifdef in immintrin.h/x86intrin.h with larger snippets. Jakub
[PATCH] Pattern recognizer rotate improvement
On Wed, May 15, 2013 at 03:24:37PM +0200, Richard Biener wrote: > We have the same issue in some other places where we insert invariant > code into the loop body - one reason there is another LIM pass > after vectorization. Well, in this case it causes the shift amount to be loaded into a vector instead of scalar, therefore even when LIM moves it before the loop, it will only work with vector/vector shifts and be more expensive that way (need to broadcast the value in a vector). The following patch improves it slightly at least for loops, by just emitting the shift amount stmts to loop preheader, rotate-4.c used to be only vectorizable with -mavx2 (which has vector/vector shifts), now also -mavx (which doesn't) vectorizes it. Unfortunately this trick doesn't work for SLP vectorization, emitting the stmts at the start of the current bb doesn't help, because every stmt emits its own and thus it is vectorized with vector/vector shifts only anyway. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-05-17 Jakub Jelinek * tree-vect-patterns.c (vect_recog_rotate_pattern): For vect_external_def oprnd1 with loop_vinfo, try to emit optional cast, negation and and stmts on the loop preheader edge instead of into the pattern def seq. * gcc.target/i386/rotate-4.c: Compile only with -mavx instead of -mavx2, require only avx instead of avx2. * gcc.target/i386/rotate-4a.c: Include avx-check.h instead of avx2-check.h and turn into an avx runtime test instead of avx2 runtime test. --- gcc/tree-vect-patterns.c.jj 2013-05-16 13:56:08.0 +0200 +++ gcc/tree-vect-patterns.c2013-05-16 15:27:00.565143478 +0200 @@ -1494,6 +1494,7 @@ vect_recog_rotate_pattern (vec * bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo); enum vect_def_type dt; optab optab1, optab2; + edge ext_def = NULL; if (!is_gimple_assign (last_stmt)) return NULL; @@ -1574,6 +1575,21 @@ vect_recog_rotate_pattern (vec * if (*type_in == NULL_TREE) return NULL; + if (dt == vect_external_def + && TREE_CODE (oprnd1) == SSA_NAME + && loop_vinfo) +{ + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + ext_def = loop_preheader_edge (loop); + if (!SSA_NAME_IS_DEFAULT_DEF (oprnd1)) + { + basic_block bb = gimple_bb (SSA_NAME_DEF_STMT (oprnd1)); + if (bb == NULL + || !dominated_by_p (CDI_DOMINATORS, ext_def->dest, bb)) + ext_def = NULL; + } +} + def = NULL_TREE; if (TREE_CODE (oprnd1) == INTEGER_CST || TYPE_MODE (TREE_TYPE (oprnd1)) == TYPE_MODE (type)) @@ -1593,7 +1609,14 @@ vect_recog_rotate_pattern (vec * def = vect_recog_temp_ssa_var (type, NULL); def_stmt = gimple_build_assign_with_ops (NOP_EXPR, def, oprnd1, NULL_TREE); - append_pattern_def_seq (stmt_vinfo, def_stmt); + if (ext_def) + { + basic_block new_bb + = gsi_insert_on_edge_immediate (ext_def, def_stmt); + gcc_assert (!new_bb); + } + else + append_pattern_def_seq (stmt_vinfo, def_stmt); } stype = TREE_TYPE (def); @@ -1618,11 +1641,19 @@ vect_recog_rotate_pattern (vec * def2 = vect_recog_temp_ssa_var (stype, NULL); def_stmt = gimple_build_assign_with_ops (NEGATE_EXPR, def2, def, NULL_TREE); - def_stmt_vinfo - = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo); - set_vinfo_for_stmt (def_stmt, def_stmt_vinfo); - STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype; - append_pattern_def_seq (stmt_vinfo, def_stmt); + if (ext_def) + { + basic_block new_bb + = gsi_insert_on_edge_immediate (ext_def, def_stmt); + gcc_assert (!new_bb); + } + else + { + def_stmt_vinfo = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo); + set_vinfo_for_stmt (def_stmt, def_stmt_vinfo); + STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype; + append_pattern_def_seq (stmt_vinfo, def_stmt); + } def2 = vect_recog_temp_ssa_var (stype, NULL); tree mask @@ -1630,11 +1661,19 @@ vect_recog_rotate_pattern (vec * def_stmt = gimple_build_assign_with_ops (BIT_AND_EXPR, def2, gimple_assign_lhs (def_stmt), mask); - def_stmt_vinfo - = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo); - set_vinfo_for_stmt (def_stmt, def_stmt_vinfo); - STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype; - append_pattern_def_seq (stmt_vinfo, def_stmt); + if (ext_def) + { + basic_block new_bb + = gsi_insert_on_edge_immediate (ext_def, def_stmt); + gcc_assert (!new_bb); + } + else + { + def_stmt_vinfo = new_stmt_vec_info (def_stmt, loop_vinfo
[Patch, Fortran] PR48858/55465 - permit multiple bind(C) declarations (but not definitions) for the same proc
Followup (and depending on) to the C binding patches for * COMMON: http://gcc.gnu.org/ml/fortran/2013-05/msg00048.html * Procedures: http://gcc.gnu.org/ml/fortran/2013-05/msg00051.html which honour Fortran 2008, where the Fortran name is no longer a global identifier if a binding name has been specified. The main reason for this patch is a build failure of Open MPI (requires !gcc$ attributes no_arg_check, i.e. it only affects GCC 4.9). Open MPI uses somethine like: interface subroutine pmpi_something() bind(C,name="MPI_something") ... and in a different module: interface subroutine mpi_something() bind(C,name="MPI_something") ... Currently, gfortran rejects it because it only permits one definition/declaration per translation unit. However, there is no reason why multiple INTERFACE blocks shouldn't be permitted. Remarks: a) Better argument checks if definition and declaration are in the same file. (see INTENT patch in a test case) b) Currently, no check is done regarding the characteristic of procedure declarations. Of course, the declaration has to be compatible with the C procedure. However, there seems to be the wish* to permit compatible input - even if the Fortran characteristic is different. For instance "int *" takes both a scalar integer ("int i; f(&i)") and arrays ("int i[5]; f(i)"). Or also popular according to the PRs: Taking a C_LOC or an integer(c_intptr_t). (* Seemingly, also J3 and/or WG5 discussed this (plenum? subgroups?) and they had the permit it. However, finding some official document is difficult.) I was wondering for a while what should be permitted and what shouldn't, but I have now decided to put that completely into the hands of the user. Build and regtested on x86-64-gnu-linux. OK for the trunk? Tobias 2013-05-17 Tobias Burnus PR fortran/48858 PR fortran/55465 * decl.c (add_global_entry): Add sym_name. * parse.c (add_global_procedure): Ditto. * resolve.c (resolve_bind_c_derived_types): Handle multiple decl for a procedure. (resolve_global_procedure): Handle gsym->ns pointing to a module. * trans-decl.c (gfc_get_extern_function_decl): Ditto. 2013-05-17 Tobias Burnus PR fortran/48858 PR fortran/55465 * gfortran.dg/binding_label_tests_10_main.f03: Update dg-error. * gfortran.dg/binding_label_tests_11_main.f03: Ditto. * gfortran.dg/binding_label_tests_13_main.f03: Ditto. * gfortran.dg/binding_label_tests_3.f03: Ditto. * gfortran.dg/binding_label_tests_4.f03: Ditto. * gfortran.dg/binding_label_tests_5.f03: Ditto. * gfortran.dg/binding_label_tests_6.f03: Ditto. * gfortran.dg/binding_label_tests_7.f03: Ditto. * gfortran.dg/binding_label_tests_8.f03: Ditto. * gfortran.dg/c_loc_tests_12.f03: Fix test case. * gfortran.dg/binding_label_tests_24.f90: New. * gfortran.dg/binding_label_tests_25.f90: New. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index cb449a2..6ab9cc7 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -5375,6 +5375,7 @@ add_global_entry (const char *name, const char *binding_label, bool sub) else { s->type = type; + s->sym_name = name; s->where = gfc_current_locus; s->defined = 1; s->ns = gfc_current_ns; @@ -5396,6 +5397,7 @@ add_global_entry (const char *name, const char *binding_label, bool sub) else { s->type = type; + s->sym_name = name; s->binding_label = binding_label; s->where = gfc_current_locus; s->defined = 1; diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c index ba1730a..a223a2c 100644 --- a/gcc/fortran/parse.c +++ b/gcc/fortran/parse.c @@ -4359,10 +4359,15 @@ add_global_procedure (bool sub) if (s->defined || (s->type != GSYM_UNKNOWN && s->type != (sub ? GSYM_SUBROUTINE : GSYM_FUNCTION))) - gfc_global_used(s, NULL); + { + gfc_global_used (s, NULL); + /* Silence follow-up errors. */ + gfc_new_block->binding_label = NULL; + } else { s->type = sub ? GSYM_SUBROUTINE : GSYM_FUNCTION; + s->sym_name = gfc_new_block->name; s->where = gfc_current_locus; s->defined = 1; s->ns = gfc_current_ns; @@ -4379,10 +4384,15 @@ add_global_procedure (bool sub) if (s->defined || (s->type != GSYM_UNKNOWN && s->type != (sub ? GSYM_SUBROUTINE : GSYM_FUNCTION))) - gfc_global_used(s, NULL); + { + gfc_global_used (s, NULL); + /* Silence follow-up errors. */ + gfc_new_block->binding_label = NULL; + } else { s->type = sub ? GSYM_SUBROUTINE : GSYM_FUNCTION; + s->sym_name = gfc_new_block->name; s->binding_label = gfc_new_block->binding_label; s->where = gfc_current_locus; s->defined = 1; diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index f3607b4..74e0aa4 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -2389,6 +2389,11 @@ resolve_global_procedure (gfc_symbol *sym, locus *where, } def_sym = gsym->ns->proc_name; + + /* This can happen if a binding name has been specified. */ + if (gsym
[PATCH] Fix VEC_[LR]SHIFT_EXPR folding for big-endian (PR tree-optimization/57051)
On Thu, May 16, 2013 at 07:59:00PM +0200, Mikael Pettersson wrote: > Jakub Jelinek writes: > > On Thu, Apr 25, 2013 at 11:47:02PM +0200, Jakub Jelinek wrote: > > > This patch adds folding of constant arguments v>> and v<<, which helps to > > > optimize the testcase from the PR back into constant store after > vectorized > > > loop is unrolled. > > > > As this fixes a regression on the 4.8 branch, I've backported it (and > > minimal prerequisite for that) to 4.8 branch too. > > Unfortunately this patch makes gcc.dg/vect/no-scevccp-outer-{7,13}.c fail > on powerpc64-linux: > > +FAIL: gcc.dg/vect/no-scevccp-outer-13.c execution test > +FAIL: gcc.dg/vect/no-scevccp-outer-7.c execution test > > which is a regression from 4.8-20130502. Reverting r198580 fixes it. > > The same FAILs also occur on trunk. Ah right, I was confused by the fact that VEC_RSHIFT_EXPR is used not just on little endian targets, but on big endian as well (VEC_LSHIFT_EXPR is never emitted), but the important spot is when extracting the scalar result from the vector: if (BYTES_BIG_ENDIAN) bitpos = size_binop (MULT_EXPR, bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1), TYPE_SIZE (scalar_type)); else bitpos = bitsize_zero_node; Fixed thusly, ok for trunk/4.8? 2013-05-17 Jakub Jelinek PR tree-optimization/57051 * fold-const.c (const_binop) : Fix BYTES_BIG_ENDIAN handling. --- gcc/fold-const.c.jj 2013-05-16 12:36:28.0 +0200 +++ gcc/fold-const.c2013-05-17 08:38:12.575117676 +0200 @@ -1393,7 +1393,7 @@ const_binop (enum tree_code code, tree a if (shiftc >= outerc || (shiftc % innerc) != 0) return NULL_TREE; int offset = shiftc / innerc; - if (code == VEC_LSHIFT_EXPR) + if ((code == VEC_RSHIFT_EXPR) ^ (!BYTES_BIG_ENDIAN)) offset = -offset; tree zero = build_zero_cst (TREE_TYPE (type)); for (i = 0; i < count; i++) Jakub
Patch ping
Hi! http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00282.html Reject -fsanitize=address -fsanitize=thread linking that won't ever work at runtime. Jakub
Re: section anchors and weak hidden symbols
On 05/16/13 15:32, Rainer Orth wrote: The new gcc.dg/visibility-21.c testcase fails on i386-pc-solaris2.11 and x86_64-unknown-linux-gnu: FAIL: gcc.dg/visibility-21.c (test for excess errors) Excess errors: /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/visibility-21.c:1:0: warning: this target does not support '-fsection-anchors' [-fsection-anchors] Fixed as follows, tested with the appropriate runtest invokation on both targets where the test becomes UNSUPPORTED, installed on mainline. thanks!