[PATCH][PR tree-optimization/65917] Record both equivalences from if (x == y) style conditionals
This turns out to be far easier than expected. Given a conditional like x == y, we already record the canonicalized x = y equivalence. If we just record y = x then this "just works". The only tricky thing is the following of the SSA_NAME_VALUE chain for the source argument -- we need to avoid that when recording the y = x variant (since following x's value chain would lead back to y). That's easily resolved with an _raw variant which doesn't follow the source value chain. While working through the code, I saw the old comment WRT loop depth and PR 61757 in record_equality. With the code to follow backedges in the CFG gone from the old threader, that code is no longer needed. So I removed it, restoring Richi's cleanup work from early 2015. Bootstrapped & regression tested on x86-linux. Installed on the trunk. Jeff commit 122fa7e277f132dcfef5d66378f7ef3411ae8b84 Author: law Date: Mon Feb 8 08:17:32 2016 + PR tree-optimization/65917 * tree-ssa-dom.c (record_temporary_equivalences): Record both equivalences from if (x == y) style conditionals. (loop_depth_of_name): Remove. (record_equality): Remove loop depth check. * tree-ssa-scopedtables.h (const_and_copies): Refine comments. (const_and_copies::record_const_or_copy_raw): New member function. * tree-ssa-scopedtables.c (const_and_copies::record_const_or_copy_raw): New, factored out of (const_and_copies::record_const_or_copy): Call new member function. PR tree-optimization/65917 * gcc.dg/tree-ssa/20030922-2.c: No longer xfailed. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233207 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index a465156..2c4a8b6 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,17 @@ +2016-02-08 Jeff Law + + PR tree-optimization/65917 + * tree-ssa-dom.c (record_temporary_equivalences): Record both + equivalences from if (x == y) style conditionals. + (loop_depth_of_name): Remove. + (record_equality): Remove loop depth check. + * tree-ssa-scopedtables.h (const_and_copies): Refine comments. + (const_and_copies::record_const_or_copy_raw): New member function. + * tree-ssa-scopedtables.c + (const_and_copies::record_const_or_copy_raw): New, factored out of + (const_and_copies::record_const_or_copy): Call new member function. + + 2016-02-05 Jeff Law PR tree-optimization/68541 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 8d3438e..6940136 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2016-02-08 Jeff Law + +PR tree-optimization/65917 + * gcc.dg/tree-ssa/20030922-2.c: No longer xfailed. + 2016-02-07 Jerry DeLisle PR fortran/50555 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c index 6c133bd..16c79da 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c @@ -20,8 +20,4 @@ rgn_rank (rtx insn1, rtx insn2) } /* There should be two IF conditionals. */ -/* This now fails as it requires a very specific decision of DOM which - SSA name to record as a copy of the other when DOM derives copies - from temporary equivalences. The heuristics there no longer do - the correct thing. VRP still optimizes this testcase. */ -/* { dg-final { scan-tree-dump-times "if " 2 "dom2" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "if " 2 "dom2" } } */ diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index b690d92..f44ac13 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -917,6 +917,15 @@ record_temporary_equivalences (edge e, tree rhs = edge_info->rhs; record_equality (lhs, rhs, const_and_copies); + /* We already recorded that LHS = RHS, with canonicalization, +value chain following, etc. + +We also want to return RHS = LHS, but without any canonicalization +or value chain following. */ + if (TREE_CODE (rhs) == SSA_NAME) + const_and_copies->record_const_or_copy_raw (rhs, lhs, + SSA_NAME_VALUE (rhs)); + /* If LHS is an SSA_NAME and RHS is a constant integer and LHS was set via a widening type conversion, then we may be able to record additional equivalences. */ @@ -1161,33 +1170,6 @@ record_cond (cond_equivalence *p, delete element; } -/* Return the loop depth of the basic block of the defining statement of X. - This number should not be treated as absolutely correct because the loop - information may not be completely up-to-date when dom runs. However, it - will be relatively correct, and as more passes are taught to keep loop info - up to date, the result will become more and more accurate. */ - -static int -loop_depth_of_n
Re: [PR69634] fix debug_insn-inconsistent REG_N_CALLS_CROSSED
On 02/06/2016 03:06 AM, Alexandre Oliva wrote: The testcase has a debug insn referencing a pseudo right before an insn that modifies the pseudo. Without debug insns, REG_N_CALLS_CROSSED was zero for that pseudo, so sched_analyze_reg added a dep between the pseudo setter and an earlier (lib)call. With debug insns, we miscomputed REG_N_CALLS_CROSSED as nonzero because of the debug insn, and then no dep was added between the two insns. This was enough to change sched1's decisions about where to place the pseudo setter. REG_N_CALLS_CROSSED is computed by both regstat_bb_compute_ri and regstat_bb_compute_calls_crossed, but although the former skipped debug insns, the latter didn't. Fixing this inconsistency was enough to fix the -fcompare-debug error. Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok to install? for gcc/ChangeLog PR target/69634 * regstat.c (regstat_bb_compute_calls_crossed): Disregard debug insns. for gcc/testsuite/ChangeLog PR target/69634 * gcc.dg/pr69634.c: New. Code is fine. Testcase needs tweaking per Uros's comments in the BZ: Please don't use explicit -m32. If the test is valid only for 32bit target, you should use "target ia32", otherwise just leave it out. 32bit multilib tests will add -m32 automatically. Explicit -m32 will probably fail with x32, which is x86_64 target, too. With the test fixed this is fine. jeff
Re: [PATCH][P1 tree-optimization/68541] Add heuristics to path splitting
On 02/06/2016 03:19 AM, Andreas Schwab wrote: Jeff Law writes: diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c b/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c new file mode 100644 index 000..aeb926e --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details " } */ + +int +foo(char *p, int n) +{ + int s = 0; + int i; + + for (i = 0; i < n; i++) { +if (p[i] >= 0) + s++; +else + s--; + } + + return s; +} + +/* { dg-final { scan-tree-dump "appears to be a join point for if-convertable diamond" "split-paths" } } */ That fails on aarch64. AH. Depends chars being signed by default. I'll fix it momentarily. Thanks, jeff
[PATCH] Fix split-path-2.c on targets with unsigned characters by default
Andreas pointed out the test was failing on aarch64. It turns out it's got to be failing on all targets where characters are unsigned by default as the test assumes characters are signed by default. Opps. The fix is pretty obvious. Just make the array of chars be signed chars. Installed after verifying x86-linux still passes and hand verification of aarch64 as well. Sorry for the breakage. jeff commit b32ded8cd0158d88155116bde4685a36c4802878 Author: law Date: Mon Feb 8 08:40:00 2016 + PR tree-optimization/68541 * gcc.dg/tree-ssa/split-path-2.c: Make char array explicitly signed. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233208 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 6940136..6aa927d 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,8 @@ 2016-02-08 Jeff Law + PR tree-optimization/68541 + * gcc.dg/tree-ssa/split-path-2.c: Make char array explicitly signed. + PR tree-optimization/65917 * gcc.dg/tree-ssa/20030922-2.c: No longer xfailed. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c b/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c index aeb926e..8f503f2 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c @@ -2,7 +2,7 @@ /* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details " } */ int -foo(char *p, int n) +foo(signed char *p, int n) { int s = 0; int i;
PATCH] Fix PR 31531: A microoptimization of isnegative of signed integer
Hi, Please find attached the patch that performs optimization on unsigned values. Original fold-const part implemented in match.pd. Please review the patch and let us know if it's OK? Regression Tested on X86_64 with no regressions. Thanks, Naveen ChangeLog: * match.pd (cmp (convert (bit_not @0)) INTEGER_CST@1): New Simplifier. Testsuite/ChangeLog: * gcc.dg/pr31531.c : New testcase.diff --git a/gcc/match.pd b/gcc/match.pd index 6c8ebd5..42f772b 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1452,6 +1452,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && (TYPE_SIZE (TREE_TYPE (@0)) == TYPE_SIZE (TREE_TYPE (@1 (view_convert @1))) +/* Fold ((CAST)~X) op C as ((CAST)X) op' ~C, where op' is the + swapped comparison. */ +(for cmp (tcc_comparison) + scmp (swapped_tcc_comparison) + (simplify + (cmp (convert (bit_not @0)) INTEGER_CST@1) + (if (TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@0)) + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))) + (with { tree cst = fold_convert (TREE_TYPE (@0), @1); } +(scmp @0 (bit_not { cst; })) + /* Re-association barriers around constants and other re-association barriers can be removed. */ (simplify diff --git a/gcc/testsuite/gcc.dg/pr31531.c b/gcc/testsuite/gcc.dg/pr31531.c new file mode 100644 index 000..d687c91 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr31531.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-fdump-tree-gimple" } */ +int isnegative_optimized_4 (unsigned int X) +{ + int result; + if ((~X) >> 31) +result = 0; + else +result = 1; + return result; +} +/* { dg-final { scan-tree-dump-times "0 != 0" 1 "gimple" } } */
Re: [PATCH, stage1] Better error recovery for merge-conflict markers
David, On Thu, Apr 9, 2015 at 10:29 AM, Bert Wesarg wrote: > Hi David, > >> Various tools that operate on source code files will inject markers >> into them when an unfixable conflict occurs in a merger. >> >> There appears to be no blessed standard for these conflict markers, >> but an ad-hoc convention is for 7 '<' , '=', or '>' characters at >> the start of a line, followed optionally by a space and optional >> text >> >> e.g.: >> <<< HEAD >> extern int some_var; >> === >> extern short some_var; >> >>> Some other branch >> >> This convention is followed by GNU patch: >> http://git.savannah.gnu.org/cgit/patch.git/tree/src/merge.c >> by git: >> >> http://git.kernel.org/cgit/git/git.git/tree/Documentation/merge-config.txt >> and by various other tools. > > > if you read both of these tools carefully, you will notice an alternative > conflict style (named 'diff3' in both of them), that includes a third > section, the common pre-image. Here is an example: > > <<< HEAD > extern int some_var; > ||| merge base > extern int var; > === > extern short var; Some other branch > > > Additionally, git supports a custom conflict-marker-size to change the > default of 7 on a per file name (the conflict-marker-size attribute). So it > may be worthwhile to support other sizes than 7 in this patch too. you never commentewd on my mail, but I saw this now in trunk. I can only repeat myself here. Thanks. Bert > > Bert >
Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA
On Fri, 5 Feb 2016, Vladimir Makarov wrote: > On 02/05/2016 04:25 AM, Richard Biener wrote: > > The following patch fixes the performance regression for 435.gromacs > > on x86_64 with AVX2 (Haswell or bdver2) caused by > > > > 2015-12-18 Andreas Krebbel > > > > * ira.c (ira_setup_alts): Move the scan for commutative modifier > > to the first loop to make it work even with disabled alternatives. > > > > which in itself is a desirable change giving the RA more freedom. > > > > It turns out the fix makes an existing issue more severe in detecting > > more swappable alternatives and thus exiting ira_setup_alts with > > operands swapped in recog_data. This seems to give a slight preference > > to choose alternatives with the operands swapped (I didn't try to > > investigate how IRA handles the "merged" alternative mask and > > operand swapping in its further processing). > Alternative mask excludes alternatives which will be definitely rejected in > LRA. This approach is to speed up LRA (a lot was done to speed up RA but > still it consumes a big chunk of compiler time which is unusual for all > compilers). > > LRA and reload prefer insns without commutative operands swap when all other > costs are the same. Ok, so leaving operands swapped in ira_setup_alts will prefer the swapped variants in case other costs are the same. > > Of course previous RTL optimizers and canonicalization rules as well > > as backend patterns are tuned towards the not swapped variant and thus > > it happens doing more swaps ends up in slower code (I didn't closely > > investigate). > > > > So I tested the following patch which simply makes sure that > > ira_setup_alts does not alter recog_data. > > > > On a Intel Haswell machine I get (base is with the patch, peak is with > > the above change reverted): > > > >Estimated > > Estimated > > Base Base BasePeak Peak Peak > > Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio > > -- -- - --- - > > - > > 435.gromacs 7140264 27.1 S7140270 > > 26.5 S > > 435.gromacs 7140264 27.1 *7140269 > > 26.6 S > > 435.gromacs 7140263 27.1 S7140269 > > 26.5 * > > == > > 435.gromacs 7140264 27.1 *7140269 > > 26.5 * > > > > which means the patched result is even better than before Andreas > > change. Current trunk homes in at a Run Time of 321s (which is > > the regression to fix). > Thanks for working on this, Richard. It is not easy to find reasons for > worse code on modern processors after such small change. As RA is based on > heuristics it hard to predict the change for a specific benchmark. I remember > I checked Andreas patch on SPEC2000 in a hope that it also improves x86-64 > code but I did not see a difference. > > It is even hard to say sometimes how a specific (non-heuristic) optimization > will affect a specific benchmark performance when a lot of unknown (from > alignments to CPU internals are involved). An year ago I tried to use ML to > choose best options. I used a set of about 100 C benchmarks (and even more > functions). For practically every benchmark, I had an option modification to > -Ofast resulting in faster code but ML prediction did not work at all. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok for trunk? > > > OK. Thanks again. Thanks. Over the weekend I did a full 3-run SPEC 2k6 with the following result. Base is r231814 while peak is r231814 patched, flags are -Ofast -march=native (on a Haswell machine). Estimated Estimated Base Base BasePeak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -- -- - --- - - 400.perlbench9770255 38.4 *9770251 39.0 S 400.perlbench9770258 37.8 S9770250 39.0 * 400.perlbench9770253 38.6 S9770250 39.0 S 401.bzip29650407 23.7 *9650400 24.1 * 401.bzip29650412 23.4 S9650417 23.1 S 401.bzip29650396 24.4 S9650398 24.3 S 403.gcc 8050252 31.9 S8050245 32.9 S 403.gcc 8050240 33.6 S8050244 32.9 * 403.gcc 8050242 33.2 *8050241 33.4 S 429.mcf 9120243 37.6 S9120245 37.3 S 429.mcf 9120224 40.7 S9120241 37.8 * 429.mcf
Re: [PATCH PR68021]Don't add biv candidate if it's not incremented by a single stmt
On Fri, Feb 5, 2016 at 8:20 PM, Bin Cheng wrote: > Hi, > As reported by PR68021, there is an ivopt bug all the time. As designed, > ivopt tries to identify and reuse induction variables in the original input. > Apparently we don't need to compute such original biv because the computation > is already there. Every time an iv use is rewritten, ivopt checks if the > candidate is an original biv using below code: > > /* An important special case -- if we are asked to express value of > the original iv by itself, just exit; there is no need to > introduce a new computation (that might also need casting the > variable to unsigned and back). */ > if (cand->pos == IP_ORIGINAL > && cand->incremented_at == use->stmt) > { > enum tree_code stmt_code; > > gcc_assert (is_gimple_assign (use->stmt)); > gcc_assert (gimple_assign_lhs (use->stmt) == cand->var_after); > > /* Check whether we may leave the computation unchanged. > This is the case only if it does not rely on other > computations in the loop -- otherwise, the computation > we rely upon may be removed in remove_unused_ivs, > thus leading to ICE. */ > stmt_code = gimple_assign_rhs_code (use->stmt); > if (stmt_code == PLUS_EXPR > || stmt_code == MINUS_EXPR > || stmt_code == POINTER_PLUS_EXPR) > { > if (gimple_assign_rhs1 (use->stmt) == cand->var_before) > op = gimple_assign_rhs2 (use->stmt); > else if (gimple_assign_rhs2 (use->stmt) == cand->var_before) > op = gimple_assign_rhs1 (use->stmt); > else > op = NULL_TREE; > } > else > op = NULL_TREE; > > if (op && expr_invariant_in_loop_p (data->current_loop, op)) > return; > } > > Note this code can only handle specific form biv, in which there is a single > explicit increment stmt in the form of "biv_after = biv_before + step". > > Unfortunately, in rare case like this, the biv is increased in two stmts, > like: > biv_x = biv_before + step_part_1; > biv_after = biv_x + step_part_2; > > That's why control flow goes to ICE point. We should not fix code at the ICE > point because: > 1) We shouldn't rewrite biv candidate. Even there is no correctness issue, > it will introduce redundant code by rewriting it. > 2) For non biv candidate, all the computation at ICE point has already been > done before at iv cost computation part. In other words, if control flow > goes here, gcc_assert (comp != NULL" will be true. > > Back to this issue, there are two possible fixes. First one is to specially > rewrite mentioned increment stmts into: > biv_after = biv_before + step > This fix needs more change because we are already after candidate creation > point and need to do computation on ourself. > > Another fix is just don't add biv. In this way, we check stricter condition > when adding biv candidate, guarantee control flow doesn't go to ICE point. > It won't cause worse code (Well, maybe a type conversion from unsigned to > signed) since we add exact the same candidate anyway (but not as a biv > candidate). As a matter of fact, we create/use another candidate which has > the same {base, step} as the biv. The computation of biv now becomes dead > code and will be removed by following passes. > > This patch fixes the issue by the 2nd method. Bootstrap and test on x86_64 > and AArch64 (test ongoing). Is it OK if no failures? Ok. Thanks, Richard. > Thanks, > bin > > 2016-02-04 Bin Cheng > > PR tree-optimization/68021 > * tree-ssa-loop-ivopts.c (increment_stmt_for_biv_p): New function. > (add_iv_candidate_for_biv, rewrite_use_nonlinear_expr): Call above > function checking if stmt is increment stmt for biv. > > gcc/testsuite/ChangeLog > 2016-02-04 Bin Cheng > > PR tree-optimization/68021 > * gcc.dg/tree-ssa/pr68021.c: New test. >
Re: RFA (convert): PATCH for c++/69631 (wrong overflow error with -fwrapv)
On Fri, Feb 5, 2016 at 9:34 PM, Jason Merrill wrote: > The problem here was that the call to convert_to_integer_nofold was still > pushing the conversion down into the multiply expression, and when we do the > multiplication in unsigned short it overflows. This patch fixes > convert_to_integer_1 to not do any truncation distribution when !dofold. > But that broke several testcases, because fold() doesn't ever try to do that > distribution, and so was missing some optimizations revealed by the > distribution. So this patch changes cp_fold to redo the conversion with > dofold enabled. I also change the C++ convert() entry point to do folding > conversion again, since a few places in convert_to_integer_1, and probably > elsewhere in the back end, expect that. > > For this email I didn't re-indent the truncation distribution code in > convert_to_integer_1 to make it easier to read; for what I check in should I > leave it like this, re-indent it, or do something else like a goto to avoid > the need for re-indentation? re-indent is fine but you could also simply tail-duplicate (optimized) /* When parsing long initializers, we might end up with a lot of casts. Shortcut this. */ if (TREE_CODE (expr) == INTEGER_CST) return fold_convert (type, expr); return build1 (CONVERT_EXPR, type, expr); and thus do if (!dofold) return ...; > Tested x86_64-pc-linux-gnu, OK for trunk? Ok. Note that I think we should move all the truncation distribution code elsewhere (match.pd). But that's probably for GCC 7. Richard.
Re: PATCH] Fix PR 31531: A microoptimization of isnegative of signed integer
Hurugalawadi, Naveen writes: > Hi, > > Please find attached the patch that performs optimization on unsigned values. > > Original fold-const part implemented in match.pd. > > Please review the patch and let us know if it's OK? > > Regression Tested on X86_64 with no regressions. > > Thanks, > Naveen > > ChangeLog: > * match.pd (cmp (convert (bit_not @0)) INTEGER_CST@1): New Simplifier. > > Testsuite/ChangeLog: > * gcc.dg/pr31531.c : New testcase. Could you please add dg-require-effective-target int32 to the test to avoid breaking targets with smaller ints (right shift by 31 bits)? Regards Senthil
Re: [PATCH] Fix PR 69282: ICE on 464.h264ref at -O3
On Mon, Feb 8, 2016 at 7:22 AM, Andrew Pinski wrote: > Hi, > The problem is that even though expand knows how to expand > VEC_COND when there is no vcond_mask pattern, > expand_vec_cond_expr_p returns false. So this patch allows > expand_vec_cond_expr_p to try the next part of the function if there > is no vcond_mask pattern. > > OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. Please combine the two ifs. if (VECTOR_BOOLEAN_TYPE_P (cmp_op_type) && get_vcond_mask_icode (TYPE_MODE (value_type), + TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing) return true; Ok with that change. Thanks, Richard. > Thanks, > Andrew Pinski > > ChangeLog: > * optabs-tree.c (expand_vec_cond_expr_p): Don't early return if > get_vcond_mask_icode returns false. > > Testsuite/ChangeLog: > * gcc.c-torture/compile/20160205-1.c: New testcase.
Re: PATCH] Fix PR 31531: A microoptimization of isnegative of signed integer
On Mon, Feb 8, 2016 at 10:16 AM, Senthil Kumar Selvaraj wrote: > > Hurugalawadi, Naveen writes: > >> Hi, >> >> Please find attached the patch that performs optimization on unsigned values. >> >> Original fold-const part implemented in match.pd. >> >> Please review the patch and let us know if it's OK? >> >> Regression Tested on X86_64 with no regressions. >> >> Thanks, >> Naveen >> >> ChangeLog: >> * match.pd (cmp (convert (bit_not @0)) INTEGER_CST@1): New Simplifier. >> >> Testsuite/ChangeLog: >> * gcc.dg/pr31531.c : New testcase. > > Could you please add dg-require-effective-target int32 to the test to > avoid breaking targets with smaller ints (right shift by 31 bits)? +/* Fold ((CAST)~X) op C as ((CAST)X) op' ~C, where op' is the + swapped comparison. */ +(for cmp (tcc_comparison) + scmp (swapped_tcc_comparison) + (simplify + (cmp (convert (bit_not @0)) INTEGER_CST@1) + (if (TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@0)) + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))) + (with { tree cst = fold_convert (TREE_TYPE (@0), @1); } +(scmp @0 (bit_not { cst; })) I fail to see how this pattern is a good fit for the testcase, can you try explaining? I'm wondering that you require 'type's precision (a bool!) to be equal to that of @0. I'm also failing to see why you can't enhance the existing /* Fold ~X op C as X op' ~C, where op' is the swapped comparison. */ (for cmp (simple_comparison) scmp (swapped_simple_comparison) (simplify (cmp (bit_not@2 @0) CONSTANT_CLASS_P@1) (if (single_use (@2) && (TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST)) (scmp @0 (bit_not @1) Thanks, Richard. > Regards > Senthil
[Ada] Fix oversight in latest create_var_decl change
It causes the global uninitialized variables to be put in DATA instead of (the equivalent of) BSS on some platforms like AIX. Fixed thusly, tested on x86_64-suse-linux, applied on the mainline. 2016-02-08 Eric Botcazou * gcc-interface/utils.c (create_var_decl): Set again DECL_COMMON and DECL_IGNORED_P last. -- Eric BotcazouIndex: gcc-interface/utils.c === --- gcc-interface/utils.c (revision 233194) +++ gcc-interface/utils.c (working copy) @@ -2484,6 +2484,24 @@ create_var_decl (tree name, tree asm_nam DECL_ARTIFICIAL (var_decl) = artificial_p; DECL_EXTERNAL (var_decl) = extern_flag; + TREE_CONSTANT (var_decl) = constant_p; + TREE_READONLY (var_decl) = const_flag; + + /* The object is public if it is external or if it is declared public + and has static storage duration. */ + TREE_PUBLIC (var_decl) = extern_flag || (public_flag && static_storage); + + /* We need to allocate static storage for an object with static storage + duration if it isn't external. */ + TREE_STATIC (var_decl) = !extern_flag && static_storage; + + TREE_SIDE_EFFECTS (var_decl) += TREE_THIS_VOLATILE (var_decl) += TYPE_VOLATILE (type) | volatile_flag; + + if (TREE_SIDE_EFFECTS (var_decl)) +TREE_ADDRESSABLE (var_decl) = 1; + /* Ada doesn't feature Fortran-like COMMON variables so we shouldn't try to fiddle with DECL_COMMON. However, on platforms that don't support global BSS sections, uninitialized global variables would @@ -2508,24 +2526,6 @@ create_var_decl (tree name, tree asm_nam != null_pointer_node)) DECL_IGNORED_P (var_decl) = 1; - TREE_CONSTANT (var_decl) = constant_p; - TREE_READONLY (var_decl) = const_flag; - - /* The object is public if it is external or if it is declared public - and has static storage duration. */ - TREE_PUBLIC (var_decl) = extern_flag || (public_flag && static_storage); - - /* We need to allocate static storage for an object with static storage - duration if it isn't external. */ - TREE_STATIC (var_decl) = !extern_flag && static_storage; - - TREE_SIDE_EFFECTS (var_decl) -= TREE_THIS_VOLATILE (var_decl) -= TYPE_VOLATILE (type) | volatile_flag; - - if (TREE_SIDE_EFFECTS (var_decl)) -TREE_ADDRESSABLE (var_decl) = 1; - /* ??? Some attributes cannot be applied to CONST_DECLs. */ if (TREE_CODE (var_decl) == VAR_DECL) process_attributes (&var_decl, &attr_list, true, gnat_node);
[PATCH, PR69599] Fix GOMP/GOACC_parallel optimization in ipa-pta
Hi, when compiling the fipa-pta tests in the libgomp testsuite (omp-nested-2.c, pr46032.c) with -flto -flto-partition=max, the tests fail in execution (PR69599). The problem is related to the GOMP/GOACC_parallel optimization we do in fipa-pta, where we interpret a call GOMP_parallel (&foo._0, data) as a call foo._0 (data). The problem is that this optimization is only legal in lto if both: - foo containing the call GOMP_parallel (&foo._0, data) and - foo._0 are contained in the same partition. In the case of -flto-partition=max, foo is contained in it's own partition, and foo._0 is contained in another partition. This means the data argument to the GOMP_parallel call appears unused, and the setting of the argument is optimized away, which causes the execution failure. This patch fixes that by testing if foo and foo._0 are part of the same partition. [ Note that the node_address_taken change in the patch has no effect, since nonlocal_p already tests for used_from_other_partition. But I thought it was clearer to state the conditions under which we are allowed to ignore node->address_taken explicitly. ] Bootstrapped and reg-tested on x86_64. Build for nvidia accelerator and reg-tested libgomp with various lto settings. OK for trunk, stage4? Thanks, - Tom Fix GOMP/GOACC_parallel optimization in ipa-pta 2016-02-08 Tom de Vries PR tree-optimization/69599 * tree-ssa-structalias.c (fndecl_maybe_in_other_partition): New function. (find_func_aliases_for_builtin_call, find_func_clobbers) (ipa_pta_execute): Handle case that foo and foo._0 are not in same lto partition. * testsuite/libgomp.c/omp-nested-3.c: New test. * testsuite/libgomp.c/pr46032-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/kernels-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/parallel-2.c: New test. --- gcc/tree-ssa-structalias.c | 53 ++ libgomp/testsuite/libgomp.c/omp-nested-3.c | 4 ++ libgomp/testsuite/libgomp.c/pr46032-2.c| 4 ++ .../libgomp.oacc-c-c++-common/kernels-2.c | 4 ++ .../libgomp.oacc-c-c++-common/parallel-2.c | 4 ++ 5 files changed, 60 insertions(+), 9 deletions(-) diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c index e7d0797..feebd8d 100644 --- a/gcc/tree-ssa-structalias.c +++ b/gcc/tree-ssa-structalias.c @@ -4162,6 +4162,18 @@ find_func_aliases_for_call_arg (varinfo_t fi, unsigned index, tree arg) process_constraint (new_constraint (lhs, *rhsp)); } +/* Return true if FNDECL may be part of another lto partition. */ + +static bool +fndecl_maybe_in_other_partition (tree fndecl) +{ + cgraph_node *fn_node = cgraph_node::get (fndecl); + if (fn_node == NULL) +return true; + + return fn_node->in_other_partition; +} + /* Create constraints for the builtin call T. Return true if the call was handled, otherwise false. */ @@ -4537,6 +4549,11 @@ find_func_aliases_for_builtin_call (struct function *fn, gcall *t) tree fnarg = gimple_call_arg (t, fnpos); gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR); tree fndecl = TREE_OPERAND (fnarg, 0); + if (in_lto_p + && fndecl_maybe_in_other_partition (fndecl)) + /* Fallthru to general call handling. */ + break; + tree arg = gimple_call_arg (t, argpos); varinfo_t fi = get_vi_for_tree (fndecl); @@ -5113,6 +5130,11 @@ find_func_clobbers (struct function *fn, gimple *origt) tree fnarg = gimple_call_arg (t, fnpos); gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR); tree fndecl = TREE_OPERAND (fnarg, 0); + if (in_lto_p + && fndecl_maybe_in_other_partition (fndecl)) + /* Fallthru to general call handling. */ + break; + varinfo_t cfi = get_vi_for_tree (fndecl); tree arg = gimple_call_arg (t, argpos); @@ -7505,9 +7527,14 @@ ipa_pta_execute (void) address_taken bit for function foo._0, which would make it non-local. But for the purpose of ipa-pta, we can regard the run_on_threads call as a local call foo._0 (data), so we ignore address_taken on nodes - with parallelized_function set. */ - bool node_address_taken = (node->address_taken - && !node->parallelized_function); + with parallelized_function set. + Note: this is only safe, if foo and foo._0 are in the same lto + partition. */ + bool node_address_taken = ((node->parallelized_function + && !(in_lto_p + && node->used_from_other_partition)) + ? false + : node->address_taken); /* For externally visible or attribute used annotated functions use local constraints for their arguments. @@ -7676,12 +7703,20 @@ ipa_pta_execute (void) continue; /* Handle direct calls to functions with body. */ - if (gimple_call_builtin_p (stmt, BUILT_IN_GOMP_PARALLEL)) - decl = TREE_OPERAND (gimple_call_arg (stmt, 0), 0); - else if (gimple_call_builtin_p (stmt, BUILT_IN_GOACC_PARALLEL)) - decl = TREE_OPERAND
[PATCH] Avoid bugs like PR68273 to trigger
This makes it less likely (for example through the PRE path) to trigger target bugs like PR68273 where targets use type alignment of call arguments to decide on the ABI. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. IMHO targets should still be fixed. Richard. 2016-02-08 Richard Biener Jeff Law PR target/68273 * tree-ssanames.c (make_ssa_name_fn): Always use unqualified types for anonymous SSA names. * gcc.target/mips/pr68273.c: New testcase. Index: gcc/tree-ssanames.c === --- gcc/tree-ssanames.c (revision 233208) +++ gcc/tree-ssanames.c (working copy) @@ -286,7 +286,7 @@ make_ssa_name_fn (struct function *fn, t if (TYPE_P (var)) { - TREE_TYPE (t) = var; + TREE_TYPE (t) = TYPE_MAIN_VARIANT (var); SET_SSA_NAME_VAR_OR_IDENTIFIER (t, NULL_TREE); } else Index: gcc/testsuite/gcc.target/mips/pr68273.c === --- gcc/testsuite/gcc.target/mips/pr68273.c (revision 0) +++ gcc/testsuite/gcc.target/mips/pr68273.c (revision 0) @@ -0,0 +1,79 @@ +/* { dg-do compile } */ +/* { dg-options "-w -fdump-rtl-expand" } */ +/* { dg-skip-if "" { *-*-* } { } { "-O2" } } */ + +extern char errbuf[]; +typedef struct Symbol +{ + char *name; +} +Symbol; +typedef struct Tnode +{ +} +Tnode; +typedef union Value +{ + long long i; +} +Value; +typedef struct Entry +{ +} +Entry; +typedef struct Table +{ +} +Table; +typedef struct Node +{ + Tnode *typ; + int sto; + Value val; +} +Node; +struct Scope +{ + Table *table; +} *sp; +static Node op (Node); +Entry *p, *q; +union YYSTYPE +{ + Symbol *sym; + Node rec; +}; +typedef union YYSTYPE YYSTYPE; +typedef short int yytype_int16; + +int +yyparse (void) +{ + YYSTYPE yyval; + int yyn; + YYSTYPE *yyvsp; + while (1) +{ + if (yyn == 34) + { + if ((yyvsp[-1].rec).sto & 0x10) + sprintf (errbuf, "invalid typedef qualifier for '%s'", +(yyvsp[0].sym)->name); + p = enter (sp->table, (yyvsp[0].sym)); + } + else + op ((yyvsp[0].rec)); + *++yyvsp = yyval; +} +} + +static Node +op (Node q) +{ + integer (q.typ); + mgtype (q.typ); +} + + +/* { dg-final { scan-rtl-dump-times "\\\(set \\\(reg:SI 5 \\\$5\\\)" 2 "expand" } } */ +/* { dg-final { scan-rtl-dump-times "\\\(set \\\(reg:SI 6 \\\$6\\\)" 1 "expand" } } */
Re: [PATCH] Avoid bugs like PR68273 to trigger
> This makes it less likely (for example through the PRE path) to trigger > target bugs like PR68273 where targets use type alignment of call > arguments to decide on the ABI. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Thanks. I think that we can also avoid this issue by teaching SRA not to over-align scalar types, I don't see any point in doing that. It's done in: misalign = (misalign + offset) & (align - 1); if (misalign != 0) align = (misalign & -misalign); if (align != TYPE_ALIGN (exp_type)) exp_type = build_aligned_type (exp_type, align); mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off); -- Eric Botcazou
[PATCH, PR59627, c++] Handle DECL_OMP_DECLARE_REDUCTION in discriminator_for_local_entity
Hi, When running libgomp testsuite with -flto, we run into an ICE in the udr-*.C tests. A minimal testcase is in listed in PR59627: ... struct A { int i; }; void foo() { A a; #pragma omp declare reduction (+: A: omp_out.i += omp_in.i) #pragma omp parallel reduction (+: a) ; } ... The ICE with backtrace looks as follows: ... src/libgomp/testsuite/libgomp.c++/udr-1.C:56:13: internal compiler error: in discriminator_for_local_entity, at cp/mangle.c:1762 0x9ad52a discriminator_for_local_entity src/gcc/cp/mangle.c:1762 0x9ad8a5 write_local_name src/gcc/cp/mangle.c:1850 0x9a7975 write_name src/gcc/cp/mangle.c:882 0x9a71c0 write_encoding src/gcc/cp/mangle.c:744 0x9a6c61 write_mangled_name src/gcc/cp/mangle.c:709 0x9b6d09 mangle_decl_string src/gcc/cp/mangle.c:3509 0x9b6d4f get_mangled_id src/gcc/cp/mangle.c:3531 0x9b71d7 mangle_decl(tree_node*) src/gcc/cp/mangle.c:3598 0x14da287 decl_assembler_name(tree_node*) src/gcc/tree.c:670 0x14edf74 assign_assembler_name_if_neeeded(tree_node*) src/gcc/tree.c:5917 0x14ee0cc free_lang_data_in_cgraph src/gcc/tree.c:5972 0x14ee280 free_lang_data src/gcc/tree.c:6014 0x14ee320 execute src/gcc/tree.c:6063 Please submit a full bug report ... The problem seems to be that we're trying to get a discriminator (the lexical ordinal of a var among entities with the same name in the same function) for a DECL_OMP_DECLARE_REDUCTION_P function, which is not handled in discriminator_for_local_entity. For the test-case above the declared reduction is shown as: ... void omp declare reduction operator+~1A (struct A &); ... AFAIU, those DECL_OMP_DECLARE_REDUCTION_P decls are unique in a function, so I'd say we can simply return '0'. Bootstrapped and reg-tested on x86_64. OK for trunk, stage1? Thanks, - Tom Handle DECL_OMP_DECLARE_REDUCTION in discriminator_for_local_entity 2016-02-08 Tom de Vries PR c++/59627 * mangle.c (discriminator_for_local_entity): Handle DECL_OMP_DECLARE_REDUCTION_P function. * testsuite/libgomp.c++/udr-20.C: New test. --- gcc/cp/mangle.c| 3 +++ libgomp/testsuite/libgomp.c++/udr-20.C | 4 2 files changed, 7 insertions(+) diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c index 410c7f4..e83fc0c 100644 --- a/gcc/cp/mangle.c +++ b/gcc/cp/mangle.c @@ -1758,6 +1758,9 @@ discriminator_for_local_entity (tree entity) return local_class_index (entity); } + else if (TREE_CODE (entity) == FUNCTION_DECL + && DECL_OMP_DECLARE_REDUCTION_P (entity)) +return 0; else gcc_unreachable (); } diff --git a/libgomp/testsuite/libgomp.c++/udr-20.C b/libgomp/testsuite/libgomp.c++/udr-20.C new file mode 100644 index 000..2aaf618 --- /dev/null +++ b/libgomp/testsuite/libgomp.c++/udr-20.C @@ -0,0 +1,4 @@ +// { dg-do run { target lto } } +// { dg-additional-options "-flto" } + +#include "udr-1.C"
Re: [PATCH] Avoid bugs like PR68273 to trigger
On Mon, 8 Feb 2016, Eric Botcazou wrote: > > This makes it less likely (for example through the PRE path) to trigger > > target bugs like PR68273 where targets use type alignment of call > > arguments to decide on the ABI. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > Thanks. I think that we can also avoid this issue by teaching SRA not to > over-align scalar types, I don't see any point in doing that. It's done in: > > misalign = (misalign + offset) & (align - 1); > if (misalign != 0) > align = (misalign & -misalign); > if (align != TYPE_ALIGN (exp_type)) > exp_type = build_aligned_type (exp_type, align); > > mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off); No, that's not over-aliging a salar type, that's preserving alignment information on the memory reference. What would need to be fixed here is to see where that 'mem_ref' is assigned to a SSA name and avoid making that SSA name have an over-aligned type. Richard.
Re: [PATCH] Avoid bugs like PR68273 to trigger
> No, that's not over-aliging a salar type, that's preserving alignment > information on the memory reference. What would we lose exactly by lowering the alignment to that of the type? What's the point in knowing that a 32-bit integer is 64-bit aligned at the GIMPLE level? -- Eric Botcazou
[PATCH, PR69707] Handle -fdiagnostics-color in lto
Hi, when running libgomp.oacc-c-c++-common/parallel-dims.c with -flto -fno-use-linker-plugin, we run into a failing 'test for excess errors'. The problem is that while -fdiagnostics-color=never is passed to gcc, it's not propagated to lto1, and the error message is annotated with color information, which confuses the test for excess errors. This patch fixes the problem by making sure that -fdiagnostics-color is propagated to lto1, in the same way that -fdiagnostics-show-caret is propagated to lto1. Bootstrapped and reg-tested on x86_64. OK for trunk, stage1? Thanks, - Tom Handle -fdiagnostics-color in lto 2016-02-08 Tom de Vries PR lto/69707 * common.opt (fdiagnostics-color=): Remove Driver flag. * lto-wrapper.c (merge_and_complain, append_compiler_options): Handle OPT_fdiagnostics_color_. * testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c: New test. --- gcc/common.opt| 2 +- gcc/lto-wrapper.c | 2 ++ .../libgomp.oacc-c-c++-common/parallel-dims-2.c | 19 +++ 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/gcc/common.opt b/gcc/common.opt index 520fa9c..a740dcb 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1153,7 +1153,7 @@ Common Alias(fdiagnostics-color=,always,never) ; fdiagnostics-color= -Driver Common Joined RejectNegative Var(flag_diagnostics_show_color) Enum(diagnostic_color_rule) Init(DIAGNOSTICS_COLOR_NO) +Common Joined RejectNegative Var(flag_diagnostics_show_color) Enum(diagnostic_color_rule) Init(DIAGNOSTICS_COLOR_NO) -fdiagnostics-color=[never|always|auto] Colorize diagnostics. ; Required for these enum values. diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index ced6f2f..484dbc1 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -232,6 +232,7 @@ merge_and_complain (struct cl_decoded_option **decoded_options, break; /* Fallthru. */ + case OPT_fdiagnostics_color_: case OPT_fdiagnostics_show_caret: case OPT_fdiagnostics_show_option: case OPT_fdiagnostics_show_location_: @@ -497,6 +498,7 @@ append_compiler_options (obstack *argv_obstack, struct cl_decoded_option *opts, on any CL_TARGET flag and a few selected others. */ switch (option->opt_index) { + case OPT_fdiagnostics_color_: case OPT_fdiagnostics_show_caret: case OPT_fdiagnostics_show_option: case OPT_fdiagnostics_show_location_: diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c new file mode 100644 index 000..eea8c7e --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c @@ -0,0 +1,19 @@ +/* { dg-do run { target { openacc_nvidia_accel_selected && lto } } } */ +/* { dg-additional-options "-flto -fno-use-linker-plugin" } */ + +/* Worker and vector size checks. Picked an outrageously large + value. */ + +int main () +{ +#pragma acc parallel num_workers (2<<20) /* { dg-error "using num_workers" } */ + { + } + +#pragma acc parallel vector_length (2<<20) /* { dg-error "using vector_length" } */ + { + } + + return 0; +} +
[patch] Fix SSA_NAME_OCCURS_IN_ABNORMAL_PHI issue in backprop
Hi, the Ada runtime fails to build for platforms still using the old SJLJ scheme because of the new GIMPLE backprop pass (hence it's a regression): eric@polaris:~/build/gcc/native> gcc/gnat1 -quiet a-ncelfu.ads -gnatpg -O -I ~/svn/gcc/gcc/ada/ a-ngcefu.adb: In function 'Ada.Numerics.Complex_Elementary_Functions.Sqrt': a-ngcefu.adb:16:4: error: SSA_NAME_OCCURS_IN_ABNORMAL_PHI should be set for SSA_NAME: _28 in statement: xr_3(ab) = PHI PHI argument _28 for PHI node xr_3(ab) = PHI +===GNAT BUG DETECTED==+ | 6.0.0 20160206 (experimental) [trunk revision 233194] (x86_64-suse-linux) GCC error:| | verify_ssa failed| | Error detected around a-ngcefu.adb:16:4 The problem is that the pass propagates the source RHS (_28) of: xr_29(ab) = ABS_EXPR <_28>; into a PHI note using xr_29 along an AB edge so the above check triggers. I think replacing such an AB SSA_NAME is problematic in any case: if the RHS is not AB, then the check triggers; if the RHS is also AB, then we may create overlapping live ranges and the compilation will abort later. We can probably create a new SSA_NAME but it's not really the spirit of the backprop pass and I'm not sure it's worth the hassle, hence the minimal attached fixlet. Tested on x86_64-suse-linux, OK for the mainline? 2016-02-08 Eric Botcazou * gimple-ssa-backprop.c (optimize_phi): Do not replace an argument with SSA_NAME_OCCURS_IN_ABNORMAL_PHI set. -- Eric BotcazouIndex: gimple-ssa-backprop.c === --- gimple-ssa-backprop.c (revision 233194) +++ gimple-ssa-backprop.c (working copy) @@ -840,13 +840,14 @@ backprop::optimize_phi (gphi *phi, tree bool replaced = false; FOR_EACH_PHI_ARG (use, phi, oi, SSA_OP_USE) { - tree new_arg = strip_sign_op (USE_FROM_PTR (use)); - if (new_arg) + tree arg = USE_FROM_PTR (use); + tree new_arg = strip_sign_op (arg); + if (new_arg && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (arg)) { if (!replaced) prepare_change (var); if (dump_file && (dump_flags & TDF_DETAILS)) - note_replacement (phi, USE_FROM_PTR (use), new_arg); + note_replacement (phi, arg, new_arg); replace_exp (use, new_arg); replaced = true; }
Re: [PATCH, PR69707] Handle -fdiagnostics-color in lto
On Mon, Feb 08, 2016 at 11:34:39AM +0100, Tom de Vries wrote: > Hi, > > when running libgomp.oacc-c-c++-common/parallel-dims.c with -flto > -fno-use-linker-plugin, we run into a failing 'test for excess errors'. > > The problem is that while -fdiagnostics-color=never is passed to gcc, it's > not propagated to lto1, and the error message is annotated with color > information, which confuses the test for excess errors. > > This patch fixes the problem by making sure that -fdiagnostics-color is > propagated to lto1, in the same way that -fdiagnostics-show-caret is > propagated to lto1. > > Bootstrapped and reg-tested on x86_64. > > OK for trunk, stage1? Doesn't that mean diagnostics from the driver itself will no longer honor that option when deciding if the diagnostics should be colorized or not? Jakub
Re: [PATCH] Avoid bugs like PR68273 to trigger
On Mon, 8 Feb 2016, Eric Botcazou wrote: > > No, that's not over-aliging a salar type, that's preserving alignment > > information on the memory reference. > > What would we lose exactly by lowering the alignment to that of the type? > What's the point in knowing that a 32-bit integer is 64-bit aligned at the > GIMPLE level? It helps vectorization. This was specifically introduced to fix a regression on powerpc. See PR65310. Richard.
Re: [PATCH, PR69599] Fix GOMP/GOACC_parallel optimization in ipa-pta
On Mon, 8 Feb 2016, Tom de Vries wrote: > Hi, > > when compiling the fipa-pta tests in the libgomp testsuite (omp-nested-2.c, > pr46032.c) with -flto -flto-partition=max, the tests fail in execution > (PR69599). > > The problem is related to the GOMP/GOACC_parallel optimization we do in > fipa-pta, where we interpret a call GOMP_parallel (&foo._0, data) as a call > foo._0 (data). > > The problem is that this optimization is only legal in lto if both: > - foo containing the call GOMP_parallel (&foo._0, data) and > - foo._0 > are contained in the same partition. > > In the case of -flto-partition=max, foo is contained in it's own partition, > and foo._0 is contained in another partition. This means the data argument to > the GOMP_parallel call appears unused, and the setting of the argument is > optimized away, which causes the execution failure. > > This patch fixes that by testing if foo and foo._0 are part of the same > partition. > > [ Note that the node_address_taken change in the patch has no effect, since > nonlocal_p already tests for used_from_other_partition. But I thought it was > clearer to state the conditions under which we are allowed to ignore > node->address_taken explicitly. ] > > Bootstrapped and reg-tested on x86_64. > > Build for nvidia accelerator and reg-tested libgomp with various lto settings. > > OK for trunk, stage4? I don't like the in_lto_p checks, why's the check not working for non-LTO? Thanks, Richard. > > Thanks, > - Tom > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [Patch AArch64] GCC 6 regression in vector performance. - Fix vector initialization to happen with lane load instructions.
On Tue, Feb 02, 2016 at 10:29:29AM +, James Greenhalgh wrote: > On Wed, Jan 20, 2016 at 03:22:11PM +, James Greenhalgh wrote: > > > > Hi, > > > > In a number of cases where we try to create vectors we end up spilling to > > the > > stack and then filling. This is one example distilled from a couple of > > micro-benchmrks where the issue shows up. The reason for the extra cost > > in this case is the unnecessary use of the stack. The patch attempts to > > finesse this by using lane loads or vector inserts to produce the right > > results. > > > > This patch is mostly Ramana's work, I've just cleaned it up a little. > > > > This has been in a number of our trees lately, and we haven't seen any > > regressions. I've also bootstrapped and tested it, and run a set of > > benchmarks to show no regressions on Cortex-A57 or Cortex-A53. > > > > The patch fixes some regressions caused by the more agressive vectorization > > in GCC6, so I'd like to propose it to go in even though we are in Stage 4. > > > > OK? > > *Ping* *ping^2* Cheers, James > > 2016-01-20 James Greenhalgh > > Ramana Radhakrishnan > > > > * config/aarch64/aarch64.c (aarch64_expand_vector_init): Refactor, > > always use lane loads to construct non-constant vectors. > > > > gcc/testsuite/ > > > > 2016-01-20 James Greenhalgh > > Ramana Radhakrishnan > > > > * gcc.target/aarch64/vector_initialization_nostack.c: New. > > > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > index 03bc1b9..3787b38 100644 > > --- a/gcc/config/aarch64/aarch64.c > > +++ b/gcc/config/aarch64/aarch64.c > > @@ -10985,28 +10985,37 @@ aarch64_simd_make_constant (rtx vals) > > return NULL_RTX; > > } > > > > +/* Expand a vector initialisation sequence, such that TARGET is > > + initialised to contain VALS. */ > > + > > void > > aarch64_expand_vector_init (rtx target, rtx vals) > > { > >machine_mode mode = GET_MODE (target); > >machine_mode inner_mode = GET_MODE_INNER (mode); > > + /* The number of vector elements. */ > >int n_elts = GET_MODE_NUNITS (mode); > > + /* The number of vector elements which are not constant. */ > >int n_var = 0; > >rtx any_const = NULL_RTX; > > + /* The first element of vals. */ > > + rtx v0 = XVECEXP (vals, 0, 0); > >bool all_same = true; > > > > + /* Count the number of variable elements to initialise. */ > >for (int i = 0; i < n_elts; ++i) > > { > >rtx x = XVECEXP (vals, 0, i); > > - if (!CONST_INT_P (x) && !CONST_DOUBLE_P (x)) > > + if (!(CONST_INT_P (x) || CONST_DOUBLE_P (x))) > > ++n_var; > >else > > any_const = x; > > > > - if (i > 0 && !rtx_equal_p (x, XVECEXP (vals, 0, 0))) > > - all_same = false; > > + all_same &= rtx_equal_p (x, v0); > > } > > > > + /* No variable elements, hand off to aarch64_simd_make_constant which > > knows > > + how best to handle this. */ > >if (n_var == 0) > > { > >rtx constant = aarch64_simd_make_constant (vals); > > @@ -11020,14 +11029,15 @@ aarch64_expand_vector_init (rtx target, rtx vals) > >/* Splat a single non-constant element if we can. */ > >if (all_same) > > { > > - rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, 0)); > > + rtx x = copy_to_mode_reg (inner_mode, v0); > >aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x)); > >return; > > } > > > > - /* Half the fields (or less) are non-constant. Load constant then > > overwrite > > - varying fields. Hope that this is more efficient than using the > > stack. */ > > - if (n_var <= n_elts/2) > > + /* Initialise a vector which is part-variable. We want to first try > > + to build those lanes which are constant in the most efficient way we > > + can. */ > > + if (n_var != n_elts) > > { > >rtx copy = copy_rtx (vals); > > > > @@ -11054,31 +11064,21 @@ aarch64_expand_vector_init (rtx target, rtx vals) > > XVECEXP (copy, 0, i) = subst; > > } > >aarch64_expand_vector_init (target, copy); > > +} > > > > - /* Insert variables. */ > > - enum insn_code icode = optab_handler (vec_set_optab, mode); > > - gcc_assert (icode != CODE_FOR_nothing); > > + /* Insert the variable lanes directly. */ > > > > - for (int i = 0; i < n_elts; i++) > > - { > > - rtx x = XVECEXP (vals, 0, i); > > - if (CONST_INT_P (x) || CONST_DOUBLE_P (x)) > > - continue; > > - x = copy_to_mode_reg (inner_mode, x); > > - emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i))); > > - } > > - return; > > -} > > + enum insn_code icode = optab_handler (vec_set_optab, mode); > > + gcc_assert (icode != CODE_FOR_nothing); > > > > - /* Construct the vector in memory one field at a time > > - and load the whole vector. */ > > - rtx mem = assign_stack_temp (mode, GET_MODE_SIZE (mode)); > >for (int i =
Re: [AArch64] Remove AARCH64_EXTRA_TUNE_RECIP_SQRT from Cortex-A57 tuning
On Mon, Feb 01, 2016 at 02:00:01PM +, James Greenhalgh wrote: > On Mon, Jan 25, 2016 at 11:20:46AM +, James Greenhalgh wrote: > > On Mon, Jan 11, 2016 at 12:04:43PM +, James Greenhalgh wrote: > > > > > > Hi, > > > > > > I've seen a couple of large performance issues caused by expanding > > > the high-precision reciprocal square root for Cortex-A57, so I'd like > > > to turn it off by default. > > > > > > This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from > > > Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) for > > > some private microbenchmark kernels which stress the divide/sqrt/multiply > > > units. It therefore seems to me to be the correct choice to make across > > > a number of workloads. > > > > > > Bootstrapped and tested on aarch64-none-linux-gnu with no issues. > > > > > > OK? > > > > *Ping* > > *pingx2* *ping^3* Thanks, James > > > --- > > > 2015-12-11 James Greenhalgh > > > > > > * config/aarch64/aarch64.c (cortexa57_tunings): Remove > > > AARCH64_EXTRA_TUNE_RECIP_SQRT. > > > > > > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > > index 1d5d898..999c9fc 100644 > > > --- a/gcc/config/aarch64/aarch64.c > > > +++ b/gcc/config/aarch64/aarch64.c > > > @@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings = > > >0, /* max_case_values. */ > > >0, /* cache_line_size. */ > > >tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > > > - (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS > > > - | AARCH64_EXTRA_TUNE_RECIP_SQRT) /* tune_flags. */ > > > + (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags. */ > > > }; > > > > > > static const struct tune_params cortexa72_tunings = > > >
Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt
On Mon, Feb 01, 2016 at 01:59:34PM +, James Greenhalgh wrote: > On Mon, Jan 25, 2016 at 11:21:25AM +, James Greenhalgh wrote: > > On Mon, Jan 11, 2016 at 11:53:39AM +, James Greenhalgh wrote: > > > > > > Hi, > > > > > > I'd like to switch the logic around in aarch64.c such that > > > -mlow-precision-recip-sqrt causes us to always emit the low-precision > > > software expansion for reciprocal square root. I have two reasons to do > > > this; first is consistency across -mcpu targets, second is enabling more > > > -mcpu targets to use the flag for peak tuning. > > > > > > I don't much like that the precision we use for -mlow-precision-recip-sqrt > > > differs between cores (and possibly compiler revisions). Yes, we're > > > under -ffast-math but I take this flag to mean the user explicitly wants > > > the > > > low-precision expansion, and we should not diverge from that based on an > > > internal decision as to what is optimal for performance in the > > > high-precision case. I'd prefer to keep things as predictable as possible, > > > and here that means always emitting the low-precision expansion when > > > asked. > > > > > > Judging by the comments in the thread proposing the reciprocal square > > > root optimisation, this will benefit all cores currently supported by GCC. > > > To be clear, we would still not expand in the high-precision case for any > > > cores which do not explicitly ask for it. Currently that is Cortex-A57 > > > and xgene, though I will be proposing a patch to remove Cortex-A57 from > > > that list shortly. > > > > > > Which gives my second motivation for this patch. > > > -mlow-precision-recip-sqrt > > > is intended as a tuning flag for situations where performance is more > > > important than precision, but the current logic requires setting an > > > internal flag which also changes the performance characteristics where > > > high-precision is needed. This conflates two decisions the target might > > > want to make, and reduces the applicability of an option targets might > > > want to enable for performance. In particular, I'd still like to see > > > -mlow-precision-recip-sqrt continue to emit the cheaper, low-precision > > > sequence for floats under Cortex-A57. > > > > > > Based on that reasoning, this patch makes the appropriate change to the > > > logic. I've checked with the current -mcpu values to ensure that behaviour > > > without -mlow-precision-recip-sqrt does not change, and that behaviour > > > with -mlow-precision-recip-sqrt is to emit the low precision sequences. > > > > > > I've also put this through bootstrap and test on aarch64-none-linux-gnu > > > with no issues. > > > > > > OK? > > > > *Ping* > > *Pingx2* *Ping^3* Thanks, James > > > 2015-12-10 James Greenhalgh > > > > > > * config/aarch64/aarch64.c (use_rsqrt_p): Always use software > > > reciprocal sqrt for -mlow-precision-recip-sqrt. > > > > > > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > > index 9142ac0..1d5d898 100644 > > > --- a/gcc/config/aarch64/aarch64.c > > > +++ b/gcc/config/aarch64/aarch64.c > > > @@ -7485,8 +7485,9 @@ use_rsqrt_p (void) > > > { > > >return (!flag_trapping_math > > > && flag_unsafe_math_optimizations > > > - && (aarch64_tune_params.extra_tuning_flags > > > - & AARCH64_EXTRA_TUNE_RECIP_SQRT)); > > > + && ((aarch64_tune_params.extra_tuning_flags > > > +& AARCH64_EXTRA_TUNE_RECIP_SQRT) > > > + || flag_mrecip_low_precision_sqrt)); > > > } > > > > > > /* Function to decide when to use > > >
Re: [PATCH] Avoid bugs like PR68273 to trigger
> It helps vectorization. This was specifically introduced to fix a > regression on powerpc. > > See PR65310. Indeed, the SRA code was much better before that one liner. ;-) -- Eric Botcazou
Re: [patch] Fix SSA_NAME_OCCURS_IN_ABNORMAL_PHI issue in backprop
On Mon, Feb 8, 2016 at 11:37 AM, Eric Botcazou wrote: > Hi, > > the Ada runtime fails to build for platforms still using the old SJLJ scheme > because of the new GIMPLE backprop pass (hence it's a regression): > > eric@polaris:~/build/gcc/native> gcc/gnat1 -quiet a-ncelfu.ads -gnatpg -O -I > ~/svn/gcc/gcc/ada/ > a-ngcefu.adb: In function 'Ada.Numerics.Complex_Elementary_Functions.Sqrt': > a-ngcefu.adb:16:4: error: SSA_NAME_OCCURS_IN_ABNORMAL_PHI should be set > for SSA_NAME: _28 in statement: > xr_3(ab) = PHI _28(6), _28(7), xr_4(ab)(11), xr_4(ab)(12), xr_4(ab)(14), xr_4(ab)(15), > xr_4(ab)(16), xr_4(ab)(17), xr_4(ab)(18), xr_4(ab)(19), xr_4(ab)(20), xr_4(ab) > (21), xr_4(ab)(22), xr_4(ab)(23), xr_4(ab)(24), xr_4(ab)(25), xr_4(ab)(26), > xr_4(ab)(27), xr_4(ab)(28), xr_4(ab)(29), xr_4(ab)(30), xr_4(ab)(32), xr_4(ab) > (34), xr_4(ab)(36), xr_4(ab)(37)> > PHI argument > _28 > for PHI node > xr_3(ab) = PHI _28(6), _28(7), xr_4(ab)(11), xr_4(ab)(12), xr_4(ab)(14), xr_4(ab)(15), > xr_4(ab)(16), xr_4(ab)(17), xr_4(ab)(18), xr_4(ab)(19), xr_4(ab)(20), xr_4(ab) > (21), xr_4(ab)(22), xr_4(ab)(23), xr_4(ab)(24), xr_4(ab)(25), xr_4(ab)(26), > xr_4(ab)(27), xr_4(ab)(28), xr_4(ab)(29), xr_4(ab)(30), xr_4(ab)(32), xr_4(ab) > (34), xr_4(ab)(36), xr_4(ab)(37)> > +===GNAT BUG DETECTED==+ > | 6.0.0 20160206 (experimental) [trunk revision 233194] (x86_64-suse-linux) > GCC error:| > | verify_ssa failed| > | Error detected around a-ngcefu.adb:16:4 > > The problem is that the pass propagates the source RHS (_28) of: > > xr_29(ab) = ABS_EXPR <_28>; > > into a PHI note using xr_29 along an AB edge so the above check triggers. > > I think replacing such an AB SSA_NAME is problematic in any case: if the RHS > is not AB, then the check triggers; if the RHS is also AB, then we may create > overlapping live ranges and the compilation will abort later. We can probably > create a new SSA_NAME but it's not really the spirit of the backprop pass and > I'm not sure it's worth the hassle, hence the minimal attached fixlet. I think the problematic transform is when the replacement happens on a PHI arg with its edge being abnormal only. Thus simply do if (EDGE_PRED (gimple_bb (phi), PHI_ARG_INDEX_FROM_USE (use))->flags & EDGE_ABNORMAL) continue; at the start of the FOR_EACH_PHI_ARG body. Thanks, Richard. > Tested on x86_64-suse-linux, OK for the mainline? > > > 2016-02-08 Eric Botcazou > > * gimple-ssa-backprop.c (optimize_phi): Do not replace an argument > with SSA_NAME_OCCURS_IN_ABNORMAL_PHI set. > > -- > Eric Botcazou
Re: [PATCH, PR59627, c++] Handle DECL_OMP_DECLARE_REDUCTION in discriminator_for_local_entity
On Mon, Feb 08, 2016 at 11:20:57AM +0100, Tom de Vries wrote: > When running libgomp testsuite with -flto, we run into an ICE in the udr-*.C > tests. > > A minimal testcase is in listed in PR59627: > ... > struct A { int i; }; > > void foo() > { > A a; > #pragma omp declare reduction (+: A: omp_out.i += omp_in.i) > #pragma omp parallel reduction (+: a) > ; > } > ... > The problem seems to be that we're trying to get a discriminator (the > lexical ordinal of a var among entities with the same name in the same > function) for a DECL_OMP_DECLARE_REDUCTION_P function, which is not handled > in discriminator_for_local_entity. > > For the test-case above the declared reduction is shown as: > ... > void omp declare reduction operator+~1A (struct A &); > ... > > AFAIU, those DECL_OMP_DECLARE_REDUCTION_P decls are unique in a function, so > I'd say we can simply return '0'. They certainly aren't unique to a function, you can have hundreds of them: void foo() { A a; { #pragma omp declare reduction (+: A: omp_out.i += omp_in.i) #pragma omp parallel reduction (+: a) ; } { #pragma omp declare reduction (+: A: omp_out.i += omp_in.i) #pragma omp parallel reduction (+: a) ; } { #pragma omp declare reduction (+: A: omp_out.i += omp_in.i) #pragma omp parallel reduction (+: a) ; } //... } etc. But they really should not be mangled, so IMHO it is better to ensure they are not mangled, nothing really needs their mangled name for anything. They don't show up in the debug info, don't have a real body, nothing calls them... Thus I think I'd prefer something like following patch (so far untested, will test later today), or returning early from mangle_decl for DECL_OMP_DECLARE_REDUCTION_P: 2016-02-08 Jakub Jelinek PR c++/59627 * parser.c (cp_parser_omp_declare_reduction): Set assembler name of the DECL_OMP_DECLARE_REDUCTION_P decls. * g++.dg/gomp/pr59627.C: New test. --- gcc/cp/parser.c.jj 2016-01-29 21:32:54.0 +0100 +++ gcc/cp/parser.c 2016-02-08 11:45:34.229860265 +0100 @@ -36080,6 +36080,7 @@ cp_parser_omp_declare_reduction (cp_pars DECL_DECLARED_INLINE_P (fndecl) = 1; DECL_IGNORED_P (fndecl) = 1; DECL_OMP_DECLARE_REDUCTION_P (fndecl) = 1; + SET_DECL_ASSEMBLER_NAME (fndecl, get_identifier ("")); DECL_ATTRIBUTES (fndecl) = tree_cons (get_identifier ("gnu_inline"), NULL_TREE, DECL_ATTRIBUTES (fndecl)); --- gcc/testsuite/g++.dg/gomp/pr59627.C.jj 2016-02-08 11:59:26.650053692 +0100 +++ gcc/testsuite/g++.dg/gomp/pr59627.C 2016-02-08 11:55:52.0 +0100 @@ -0,0 +1,14 @@ +// PR c++/59627 +// { dg-do compile { target lto } } +// { dg-options "-fopenmp -flto" } + +struct A { A () : i (0) {} int i; }; + +void +foo () +{ + A a; + #pragma omp declare reduction (+: A: omp_out.i += omp_in.i) + #pragma omp parallel reduction (+: a) + ; +} Jakub
Re: [PATCH] Avoid bugs like PR68273 to trigger
On Mon, 8 Feb 2016, Eric Botcazou wrote: > > It helps vectorization. This was specifically introduced to fix a > > regression on powerpc. > > > > See PR65310. > > Indeed, the SRA code was much better before that one liner. ;-) Not sure what to make of this comment ;) Did you see where we create the SSA name with the overaligned type? Richard.
Re: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian
Hi Charles, On 03/02/16 18:59, charles.bay...@linaro.org wrote: From: Charles Baylis gcc/ChangeLog: 2016-02-03 Charles Baylis PR target/68532 * config/arm/arm.c (neon_endian_lane_map): New function. (neon_vector_pair_endian_lane_map): New function. (arm_evpc_neon_vuzp): Allow for big endian lane order. * config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big endian. (vuzpq_s16): Likewise. (vuzpq_s32): Likewise. (vuzpq_f32): Likewise. (vuzpq_u8): Likewise. (vuzpq_u16): Likewise. (vuzpq_u32): Likewise. (vuzpq_p8): Likewise. (vuzpq_p16): Likewise. gcc/testsuite/ChangeLog: 2015-12-15 Charles Baylis PR target/68532 * gcc.c-torture/execute/pr68532.c: New test. Change-Id: Ifd35d79bd42825f05403a1b96d8f34ef0f21dac3 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index d8a2745..e9aa982 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28208,6 +28208,35 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) arm_expand_vec_perm_1 (target, op0, op1, sel); } +/* map lane ordering between architectural lane order, and GCC lane order, + taking into account ABI. See comment above output_move_neon for details. */ +static int +neon_endian_lane_map (machine_mode mode, int lane) s/map/Map/ New line between comment and function signature. +{ + if (BYTES_BIG_ENDIAN) + { +int nelems = GET_MODE_NUNITS (mode); +/* Reverse lane order. */ +lane = (nelems - 1 - lane); +/* Reverse D register order, to match ABI. */ +if (GET_MODE_SIZE (mode) == 16) + lane = lane ^ (nelems / 2); + } + return lane; +} + +/* some permutations index into pairs of vectors, this is a helper function + to map indexes into those pairs of vectors. */ +static int +neon_pair_endian_lane_map (machine_mode mode, int lane) Similarly, s/some/Some/ and new line after comment. +{ + int nelem = GET_MODE_NUNITS (mode); + if (BYTES_BIG_ENDIAN) +lane = + neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem); + return lane; +} + /* Generate or test for an insn that supports a constant permutation. */ /* Recognize patterns for the VUZP insns. */ @@ -28218,14 +28247,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) unsigned int i, odd, mask, nelt = d->nelt; rtx out0, out1, in0, in1; rtx (*gen)(rtx, rtx, rtx, rtx); + int first_elem; + int swap; Just make this a bool. if (GET_MODE_UNIT_SIZE (d->vmode) >= 8) return false; - /* Note that these are little-endian tests. Adjust for big-endian later. */ - if (d->perm[0] == 0) + /* arm_expand_vec_perm_const_1 () helpfully swaps the operands for the + big endian pattern on 64 bit vectors, so we correct for that. */ + swap = BYTES_BIG_ENDIAN && !d->one_vector_p +&& GET_MODE_SIZE (d->vmode) == 8 ? d->nelt : 0; + + first_elem = d->perm[neon_endian_lane_map (d->vmode, 0)] ^ swap; + + if (first_elem == neon_endian_lane_map (d->vmode, 0)) odd = 0; - else if (d->perm[0] == 1) + else if (first_elem == neon_endian_lane_map (d->vmode, 1)) odd = 1; else return false; @@ -28233,8 +28270,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) for (i = 0; i < nelt; i++) { - unsigned elt = (i * 2 + odd) & mask; - if (d->perm[i] != elt) + unsigned elt = + (neon_pair_endian_lane_map (d->vmode, i) * 2 + odd) & mask; + if ((d->perm[i] ^ swap) != neon_pair_endian_lane_map (d->vmode, elt)) return false; } @@ -28258,10 +28296,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) in0 = d->op0; in1 = d->op1; - if (BYTES_BIG_ENDIAN) + if (swap) { std::swap (in0, in1); - odd = !odd; } remove the braces around the std::swap out0 = d->target; diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 47816d5..2e014b6 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -8741,9 +8741,9 @@ vuzpq_s8 (int8x16_t __a, int8x16_t __b) int8x16x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15 }); + { 9, 11, 13, 15, 1, 3, 5, 7, 25, 27, 29, 31, 17, 19, 21, 23 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14 }); + { 8, 10, 12, 14, 0, 2, 4, 6, 24, 26, 28, 30, 16, 18, 20, 22 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 }); @@ -8759,9 +8759,9 @@ vuzpq_s16 (int16x8_t __a, int16x8_t __b) int16x8x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 9, 11, 13, 15, 1, 3, 5, 7 }); + { 5, 7, 1, 3, 13, 15, 9, 11 }); __rv.val[1] = __b
Re: [PATCH 2/2] [ARM] PR68532 Fix up vzip recognition for big endian
Hi Charles, On 03/02/16 18:59, charles.bay...@linaro.org wrote: From: Charles Baylis gcc/ChangeLog: 2016-02-03 Charles Baylis PR target/68532 * config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane order. * config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big endian. (vzipq_s16): Likewise. (vzipq_s32): Likewise. (vzipq_f32): Likewise. (vzipq_u8): Likewise. (vzipq_u16): Likewise. (vzipq_u32): Likewise. (vzipq_p8): Likewise. (vzipq_p16): Likewise. Change-Id: I327678f5e73c1de2f413c1d22769ab42ce1d6c16 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e9aa982..24239db 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28318,15 +28318,21 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) unsigned int i, high, mask, nelt = d->nelt; rtx out0, out1, in0, in1; rtx (*gen)(rtx, rtx, rtx, rtx); + int first_elem; + bool is_swapped; if (GET_MODE_UNIT_SIZE (d->vmode) >= 8) return false; + is_swapped = BYTES_BIG_ENDIAN ? true : false; This is just "is_swapped = BYTES_BIG_ENDIAN;" + /* Note that these are little-endian tests. Adjust for big-endian later. */ I think you can remove this comment now, like in patch 1/2 + first_elem = d->perm[neon_endian_lane_map (d->vmode, 0) ^ is_swapped]; + high = nelt / 2; - if (d->perm[0] == high) + if (first_elem == neon_endian_lane_map (d->vmode, high)) ; - else if (d->perm[0] == 0) + else if (first_elem == neon_endian_lane_map (d->vmode, 0)) high = 0; else return false; @@ -28334,11 +28340,16 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) for (i = 0; i < nelt / 2; i++) { - unsigned elt = (i + high) & mask; - if (d->perm[i * 2] != elt) + unsigned elt = + neon_pair_endian_lane_map (d->vmode, i + high) & mask; + if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + is_swapped)] + != elt) return false; - elt = (elt + nelt) & mask; - if (d->perm[i * 2 + 1] != elt) + elt = + neon_pair_endian_lane_map (d->vmode, i + nelt + high) + & mask; The "& mask" can go on the previous line. + if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + !is_swapped)] + != elt) return false; } @@ -28362,10 +28373,9 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) in0 = d->op0; in1 = d->op1; - if (BYTES_BIG_ENDIAN) + if (is_swapped) { std::swap (in0, in1); - high = !high; } remove the braces around the std::swap. Ok with these changes. I've tried out both patch and they do fix execution failures on big-endian and don't break any NEON intrinsics tests that I threw at them. out0 = d->target; diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 2e014b6..aa17f49 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -8453,9 +8453,9 @@ vzipq_s8 (int8x16_t __a, int8x16_t __b) int8x16x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 24, 8, 25, 9, 26, 10, 27, 11, 28, 12, 29, 13, 30, 14, 31, 15 }); + { 20, 4, 21, 5, 22, 6, 23, 7, 16, 0, 17, 1, 18, 2, 19, 3 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 16, 0, 17, 1, 18, 2, 19, 3, 20, 4, 21, 5, 22, 6, 23, 7 }); + { 28, 12, 29, 13, 30, 14, 31, 15, 24, 8, 25, 9, 26, 10, 27, 11 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 }); @@ -8471,9 +8471,9 @@ vzipq_s16 (int16x8_t __a, int16x8_t __b) int16x8x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 12, 4, 13, 5, 14, 6, 15, 7 }); + { 10, 2, 11, 3, 8, 0, 9, 1 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 8, 0, 9, 1, 10, 2, 11, 3 }); + { 14, 6, 15, 7, 12, 4, 13, 5 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) { 0, 8, 1, 9, 2, 10, 3, 11 }); @@ -8488,8 +8488,8 @@ vzipq_s32 (int32x4_t __a, int32x4_t __b) { int32x4x2_t __rv; #ifdef __ARM_BIG_ENDIAN - __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 6, 2, 7, 3 }); - __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 0, 5, 1 }); + __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 1, 4, 0 }); + __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 7, 3, 6, 2 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 4, 1, 5 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 6, 3, 7 }); @@ -8502,8 +8502,8 @@ vzipq_f32 (float32x4_t __a, float32x4_t __b) { float32x4x2_t __rv; #ifdef __ARM_BIG_ENDIAN - __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 6, 2, 7, 3 }); - __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4,
Re: [PATCH] Avoid bugs like PR68273 to trigger
> Not sure what to make of this comment ;) I guess it was a variant of the usual can-of-worms-opening thing. :-) > Did you see where we create the SSA name with the overaligned type? In tree-ssa-pre.c:insert_into_preds_of_block: (gdb) p debug_pre_expr(expr) {mem_ref<0B>,yyvsp_1}@.MEM_2 (gdb) p debug_tree(type) unit size align 8 symtab 0 alias set -1 canonical type 0x76cea3f0 context pointer_to_this chain > sizes-gimplified unsigned SI size constant 32> unit size constant 4> align 64 symtab 0 alias set -1 canonical type 0x76cead20> /* Now build a phi for the new variable. */ temp = make_temp_ssa_name (type, NULL, "prephitmp"); -- Eric Botcazou
Re: [PATCH v3] PR48344: Fix unrecognizable insn error with -fstack-limit-register=r2
On 01/27/2016 07:12 PM, Kelvin Nilsen wrote: +/* During execution of handle_common_deferred_options (), the Pmode + variable cannot be used because it has not yet been initialized. + For this reason, handling of the OPT_fstack_limit_register_ and + OPT_fstack_limit_symbol_ options is deferred until execution + of finish_deferred_option_handling (), which is invoked following + target-specific initialization. + + The variable opt_fstack_limit_symbol_arg represents the name + of the register specified in an OPT_fstack_limit_symbol_ command + line option, or NULL to represent that no OPT_fstack_limit_symbol_ + option is active. + + The variable opt_fstack_limit_register_no represents the number of + the register specified in an OPT_fstack_limit_register_ + command-line option, or -1 to indicate that no + OPT_fstack_limit_register_ option is active. (Legal register + numbers are all >= 0.) + + Note that these two command-line options are mutually exclusive. + If both are specified, subsequent option processing overwrites + earlier option processing. */ + +static const char *opt_fstack_limit_symbol_arg = NULL; +static int opt_fstack_limit_register_no = -1; + My suggestion would be: make these global, and move the code to initialize the rtx to init_emit_once. Then use a much shorter comment here ("Store values of -fstack-limit arguments for use by init_emit_once."). Also, note that there should be two spaces at the end of a sentence, even at the end of each comment. Bernd
Re: [PATCH] Avoid bugs like PR68273 to trigger
On Mon, 8 Feb 2016, Eric Botcazou wrote: > > Not sure what to make of this comment ;) > > I guess it was a variant of the usual can-of-worms-opening thing. :-) > > > Did you see where we create the SSA name with the overaligned type? > > In tree-ssa-pre.c:insert_into_preds_of_block: > > (gdb) p debug_pre_expr(expr) > {mem_ref<0B>,yyvsp_1}@.MEM_2 > > (gdb) p debug_tree(type) > type size > unit size > align 8 symtab 0 alias set -1 canonical type 0x76cea3f0 context > > pointer_to_this chain 0x76c54a18 D.1410>> > sizes-gimplified unsigned SI > size bitsizetype> constant 32> > unit size sizetype> constant 4> > align 64 symtab 0 alias set -1 canonical type 0x76cead20> > > /* Now build a phi for the new variable. */ > temp = make_temp_ssa_name (type, NULL, "prephitmp"); Yes, that place I just fixed. I mean for the SRA case. Richard.
Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA
On 02/08/2016 10:09 AM, Richard Biener wrote: The gcc.target/i386/addr-sel-1.c (for PR28940) seems to just started working at some point past in time and thus it was added and the bug closed. You could say RA does a better job after the patch as it uses 1 less register but that restricts the followup postreload combine attempts. Though I wonder about what's "better" RA here - isn't the best allocation one that avoids spills but uses as many registers as possible (at least when targeting a CPU that cannot to register renaming)? regrename doesn't help this testcase either (it runs too late and does a renaming that doesn't help). I don't think regrename's place in the pass pipeline is set in stone. If we enable it for gcc-7. we could also experiment with putting it just before postreload, or schedule another reload-combine afterwards. Bernd
libgcc: On AIX, increase chances to find landing pads for exceptions
Hi David, still experiencing exception-not-caught problems with gcc-4.2.4 on AIX leads me to some patch proposed in http://gcc.gnu.org/PR13878 back in 2004 already, ought to be fixed by some different commit since 3.4.0. As long as build systems (even libtool right now) on AIX do export these _GLOBAL__* symbols from shared libraries, overlapping frame-base address ranges may become registered, even if newer gcc (seen with 4.8) does name the FDE symbols more complex to reduce these chances. But still, just think of linking some static library into multiple shared libraries and/or the main executable. Or sometimes there is just need for some hackery to override a shared object's implementation detail and rely on runtime linking to do the override at runtime. Agreed both is "wrong" to some degree, but the larger an application is, the higher is the chance for this to happen. Thoughts? Thanks! /haubi/ 2016-02-08 Michael Haubenwallner On AIX, increase chances to find landing pads for exceptions. * unwind-dw2-fde.c (_Unwind_Find_FDE): Stop assuming registered object's address ranges to not overlap. --- libgcc/unwind-dw2-fde.c +++ libgcc/unwind-dw2-fde.c @@ -1013,7 +1013,25 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases) f = search_object (ob, pc); if (f) goto fini; + /* break; + In an ideal world, even on AIX, we could break here because + objects would not overlap. But the larger an application is, + the more likely an "overlap" may happen (on AIX) because of: + - Shared libraries do export the FDE symbols ("_GLOBAL__F*"), + which is a bug in their build system, but out of gcc's control. + - Other shared libraries, or the main executable, do contain + identical or similar object files - which is suboptimal, + but may be intentional. However, exporting their FDE symbols, + which may have identical names as seen in the former shared + libraries, again is a bug in their build system, but still + out of gcc's control. + - Finally, run time linking is enabled, redirecting adresses of + identically named exported symbols from their original shared + library's address range into another shared library's or the + main executable's address range. + This results in address ranges being registered by different + objects to potentially overlap. */ } /* Classify and search the objects we've not yet processed. */
Re: [PATCH][AArch64] PR target/69161: Don't use special predicate for CCmode comparisons in expressions that require matching modes
Hi James, On 04/02/16 13:49, James Greenhalgh wrote: On Fri, Jan 29, 2016 at 02:27:34PM +, Kyrill Tkachov wrote: Hi all, In this PR we ICE during combine when trying to propagate a comparison into a vec_duplicate, that is we end up creating the rtx: (vec_duplicate:V4SI (eq:CC_NZ (reg:CC_NZ 66 cc) (const_int 0 [0]))) The documentation for vec_duplicate says: "The output vector mode must have the same submodes as the input vector mode or the scalar modes" So this is invalid RTL, which triggers an assert in simplify-rtx to that effect. It has been suggested on the PR that this is because we use a special_predicate for aarch64_comparison_operator which means that it ignores the mode when matching. This is fine when used in RTXes that don't need it, like if_then_else expressions but can cause trouble when used in places where the modes do matter, like in SET operations. In this particular ICE the cause was the conditional store patterns that could end up matching an intermediate rtx during combine of (set (reg:SI) (eq:CC_NZ x y)). The suggested solution is to define a separate predicate with the same conditions as aarch64_comparison_operator but make it not special, so it gets automatic mode checks to prevent such a situation. This patch does that. Bootstrapped and tested on aarch64-linux-gnu. SPEC2006 codegen did not change with this patch, so there shouldn't be any code quality regressions. Ok for trunk? It would be good to leave a more detailed comment on "aarch64_comparison_operator_mode" as to why we need it. Otherwise, this is OK. Ok, here's the patch with a comment added. I'll commit it when the arm version is approved as well. Thanks, Kyrill Thanks, James Thanks, Kyrill 2016-01-29 Kyrylo Tkachov PR target/69161 * config/aarch64/predicates.md (aarch64_comparison_operator_mode): New predicate. (aarch64_comparison_operator): Break overly long line into two. (aarch64_comparison_operation): Likewise. * config/aarch64/aarch64.md (cstorecc4): Use aarch64_comparison_operator_mode instead of aarch64_comparison_operator. (cstore4): Likewise. (aarch64_cstore): Likewise. (*cstoresi_insn_uxtw): Likewise. (cstore_neg): Likewise. (*cstoresi_neg_uxtw): Likewise. 2016-01-29 Kyrylo Tkachov PR target/69161 * gcc.c-torture/compile/pr69161.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 221f106430ea2c4a44352d937e660d0c1bbe10da..bf2140c5fd9458476d42e49100347dd05e1b21df 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3039,7 +3039,7 @@ (define_expand "cstore4" (define_expand "cstorecc4" [(set (match_operand:SI 0 "register_operand") - (match_operator 1 "aarch64_comparison_operator" + (match_operator 1 "aarch64_comparison_operator_mode" [(match_operand 2 "cc_register") (match_operand 3 "const0_operand")]))] "" @@ -3051,7 +3051,7 @@ (define_expand "cstorecc4" (define_expand "cstore4" [(set (match_operand:SI 0 "register_operand" "") - (match_operator:SI 1 "aarch64_comparison_operator" + (match_operator:SI 1 "aarch64_comparison_operator_mode" [(match_operand:GPF 2 "register_operand" "") (match_operand:GPF 3 "aarch64_fp_compare_operand" "")]))] "" @@ -3064,7 +3064,7 @@ (define_expand "cstore4" (define_insn "aarch64_cstore" [(set (match_operand:ALLI 0 "register_operand" "=r") - (match_operator:ALLI 1 "aarch64_comparison_operator" + (match_operator:ALLI 1 "aarch64_comparison_operator_mode" [(match_operand 2 "cc_register" "") (const_int 0)]))] "" "cset\\t%0, %m1" @@ -3109,7 +3109,7 @@ (define_insn_and_split "*compare_cstore_insn" (define_insn "*cstoresi_insn_uxtw" [(set (match_operand:DI 0 "register_operand" "=r") (zero_extend:DI - (match_operator:SI 1 "aarch64_comparison_operator" + (match_operator:SI 1 "aarch64_comparison_operator_mode" [(match_operand 2 "cc_register" "") (const_int 0)])))] "" "cset\\t%w0, %m1" @@ -3118,7 +3118,7 @@ (define_insn "*cstoresi_insn_uxtw" (define_insn "cstore_neg" [(set (match_operand:ALLI 0 "register_operand" "=r") - (neg:ALLI (match_operator:ALLI 1 "aarch64_comparison_operator" + (neg:ALLI (match_operator:ALLI 1 "aarch64_comparison_operator_mode" [(match_operand 2 "cc_register" "") (const_int 0)])))] "" "csetm\\t%0, %m1" @@ -3129,7 +3129,7 @@ (define_insn "cstore_neg" (define_insn "*cstoresi_neg_uxtw" [(set (match_operand:DI 0 "register_operand" "=r") (zero_extend:DI - (neg:SI (match_operator:SI 1 "aarch64_comparison_operator" + (neg:SI (match_operator:SI 1 "aarch64_comparison_operator_mode" [(match_operand 2 "cc_register" "") (const_int 0)]] "" "csetm\\t%w0, %m1" diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index e80e40683cada374303ea987017f95654531a032..11868278c3d0a1887c2065568f890c3eb8ff7f0b 100644 --- a/gcc/config/aarch64/predicates.md +++ b/g
Re: [PATCH] Avoid bugs like PR68273 to trigger
> Yes, that place I just fixed. I mean for the SRA case. Are you sure that there is one? expr = build_ref_for_model (loc, agg, access->offset - top_offset, access, gsi, insert_after); if (write) { if (access->grp_partial_lhs) expr = force_gimple_operand_gsi (gsi, expr, true, NULL_TREE, !insert_after, insert_after ? GSI_NEW_STMT : GSI_SAME_STMT); stmt = gimple_build_assign (repl, expr); (gdb) p debug_tree(repl) unit size align 8 symtab 0 alias set -1 canonical type 0x76cea3f0 context pointer_to_this chain > sizes-gimplified unsigned SI size unit size align 32 symtab 0 alias set -1 canonical type 0x76cead20> used unsigned SI file pr68273.c line 67 col 10 size unit size align 32 context > (gdb) p debug_tree(expr) unit size align 8 symtab 0 alias set -1 canonical type 0x76cea3f0 context pointer_to_this chain > sizes-gimplified unsigned SI size unit size align 64 symtab 0 alias set -1 canonical type 0x76cead20> arg 0 unsigned SI size unit size align 32 symtab 0 alias set -1 canonical type 0x76d333f0> arg 0 used BLK file pr68273.c line 67 col 10 size unit size align 64 context arg-type >> arg 1 constant 0>> -- Eric Botcazou
[PATCH, PR67709 ] Don't call call_cgraph_insertion_hooks in simd_clone_create
Hi, Consider libgomp.fortran/declare-simd-3.f90: ... subroutine bar use declare_simd_2_mod real :: b(128) integer :: i !$omp simd do i = 1, 128 b(i) = i * 2.0 end do !$omp simd do i = 1, 128 b(i) = foo (7.0_8, 5 * i, b(i)) end do do i = 1, 128 if (b(i).ne.(7.0 + 10.0 * i * i)) call abort end do end subroutine bar ... when compiling declare-simd-3.f90 with '-O0 -fopenmp -flto -fno-use-linker-plugin', we run into an ICE with backtrace: ... ICE backtrace: ... src/libgomp/testsuite/libgomp.fortran/declare-simd-3.f90: At top level: src/libgomp/testsuite/libgomp.fortran/declare-simd-3.f90:7:0: internal compiler error: in estimate_function_body_sizes, at ipa-inline-analysis.c:2486 use declare_simd_2_mod 0xc9319d estimate_function_body_sizes src/gcc/ipa-inline-analysis.c:2486 0xc950dd compute_inline_parameters(cgraph_node*, bool) src/gcc/ipa-inline-analysis.c:2953 0xc9813b inline_analyze_function(cgraph_node*) src/gcc/ipa-inline-analysis.c:4078 0xc98205 inline_summary_t::insert(cgraph_node*, inline_summary*) src/gcc/ipa-inline-analysis.c:4105 0x9a6213 symbol_table::call_cgraph_insertion_hooks(cgraph_node*) src/gcc/cgraph.c:371 0xdefa0e simd_clone_create src/gcc/omp-low.c:18738 0xdf5012 expand_simd_clones src/gcc/omp-low.c:19799 0xdf519b ipa_omp_simd_clone src/gcc/omp-low.c:19839 0xdf520a execute src/gcc/omp-low.c:19867 Please submit a full bug report, ... During pass_omp_simd_clone, we call simd_clone_create for foo, and execute the !old_node->definition part: ... tree old_decl = old_node->decl; tree new_decl = copy_node (old_node->decl); DECL_NAME (new_decl) = clone_function_name (old_decl, "simdclone"); SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl)); SET_DECL_RTL (new_decl, NULL); DECL_STATIC_CONSTRUCTOR (new_decl) = 0; DECL_STATIC_DESTRUCTOR (new_decl) = 0; new_node = old_node->create_version_clone (new_decl, vNULL, NULL); if (old_node->in_other_partition) new_node->in_other_partition = 1; symtab->call_cgraph_insertion_hooks (new_node); ... The 'symtab->call_cgraph_insertion_hooks (new_node)' calls 'inline_summary_t::insert', a hook inserted during pass_ipa_inline. During execution of the hook we stumble over the fact that the new node has no 'struct function' in estimate_function_body_sizes: ... struct function *my_function = DECL_STRUCT_FUNCTION (node->decl); ... gcc_assert (my_function && my_function->cfg); ... The patch fixes the ICE by removing the call to 'symtab->call_cgraph_insertion_hooks'. [ The pass before pass_omp_simd_clone is pass_dispatcher_calls. It has a function create_target_clone, similar to simd_clone_create, with a node.defition and !node.defition part. The !node.defition part does not call 'symtab->call_cgraph_insertion_hooks (new_node)'. ] Bootstrapped and reg-tested on x86_64. OK for stage1 trunk? Thanks, - Tom Don't call call_cgraph_insertion_hooks in simd_clone_create 2016-02-08 Tom de Vries PR lto/67709 * omp-low.c (simd_clone_create): Remove call to symtab->call_cgraph_insertion_hooks. * testsuite/libgomp.fortran/declare-simd-4.f90: New test. --- gcc/omp-low.c| 1 - libgomp/testsuite/libgomp.fortran/declare-simd-4.f90 | 7 +++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index d41688b..fcbb3e0 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -18735,7 +18735,6 @@ simd_clone_create (struct cgraph_node *old_node) new_node = old_node->create_version_clone (new_decl, vNULL, NULL); if (old_node->in_other_partition) new_node->in_other_partition = 1; - symtab->call_cgraph_insertion_hooks (new_node); } if (new_node == NULL) return new_node; diff --git a/libgomp/testsuite/libgomp.fortran/declare-simd-4.f90 b/libgomp/testsuite/libgomp.fortran/declare-simd-4.f90 new file mode 100644 index 000..bfdf9cf --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/declare-simd-4.f90 @@ -0,0 +1,7 @@ +! { dg-do run { target { vect_simd_clones && lto } } } +! { dg-options "-fno-inline -flto -fno-use-linker-plugin" } +! { dg-additional-sources declare-simd-3.f90 } +! { dg-additional-options "-msse2" { target sse2_runtime } } +! { dg-additional-options "-mavx" { target avx_runtime } } + +include 'declare-simd-2.f90'
Re: [Patch AArch64] Restrict 16-bit sqrdml{sa}h instructions to FP_LO_REGS
On Tue, Jan 26, 2016 at 04:04:47PM +, James Greenhalgh wrote: > > Hi, > > In their forms using 16-bit lanes, the sqrdmlah and sqrdmlsh instruction > available when compiling with -march=armv8.1-a are only usable with > a register number in the range 0 to 15 for operand 3, as gas will point > out: > > Error: register number out of range 0 to 15 at > operand 3 -- `sqrdmlsh v2.4h,v4.4h,v23.h[5]' > > This patch teaches GCC to avoid registers outside of this range when > appropriate, in the same fashion as we do for other instructions with > this limitation. > > Tested on an internal testsuite targeting Neon intrinsics. > > OK? *ping* Thanks, James > --- > 2016-01-25 James Greenhalgh > > * config/aarch64/aarch64.md > (arch64_sqrdmlh_lane): Fix register > constraints for operand 3. > (aarch64_sqrdmlh_laneq): Likewise. > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index e1f5682..0b46e78 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -3240,7 +3240,7 @@ > [(match_operand:VDQHS 1 "register_operand" "0") > (match_operand:VDQHS 2 "register_operand" "w") > (vec_select: > - (match_operand: 3 "register_operand" "w") > + (match_operand: 3 "register_operand" "") >(parallel [(match_operand:SI 4 "immediate_operand" "i")]))] > SQRDMLH_AS))] > "TARGET_SIMD_RDMA" > @@ -3258,7 +3258,7 @@ > [(match_operand:SD_HSI 1 "register_operand" "0") > (match_operand:SD_HSI 2 "register_operand" "w") > (vec_select: > - (match_operand: 3 "register_operand" "w") > + (match_operand: 3 "register_operand" "") >(parallel [(match_operand:SI 4 "immediate_operand" "i")]))] > SQRDMLH_AS))] > "TARGET_SIMD_RDMA" > @@ -3278,7 +3278,7 @@ > [(match_operand:VDQHS 1 "register_operand" "0") > (match_operand:VDQHS 2 "register_operand" "w") > (vec_select: > - (match_operand: 3 "register_operand" "w") > + (match_operand: 3 "register_operand" "") >(parallel [(match_operand:SI 4 "immediate_operand" "i")]))] > SQRDMLH_AS))] > "TARGET_SIMD_RDMA" > @@ -3296,7 +3296,7 @@ > [(match_operand:SD_HSI 1 "register_operand" "0") > (match_operand:SD_HSI 2 "register_operand" "w") > (vec_select: > - (match_operand: 3 "register_operand" "w") > + (match_operand: 3 "register_operand" "") >(parallel [(match_operand:SI 4 "immediate_operand" "i")]))] > SQRDMLH_AS))] > "TARGET_SIMD_RDMA"
Re: [PATCH] Avoid bugs like PR68273 to trigger
On Mon, 8 Feb 2016, Eric Botcazou wrote: > > Yes, that place I just fixed. I mean for the SRA case. > > Are you sure that there is one? No, but if there is none left why would you want to "fix" SRA? Richard. > expr = build_ref_for_model (loc, agg, access->offset - top_offset, > access, gsi, insert_after); > > if (write) > { > if (access->grp_partial_lhs) > expr = force_gimple_operand_gsi (gsi, expr, true, NULL_TREE, >!insert_after, >insert_after ? GSI_NEW_STMT >: GSI_SAME_STMT); > stmt = gimple_build_assign (repl, expr); > > > (gdb) p debug_tree(repl) > type type size > unit size > align 8 symtab 0 alias set -1 canonical type 0x76cea3f0 > context > pointer_to_this chain 0x76c54a18 D.1410>> > sizes-gimplified unsigned SI > size > unit size > align 32 symtab 0 alias set -1 canonical type 0x76cead20> > used unsigned SI file pr68273.c line 67 col 10 size 0x76c435e8 32> unit size > align 32 context > > > > (gdb) p debug_tree(expr) > type type size > unit size > align 8 symtab 0 alias set -1 canonical type 0x76cea3f0 > context > pointer_to_this chain 0x76c54a18 D.1410>> > sizes-gimplified unsigned SI > size > unit size > align 64 symtab 0 alias set -1 canonical type 0x76cead20> > > arg 0 type Node> > unsigned SI size unit size > > align 32 symtab 0 alias set -1 canonical type 0x76d333f0> > > arg 0 Node> > used BLK file pr68273.c line 67 col 10 > size > unit size > align 64 context arg-type > >> > arg 1 > constant 0>> > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH, PR67709 ] Don't call call_cgraph_insertion_hooks in simd_clone_create
On Mon, Feb 08, 2016 at 01:46:44PM +0100, Tom de Vries wrote: > [ The pass before pass_omp_simd_clone is pass_dispatcher_calls. It has a > function create_target_clone, similar to simd_clone_create, with a > node.defition and !node.defition part. The !node.defition part does not call > 'symtab->call_cgraph_insertion_hooks (new_node)'. ] I'll defer to Honza or Richi if it is ok not to call cgraph insertion hooks at this point (and since when they can be avoided), or what else should be done. The patch could be ok even for 6.0, not just stage1, if they are ok with it (or propose some other change). > Don't call call_cgraph_insertion_hooks in simd_clone_create > > 2016-02-08 Tom de Vries > > PR lto/67709 > * omp-low.c (simd_clone_create): Remove call to > symtab->call_cgraph_insertion_hooks. > > * testsuite/libgomp.fortran/declare-simd-4.f90: New test. Jakub
[PATCH, PR69607] Mark offload symbols as global in lto
Hi, when running libgomp.c testsuite with "-flto -flto-partition=1to1 -fno-toplevel-reorder" we run into many compilation failures like this: ... /tmp/.ltrans0.ltrans.o:(.gnu.offload_funcs+0x1a0): undefined reference to `MAIN__._omp_fn.0'^M ... The problem is that the offload table is in one lto partition, and the function listed in the offload table is in another, without the function having been promoted to be visible in the other partition. The patch fixes this by promoting the symbols in the offload table such that they're visible in all partitions. Bootstrapped and reg-tested on x86_64. Build for nvidia accelerator and reg-tested libgomp with various lto settings. OK for trunk, stage1? Thanks, - Tom Mark offload symbols as global in lto 2016-02-08 Tom de Vries PR lto/69607 * lto-partition.c (promote_offload_tables): New function. * lto-partition.h (promote_offload_tables): Declare. * lto.c (do_whole_program_analysis): call promote_offload_tables. * testsuite/libgomp.c/target-36.c: New test. --- gcc/lto/lto-partition.c | 28 gcc/lto/lto-partition.h | 1 + gcc/lto/lto.c | 2 ++ libgomp/testsuite/libgomp.c/target-36.c | 4 4 files changed, 35 insertions(+) diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index 9eb63c2..56598d4 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-partition.c @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see #include "ipa-prop.h" #include "ipa-inline.h" #include "lto-partition.h" +#include "omp-low.h" vec ltrans_partitions; @@ -1003,6 +1004,33 @@ promote_symbol (symtab_node *node) "Promoting as hidden: %s\n", node->name ()); } +/* Promote the symbols in the offload tables. */ + +void +promote_offload_tables (void) +{ + if (vec_safe_is_empty (offload_funcs) && vec_safe_is_empty (offload_vars)) +return; + + for (unsigned i = 0; i < vec_safe_length (offload_funcs); i++) +{ + tree fn_decl = (*offload_funcs)[i]; + cgraph_node *node = cgraph_node::get (fn_decl); + if (node->externally_visible) + continue; + promote_symbol (node); +} + + for (unsigned i = 0; i < vec_safe_length (offload_vars); i++) +{ + tree var_decl = (*offload_vars)[i]; + varpool_node *node = varpool_node::get (var_decl); + if (node->externally_visible) + continue; + promote_symbol (node); +} +} + /* Return true if NODE needs named section even if it won't land in the partition symbol table. FIXME: we should really not use named sections for inline clones and master diff --git a/gcc/lto/lto-partition.h b/gcc/lto/lto-partition.h index 31e3764..1a38126 100644 --- a/gcc/lto/lto-partition.h +++ b/gcc/lto/lto-partition.h @@ -36,6 +36,7 @@ extern vec ltrans_partitions; void lto_1_to_1_map (void); void lto_max_map (void); void lto_balanced_map (int); +extern void promote_offload_tables (void); void lto_promote_cross_file_statics (void); void free_ltrans_partitions (void); void lto_promote_statics_nonwpa (void); diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c index 6718fbbe..ecec1bc 100644 --- a/gcc/lto/lto.c +++ b/gcc/lto/lto.c @@ -3138,6 +3138,8 @@ do_whole_program_analysis (void) to globals with hidden visibility because they are accessed from multiple partitions. */ lto_promote_cross_file_statics (); + /* Promote all the offload symbols. */ + promote_offload_tables (); timevar_pop (TV_WHOPR_PARTITIONING); timevar_stop (TV_PHASE_OPT_GEN); diff --git a/libgomp/testsuite/libgomp.c/target-36.c b/libgomp/testsuite/libgomp.c/target-36.c new file mode 100644 index 000..bafb718 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/target-36.c @@ -0,0 +1,4 @@ +/* { dg-do run { target lto } } */ +/* { dg-additional-options "-flto -flto-partition=1to1 -fno-toplevel-reorder" } */ + +#include "target-1.c"
[Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump
Hi, As far as I can tell, this testcase will only vectorize for x86_64/i?86 targets, so it should be gated to only check for vectorization on those. Additionally, this test wants to scan the vectorizer dumps, so we ought to add -fdump-tree-vect-all to the options. Checked on aarch64 (cross/native) and x86 with no issues. OK? Thanks, James --- 2016-02-08 James Greenhalgh * gcc.dg/vect/vect-mask-store-move-1.c: Add dump option, and gate check on x86_64/i?86. diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c index e575f6d..3ef613d 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c +++ b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3" } */ +/* { dg-options "-O3 -fdump-tree-vect-all" } */ /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */ #define N 256 @@ -16,4 +16,4 @@ void foo (int n) } } -/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" } } */ +/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" { target { i?86-*-* x86_64-*-* } } } } */
Re: [PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables
On 26/01/16 14:01, Ilya Verbin wrote: On Tue, Jan 26, 2016 at 13:21:57 +0100, Tom de Vries wrote: On 25/01/16 14:27, Ilya Verbin wrote: On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote: diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index 62e5454..cdaee41 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -1911,6 +1911,11 @@ input_offload_tables (void) tree fn_decl = lto_file_decl_data_get_fn_decl (file_data, decl_index); vec_safe_push (offload_funcs, fn_decl); + + /* Prevent IPA from removing fn_decl as unreachable, since there +may be no refs from the parent function to child_fn in offload +LTO mode. */ + cgraph_node::get (fn_decl)->mark_force_output (); } else if (tag == LTO_symtab_variable) { @@ -1918,6 +1923,10 @@ input_offload_tables (void) tree var_decl = lto_file_decl_data_get_var_decl (file_data, decl_index); vec_safe_push (offload_vars, var_decl); + + /* Prevent IPA from removing var_decl as unused, since there +may be no refs to var_decl in offload LTO mode. */ + varpool_node::get (var_decl)->force_output = 1; } This doesn't work when there is more than one LTO partition, because only first partition contains full offload table to maintain correct order, but cgraph and varpool nodes aren't necessarily created for the first partition. To reproduce: $ make check-target-libgomp RUNTESTFLAGS="c.exp=for-* --target_board=unix/-flto" FAIL: libgomp.c/for-3.c (internal compiler error) FAIL: libgomp.c/for-5.c (internal compiler error) FAIL: libgomp.c/for-6.c (internal compiler error) $ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-* --target_board=unix/-flto" FAIL: libgomp.c++/for-11.C (internal compiler error) FAIL: libgomp.c++/for-13.C (internal compiler error) FAIL: libgomp.c++/for-14.C (internal compiler error) This works for me. OK for trunk? Thanks, - Tom Check that cgraph/varpool_node exists before use in input_offload_tables 2016-01-26 Tom de Vries * lto-cgraph.c (input_offload_tables): Check that cgraph/varpool_node exists before use. In this case they will be not marked as force_output in other partitions (except the first one). AFAIU, that's not the case. If we're splitting up lto compilation over partitions, it means we're first calling lto1 in WPA mode. We'll read in all offload tables, and mark all symbols with force_output, and when writing out the partitions, we'll write the offload symbols out with force_output set. This updated patch only does the force_output marking for offload symbols in WPA or LTO. It's not necessary in LTRANS mode. Bootstrapped and reg-tested on x86_64. Build for nvidia accelerator and reg-tested libgomp with various lto settings. OK for trunk, stage4? Thanks, - Tom Don't mark offload symbols with force_output in ltrans 2016-02-08 Tom de Vries PR lto/69655 * lto-cgraph.c (input_offload_tables): Add and handle bool parameter do_force_output. * lto-streamer.h (input_offload_tables): Add and handle bool parameter. * lto.c (read_cgraph_and_symbols): Call input_offload_tables with argument. --- gcc/lto-cgraph.c | 8 +--- gcc/lto-streamer.h | 2 +- gcc/lto/lto.c | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index 0634779..95c446d 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -1885,7 +1885,7 @@ input_symtab (void) target code, and store them into OFFLOAD_FUNCS and OFFLOAD_VARS. */ void -input_offload_tables (void) +input_offload_tables (bool do_force_output) { struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data (); struct lto_file_decl_data *file_data; @@ -1915,7 +1915,8 @@ input_offload_tables (void) /* Prevent IPA from removing fn_decl as unreachable, since there may be no refs from the parent function to child_fn in offload LTO mode. */ - cgraph_node::get (fn_decl)->mark_force_output (); + if (do_force_output) + cgraph_node::get (fn_decl)->mark_force_output (); } else if (tag == LTO_symtab_variable) { @@ -1926,7 +1927,8 @@ input_offload_tables (void) /* Prevent IPA from removing var_decl as unused, since there may be no refs to var_decl in offload LTO mode. */ - varpool_node::get (var_decl)->force_output = 1; + if (do_force_output) + varpool_node::get (var_decl)->force_output = 1; } else fatal_error (input_location, diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h index 0cb200e..f391161 100644 --- a/gcc/lto-streamer.h +++ b/gcc/lto-streamer.h @@ -915,7 +915,7 @@ bool lto_symtab_encoder_encode_initializer_p (lto_symtab_encoder_t, void output_symtab (void); void input_symtab (void); void output_offload_tables (void); -void input_offload_tables (v
Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump
Hi James, Thanks for reporting this issue. I prepared slightly different patch since we don't need to add tree-vect dump option - it is on by default for all tests in /vect directory. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-mask-store-move-1.c: Gate dump with x86 target. 2016-02-08 16:07 GMT+03:00 James Greenhalgh : > > Hi, > > As far as I can tell, this testcase will only vectorize for x86_64/i?86 > targets, so it should be gated to only check for vectorization on those. > > Additionally, this test wants to scan the vectorizer dumps, so we ought > to add -fdump-tree-vect-all to the options. > > Checked on aarch64 (cross/native) and x86 with no issues. > > OK? > > Thanks, > James > > --- > 2016-02-08 James Greenhalgh > > * gcc.dg/vect/vect-mask-store-move-1.c: Add dump option, and gate > check on x86_64/i?86. > patch-test Description: Binary data
Re: [PATCH, PR69707] Handle -fdiagnostics-color in lto
On 08/02/16 11:42, Jakub Jelinek wrote: On Mon, Feb 08, 2016 at 11:34:39AM +0100, Tom de Vries wrote: Hi, when running libgomp.oacc-c-c++-common/parallel-dims.c with -flto -fno-use-linker-plugin, we run into a failing 'test for excess errors'. The problem is that while -fdiagnostics-color=never is passed to gcc, it's not propagated to lto1, and the error message is annotated with color information, which confuses the test for excess errors. This patch fixes the problem by making sure that -fdiagnostics-color is propagated to lto1, in the same way that -fdiagnostics-show-caret is propagated to lto1. Bootstrapped and reg-tested on x86_64. OK for trunk, stage1? Doesn't that mean diagnostics from the driver itself will no longer honor that option when deciding if the diagnostics should be colorized or not? Hi, hmm, indeed removing the 'Driver' flag from the fdiagnostics-color= entry in common.opt breaks the functioning of fdiagnostics-color= in the gcc driver. This patch leaves the 'Driver' flag alone, and instead explicitly allows fdiagnostics-color= in lto_write_options. Is this approach ok? Thanks, - Tom Handle -fdiagnostics-color in lto 2016-02-08 Tom de Vries PR lto/69707 * lto-opts.c (lto_write_options): Allow fdiagnostics-color. * lto-wrapper.c (merge_and_complain, append_compiler_options): Handle OPT_fdiagnostics_color_. * testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c: New test. --- gcc/lto-opts.c | 21 ++--- gcc/lto-wrapper.c | 2 ++ .../libgomp.oacc-c-c++-common/parallel-dims-2.c | 19 +++ 3 files changed, 39 insertions(+), 3 deletions(-) diff --git a/gcc/lto-opts.c b/gcc/lto-opts.c index 5f4b427..b15b9e8 100644 --- a/gcc/lto-opts.c +++ b/gcc/lto-opts.c @@ -200,9 +200,24 @@ lto_write_options (void) which includes things like -o and -v or -fhelp for example. We do not need those. The only exception is -foffload option, if we write it in offload_lto section. Also drop all diagnostic options. */ - if ((cl_options[option->opt_index].flags & (CL_DRIVER|CL_WARNING)) - && (!lto_stream_offload_p || option->opt_index != OPT_foffload_)) - continue; + if (cl_options[option->opt_index].flags & (CL_DRIVER|CL_WARNING)) + { + bool keep = false; + switch (option->opt_index) + { + case OPT_foffload_: + keep = lto_stream_offload_p; + break; + case OPT_fdiagnostics_color_: + keep = true; + break; + default: + break; + } + + if (!keep) + continue; + } for (j = 0; j < option->canonical_option_num_elements; ++j) append_to_collect_gcc_options (&temporary_obstack, &first_p, diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index ced6f2f..484dbc1 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -232,6 +232,7 @@ merge_and_complain (struct cl_decoded_option **decoded_options, break; /* Fallthru. */ + case OPT_fdiagnostics_color_: case OPT_fdiagnostics_show_caret: case OPT_fdiagnostics_show_option: case OPT_fdiagnostics_show_location_: @@ -497,6 +498,7 @@ append_compiler_options (obstack *argv_obstack, struct cl_decoded_option *opts, on any CL_TARGET flag and a few selected others. */ switch (option->opt_index) { + case OPT_fdiagnostics_color_: case OPT_fdiagnostics_show_caret: case OPT_fdiagnostics_show_option: case OPT_fdiagnostics_show_location_: diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c new file mode 100644 index 000..eea8c7e --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c @@ -0,0 +1,19 @@ +/* { dg-do run { target { openacc_nvidia_accel_selected && lto } } } */ +/* { dg-additional-options "-flto -fno-use-linker-plugin" } */ + +/* Worker and vector size checks. Picked an outrageously large + value. */ + +int main () +{ +#pragma acc parallel num_workers (2<<20) /* { dg-error "using num_workers" } */ + { + } + +#pragma acc parallel vector_length (2<<20) /* { dg-error "using vector_length" } */ + { + } + + return 0; +} +
Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump
On Mon, Feb 08, 2016 at 04:29:31PM +0300, Yuri Rumyantsev wrote: > Hi James, > > Thanks for reporting this issue. > I prepared slightly different patch since we don't need to add > tree-vect dump option - it is on by default for all tests in /vect > directory. Hm, I added that line as my test runs were showing: UNRESOLVED: gcc.dg/vect/vect-mask-store-move-1.c: dump file does not exist I would guess the explicit /* { dg-options "-O3" } */ is clobbering the vect.exp setup of flags? This also affects the x86-64 results H.J. Lu is sending out: https://gcc.gnu.org/ml/gcc-testresults/2016-02/msg00824.html Thanks, James > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-mask-store-move-1.c: Gate dump with x86 target. > > 2016-02-08 16:07 GMT+03:00 James Greenhalgh : > > > > Hi, > > > > As far as I can tell, this testcase will only vectorize for x86_64/i?86 > > targets, so it should be gated to only check for vectorization on those. > > > > Additionally, this test wants to scan the vectorizer dumps, so we ought > > to add -fdump-tree-vect-all to the options. > > > > Checked on aarch64 (cross/native) and x86 with no issues. > > > > OK? > > > > Thanks, > > James > > > > --- > > 2016-02-08 James Greenhalgh > > > > * gcc.dg/vect/vect-mask-store-move-1.c: Add dump option, and gate > > check on x86_64/i?86. > >
Re: [PATCH, PR69707] Handle -fdiagnostics-color in lto
On Mon, Feb 08, 2016 at 02:38:17PM +0100, Tom de Vries wrote: > hmm, indeed removing the 'Driver' flag from the fdiagnostics-color= entry in > common.opt breaks the functioning of fdiagnostics-color= in the gcc driver. > > This patch leaves the 'Driver' flag alone, and instead explicitly allows > fdiagnostics-color= in lto_write_options. > > Is this approach ok? Doesn't that mean storing -fdiagnostics-color= into the LTO option section and then using whatever was recorded from the first TU that recorded anything? That doesn't look right for such diagnostics options either. I mean, if I $ gcc -fdiagnostics-color=always -c -flto a.c on another terminal $ gcc -fdiagnostics-color=never -c -flto b.c on yet another terminal $ gcc -flto -o a a.o b.o then I'd expect the default setting for -fdiagnostics-color= for all diagnostics while linking/LTO optimizing it, and for say $ gcc -fdiagnostics-color=always -flto -o a a.o b.o to have it always colorized, similarly for never. IMHO we should just honor what has been specified on the linker command line if anything. So, the question is, is -fdiagnostics-color=never passed in the testsuite just to the compilation and not to linking (that would be IMHO a testsuite bug), or is it passed on the linking command line, but not passed through to lto1? > Handle -fdiagnostics-color in lto > > 2016-02-08 Tom de Vries > > PR lto/69707 > * lto-opts.c (lto_write_options): Allow fdiagnostics-color. > * lto-wrapper.c (merge_and_complain, append_compiler_options): Handle > OPT_fdiagnostics_color_. > > * testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c: New test. Jakub
Re: libgcc: On AIX, increase chances to find landing pads for exceptions
Runtime linking is disabled by default on AIX, and I disabled it for libstdc++. There are two remaining issues: 1) FDEs with overlapping ranges causing problems with exceptions. I'm not sure of the best way to work around this. Your patch is one possible solution. 2) AIX linker garbage collection conflicting with scanning for symbols. collect2 scanning needs to better emulate SVR4 linker semantics for object files and archives. Thanks, David On Mon, Feb 8, 2016 at 7:14 AM, Michael Haubenwallner wrote: > Hi David, > > still experiencing exception-not-caught problems with gcc-4.2.4 on AIX > leads me to some patch proposed in http://gcc.gnu.org/PR13878 back in > 2004 already, ought to be fixed by some different commit since 3.4.0. > > As long as build systems (even libtool right now) on AIX do export these > _GLOBAL__* symbols from shared libraries, overlapping frame-base address > ranges may become registered, even if newer gcc (seen with 4.8) does name > the FDE symbols more complex to reduce these chances. > > But still, just think of linking some static library into multiple shared > libraries and/or the main executable. Or sometimes there is just need for > some hackery to override a shared object's implementation detail and rely > on runtime linking to do the override at runtime. > > Agreed both is "wrong" to some degree, but the larger an application is, > the higher is the chance for this to happen. > > Thoughts? > > Thanks! > /haubi/
Re: [PATCH, PR69607] Mark offload symbols as global in lto
On Mon, Feb 08, 2016 at 14:00:00 +0100, Tom de Vries wrote: > when running libgomp.c testsuite with "-flto -flto-partition=1to1 > -fno-toplevel-reorder" we run into many compilation failures like this: > ... > /tmp/.ltrans0.ltrans.o:(.gnu.offload_funcs+0x1a0): undefined > reference to `MAIN__._omp_fn.0'^M > ... > > The problem is that the offload table is in one lto partition, and the > function listed in the offload table is in another, without the function > having been promoted to be visible in the other partition. > > The patch fixes this by promoting the symbols in the offload table such that > they're visible in all partitions. > > Bootstrapped and reg-tested on x86_64. > > Build for nvidia accelerator and reg-tested libgomp with various lto > settings. Works fine with intelmic offloading. -- Ilya
Re: [PATCH][P1 tree-optimization/68541] Add heuristics to path splitting
Hi Jeff, On 05/02/16 23:49, Jeff Law wrote: This patch addresses multiple issues with how we determine when to split paths. The primary motivation is addressing 68541 (P1). As I've gotten testcodes from Ajit, I've been able to look closely at the path splitting opportunities and consistently what I've seen is that contrary to the original belief, CSE/DCE opportunities are rarely exposed by path splitting. It seems the benefit is more due to removing the unconditional jump inherent in a CFG diamond -- even more so on the microblaze where these jumps have delay slots. While there are cases where splitting allows for CSE/DCE, they're the exception rather than the rule in the codes I've looked at. With that in mind, we want to encourage path splitting when the amount of code we're duplicating is small. We also want to balance that with not path splitting when the original code is something that may be if-convertable. The vast majority of if-convertable codes are cases where the two predecessors of the join block are single statement blocks with their results feeding the same PHI in the join block. We now reject that case for path splitting so as not to spoil if-conversion. The only wrinkle with that heuristic is that some codes (MIN, MAX, ABS, COND, calls, etc) are not good candidates for further if conversion. (ie, we have a MIN_EXPR in both predecessors that feed the PHI in the join). So we still allow those cases for splitting. This can be easily refined as we find more cases or as the if-converter is extended. So with that in place as the first filter, we just need to put a limiter on the number of statements we allow to be copied. I punted a bit and re-used the PARAM for jump threading. They've got enough in common that I felt it was reasonable. If I were to create a new PARAM, I'd probably start with a smaller default value after doing further instrumentation. While I was working through this I noticed we were path splitting in some cases where we shouldn't. Particularly if we had a CFG like: a / \ b c / \ / e d / \ / \ / \ latch exit Duplicating d for the edges b->d and c->d isn't as helpful because the duplicate can't be merged into b. We now detect this kind of case and reject it for path splitting. The new tests are a combination of reductions of codes from Ajit and my own poking around. Looking further out, I really wonder if the low level aspects of path splitting like we're trying to capture here really belong in the cross-jumping phase of the RTL optimizers. The higher level aspects (when we are able to expose CSE/DCE opportunities) should be driven by actually looking at the propagation/simplifications enabled by a potential split path. But those are gcc-7 things. Bootstrapped and regression tested on x86 linux. Installed on the trunk. I'll obviously keep an eye out for how the tests are handled on other platforms -- I don't think the tests are real sensitive to branch costs and such, but I've been surprised before. Onward to the jump threading code explosion wrap-up... jeff After this patch I also see: FAIL: gcc.dg/tree-ssa/split-path-1.c scan-tree-dump split-paths "Duplicating join block" on arm, but not on aarch64. My arm-none-eabi cross compiler is configured with: --with-float=hard --with-cpu=cortex-a9 --with-fpu=neon --with-mode=thumb Kyrill
Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump
Sorry for troubles. One line must be excluded from test: -/* { dg-options "-O3" } */ Here is updated patch. Best regards. Yuri. 2016-02-08 16:40 GMT+03:00 James Greenhalgh : > On Mon, Feb 08, 2016 at 04:29:31PM +0300, Yuri Rumyantsev wrote: >> Hi James, >> >> Thanks for reporting this issue. >> I prepared slightly different patch since we don't need to add >> tree-vect dump option - it is on by default for all tests in /vect >> directory. > > Hm, I added that line as my test runs were showing: > > UNRESOLVED: gcc.dg/vect/vect-mask-store-move-1.c: dump file does not exist > > I would guess the explicit > > /* { dg-options "-O3" } */ > > is clobbering the vect.exp setup of flags? > > This also affects the x86-64 results H.J. Lu is sending out: > > https://gcc.gnu.org/ml/gcc-testresults/2016-02/msg00824.html > > Thanks, > James > >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/vect/vect-mask-store-move-1.c: Gate dump with x86 target. >> >> 2016-02-08 16:07 GMT+03:00 James Greenhalgh : >> > >> > Hi, >> > >> > As far as I can tell, this testcase will only vectorize for x86_64/i?86 >> > targets, so it should be gated to only check for vectorization on those. >> > >> > Additionally, this test wants to scan the vectorizer dumps, so we ought >> > to add -fdump-tree-vect-all to the options. >> > >> > Checked on aarch64 (cross/native) and x86 with no issues. >> > >> > OK? >> > >> > Thanks, >> > James >> > >> > --- >> > 2016-02-08 James Greenhalgh >> > >> > * gcc.dg/vect/vect-mask-store-move-1.c: Add dump option, and gate >> > check on x86_64/i?86. >> > > > patch-test Description: Binary data
Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump
On Mon, Feb 8, 2016 at 2:40 PM, James Greenhalgh wrote: > On Mon, Feb 08, 2016 at 04:29:31PM +0300, Yuri Rumyantsev wrote: >> Hi James, >> >> Thanks for reporting this issue. >> I prepared slightly different patch since we don't need to add >> tree-vect dump option - it is on by default for all tests in /vect >> directory. > > Hm, I added that line as my test runs were showing: > > UNRESOLVED: gcc.dg/vect/vect-mask-store-move-1.c: dump file does not exist > > I would guess the explicit > > /* { dg-options "-O3" } */ > > is clobbering the vect.exp setup of flags? Yes. Use { dg-additional-options "-O3" } instead. > This also affects the x86-64 results H.J. Lu is sending out: > > https://gcc.gnu.org/ml/gcc-testresults/2016-02/msg00824.html > > Thanks, > James > >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/vect/vect-mask-store-move-1.c: Gate dump with x86 target. >> >> 2016-02-08 16:07 GMT+03:00 James Greenhalgh : >> > >> > Hi, >> > >> > As far as I can tell, this testcase will only vectorize for x86_64/i?86 >> > targets, so it should be gated to only check for vectorization on those. >> > >> > Additionally, this test wants to scan the vectorizer dumps, so we ought >> > to add -fdump-tree-vect-all to the options. >> > >> > Checked on aarch64 (cross/native) and x86 with no issues. >> > >> > OK? >> > >> > Thanks, >> > James >> > >> > --- >> > 2016-02-08 James Greenhalgh >> > >> > * gcc.dg/vect/vect-mask-store-move-1.c: Add dump option, and gate >> > check on x86_64/i?86. >> > > >
Re: Fix PR67639
On 12/21/2015 08:39 PM, Jeff Law wrote: On 12/18/2015 11:38 AM, Bernd Schmidt wrote: In an earlier fix, the following change was made in varasm.c for invalid register variables: --- trunk/gcc/varasm.c2014/08/26 14:59:59214525 +++ trunk/gcc/varasm.c2014/08/26 17:06:31214526 @@ -1371,6 +1371,11 @@ make_decl_rtl (tree decl) /* As a register variable, it has no section. */ return; } + /* Avoid internal errors from invalid register + specifications. */ + SET_DECL_ASSEMBLER_NAME (decl, NULL_TREE); + DECL_HARD_REGISTER (decl) = 0; + return; } As seen in PR67639, this makes the IL inconsistent and triggers another internal error where we expect to see an SSA_NAME instead of a VAR_DECL. The following patch extends the above slightly, by also setting DECL_EXTERNAL to pretend that the erroneous variable is actually a global. Bootstrapped and tested on x86_64-linux, ok? OK. Turns out 65702 is a dup and this should go into gcc-5 as well. Ok to backport? Bernd
[PATCH] Fix PR69719
The following fixes a latent bug in vect_prune_runtime_alias_test_list, not considering negative offset1 - offset2. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk sofar. Richard. 2016-02-08 Richard Biener PR tree-optimization/69719 * tree-vect-data-refs.c (vect_prune_runtime_alias_test_list): Properly use absolute of the difference of the two offsets to compare or adjust the segment length. * gcc.dg/torture/pr69719.c: New testcase. Index: gcc/tree-vect-data-refs.c === *** gcc/tree-vect-data-refs.c (revision 233211) --- gcc/tree-vect-data-refs.c (working copy) *** vect_prune_runtime_alias_test_list (loop *** 2905,2912 || !tree_fits_shwi_p (dr_a2->offset)) continue; ! HOST_WIDE_INT diff = (tree_to_shwi (dr_a2->offset) ! - tree_to_shwi (dr_a1->offset)); /* Now we check if the following condition is satisfied: --- 2905,2913 || !tree_fits_shwi_p (dr_a2->offset)) continue; ! unsigned HOST_WIDE_INT diff ! = absu_hwi (tree_to_shwi (dr_a2->offset) ! - tree_to_shwi (dr_a1->offset)); /* Now we check if the following condition is satisfied: *** vect_prune_runtime_alias_test_list (loop *** 2925,2937 */ ! HOST_WIDE_INT min_seg_len_b = (tree_fits_shwi_p (dr_b1->seg_len) ! ? tree_to_shwi (dr_b1->seg_len) ! : vect_factor); if (diff <= min_seg_len_b ! || (tree_fits_shwi_p (dr_a1->seg_len) ! && diff - tree_to_shwi (dr_a1->seg_len) < min_seg_len_b)) { if (dump_enabled_p ()) { --- 2926,2939 */ ! unsigned HOST_WIDE_INT min_seg_len_b ! = (tree_fits_uhwi_p (dr_b1->seg_len) ! ? tree_to_uhwi (dr_b1->seg_len) ! : vect_factor); if (diff <= min_seg_len_b ! || (tree_fits_uhwi_p (dr_a1->seg_len) ! && diff - tree_to_uhwi (dr_a1->seg_len) < min_seg_len_b)) { if (dump_enabled_p ()) { Index: gcc/testsuite/gcc.dg/torture/pr69719.c === *** gcc/testsuite/gcc.dg/torture/pr69719.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr69719.c (working copy) *** *** 0 --- 1,24 + /* { dg-do run } */ + + int b, c = 1, e, f; + int a[6][5] = { {0, 0, 0, 0, 0}, {0, 0, 0, 0, 0}, {0, 1, 0, 0, 0} }; + + void __attribute__((noinline)) + fn1 () + { + int d; + for (b = 0; b < 5; b++) + for (d = 4; d; d--) + a[c + 1][b] = a[d + 1][d]; + } + + int + main () + { + fn1 (); + + if (a[2][1] != 0) + __builtin_abort (); + + return 0; + }
[PATCH] Re: PR c++/69139
The following fixes up the handling of trailing returns with cv/ref specifiers mentioned by TC in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69139#c3. I've added handling of exception and transaction specs too. --- gcc/cp/parser.c | 12 ++-- gcc/testsuite/g++.dg/cpp0x/trailing12.C | 10 ++ 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index c1a9674..f51fac4 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -16055,8 +16055,16 @@ cp_parser_simple_type_specifier (cp_parser* parser, /*recovering*/false, /*or_comma*/false, /*consume_paren*/true))) - have_trailing_return_fn_decl - = cp_lexer_next_token_is (parser->lexer, CPP_DEREF); + { + /* Consume any cv-qualifier-seq, ref-qualifier, +tx-qualifier and/or exception specifier. */ + cp_parser_cv_qualifier_seq_opt (parser); + cp_parser_ref_qualifier_opt (parser); + cp_parser_tx_qualifier_opt (parser); + cp_parser_exception_specification_opt (parser); + have_trailing_return_fn_decl + = cp_lexer_next_token_is (parser->lexer, CPP_DEREF); + } } cp_parser_abort_tentative_parse (parser); } diff --git a/gcc/testsuite/g++.dg/cpp0x/trailing12.C b/gcc/testsuite/g++.dg/cpp0x/trailing12.C index f3e02a8..707b753 100644 --- a/gcc/testsuite/g++.dg/cpp0x/trailing12.C +++ b/gcc/testsuite/g++.dg/cpp0x/trailing12.C @@ -4,3 +4,13 @@ auto get(int) -> int { return {}; } template int f(auto (*)(int) -> R) { return {}; } int i = f(get); + +struct X { + auto get(int) const & throw() -> int { return {}; } + auto get(int) && -> long { return {}; } +}; + +template auto f(auto (X::*)(int) const & -> R) -> R {} + +using I = decltype(f(&X::get)); +using I = int; -- 2.7.0
Re: [PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables
On Mon, Feb 08, 2016 at 14:20:11 +0100, Tom de Vries wrote: > On 26/01/16 14:01, Ilya Verbin wrote: > >On Tue, Jan 26, 2016 at 13:21:57 +0100, Tom de Vries wrote: > >>On 25/01/16 14:27, Ilya Verbin wrote: > >>>On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote: > >diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c > >index 62e5454..cdaee41 100644 > >--- a/gcc/lto-cgraph.c > >+++ b/gcc/lto-cgraph.c > >@@ -1911,6 +1911,11 @@ input_offload_tables (void) > > tree fn_decl > > = lto_file_decl_data_get_fn_decl (file_data, > > decl_index); > > vec_safe_push (offload_funcs, fn_decl); > >+ > >+ /* Prevent IPA from removing fn_decl as unreachable, > >since there > >+ may be no refs from the parent function to child_fn in > >offload > >+ LTO mode. */ > >+ cgraph_node::get (fn_decl)->mark_force_output (); > > } > > else if (tag == LTO_symtab_variable) > > { > >@@ -1918,6 +1923,10 @@ input_offload_tables (void) > > tree var_decl > > = lto_file_decl_data_get_var_decl (file_data, > > decl_index); > > vec_safe_push (offload_vars, var_decl); > >+ > >+ /* Prevent IPA from removing var_decl as unused, since > >there > >+ may be no refs to var_decl in offload LTO mode. */ > >+ varpool_node::get (var_decl)->force_output = 1; > > } > >>> > >>>This doesn't work when there is more than one LTO partition, because only > >>>first > >>>partition contains full offload table to maintain correct order, but > >>>cgraph and > >>>varpool nodes aren't necessarily created for the first partition. To > >>>reproduce: > >>> > >>>$ make check-target-libgomp RUNTESTFLAGS="c.exp=for-* > >>>--target_board=unix/-flto" > >>>FAIL: libgomp.c/for-3.c (internal compiler error) > >>>FAIL: libgomp.c/for-5.c (internal compiler error) > >>>FAIL: libgomp.c/for-6.c (internal compiler error) > >>>$ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-* > >>>--target_board=unix/-flto" > >>>FAIL: libgomp.c++/for-11.C (internal compiler error) > >>>FAIL: libgomp.c++/for-13.C (internal compiler error) > >>>FAIL: libgomp.c++/for-14.C (internal compiler error) > >> > >>This works for me. > >> > >>OK for trunk? > >> > >>Thanks, > >>- Tom > >> > > > >>Check that cgraph/varpool_node exists before use in input_offload_tables > >> > >>2016-01-26 Tom de Vries > >> > >>* lto-cgraph.c (input_offload_tables): Check that cgraph/varpool_node > >>exists before use. > > > >In this case they will be not marked as force_output in other partitions > >(except > >the first one). > > AFAIU, that's not the case. > > If we're splitting up lto compilation over partitions, it means we're first > calling lto1 in WPA mode. We'll read in all offload tables, and mark all > symbols with force_output, and when writing out the partitions, we'll write > the offload symbols out with force_output set. > > This updated patch only does the force_output marking for offload symbols in > WPA or LTO. It's not necessary in LTRANS mode. You're right, works for me. -- Ilya
Re: C++ PATCH for c++/69688 (bogus error with -Wsign-compare)
On Fri, Feb 05, 2016 at 05:36:21PM -0500, Jason Merrill wrote: > On 02/05/2016 05:32 PM, Marek Polacek wrote: > > if (TREE_CODE (type) == ERROR_MARK) > > return NULL_TREE; > > > >+ /* Here, DECL may change value; purge caches. */ > >+ clear_fold_cache (); > >+ clear_cv_cache (); > >+ > > if (MAYBE_CLASS_TYPE_P (type)) > > This should happen after computing the value to be stored, not before. Also, > could you combine those two functions into one? There's no reason for > callers such as this to need to call two different functions. Okay. So like this? Bootstrapped/regtested on x86_64-linux, ok for trunk? 2016-02-08 Marek Polacek PR c++/69688 * constexpr.c (clear_cv_and_fold_caches): Renamed from clear_cv_cache. Call clear_fold_cache. * cp-tree.h: Adjust declaration. * decl.c (finish_enum_value_list): Call clear_cv_and_fold_caches rather than clear_cv_cache and clear_fold_cache. * typeck2.c (store_init_value): Call clear_cv_and_fold_caches. * g++.dg/init/const12.C: New test. diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c index 05f6843..85fc64e 100644 --- gcc/cp/constexpr.c +++ gcc/cp/constexpr.c @@ -4098,12 +4098,13 @@ maybe_constant_value (tree t, tree decl) return ret; } -/* Dispose of the whole CV_CACHE. */ +/* Dispose of the whole CV_CACHE and FOLD_CACHE. */ void -clear_cv_cache (void) +clear_cv_and_fold_caches (void) { gt_cleare_cache (cv_cache); + clear_fold_cache (); } /* Like maybe_constant_value but first fully instantiate the argument. diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h index 0aeee57..a67e9b6 100644 --- gcc/cp/cp-tree.h +++ gcc/cp/cp-tree.h @@ -6920,7 +6920,7 @@ extern bool var_in_constexpr_fn (tree); extern void explain_invalid_constexpr_fn(tree); extern vec cx_error_context (void); extern tree fold_sizeof_expr (tree); -extern void clear_cv_cache (void); +extern void clear_cv_and_fold_caches (void); /* In c-family/cilk.c */ extern bool cilk_valid_spawn(tree); diff --git gcc/cp/decl.c gcc/cp/decl.c index 2c337bc..11f7ce6 100644 --- gcc/cp/decl.c +++ gcc/cp/decl.c @@ -13414,8 +13414,7 @@ finish_enum_value_list (tree enumtype) /* Each enumerator now has the type of its enumeration. Clear the cache so that this change in types doesn't confuse us later on. */ - clear_cv_cache (); - clear_fold_cache (); + clear_cv_and_fold_caches (); } /* Finishes the enum type. This is called only the first time an diff --git gcc/cp/typeck2.c gcc/cp/typeck2.c index 419faa2..2a76c96 100644 --- gcc/cp/typeck2.c +++ gcc/cp/typeck2.c @@ -837,6 +837,9 @@ store_init_value (tree decl, tree init, vec** cleanups, int flags) /* Handle aggregate NSDMI in non-constant initializers, too. */ value = replace_placeholders (value, decl); + /* DECL may change value; purge caches. */ + clear_cv_and_fold_caches (); + /* If the initializer is not a constant, fill in DECL_INITIAL with the bits that are constant, and then return an expression that will perform the dynamic initialization. */ diff --git gcc/testsuite/g++.dg/init/const12.C gcc/testsuite/g++.dg/init/const12.C index e69de29..2f6f9b2 100644 --- gcc/testsuite/g++.dg/init/const12.C +++ gcc/testsuite/g++.dg/init/const12.C @@ -0,0 +1,20 @@ +// PR c++/69688 +// { dg-do compile } +// { dg-options "-Wsign-compare" } + +struct S +{ + static const int s; + static const char c[]; + static wchar_t w[]; + + S () +{ + for (int i = 0; i < s; i++) + w[i] = 0; +} +}; + +const char S::c[] = "x"; +const int S::s = sizeof (S::c) - 1; +wchar_t S::w[S::s]; Marek
Re: C++ PATCH for c++/69688 (bogus error with -Wsign-compare)
OK, thanks. Jason
Re: [PATCH] PR c++/69139
On 02/06/2016 05:25 PM, Adam Butcher wrote: + if (cp_lexer_next_token_is (parser->lexer, CPP_DEREF)) + have_trailing_return_fn_decl = true; + else if ((cp_lexer_consume_token (parser->lexer)->type + == CPP_OPEN_PAREN) + && (cp_parser_skip_to_closing_parenthesis + (parser, + /*recovering*/false, + /*or_comma*/false, + /*consume_paren*/true))) + have_trailing_return_fn_decl + = cp_lexer_next_token_is (parser->lexer, CPP_DEREF); Let's reorganize this so you only check CPP_DEREF in one place; this is more important now that you've added more processing to only one of them in your followup patch. Jason
Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA
On 02/08/2016 04:09 AM, Richard Biener wrote: With all of the above I'm not sure what to do for GCC 6 (even though you just approved the patch). Going with the patch alternative (just revert swapping parts of the commutative operands) looks like completely bogus though it works for fixing the regression as well. Yes, it is bogus. But I don't see other easy way to close P1 PR except making it P2. I still have plans to rewrite ira-costs.c (may be for GCC7). The algorithm is actually adaptation of old regclass.c one. There are few complaints about wrong costs calculations because of this (the most recent complaint was Peter Bergner's one about a power8 test). I don't like the algorithm, it ignores the fact that insn operands should be from the same alternative during pseudo (allocno) cost calculations. So this change probably will not survive after rewriting ira-costs.c.
Re: [PATCH] Fix libstdc++-v3/include/math.h:66:1 2: error: 'constexpr bool std::isnan(double)' conflicts with a previous declaration
On 06/02/16 15:31 +, Jonathan Wakely wrote: On 6 February 2016 at 12:51, Gerald Pfeifer wrote: On Fri, 5 Feb 2016, Jonathan Wakely wrote: Can anyone else test this on an older FreeBSD or other target that isn't gnu/aix/hpux? Thank you, Jonathan! I did not have such an older environment available, but now could install the infrastructure and get all prerequisites in place for FreeBSD 9.3/AMD64 testing. That allowed to both reproduce the original failure and confirm that it does not occur any more with your proposed patch. OK, thanks. I'll commit it on Monday (with a slight tweak). This enables the isinf/isnan checks everywhere and also adds #define _GLIBCXX_INCLUDE_NEXT_C_HEADERS to the configure test, to ensure that it finds the libc math.h and not an already installed libstdc++ wrapper (which would confuse the test). I also accidentally committed a change to add 'constexpr' to the configure test, which I reverted. Tested powerp64-linux, committed to trunk. commit b0a7399294012b65df3d155da0182e017d6e4214 Author: redi Date: Mon Feb 8 15:22:32 2016 + Enable isinf/isnan checks for all targets PR libstdc++/48891 * acinclude.m4 (GLIBCXX_CHECK_MATH11_PROTO): Enable isinf and isnan checks for all targets except *-*-solaris2.* and ensure we find the libc math.h header not our own. * configure: Regenerate. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233214 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index 057b58e..e667ccc 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -2215,7 +2215,7 @@ AC_DEFUN([GLIBCXX_CHECK_MATH11_PROTO], [ fi AC_MSG_RESULT([$glibcxx_cv_math11_overload]) ;; -*-*-*gnu* | *-*-aix* | *-*-hpux*) +*) # If defines the obsolete isinf(double) and isnan(double) # functions (instead of or as well as the C99 generic macros) then we # can't define std::isinf(double) and std::isnan(double) in @@ -2223,12 +2223,13 @@ AC_DEFUN([GLIBCXX_CHECK_MATH11_PROTO], [ AC_MSG_CHECKING([for obsolete isinf function in ]) AC_CACHE_VAL(glibcxx_cv_obsolete_isinf, [ AC_COMPILE_IFELSE([AC_LANG_SOURCE( -[#include +[#define _GLIBCXX_INCLUDE_NEXT_C_HEADERS + #include #undef isinf namespace std { using ::isinf; - bool isinf(float); - bool isinf(long double); + constexpr bool isinf(float); + constexpr bool isinf(long double); } using std::isinf; bool b = isinf(0.0); commit 6c1b4080c3cd651e3559bbbf155d1f09236c68ee Author: redi Date: Mon Feb 8 15:37:59 2016 + Remove accidentally added 'constexpr' in previous commit * acinclude.m4 (GLIBCXX_CHECK_MATH11_PROTO): Remove accidentally added 'constexpr' in previous commit. * configure: Regenerate. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233219 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index e667ccc..95df24a 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -2228,8 +2228,8 @@ AC_DEFUN([GLIBCXX_CHECK_MATH11_PROTO], [ #undef isinf namespace std { using ::isinf; - constexpr bool isinf(float); - constexpr bool isinf(long double); + bool isinf(float); + bool isinf(long double); } using std::isinf; bool b = isinf(0.0);
Re: Fix PR67639
On 02/08/2016 07:26 AM, Bernd Schmidt wrote: On 12/21/2015 08:39 PM, Jeff Law wrote: On 12/18/2015 11:38 AM, Bernd Schmidt wrote: In an earlier fix, the following change was made in varasm.c for invalid register variables: --- trunk/gcc/varasm.c2014/08/26 14:59:59214525 +++ trunk/gcc/varasm.c2014/08/26 17:06:31214526 @@ -1371,6 +1371,11 @@ make_decl_rtl (tree decl) /* As a register variable, it has no section. */ return; } + /* Avoid internal errors from invalid register + specifications. */ + SET_DECL_ASSEMBLER_NAME (decl, NULL_TREE); + DECL_HARD_REGISTER (decl) = 0; + return; } As seen in PR67639, this makes the IL inconsistent and triggers another internal error where we expect to see an SSA_NAME instead of a VAR_DECL. The following patch extends the above slightly, by also setting DECL_EXTERNAL to pretend that the erroneous variable is actually a global. Bootstrapped and tested on x86_64-linux, ok? OK. Turns out 65702 is a dup and this should go into gcc-5 as well. Ok to backport? Yes. Thanks, jeff
Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump
On 02/08/2016 06:07 AM, James Greenhalgh wrote: Hi, As far as I can tell, this testcase will only vectorize for x86_64/i?86 targets, so it should be gated to only check for vectorization on those. Additionally, this test wants to scan the vectorizer dumps, so we ought to add -fdump-tree-vect-all to the options. Checked on aarch64 (cross/native) and x86 with no issues. OK? Thanks, James --- 2016-02-08 James Greenhalgh * gcc.dg/vect/vect-mask-store-move-1.c: Add dump option, and gate check on x86_64/i?86. OK. Thanks, Jeff
Re: [PATCH] PR preprocessor/69664: fix rich_location::override_column
On 02/04/2016 03:51 PM, David Malcolm wrote: In gcc 5 and earlier, struct diagnostic_info had a field: unsigned int override_column; which could be set by the macro: diagnostic_override_column This was only used by the frontends' callbacks for handling errors from libcpp: c_cpp_error for the c-family, and cb_cpp_error for Fortran: if (column_override) diagnostic_override_column (&diagnostic, column_override); based on a column_override value passed by libcpp to the callback. I removed the field when introducing rich_location (in r229884), instead adding a rich_location::override_column method, called from libcpp immediately before calling the frontend's error-handling callback (the callback takes a rich_location *). I got the implementation of rich_location::override_column wrong (sorry), and PR preprocessor/69664 is a symptom of this. Specifically, I was only overriding the column within m_expanded_location, which affects some parts of diagnostic-show-locus.c, but I was not overriding the column within the various expanded_location instances within the rich_location's m_ranges[0]. Hence the wrong column information is printed for diagnostics emitted by libcpp where the column is overridden. This happens for any of the "_with_line" diagnostics calls in libcpp that are passed a non-zero column override. I believe these are all confined to libcpp/lex.c. The attached patch fixes this, by ensuring that all relevant columns are updated by rich_location::override_column, and also adding the missing conditional to only call it for non-zero column_override values (this time in libcpp before calling the error-handling callback, rather than from within the error-handling callbacks). It also adds expected column numbers to some pre-existing tests, giving us test coverage for this. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu; OK for trunk? gcc/testsuite/ChangeLog: PR preprocessor/69664 * gcc.dg/cpp/trad/comment-2.c: Add expected column number. * gcc.dg/cpp/warn-comments.c: Likewise. libcpp/ChangeLog: PR preprocessor/69664 * errors.c (cpp_diagnostic_with_line): Only call rich_location::override_column if the column is non-zero. * line-map.c (rich_location::override_column): Update columns within m_ranges[0]. Add assertions to verify that doing so is sane. OK Thanks, Jeff
Re: [RFC] Variants of __typeof
On 02/03/2016 11:01 PM, Richard Henderson wrote: While attempting to write some code that uses the new x86 named address space support in gcc 6, I found that __typeof is very unhelpful. In particular, given int __seg_fs *ptr; __typeof(*ptr) obj; OBJ will not be type "int", but "int __seg_fs". Which means that you can't use it to create temporaries within statement expressions. ISTM __typeof is doing the obvious thing here -- what's wrong is the __seg_fs is inherently "associated" with the pointer, right? In the process of writing this, I found a hack in __typeof added just to support _Atomic. Which suggests that one of these variants would be more generally helpful than the hack. Yea, seems that way. I add __typeof_noas and __typeof_noqual. The first strips only the address space, leaving 'const' and 'volatile' (and, I suppose 'restrict'). The second strips all qualifiers, essentially yielding the TYPE_MAIN_VARIANT. Something like this seems reasonable. jeff
Re: Combine simplify_set WORD_REGISTER_OPERATIONS
On 01/31/2016 03:16 PM, Alan Modra wrote: The comment says this test is supposed to prevent "a narrower operation than requested", but it actually only allows a larger subreg, not one the same size. Fix that. Bootstrapped and regression tested powerpc64-linux. OK for stage1? Note that this bug was found when investigating why gcc-6 does not suffer from pr69548, ie. this bug was masking a powerpc backend bug. * combine.c (simplify_set): Correct WORD_REGISTER_OPERATIONS test. Is there a strong need to apply this to gcc6? Can we construct a testcase where this makes a difference in the code we generate? My inclination would be to approve for gcc-7 as-is, but I'm more hesitant for gcc-6. jeff
Re: Fix c/69522, memory management issue in c-parser
On 01/29/2016 04:40 AM, Bernd Schmidt wrote: Let's say we have struct a { int x[1]; int y[1]; } x = { 0, { 0 } }; ^ When we reach the marked brace, we call into push_init_level, where we notice that we have implicit initializers (for x[]) lying around that we should deal with now that we've seen another open brace. The problem is that we've created a new obstack for the initializer of y, and this is where we also put data for the inits of x, freeing it when we see the close brace for the initialization of y. In the actual testcase, which is a little more complex to actually demonstrate the issue, we end up allocating two init elts at the same address (because of premature freeing) and place them in the same tree, which ends up containing a cycle because of this. Then we hang. Fixed by this patch, which splits off a new function finish_implicit_inits from push_init_level and ensures it's called with the outer obstack instead of the new one in the problematic case. Bootstrapped and tested on x86_64-linux, ok? Bernd braced-init.diff c/ PR c/69522 * c-parser.c (c_parser_braced_init): New arg outer_obstack. All callers changed. If nested_p is true, use it to call finish_implicit_inits. * c-tree.h (finish_implicit_inits): Declare. * c-typeck.c (finish_implicit_inits): New function. Move code from ... (push_init_level): ... here. (set_designator, process_init_element): Call finish_implicit_inits. testsuite/ PR c/69522 gcc.dg/pr69522.c: New test. OK. Thanks for tracking this down. Thanks, Jeff
Re: [PATCH] Avoid bugs like PR68273 to trigger
> No, but if there is none left why would you want to "fix" SRA? Because I'm afraid this over-aligned type might leak into other places so we would probably be better off not creating it in the first place, all the more so that it is probably useless in most cases. For PR tree-opt/65310, why couldn't the vectorizer use the alignment of the pointed-to type of the addend of the MEM_REF, like for the alias set: # .MEM_8 = VDEF <.MEM_1(D)> # lhs access alignment 128+0 MEM[(struct LorentzVector *)res_7(D)] = SR.22_44; # .MEM_53 = VDEF <.MEM_8> # lhs access alignment 32+0 MEM[(struct LorentzVector *)res_7(D) + 4B] = SR.23_43; # .MEM_54 = VDEF <.MEM_53> # lhs access alignment 64+0 MEM[(struct LorentzVector *)res_7(D) + 8B] = SR.24_42; # .MEM_55 = VDEF <.MEM_54> # lhs access alignment 32+0 MEM[(struct LorentzVector *)res_7(D) + 12B] = SR.25_41; -- Eric Botcazou
Re: [Patch] Update GCC Internals: remove section Preserving virtual SSA form
On 01/29/2016 04:24 AM, Nicklas Bo Jensen wrote: Hi, The section "12.3.2 Preserving the virtual SSA form" in GCC Internals is outdated. The two functions it documents push_stmt_changes and pop_stmt_changes have been removed. The functionality have been replaced with update_stmt. update_stmt is documented elsewhere in internals. I therefore propose to remove section 12.3.2. Furthermore, the function mark_stmt_modified have been replaced by gimple_set_modified. Please install this patch. Best, Nicklas 2016-01-29 Nicklas Bo Jensen * doc/tree-ssa.texi (Preserving the virtual SSA form): Remove outdated section Thanks. Installed. jeff
Re: [PATCH] PR c++/69139
On Sat, Feb 6, 2016 at 5:25 PM, Adam Butcher wrote: > PR c++/69139 > * cp/parser.c (cp_parser_simple_type_specifier): Don't mistake 'auto' > in trailing return function pointer types as an implicit template > parameter. > > PR c++/69139 > * g++.dg/cpp0x/trailing12.C: New test. > --- > gcc/cp/parser.c | 22 ++ > gcc/testsuite/g++.dg/cpp0x/trailing12.C | 6 ++ > 2 files changed, 24 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/cpp0x/trailing12.C > > diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c > index d03b0c9..c1a9674 100644 > --- a/gcc/cp/parser.c > +++ b/gcc/cp/parser.c > @@ -16029,8 +16029,11 @@ cp_parser_simple_type_specifier (cp_parser* parser, >maybe_warn_cpp0x (CPP0X_AUTO); >if (parser->auto_is_implicit_function_template_parm_p) > { > - /* The 'auto' might be the placeholder return type for a function > decl > -with trailing return type. */ > + /* The 'auto' might be the placeholder return type for a function > type > +with trailing return type. Look for a '->' after parameter list. > +Handle pointer-to-function, function reference and > +pointer-to-member-function by tentatively consuming two pairs of > +parens before testing for '->'. */ > bool have_trailing_return_fn_decl = false; > if (cp_lexer_peek_nth_token (parser->lexer, 2)->type > == CPP_OPEN_PAREN) > @@ -16042,8 +16045,19 @@ cp_parser_simple_type_specifier (cp_parser* parser, > /*recovering*/false, > /*or_comma*/false, > > /*consume_paren*/true)) > - have_trailing_return_fn_decl > - = cp_lexer_next_token_is (parser->lexer, CPP_DEREF); > + { > + if (cp_lexer_next_token_is (parser->lexer, CPP_DEREF)) > + have_trailing_return_fn_decl = true; > + else if ((cp_lexer_consume_token (parser->lexer)->type > + == CPP_OPEN_PAREN) > + && (cp_parser_skip_to_closing_parenthesis > + (parser, > + /*recovering*/false, > + /*or_comma*/false, > + /*consume_paren*/true))) > + have_trailing_return_fn_decl > + = cp_lexer_next_token_is (parser->lexer, CPP_DEREF); > + } > cp_parser_abort_tentative_parse (parser); > } > > diff --git a/gcc/testsuite/g++.dg/cpp0x/trailing12.C > b/gcc/testsuite/g++.dg/cpp0x/trailing12.C > new file mode 100644 > index 000..f3e02a8 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/cpp0x/trailing12.C > @@ -0,0 +1,6 @@ > +// PR c++/69139 > +// { dg-do compile { target c++11 } } > + > +auto get(int) -> int { return {}; } > +template int f(auto (*)(int) -> R) { return {}; } > +int i = f(get); > -- > 2.7.0 > Does the trailing-return get noticed if there's an __attribute__ specifier in between it and the auto? For example, void foo (auto __attribute__ ((unused)) f (int) -> int) { } BTW, last month I posted a patch for this PR that handles all kinds of specifiers as well __attribute__ specifiers. Patch is at: https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02004.html -- it makes the parser arbitrarily look ahead (while skipping over pairs of parens) until it finds a DEREF, a COMMA, a CLOSE_PAREN or an EQ. If it first finds a DEREF then have_trailing_return_fn_decl is set. Dunno if it's better to have this kind of "dumb" lookahead, or to be more explicit about one expects to consume like your followup patch does.
Re: [PATCH] Avoid bugs like PR68273 to trigger
On February 8, 2016 5:31:16 PM GMT+01:00, Eric Botcazou wrote: >> No, but if there is none left why would you want to "fix" SRA? > >Because I'm afraid this over-aligned type might leak into other places >so we >would probably be better off not creating it in the first place, all >the more >so that it is probably useless in most cases. > >For PR tree-opt/65310, why couldn't the vectorizer use the alignment of >the >pointed-to type of the addend of the MEM_REF, like for the alias set: Because that's not valid. Richard. > # .MEM_8 = VDEF <.MEM_1(D)> > # lhs access alignment 128+0 > MEM[(struct LorentzVector *)res_7(D)] = SR.22_44; > # .MEM_53 = VDEF <.MEM_8> > # lhs access alignment 32+0 > MEM[(struct LorentzVector *)res_7(D) + 4B] = SR.23_43; > # .MEM_54 = VDEF <.MEM_53> > # lhs access alignment 64+0 > MEM[(struct LorentzVector *)res_7(D) + 8B] = SR.24_42; > # .MEM_55 = VDEF <.MEM_54> > # lhs access alignment 32+0 > MEM[(struct LorentzVector *)res_7(D) + 12B] = SR.25_41;
Re: [PR66726] Fixe regression caused by Factor conversion out of COND_EXPR
On 01/18/2016 08:52 PM, Kugan wrote: Hi, This is an updated version of https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02196.html. Patch to fix PR66726 missed optimization, factor conversion out of COND_EXPR caused a regression for targets with branch cost greater than i.e., testcase gcc.dg/pr46309.c failed for these targets. I posted a patch for this which had some issues. Please find an updated version of this patch that now passes regression. This patch makes optimize_range_tests understand the factored out COND_EXPR. i.e., Updated the final_range_test_p to look for the new pattern. Changed the maybe_optimize_range_tests (which does the inter basic block range test optimization) accordingly. Bootstrapped and regression tested on x86_64-none-linux-gnu with no new regressions. And also regression tested on arm-none-linux-gnu and aarch64-none-linux-gnu with no new regressions. Is this Ok for trunk? Thanks, Kugan gcc/ChangeLog: 2016-01-19 Kugan Vivekanandarajah PR middle-end/66726 * tree-ssa-reassoc.c (optimize_range_tests): Handle tcc_compare stmt whose result is used in PHI. (maybe_optimize_range_tests): Likewise. (final_range_test_p): Lokweise. s/Lokeweise/Likewise/ in the ChangeLog entry. Otherwise this looks OK for the trunk. It really hasn't changed much since the version from July. And while the PR is not marked as such, this is a code quality regression fix for targets with a BRANCH_COST > 1. Thanks for your patience and not letting this get lost. Jeff
Re: [PATCH][P1 tree-optimization/68541] Add heuristics to path splitting
On 02/08/2016 07:18 AM, Kyrill Tkachov wrote: Hi Jeff, On 05/02/16 23:49, Jeff Law wrote: This patch addresses multiple issues with how we determine when to split paths. The primary motivation is addressing 68541 (P1). As I've gotten testcodes from Ajit, I've been able to look closely at the path splitting opportunities and consistently what I've seen is that contrary to the original belief, CSE/DCE opportunities are rarely exposed by path splitting. It seems the benefit is more due to removing the unconditional jump inherent in a CFG diamond -- even more so on the microblaze where these jumps have delay slots. While there are cases where splitting allows for CSE/DCE, they're the exception rather than the rule in the codes I've looked at. With that in mind, we want to encourage path splitting when the amount of code we're duplicating is small. We also want to balance that with not path splitting when the original code is something that may be if-convertable. The vast majority of if-convertable codes are cases where the two predecessors of the join block are single statement blocks with their results feeding the same PHI in the join block. We now reject that case for path splitting so as not to spoil if-conversion. The only wrinkle with that heuristic is that some codes (MIN, MAX, ABS, COND, calls, etc) are not good candidates for further if conversion. (ie, we have a MIN_EXPR in both predecessors that feed the PHI in the join). So we still allow those cases for splitting. This can be easily refined as we find more cases or as the if-converter is extended. So with that in place as the first filter, we just need to put a limiter on the number of statements we allow to be copied. I punted a bit and re-used the PARAM for jump threading. They've got enough in common that I felt it was reasonable. If I were to create a new PARAM, I'd probably start with a smaller default value after doing further instrumentation. While I was working through this I noticed we were path splitting in some cases where we shouldn't. Particularly if we had a CFG like: a / \ b c / \ / e d / \ / \ / \ latch exit Duplicating d for the edges b->d and c->d isn't as helpful because the duplicate can't be merged into b. We now detect this kind of case and reject it for path splitting. The new tests are a combination of reductions of codes from Ajit and my own poking around. Looking further out, I really wonder if the low level aspects of path splitting like we're trying to capture here really belong in the cross-jumping phase of the RTL optimizers. The higher level aspects (when we are able to expose CSE/DCE opportunities) should be driven by actually looking at the propagation/simplifications enabled by a potential split path. But those are gcc-7 things. Bootstrapped and regression tested on x86 linux. Installed on the trunk. I'll obviously keep an eye out for how the tests are handled on other platforms -- I don't think the tests are real sensitive to branch costs and such, but I've been surprised before. Onward to the jump threading code explosion wrap-up... jeff After this patch I also see: FAIL: gcc.dg/tree-ssa/split-path-1.c scan-tree-dump split-paths "Duplicating join block" on arm, but not on aarch64. My arm-none-eabi cross compiler is configured with: --with-float=hard --with-cpu=cortex-a9 --with-fpu=neon --with-mode=thumb Thanks. That configuration just has more statements in the join block and bumps up against the limit. I'll probably just manually increase the limit for the test. jeff
[committed] jit: fix build after r233218 (build_common_tree_nodes, -fshort-double)
Looks like r233218's change to build_common_tree_nodes missed the jit subdirectory: ../../src/gcc/jit/dummy-frontend.c: In function ‘bool jit_langhook_init()’: ../../src/gcc/jit/dummy-frontend.c:108:40: error: too many arguments to function ‘void build_common_tree_nodes(bool)’ build_common_tree_nodes (false, false); ^ In file included from ../../src/gcc/jit/jit-common.h:27:0, from ../../src/gcc/jit/jit-recording.h:24, from ../../src/gcc/jit/jit-playback.h:28, from ../../src/gcc/jit/dummy-frontend.c:23: ../../src/gcc/tree.h:4766:13: note: declared here extern void build_common_tree_nodes (bool); ^~~ Fixed in trunk in r233222 in the obvious way, as attached. DaveIndex: gcc/jit/ChangeLog === --- gcc/jit/ChangeLog (revision 233221) +++ gcc/jit/ChangeLog (revision 233222) @@ -1,3 +1,9 @@ +2016-02-08 David Malcolm + + * dummy-frontend.c (jit_langhook_init): Remove + second argument to build_common_tree_nodes to + track r233218. + 2016-01-23 Iain Buclaw * jit-playback.c: Include pthread.h. Index: gcc/jit/dummy-frontend.c === --- gcc/jit/dummy-frontend.c (revision 233221) +++ gcc/jit/dummy-frontend.c (revision 233222) @@ -105,7 +105,7 @@ registered_root_tab = true; } - build_common_tree_nodes (false, false); + build_common_tree_nodes (false); /* I don't know why this has to be done explicitly. */ void_list_node = build_tree_list (NULL_TREE, void_type_node);
Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA
Hi, On Mon, 8 Feb 2016, Richard Biener wrote: > 429.mcf 9120243 37.6 S9120245 37.3 S > 429.mcf 9120224 40.7 S9120241 37.8 * > 429.mcf 9120225 40.5 *9120229 39.9 S > 471.omnetpp 6250326 19.1 S6250328 19.1 S > 471.omnetpp 6250305 20.5 *6250324 19.3 * > 471.omnetpp 6250296 21.1 S6250305 20.5 S > 435.gromacs 7140316 22.6 S7140264 27.1 S > 435.gromacs 7140314 22.7 S7140263 27.2 S > 435.gromacs 7140316 22.6 *7140263 27.1 * > > So overall the patch is a loss for SPEC CPU 2006 INT due to the 429.mcf > and 471.omnetpp regressions and a win on SPEC FP. (I didn't test SPEC > INT previously, only FP) That's no regression if you look at the detailed results (like above). mcf and omnetpp were noisy on your machine, if you took the minimum of each run only the large progression of gromacs would remain. > The gcc.target/i386/addr-sel-1.c (for PR28940) seems to just started > working at some point past in time and thus it was added and the > bug closed. You could say RA does a better job after the patch > as it uses 1 less register but that restricts the followup > postreload combine attempts. Though I wonder about what's "better" > RA here - isn't the best allocation one that avoids spills but > uses as many registers as possible (at least when targeting a CPU > that cannot to register renaming)? Yeah, I think the testcase was added for the wrong reasons. It does exhibit a desirable outcome for sure, but the output became such by random changes, and hence also can go back by other changes. > So I have applied the patch now, I think the patch makes perfect sense. ira_setup_alts should have no observable behaviour from the outside, except the returned value of merged acceptable alternatives. Certainly it has no business to fiddle with recog_data. It only does the swapping to merge alternatives, and accidentaly left them swapped; the merging could have been implemented without swapping recog_data.operands, and then the whole issue wouldn't have occurred (and addr-sel-1.c wouldn't have been added because it still would be "broken"). Ciao, Michael.
[committed] Add testcase for already fixed PR ipa/69239
Hi! Honza has fixed this PR in r232356, I've added the testcase as obvious, so that we can close the PR. 2016-02-08 Jakub Jelinek PR ipa/69239 * g++.dg/ipa/pr69239.C: New test. --- gcc/testsuite/g++.dg/ipa/pr69239.C.jj 2016-02-08 18:14:08.807338362 +0100 +++ gcc/testsuite/g++.dg/ipa/pr69239.C 2016-02-08 18:15:30.748211371 +0100 @@ -0,0 +1,71 @@ +// PR ipa/69239 +// { dg-do run } +// { dg-options "-O2 --param=early-inlining-insns=196" } +// { dg-additional-options "-fPIC" { target fpic } } + +struct D +{ + float f; + D () {} + virtual float bar (float z); +}; + +struct A +{ + A (); + virtual int foo (int i); +}; + +struct B : public D, public A +{ + virtual int foo (int i); +}; + +float +D::bar (float) +{ + return f / 2; +} + +int +A::foo (int i) +{ + return i + 1; +} + +int +B::foo (int i) +{ + return i + 2; +} + +int __attribute__ ((noinline,noclone)) +baz () +{ + return 1; +} + +static int __attribute__ ((noinline)) +fn (A *obj, int i) +{ + return obj->foo (i); +} + +inline __attribute__ ((always_inline)) +A::A () +{ + if (fn (this, baz ()) != 2) +__builtin_abort (); +} + +static void +bah () +{ + B b; +} + +int +main () +{ + bah (); +} Jakub
Re: [RFC] Variants of __typeof
mailer decided to send html I guess... for some reason.. all of a sudden... stupid computer. anyway... the original bounced., so here it is again... On 02/08/2016 12:05 PM, Andrew MacLeod wrote: On 02/08/2016 11:21 AM, Jeff Law wrote: On 02/03/2016 11:01 PM, Richard Henderson wrote: In the process of writing this, I found a hack in __typeof added just to support _Atomic. Which suggests that one of these variants would be more generally helpful than the hack. Yea, seems that way. fyi, it originated here: https://gcc.gnu.org/ml/gcc-patches/2013-08/msg00420.html snippet near the bottom: From: "Joseph S. Myers" > On Fri, 2 Aug 2013, Andrew MacLeod wrote: > Ive included an additional 2 line patch which should change the meaning of > __typeof__ (again untested, the joys of imminently leaving for 2 weeks :-). > Im not sure the normal practical uses of > __typeof__ have much meaning for an atomic type, it seems far more useful to > have __typeof__ for an atomic qualified type to return the non-atomic variant. What typeof should do in general for qualified types is unclear (especially in the case of rvalues, where the movement in ISO C seems to be to say that rvalues can't have qualified types at all) - returning the non-atomic type seems reasonable to me. Andrew
Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA
On 02/08/2016 12:38 PM, Michael Matz wrote: I think the patch makes perfect sense. ira_setup_alts should have no observable behaviour from the outside, except the returned value of merged acceptable alternatives. Certainly it has no business to fiddle with recog_data. It only does the swapping to merge alternatives, and accidentaly left them swapped; the merging could have been implemented without swapping recog_data.operands, and then the whole issue wouldn't have occurred (and addr-sel-1.c wouldn't have been added because it still would be "broken"). Sorry, I was confused by Richard's message thinking that his patch actually exchanges the operands. I think we have some expression shaping optimizations and exchanging operands probably rejects the optimization effect. With this point of view ira-costs.c should not exchange operands. So the patch is not bogus as Richard wrote but perfectly legitimate.
Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA
On February 8, 2016 7:07:46 PM GMT+01:00, Vladimir Makarov wrote: >On 02/08/2016 12:38 PM, Michael Matz wrote: >> >> I think the patch makes perfect sense. ira_setup_alts should have no >> observable behaviour from the outside, except the returned value of >merged >> acceptable alternatives. Certainly it has no business to fiddle with >> recog_data. It only does the swapping to merge alternatives, and >> accidentaly left them swapped; the merging could have been >implemented >> without swapping recog_data.operands, and then the whole issue >wouldn't >> have occurred (and addr-sel-1.c wouldn't have been added because it >still >> would be "broken"). >> >Sorry, I was confused by Richard's message thinking that his patch >actually exchanges the operands. I think we have some expression >shaping optimizations and exchanging operands probably rejects the >optimization effect. With this point of view ira-costs.c should not >exchange operands. So the patch is not bogus as Richard wrote but >perfectly legitimate. Yes, I meant the posted alternative patch that just unswapped for the cases that Andreas patch changed whether we consider swapping or not. The applied patch indeed makes perfect sense. Richard.
[PATCH] Fix fnsplit ICE (PR tree-optimization/69209)
Hi! This fixes an ICE, where the split part doesn't return a value of a is_gimple_reg_type retval, only compares its address (therefore it is addressable), and the main part only uses the var in return_bb. In that case, retval is not gimple val, but needs to be returned as gimple val. So we need to load it into a SSA_NAME. I've tried to construct testcases for other cases where we might need something similar, but have not succeeded, so maybe it is the only spot that needs such handling. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-02-08 Jakub Jelinek PR tree-optimization/69209 * ipa-split.c (split_function): If split part is not returning retval, retval has gimple type but is not gimple value, force it into a SSA_NAME first. * gcc.c-torture/compile/pr69209.c: New test. --- gcc/ipa-split.c.jj 2016-02-03 23:36:29.0 +0100 +++ gcc/ipa-split.c 2016-02-08 15:56:41.701994621 +0100 @@ -1628,8 +1628,22 @@ split_function (basic_block return_bb, s gimple_call_set_lhs (call, build_simple_mem_ref (retval)); else gimple_call_set_lhs (call, retval); + gsi_insert_after (&gsi, call, GSI_NEW_STMT); + } + else + { + gsi_insert_after (&gsi, call, GSI_NEW_STMT); + if (retval + && is_gimple_reg_type (TREE_TYPE (retval)) + && !is_gimple_val (retval)) + { + gassign *g + = gimple_build_assign (make_ssa_name (TREE_TYPE (retval)), + retval); + retval = gimple_assign_lhs (g); + gsi_insert_after (&gsi, g, GSI_NEW_STMT); + } } - gsi_insert_after (&gsi, call, GSI_NEW_STMT); /* Build bndret call to obtain returned bounds. */ if (retbnd) chkp_insert_retbnd_call (retbnd, retval, &gsi); --- gcc/testsuite/gcc.c-torture/compile/pr69209.c.jj2016-02-08 16:13:32.527138280 +0100 +++ gcc/testsuite/gcc.c-torture/compile/pr69209.c 2016-02-08 16:13:19.0 +0100 @@ -0,0 +1,28 @@ +/* PR tree-optimization/69209 */ + +int a, c, *d, e; + +void foo (void) __attribute__ ((__noreturn__)); + +int +bar (void) +{ + int f; + if (a) +{ + if (e) + foo (); + foo (); +} + if (d != &f) +foo (); + if (!c) +foo (); + return f; +} + +void +baz () +{ + bar (); +} Jakub
Re: [PATCH] PR c++/69139
On 02/08/2016 11:43 AM, Patrick Palka wrote: BTW, last month I posted a patch for this PR that handles all kinds of specifiers as well __attribute__ specifiers. Patch is at: https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02004.html -- it makes the parser arbitrarily look ahead (while skipping over pairs of parens) until it finds a DEREF, a COMMA, a CLOSE_PAREN or an EQ. If it first finds a DEREF then have_trailing_return_fn_decl is set. Dunno if it's better to have this kind of "dumb" lookahead, or to be more explicit about one expects to consume like your followup patch does. Hmm, I think I prefer your approach, but please add the testcase with stuff between ) and ->. Jason
Re: [PATCH] Fix PR c++/69283 (auto deduction fails when ADL is required)
OK. Jason
Re: [PATCH] PR c++/69139
On Mon, 8 Feb 2016, Jason Merrill wrote: On 02/08/2016 11:43 AM, Patrick Palka wrote: BTW, last month I posted a patch for this PR that handles all kinds of specifiers as well __attribute__ specifiers. Patch is at: https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02004.html -- it makes the parser arbitrarily look ahead (while skipping over pairs of parens) until it finds a DEREF, a COMMA, a CLOSE_PAREN or an EQ. If it first finds a DEREF then have_trailing_return_fn_decl is set. Dunno if it's better to have this kind of "dumb" lookahead, or to be more explicit about one expects to consume like your followup patch does. Hmm, I think I prefer your approach, but please add the testcase with stuff between ) and ->. Done. Does this look OK? -- >8 -- gcc/cp/ChangeLog: PR c++/69139 * parser.c (cp_parser_simple_type_specifier): Make the check for disambiguating between an 'auto' placeholder and an implicit template parameter more robust. gcc/testsuite/ChangeLog: PR c++/69139 * g++.dg/cpp0x/trailing12.C: New test. * g++.dg/cpp0x/trailing13.C: New test. --- gcc/cp/parser.c | 33 +++-- gcc/testsuite/g++.dg/cpp0x/trailing12.C | 22 ++ gcc/testsuite/g++.dg/cpp0x/trailing13.C | 12 3 files changed, 57 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/trailing12.C create mode 100644 gcc/testsuite/g++.dg/cpp0x/trailing13.C diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index d03b0c9..56c834f 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -16032,20 +16032,33 @@ cp_parser_simple_type_specifier (cp_parser* parser, /* The 'auto' might be the placeholder return type for a function decl with trailing return type. */ bool have_trailing_return_fn_decl = false; - if (cp_lexer_peek_nth_token (parser->lexer, 2)->type - == CPP_OPEN_PAREN) + + cp_parser_parse_tentatively (parser); + cp_lexer_consume_token (parser->lexer); + while (cp_lexer_next_token_is_not (parser->lexer, CPP_EQ) +&& cp_lexer_next_token_is_not (parser->lexer, CPP_COMMA) +&& cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN) +&& cp_lexer_next_token_is_not (parser->lexer, CPP_EOF)) { - cp_parser_parse_tentatively (parser); - cp_lexer_consume_token (parser->lexer); - cp_lexer_consume_token (parser->lexer); - if (cp_parser_skip_to_closing_parenthesis (parser, + if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)) + { + cp_lexer_consume_token (parser->lexer); + cp_parser_skip_to_closing_parenthesis (parser, /*recovering*/false, /*or_comma*/false, -/*consume_paren*/true)) - have_trailing_return_fn_decl - = cp_lexer_next_token_is (parser->lexer, CPP_DEREF); - cp_parser_abort_tentative_parse (parser); +/*consume_paren*/true); + continue; + } + + if (cp_lexer_next_token_is (parser->lexer, CPP_DEREF)) + { + have_trailing_return_fn_decl = true; + break; + } + + cp_lexer_consume_token (parser->lexer); } + cp_parser_abort_tentative_parse (parser); if (have_trailing_return_fn_decl) { diff --git a/gcc/testsuite/g++.dg/cpp0x/trailing12.C b/gcc/testsuite/g++.dg/cpp0x/trailing12.C new file mode 100644 index 000..a27d988 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/trailing12.C @@ -0,0 +1,22 @@ +// PR c++/69139 +// { dg-do compile { target c++11 } } + +auto get(int) -> int { return {}; } +template int f(auto (*)(int) -> R) { return {}; } +int i = f(get); + +int foo1 (auto (int) -> char); + +int foo2 (auto f(int) -> char); + +int foo2 (auto (f)(int) -> char); + +int foo3 (auto (*f)(int) -> char); + +int foo4 (auto (*const **&f)(int) -> char); + +int foo5 (auto (*const **&f)(int, int *) -> char); + +int foo6 (auto (int) const -> char); // { dg-error "const" } + +void foo7 (auto __attribute__ ((unused)) f (int) -> int) { } diff --git a/gcc/testsuite/g++.dg/cpp0x/trailing13.C b/gcc/testsuite/g++.dg/cpp0x/trailing13.C new file mode 100644 index 000..2681bcd --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/trailing13.C @@ -0,0 +1,12 @@ +// PR c++/69139 +// { dg-do compile { target c++11 } } + +struct X { + auto get(int) const & noexcept -> int { return {}; } + auto get(int) && throw () -> long { return {}; } +}; + +template auto f(auto (X::*)(int) const & -> R) -> R {} + +using I = decltype(f(&X::get)); +using I = i
[patch, fortran] PR56007 Remarkably bad error message with DO array=1,2
Hi, the simple patch below rejects arrays as do loop index variable before another (confusing) error message is emitted. Two new testcases derived from the PR, plus adaption of one testcase that relies on the old error message. Whoever wants to take it... Harald 2016-02-08 Harald Anlauf PR fortran/56007 * match.c (gfc_match_iterator): Add diagnostic for array variable as do loop index. Index: gcc/fortran/match.c === --- gcc/fortran/match.c (revision 233203) +++ gcc/fortran/match.c (working copy) @@ -877,6 +877,12 @@ if (m != MATCH_YES) return MATCH_NO; + if (var->symtree->n.sym->attr.dimension) +{ + gfc_error ("Loop variable at %C cannot be an array"); + goto cleanup; +} + /* F2008, C617 & C565. */ if (var->symtree->n.sym->attr.codimension) { 2016-02-08 Harald Anlauf PR fortran/56007 * gfortran.dg/coarray_8.f90: Adjust error message. * gfortran.dg/pr56007.f90: New test. * gfortran.dg/pr56007.f: New test. Index: gcc/testsuite/gfortran.dg/coarray_8.f90 === --- gcc/testsuite/gfortran.dg/coarray_8.f90 (revision 233203) +++ gcc/testsuite/gfortran.dg/coarray_8.f90 (working copy) @@ -146,7 +146,7 @@ subroutine tfgh() integer :: i(2) DATA i/(i, i=1,2)/ ! { dg-error "Expected PARAMETER symbol" } - do i = 1, 5 ! { dg-error "cannot be a sub-component" } + do i = 1, 5 ! { dg-error "cannot be an array" } end do ! { dg-error "Expecting END SUBROUTINE" } end subroutine tfgh Index: gcc/testsuite/gfortran.dg/pr56007.f90 === --- gcc/testsuite/gfortran.dg/pr56007.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/pr56007.f90 (revision 0) @@ -0,0 +1,11 @@ +! { dg-do compile } +! PR fortran/56007 +! Based on testcase by Tobias Schlüter + + integer iw1(90), doiw1(90) + do iw1=1,2 ! { dg-error "cannot be an array" } + end do ! { dg-error "Expecting END PROGRAM statement" } + do iw1(1)=1! { dg-error "Unclassifiable statement" } + do iw1=1 ! { dg-error "cannot be an array" } + end do ! { dg-error "Expecting END PROGRAM statement" } +END program Index: gcc/testsuite/gfortran.dg/pr56007.f === --- gcc/testsuite/gfortran.dg/pr56007.f (revision 0) +++ gcc/testsuite/gfortran.dg/pr56007.f (revision 0) @@ -0,0 +1,10 @@ +! { dg-do compile } +! PR fortran/56007 +! Based on testcase by Tobias Schlüter + + integer iw1(90), doiw1(90) + do iw1(1)=1 + do iw1=1 + do iw1=1,2! { dg-error "cannot be an array" } + end do! { dg-error "Expecting END PROGRAM statement" } + END
Re: [PATCH] Fix fnsplit ICE (PR tree-optimization/69209)
On February 8, 2016 7:27:49 PM GMT+01:00, Jakub Jelinek wrote: >Hi! > >This fixes an ICE, where the split part doesn't return a value >of a is_gimple_reg_type retval, only compares its address (therefore it >is >addressable), and the main part only uses the var in return_bb. >In that case, retval is not gimple val, but needs to be returned >as gimple val. So we need to load it into a SSA_NAME. >I've tried to construct testcases for other cases where we might need >something similar, but have not succeeded, so maybe it is the only spot >that needs such handling. > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Thanks, Richard. >2016-02-08 Jakub Jelinek > > PR tree-optimization/69209 > * ipa-split.c (split_function): If split part is not > returning retval, retval has gimple type but is not > gimple value, force it into a SSA_NAME first. > > * gcc.c-torture/compile/pr69209.c: New test. > >--- gcc/ipa-split.c.jj 2016-02-03 23:36:29.0 +0100 >+++ gcc/ipa-split.c2016-02-08 15:56:41.701994621 +0100 >@@ -1628,8 +1628,22 @@ split_function (basic_block return_bb, s > gimple_call_set_lhs (call, build_simple_mem_ref (retval)); > else > gimple_call_set_lhs (call, retval); >+gsi_insert_after (&gsi, call, GSI_NEW_STMT); >+ } >+else >+ { >+gsi_insert_after (&gsi, call, GSI_NEW_STMT); >+if (retval >+&& is_gimple_reg_type (TREE_TYPE (retval)) >+&& !is_gimple_val (retval)) >+ { >+gassign *g >+ = gimple_build_assign (make_ssa_name (TREE_TYPE (retval)), >+ retval); >+retval = gimple_assign_lhs (g); >+gsi_insert_after (&gsi, g, GSI_NEW_STMT); >+ } > } >- gsi_insert_after (&gsi, call, GSI_NEW_STMT); > /* Build bndret call to obtain returned bounds. */ > if (retbnd) > chkp_insert_retbnd_call (retbnd, retval, &gsi); >--- gcc/testsuite/gcc.c-torture/compile/pr69209.c.jj 2016-02-08 >16:13:32.527138280 +0100 >+++ gcc/testsuite/gcc.c-torture/compile/pr69209.c 2016-02-08 >16:13:19.0 +0100 >@@ -0,0 +1,28 @@ >+/* PR tree-optimization/69209 */ >+ >+int a, c, *d, e; >+ >+void foo (void) __attribute__ ((__noreturn__)); >+ >+int >+bar (void) >+{ >+ int f; >+ if (a) >+{ >+ if (e) >+ foo (); >+ foo (); >+} >+ if (d != &f) >+foo (); >+ if (!c) >+foo (); >+ return f; >+} >+ >+void >+baz () >+{ >+ bar (); >+} > > Jakub
Re: [PATCH][P1 tree-optimization/68541] Add heuristics to path splitting
On 02/08/2016 10:02 AM, Jeff Law wrote: After this patch I also see: FAIL: gcc.dg/tree-ssa/split-path-1.c scan-tree-dump split-paths "Duplicating join block" on arm, but not on aarch64. My arm-none-eabi cross compiler is configured with: --with-float=hard --with-cpu=cortex-a9 --with-fpu=neon --with-mode=thumb Thanks. That configuration just has more statements in the join block and bumps up against the limit. I'll probably just manually increase the limit for the test. So just for the record, for the configuration above, we generate more statements in the join block than for the x86_64 and aarch64 targets. This appears to be due to differences in addressing modes and IVopts selections. Prior to my change for 68541, there was no limit on the number of statements we were allowed to copy, so it "just worked", even with the additional statements. After the change to 68541 we hit the newly added upper limit for some configurations. For the purposes of the test I have bumped the upper limit on how many statements it will copy from the join block and verified by hand the path is split as expected for the configuration noted above and verified that the x86-64 testresults are unaffected. Thanks, Jeff commit 63b766bdef44518d33a21569a089fca2356d6ffe Author: Jeff Law Date: Mon Feb 8 12:52:21 2016 -0700 PR tree-optimization/68541 * gcc.dg/tree-ssa/split-path-1.c: Increase limit for number of statements allowed in join block for path splitting. diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 64f512d..9f629d6 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2016-02-08 Jeff Law + + PR tree-optimization/68541 + * gcc.dg/tree-ssa/split-path-1.c: Increase limit for number + of statements allowed in join block for path splitting. + 2016-02-08 Jakub Jelinek PR c++/59627 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-1.c b/gcc/testsuite/gcc.dg/tree-ssa/split-path-1.c index b24f6a9..8b23ef4 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/split-path-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-1.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details " } */ +/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details --param max-jump-thread-duplication-stmts=20" } */ #include #include
Re: [PATCH 1/4] Make SRA scalarize constant-pool loads
On 01/27/2016 05:14 PM, David Edelsohn wrote: On Wed, Jan 27, 2016 at 6:36 PM, Jeff Law wrote: On 01/27/2016 12:39 PM, David Edelsohn wrote: The new sra-17.c and sra-18.c tests fail on AIX because the regex is too restrictive -- AIX labels don't have exactly the same format. On AIX, the labels in the dumps look like "LC..0" instead of ".LC0". This patch adds "*" and ".*" so that the "." prepended to LC is optional and to allow characters between the "LC" and the "0". I needed extra escapes for the sra-17.c line that matches multiple times - for no apparent reason. The joys of expect/tcl. I just keep escaping until the regex that I developed outside the suite works. I have been trying to get away from using .* though. The longest match nature sometimes gives surprising results. In theory .*? ought to work better, but I haven't tried using it much. Anyway, the change looks fine to me. Segher pointed out to me that my revised regex was matching multiple lines, so it was not triggering multiple times without the restriction on the pattern. A revised, tighter patch uses "?" Yup, that's precisely why I've moved away from .* :-) jeff
Re: AW: Wonly-top-basic-asm
On 2/7/2016 10:45 PM, Bernd Edlinger wrote: On 8. 2. 2016 04:45, David Wohlferd wrote: I replied with a patch that includes most of the changes you asked for (see inline below). Were you waiting on me for something more? ChangeLog entries are still missing. I'll add them back in the next post. David, there is a tool that you can use to check the patch for some style-nits before submission. I was not aware of this tool. I'll fix these before the next post. At one point you proposed renaming the option -Wbasic-asm. This seemed a little terse (and possibly misleading) so I counter-proposed -Wbasic-asm-in-function (a little verbose, but clearer). I have no strong preferences here, and you haven't said one way or the other. Are we just going to stick with -Wonly-top-basic-asm? Hopefully one more try and this will be done. Thanks, dw
Re: [PATCH] PR c++/69139
On 2016-02-08 19:14, Patrick Palka wrote: On Mon, 8 Feb 2016, Jason Merrill wrote: On 02/08/2016 11:43 AM, Patrick Palka wrote: BTW, last month I posted a patch for this PR that handles all kinds of specifiers as well __attribute__ specifiers. Patch is at: https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02004.html -- it makes the parser arbitrarily look ahead (while skipping over pairs of parens) until it finds a DEREF, a COMMA, a CLOSE_PAREN or an EQ. If it first finds a DEREF then have_trailing_return_fn_decl is set. Dunno if it's better to have this kind of "dumb" lookahead, or to be more explicit about one expects to consume like your followup patch does. Hmm, I think I prefer your approach [snip] Me too. I was worried that the cases handled in the explicit solution might get longer, more complex and repeat large amounts of other parser code. Providing this approach gives us no false positives I would say it's superior.
Re: [PATCH, PR69707] Handle -fdiagnostics-color in lto
On 08/02/16 14:54, Jakub Jelinek wrote: On Mon, Feb 08, 2016 at 02:38:17PM +0100, Tom de Vries wrote: hmm, indeed removing the 'Driver' flag from the fdiagnostics-color= entry in common.opt breaks the functioning of fdiagnostics-color= in the gcc driver. This patch leaves the 'Driver' flag alone, and instead explicitly allows fdiagnostics-color= in lto_write_options. Is this approach ok? Doesn't that mean storing -fdiagnostics-color= into the LTO option section and then using whatever was recorded from the first TU that recorded anything? Yes. That doesn't look right for such diagnostics options either. I mean, if I $ gcc -fdiagnostics-color=always -c -flto a.c on another terminal $ gcc -fdiagnostics-color=never -c -flto b.c on yet another terminal $ gcc -flto -o a a.o b.o then I'd expect the default setting for -fdiagnostics-color= for all diagnostics while linking/LTO optimizing it, and for say $ gcc -fdiagnostics-color=always -flto -o a a.o b.o to have it always colorized, similarly for never. IMHO we should just honor what has been specified on the linker command line if anything. Agreed. So, the question is, is -fdiagnostics-color=never passed in the testsuite just to the compilation and not to linking (that would be IMHO a testsuite bug), In libgomp.exp we have: ... # Disable color diagnostics lappend ALWAYS_CFLAGS "additional_flags=-fdiagnostics-color=never" ... The test setup looks ok to me. or is it passed on the linking command line, but not passed through to lto1? Yes. I've now realized this is specific to mkoffload. If we're doing lto, in lto-wrapper.c:run_gcc, we have: ... append_compiler_options (&argv_obstack, fdecoded_options, fdecoded_options_count); append_linker_options (&argv_obstack, decoded_options, decoded_options_count); ... And the last line will propagate the fdiagnostics-color setting. But for calling mkoffload, we just have this: ... /* Append options from offload_lto sections. */ append_compiler_options (&argv_obstack, compiler_opts, compiler_opt_count); ... Followed by this, which just filters the -foffload options: ... /* Append options specified by -foffload last. In case of conflicting options we expect offload compiler to choose the latest. */ append_offload_options (&argv_obstack, target, compiler_opts, compiler_opt_count); append_offload_options (&argv_obstack, target, linker_opts, linker_opt_count); ... Attached patch adds the diagnostics flags to mkoffload. Bootstrapped and reg-tested on x86_64. OK for trunk, stage1? Thanks, - Tom Handle -fdiagnostics-color in lto 2016-02-08 Tom de Vries PR lto/69707 * lto-wrapper.c (append_diag_options): New function. (compile_offload_image): Call append_diag_options. * testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c: New test. --- gcc/lto-wrapper.c | 31 ++ .../libgomp.oacc-c-c++-common/parallel-dims-2.c| 19 + 2 files changed, 50 insertions(+) diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index ced6f2f..8cda1fa 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -542,6 +542,36 @@ append_compiler_options (obstack *argv_obstack, struct cl_decoded_option *opts, } } +/* Append diag options in OPTS with length COUNT to ARGV_OBSTACK. */ + +static void +append_diag_options (obstack *argv_obstack, struct cl_decoded_option *opts, + unsigned int count) +{ + /* Append compiler driver arguments as far as they were merged. */ + for (unsigned int j = 1; j < count; ++j) +{ + struct cl_decoded_option *option = &opts[j]; + + switch (option->opt_index) + { + case OPT_fdiagnostics_color_: + case OPT_fdiagnostics_show_caret: + case OPT_fdiagnostics_show_option: + case OPT_fdiagnostics_show_location_: + case OPT_fshow_column: + break; + default: + continue; + } + + /* Pass the option on. */ + for (unsigned int i = 0; i < option->canonical_option_num_elements; ++i) + obstack_ptr_grow (argv_obstack, option->canonical_option[i]); +} +} + + /* Append linker options OPTS to ARGV_OBSTACK. */ static void @@ -724,6 +754,7 @@ compile_offload_image (const char *target, const char *compiler_path, /* Append options from offload_lto sections. */ append_compiler_options (&argv_obstack, compiler_opts, compiler_opt_count); + append_diag_options (&argv_obstack, linker_opts, linker_opt_count); /* Append options specified by -foffload last. In case of conflicting options we expect offload compiler to choose the latest. */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c new file mode 100644 index 000..eea8c7e --- /dev/null +++ b/libgomp/testsuite/libgo
Re: [PATCH, gcc7, aarch64] Add arithmetic overflow patterns
On 02/07/2016 10:58 AM, Michael Collison wrote: Richard, One other question on the patch. I note that when you expand the addv and uaddv patterns you emit rtl using gen_add3_compareV and en_add3_compareC respectively. These patterns use sign_extend and zero_extend respectively. Why do you not do the same thing for the subv and usubv patterns? The subv patterns expand into calls to gen_sub3_compare1 which does not emit sign or zero extends. Why the difference? The canonical "compare" is a subtraction. Thus if we want the flag bits off of a subtraction, we only need issue a compare. We only have to do something special if the flag bits aren't the same as a subtract. E.g. for addition, where the carry and overflow flags are (logically) reversed from that of subtraction. r~
Re: [PATCH, PR69599] Fix GOMP/GOACC_parallel optimization in ipa-pta
On 08/02/16 11:54, Richard Biener wrote: On Mon, 8 Feb 2016, Tom de Vries wrote: Hi, when compiling the fipa-pta tests in the libgomp testsuite (omp-nested-2.c, pr46032.c) with -flto -flto-partition=max, the tests fail in execution (PR69599). The problem is related to the GOMP/GOACC_parallel optimization we do in fipa-pta, where we interpret a call GOMP_parallel (&foo._0, data) as a call foo._0 (data). The problem is that this optimization is only legal in lto if both: - foo containing the call GOMP_parallel (&foo._0, data) and - foo._0 are contained in the same partition. In the case of -flto-partition=max, foo is contained in it's own partition, and foo._0 is contained in another partition. This means the data argument to the GOMP_parallel call appears unused, and the setting of the argument is optimized away, which causes the execution failure. This patch fixes that by testing if foo and foo._0 are part of the same partition. [ Note that the node_address_taken change in the patch has no effect, since nonlocal_p already tests for used_from_other_partition. But I thought it was clearer to state the conditions under which we are allowed to ignore node->address_taken explicitly. ] Bootstrapped and reg-tested on x86_64. Build for nvidia accelerator and reg-tested libgomp with various lto settings. OK for trunk, stage4? I don't like the in_lto_p checks, why's the check not working for non-LTO? I was not sure if the partition flags were valid outside lto. Updated patch removes the in_lto_p checks. Bootstrapped on x86_64. Build and reg-tested libgomp testsuite. OK? Thanks, - Tom Thanks, - Tom Fix GOMP/GOACC_parallel optimization in ipa-pta 2016-02-08 Tom de Vries PR tree-optimization/69599 * tree-ssa-structalias.c (fndecl_maybe_in_other_partition): New function. (find_func_aliases_for_builtin_call, find_func_clobbers) (ipa_pta_execute): Handle case that foo and foo._0 are not in same lto partition. * testsuite/libgomp.c/omp-nested-3.c: New test. * testsuite/libgomp.c/pr46032-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/kernels-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/parallel-2.c: New test. --- gcc/tree-ssa-structalias.c | 49 ++ libgomp/testsuite/libgomp.c/omp-nested-3.c | 4 ++ libgomp/testsuite/libgomp.c/pr46032-2.c| 4 ++ .../libgomp.oacc-c-c++-common/kernels-2.c | 4 ++ .../libgomp.oacc-c-c++-common/parallel-2.c | 4 ++ 5 files changed, 56 insertions(+), 9 deletions(-) diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c index e7d0797..d7a7dc5 100644 --- a/gcc/tree-ssa-structalias.c +++ b/gcc/tree-ssa-structalias.c @@ -4162,6 +4162,18 @@ find_func_aliases_for_call_arg (varinfo_t fi, unsigned index, tree arg) process_constraint (new_constraint (lhs, *rhsp)); } +/* Return true if FNDECL may be part of another lto partition. */ + +static bool +fndecl_maybe_in_other_partition (tree fndecl) +{ + cgraph_node *fn_node = cgraph_node::get (fndecl); + if (fn_node == NULL) +return true; + + return fn_node->in_other_partition; +} + /* Create constraints for the builtin call T. Return true if the call was handled, otherwise false. */ @@ -4537,6 +4549,10 @@ find_func_aliases_for_builtin_call (struct function *fn, gcall *t) tree fnarg = gimple_call_arg (t, fnpos); gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR); tree fndecl = TREE_OPERAND (fnarg, 0); + if (fndecl_maybe_in_other_partition (fndecl)) + /* Fallthru to general call handling. */ + break; + tree arg = gimple_call_arg (t, argpos); varinfo_t fi = get_vi_for_tree (fndecl); @@ -5113,6 +5129,10 @@ find_func_clobbers (struct function *fn, gimple *origt) tree fnarg = gimple_call_arg (t, fnpos); gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR); tree fndecl = TREE_OPERAND (fnarg, 0); + if (fndecl_maybe_in_other_partition (fndecl)) + /* Fallthru to general call handling. */ + break; + varinfo_t cfi = get_vi_for_tree (fndecl); tree arg = gimple_call_arg (t, argpos); @@ -7505,9 +7525,13 @@ ipa_pta_execute (void) address_taken bit for function foo._0, which would make it non-local. But for the purpose of ipa-pta, we can regard the run_on_threads call as a local call foo._0 (data), so we ignore address_taken on nodes - with parallelized_function set. */ - bool node_address_taken = (node->address_taken - && !node->parallelized_function); + with parallelized_function set. + Note: this is only safe, if foo and foo._0 are in the same lto + partition. */ + bool node_address_taken = ((node->parallelized_function + && !node->used_from_other_partition) + ? false + : node->address_taken); /* For externally visible or attribute used annotated functions use local constraints for their arguments. @@ -7676,12 +7700,19 @@ ipa_pta_execute (void) continu
Re: [PATCH] Make C++ use the enum mode attribute
On Feb 6, 2016, at 2:37 PM, Trevor Saunders wrote: > it is allowed if you do something like > > enum X : int; > > but it seems really pointless to support setting the size of something > when the language gives you a standard way to do that. Yes, it is, as long as you never use header files, and multiple languages. :-) Unfortunately, those are vended features of C and C++. Once you have a C project that you want to be able to be used from C++, it then is handy to be able to do this.
Re: [PATCH, PR69707] Handle -fdiagnostics-color in lto
On Mon, Feb 08, 2016 at 10:04:48PM +0100, Tom de Vries wrote: > Attached patch adds the diagnostics flags to mkoffload. > > Bootstrapped and reg-tested on x86_64. > > OK for trunk, stage1? Ok for stage4. > Handle -fdiagnostics-color in lto > > 2016-02-08 Tom de Vries > > PR lto/69707 > * lto-wrapper.c (append_diag_options): New function. > (compile_offload_image): Call append_diag_options. > > * testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c: New test. Jakub
C++ PATCH for c++/69657 (abs not inlined)
The issue in this bug was that due to changes in the libstdc++ headers, the built-in abs declaration was getting hidden by a using-declaration, so that then when the built-in got an explicit declaration, the original declaration wasn't there anymore and so the new declaration didn't get marked as built-in. Fixed by overloading an anticipated built-in rather than clobbering it when we see a using-declaration of the same name. Tested x86_64-pc-linux-gnu, applying to trunk. commit 8f47c80051a73a75fb44900d9edb1444142c3d5f Author: Jason Merrill Date: Mon Feb 8 16:39:42 2016 -0500 PR c++/69657 * name-lookup.c (do_nonmember_using_decl): Leave anticipated built-ins alone. diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c index 92d99aa..227d6f2 100644 --- a/gcc/cp/name-lookup.c +++ b/gcc/cp/name-lookup.c @@ -2583,15 +2583,6 @@ do_nonmember_using_decl (tree scope, tree name, tree oldval, tree oldtype, decls.value = NULL_TREE; } - /* It is impossible to overload a built-in function; any explicit - declaration eliminates the built-in declaration. So, if OLDVAL - is a built-in, then we can just pretend it isn't there. */ - if (oldval - && TREE_CODE (oldval) == FUNCTION_DECL - && DECL_ANTICIPATED (oldval) - && !DECL_HIDDEN_FRIEND_P (oldval)) -oldval = NULL_TREE; - if (decls.value) { /* Check for using functions. */ @@ -2610,6 +2601,10 @@ do_nonmember_using_decl (tree scope, tree name, tree oldval, tree oldtype, { tree new_fn = OVL_CURRENT (tmp); + /* Don't import functions that haven't been declared. */ + if (DECL_ANTICIPATED (new_fn)) + continue; + /* [namespace.udecl] If a function declaration in namespace scope or block @@ -2627,13 +2622,13 @@ do_nonmember_using_decl (tree scope, tree name, tree oldval, tree oldtype, continue; /* this is a using decl */ else if (compparms_for_decl_and_using_decl (new_fn, old_fn)) { - gcc_assert (!DECL_ANTICIPATED (old_fn) - || DECL_HIDDEN_FRIEND_P (old_fn)); - /* There was already a non-using declaration in this scope with the same parameter types. If both are the same extern "C" functions, that's ok. */ - if (decls_match (new_fn, old_fn)) + if (DECL_ANTICIPATED (old_fn) + && !DECL_HIDDEN_FRIEND_P (old_fn)) + /* Ignore anticipated built-ins. */; + else if (decls_match (new_fn, old_fn)) break; else { @@ -2669,6 +2664,14 @@ do_nonmember_using_decl (tree scope, tree name, tree oldval, tree oldtype, } else { + /* If we're declaring a non-function and OLDVAL is an anticipated + built-in, just pretend it isn't there. */ + if (oldval + && TREE_CODE (oldval) == FUNCTION_DECL + && DECL_ANTICIPATED (oldval) + && !DECL_HIDDEN_FRIEND_P (oldval)) + oldval = NULL_TREE; + *newval = decls.value; if (oldval && !decls_match (*newval, oldval)) error ("%qD is already declared in this scope", name); diff --git a/gcc/testsuite/g++.dg/lookup/builtin6.C b/gcc/testsuite/g++.dg/lookup/builtin6.C new file mode 100644 index 000..456ade7 --- /dev/null +++ b/gcc/testsuite/g++.dg/lookup/builtin6.C @@ -0,0 +1,23 @@ +// PR c++/69657 +// { dg-options -fdump-tree-gimple } +// { dg-final { scan-tree-dump "ABS_EXPR" "gimple" } } + +namespace foo +{ + inline double + abs(double x) + { return __builtin_fabs(x); } +} +using foo::abs; + +extern "C" int abs(int); + +namespace bar { + using ::abs; +} + +int +wrapper (int x) +{ + return bar::abs (x) + bar::abs(x); +}
Re: [PATCH] PR c++/69139
OK. Jason