[PATCH][PR tree-optimization/65917] Record both equivalences from if (x == y) style conditionals

2016-02-08 Thread Jeff Law


This turns out to be far easier than expected.  Given a conditional like 
x == y, we already record the canonicalized x = y equivalence.  If we 
just record y = x then this "just works".


The only tricky thing is the following of the SSA_NAME_VALUE chain for 
the source argument -- we need to avoid that when recording the y = x 
variant (since following x's value chain would lead back to y).  That's 
easily resolved with an _raw variant which doesn't follow the source 
value chain.


While working through the code, I saw the old comment WRT loop depth and 
PR 61757 in record_equality.  With the code to follow backedges in the 
CFG gone from the old threader, that code is no longer needed.  So I 
removed it, restoring Richi's cleanup work from early 2015.


Bootstrapped & regression tested on x86-linux.  Installed on the trunk.

Jeff
commit 122fa7e277f132dcfef5d66378f7ef3411ae8b84
Author: law 
Date:   Mon Feb 8 08:17:32 2016 +

PR tree-optimization/65917
* tree-ssa-dom.c (record_temporary_equivalences): Record both
equivalences from if (x == y) style conditionals.
(loop_depth_of_name): Remove.
(record_equality): Remove loop depth check.
* tree-ssa-scopedtables.h (const_and_copies): Refine comments.
(const_and_copies::record_const_or_copy_raw): New member function.
* tree-ssa-scopedtables.c
(const_and_copies::record_const_or_copy_raw): New, factored out of
(const_and_copies::record_const_or_copy): Call new member function.

PR tree-optimization/65917
* gcc.dg/tree-ssa/20030922-2.c: No longer xfailed.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233207 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a465156..2c4a8b6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,17 @@
+2016-02-08  Jeff Law  
+
+   PR tree-optimization/65917
+   * tree-ssa-dom.c (record_temporary_equivalences): Record both
+   equivalences from if (x == y) style conditionals.
+   (loop_depth_of_name): Remove.
+   (record_equality): Remove loop depth check.
+   * tree-ssa-scopedtables.h (const_and_copies): Refine comments.
+   (const_and_copies::record_const_or_copy_raw): New member function.
+   * tree-ssa-scopedtables.c
+   (const_and_copies::record_const_or_copy_raw): New, factored out of
+   (const_and_copies::record_const_or_copy): Call new member function.
+   
+
 2016-02-05  Jeff Law  
 
PR tree-optimization/68541
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 8d3438e..6940136 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2016-02-08  Jeff Law  
+
+PR tree-optimization/65917
+   * gcc.dg/tree-ssa/20030922-2.c: No longer xfailed.
+
 2016-02-07  Jerry DeLisle  
 
PR fortran/50555
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
index 6c133bd..16c79da 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
@@ -20,8 +20,4 @@ rgn_rank (rtx insn1, rtx insn2)
 }
 
 /* There should be two IF conditionals.  */
-/* This now fails as it requires a very specific decision of DOM which
-   SSA name to record as a copy of the other when DOM derives copies
-   from temporary equivalences.  The heuristics there no longer do
-   the correct thing.  VRP still optimizes this testcase.  */
-/* { dg-final { scan-tree-dump-times "if " 2 "dom2" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "if " 2 "dom2" } } */
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index b690d92..f44ac13 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -917,6 +917,15 @@ record_temporary_equivalences (edge e,
   tree rhs = edge_info->rhs;
   record_equality (lhs, rhs, const_and_copies);
 
+  /* We already recorded that LHS = RHS, with canonicalization,
+value chain following, etc.
+
+We also want to return RHS = LHS, but without any canonicalization
+or value chain following.  */
+  if (TREE_CODE (rhs) == SSA_NAME)
+   const_and_copies->record_const_or_copy_raw (rhs, lhs,
+   SSA_NAME_VALUE (rhs));
+
   /* If LHS is an SSA_NAME and RHS is a constant integer and LHS was
 set via a widening type conversion, then we may be able to record
 additional equivalences.  */
@@ -1161,33 +1170,6 @@ record_cond (cond_equivalence *p,
 delete element;
 }
 
-/* Return the loop depth of the basic block of the defining statement of X.
-   This number should not be treated as absolutely correct because the loop
-   information may not be completely up-to-date when dom runs.  However, it
-   will be relatively correct, and as more passes are taught to keep loop info
-   up to date, the result will become more and more accurate.  */
-
-static int
-loop_depth_of_n

Re: [PR69634] fix debug_insn-inconsistent REG_N_CALLS_CROSSED

2016-02-08 Thread Jeff Law

On 02/06/2016 03:06 AM, Alexandre Oliva wrote:

The testcase has a debug insn referencing a pseudo right before an
insn that modifies the pseudo.

Without debug insns, REG_N_CALLS_CROSSED was zero for that pseudo, so
sched_analyze_reg added a dep between the pseudo setter and an earlier
(lib)call.

With debug insns, we miscomputed REG_N_CALLS_CROSSED as nonzero
because of the debug insn, and then no dep was added between the two
insns.  This was enough to change sched1's decisions about where to
place the pseudo setter.

REG_N_CALLS_CROSSED is computed by both regstat_bb_compute_ri and
regstat_bb_compute_calls_crossed, but although the former skipped
debug insns, the latter didn't.

Fixing this inconsistency was enough to fix the -fcompare-debug error.

Regstrapped on x86_64-linux-gnu and i686-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR target/69634
* regstat.c (regstat_bb_compute_calls_crossed): Disregard
debug insns.

for  gcc/testsuite/ChangeLog

PR target/69634
* gcc.dg/pr69634.c: New.

Code is fine.  Testcase needs tweaking per Uros's comments in the BZ:

  Please don't use explicit -m32. If the test is valid only for 32bit
  target, you should use "target ia32", otherwise just leave it out.
  32bit multilib tests will add -m32 automatically.

  Explicit -m32 will probably fail with x32, which is x86_64 target,
  too.

With the test fixed this is fine.

jeff


Re: [PATCH][P1 tree-optimization/68541] Add heuristics to path splitting

2016-02-08 Thread Jeff Law

On 02/06/2016 03:19 AM, Andreas Schwab wrote:

Jeff Law  writes:


diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c
new file mode 100644
index 000..aeb926e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details " } */
+
+int
+foo(char *p, int n)
+{
+  int s = 0;
+  int i;
+
+  for (i = 0; i < n; i++) {
+if (p[i] >= 0)
+  s++;
+else
+  s--;
+  }
+
+  return s;
+}
+
+/* { dg-final { scan-tree-dump "appears to be a join point for if-convertable diamond" 
"split-paths" } } */


That fails on aarch64.

AH.  Depends chars being signed by default.  I'll fix it momentarily.

Thanks,
jeff



[PATCH] Fix split-path-2.c on targets with unsigned characters by default

2016-02-08 Thread Jeff Law


Andreas pointed out the test was failing on aarch64.  It turns out it's 
got to be failing on all targets where characters are unsigned by 
default as the test assumes characters are signed by default.  Opps.


The fix is pretty obvious.  Just make the array of chars be signed chars.

Installed after verifying x86-linux still passes and hand verification 
of aarch64 as well.


Sorry for the breakage.

jeff
commit b32ded8cd0158d88155116bde4685a36c4802878
Author: law 
Date:   Mon Feb 8 08:40:00 2016 +

PR tree-optimization/68541
* gcc.dg/tree-ssa/split-path-2.c: Make char array explicitly signed.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233208 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 6940136..6aa927d 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,8 @@
 2016-02-08  Jeff Law  
 
+   PR tree-optimization/68541
+   * gcc.dg/tree-ssa/split-path-2.c: Make char array explicitly signed.
+
 PR tree-optimization/65917
* gcc.dg/tree-ssa/20030922-2.c: No longer xfailed.
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c
index aeb926e..8f503f2 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c
@@ -2,7 +2,7 @@
 /* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details " } */
 
 int
-foo(char *p, int n)
+foo(signed char *p, int n)
 {
   int s = 0;
   int i;


PATCH] Fix PR 31531: A microoptimization of isnegative of signed integer

2016-02-08 Thread Hurugalawadi, Naveen
Hi,

Please find attached the patch that performs optimization on unsigned values.

Original fold-const part implemented in match.pd.

Please review the patch and let us know if it's OK?

Regression Tested on X86_64 with no regressions.

Thanks,
Naveen

ChangeLog:
* match.pd (cmp (convert (bit_not @0)) INTEGER_CST@1): New Simplifier.

Testsuite/ChangeLog:
* gcc.dg/pr31531.c : New testcase.diff --git a/gcc/match.pd b/gcc/match.pd
index 6c8ebd5..42f772b 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1452,6 +1452,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& (TYPE_SIZE (TREE_TYPE (@0)) == TYPE_SIZE (TREE_TYPE (@1
(view_convert @1)))
 
+/* Fold ((CAST)~X) op C as ((CAST)X) op' ~C, where op' is the
+   swapped comparison.  */
+(for cmp (tcc_comparison)
+ scmp (swapped_tcc_comparison)
+ (simplify
+  (cmp (convert (bit_not @0)) INTEGER_CST@1)
+  (if (TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@0))
+   && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)))
+   (with { tree cst = fold_convert (TREE_TYPE (@0), @1); }
+(scmp @0 (bit_not { cst; }))
+
 /* Re-association barriers around constants and other re-association
barriers can be removed.  */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/pr31531.c b/gcc/testsuite/gcc.dg/pr31531.c
new file mode 100644
index 000..d687c91
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr31531.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-gimple" } */
+int isnegative_optimized_4 (unsigned int X)
+{
+  int result;
+  if ((~X) >> 31)
+result = 0;
+  else
+result = 1;
+  return result;
+}
+/* { dg-final { scan-tree-dump-times "0 != 0" 1 "gimple" } } */


Re: [PATCH, stage1] Better error recovery for merge-conflict markers

2016-02-08 Thread Bert Wesarg
David,

On Thu, Apr 9, 2015 at 10:29 AM, Bert Wesarg  wrote:
> Hi David,
>
>> Various tools that operate on source code files will inject markers
>> into them when an unfixable conflict occurs in a merger.
>>
>> There appears to be no blessed standard for these conflict markers,
>> but an ad-hoc convention is for 7 '<' , '=', or '>' characters at
>> the start of a line, followed optionally by a space and optional
>> text
>>
>> e.g.:
>> <<< HEAD
>> extern int some_var;
>> ===
>> extern short some_var;
>> >>> Some other branch
>>
>> This convention is followed by GNU patch:
>>   http://git.savannah.gnu.org/cgit/patch.git/tree/src/merge.c
>> by git:
>>
>> http://git.kernel.org/cgit/git/git.git/tree/Documentation/merge-config.txt
>> and by various other tools.
>
>
> if you read both of these tools carefully, you will notice an alternative
> conflict style (named 'diff3' in both of them), that includes a third
> section, the common pre-image. Here is an example:
>
> <<< HEAD
> extern int some_var;
> ||| merge base
> extern int var;
> ===
> extern short var;

 Some other branch
>
>
> Additionally, git supports a custom conflict-marker-size to change the
> default of 7 on a per file name (the conflict-marker-size attribute). So it
> may be worthwhile to support other sizes than 7 in this patch too.

you never commentewd on my mail, but I saw this now in trunk. I can
only repeat myself here.

Thanks.

Bert

>
> Bert
>


Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-08 Thread Richard Biener
On Fri, 5 Feb 2016, Vladimir Makarov wrote:

> On 02/05/2016 04:25 AM, Richard Biener wrote:
> > The following patch fixes the performance regression for 435.gromacs
> > on x86_64 with AVX2 (Haswell or bdver2) caused by
> > 
> > 2015-12-18  Andreas Krebbel  
> > 
> > * ira.c (ira_setup_alts): Move the scan for commutative modifier
> > to the first loop to make it work even with disabled alternatives.
> > 
> > which in itself is a desirable change giving the RA more freedom.
> > 
> > It turns out the fix makes an existing issue more severe in detecting
> > more swappable alternatives and thus exiting ira_setup_alts with
> > operands swapped in recog_data.  This seems to give a slight preference
> > to choose alternatives with the operands swapped (I didn't try to
> > investigate how IRA handles the "merged" alternative mask and
> > operand swapping in its further processing).
> Alternative mask excludes alternatives which will be definitely rejected in
> LRA.  This approach is to speed up LRA (a lot was done to speed up RA but
> still it consumes a big chunk of compiler time which is unusual for all
> compilers).
> 
> LRA and reload prefer insns without commutative operands swap when all other
> costs are the same.

Ok, so leaving operands swapped in ira_setup_alts will prefer the
swapped variants in case other costs are the same.

> > Of course previous RTL optimizers and canonicalization rules as well
> > as backend patterns are tuned towards the not swapped variant and thus
> > it happens doing more swaps ends up in slower code (I didn't closely
> > investigate).
> > 
> > So I tested the following patch which simply makes sure that
> > ira_setup_alts does not alter recog_data.
> > 
> > On a Intel Haswell machine I get (base is with the patch, peak is with
> > the above change reverted):
> > 
> >Estimated
> > Estimated
> >  Base Base   BasePeak Peak   Peak
> > Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
> > -- --  -  ---  -
> > -
> > 435.gromacs  7140264   27.1 S7140270
> > 26.5 S
> > 435.gromacs  7140264   27.1 *7140269
> > 26.6 S
> > 435.gromacs  7140263   27.1 S7140269
> > 26.5 *
> > ==
> > 435.gromacs  7140264   27.1 *7140269
> > 26.5 *
> > 
> > which means the patched result is even better than before Andreas
> > change.  Current trunk homes in at a Run Time of 321s (which is
> > the regression to fix).
>   Thanks for working on this, Richard.  It is not easy to find reasons for
> worse code on modern processors after such small change.  As RA is based on
> heuristics it hard to predict the change for a specific benchmark.  I remember
> I checked  Andreas patch on SPEC2000 in a hope that it also improves x86-64
> code but I did not see a difference.
> 
> It is even hard to say sometimes how a specific (non-heuristic) optimization
> will affect a specific benchmark performance when a lot of unknown (from
> alignments to CPU internals are involved).  An year ago I tried to use ML to
> choose best options.  I used a set of about 100 C benchmarks (and even more
> functions).  For practically every benchmark, I had an option modification to
> -Ofast resulting in faster code but ML prediction did not work at all.
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok for trunk?
> > 
> OK.  Thanks again.

Thanks.  Over the weekend I did a full 3-run SPEC 2k6 with the following
result.  Base is r231814 while peak is r231814 patched, flags are
-Ofast -march=native (on a Haswell machine).

  Estimated   
Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  
-
400.perlbench9770255   38.4 *9770251   
39.0 S
400.perlbench9770258   37.8 S9770250   
39.0 *
400.perlbench9770253   38.6 S9770250   
39.0 S
401.bzip29650407   23.7 *9650400   
24.1 *
401.bzip29650412   23.4 S9650417   
23.1 S
401.bzip29650396   24.4 S9650398   
24.3 S
403.gcc  8050252   31.9 S8050245   
32.9 S
403.gcc  8050240   33.6 S8050244   
32.9 *
403.gcc  8050242   33.2 *8050241   
33.4 S
429.mcf  9120243   37.6 S9120245   
37.3 S
429.mcf  9120224   40.7 S9120241   
37.8 *
429.mcf

Re: [PATCH PR68021]Don't add biv candidate if it's not incremented by a single stmt

2016-02-08 Thread Richard Biener
On Fri, Feb 5, 2016 at 8:20 PM, Bin Cheng  wrote:
> Hi,
> As reported by PR68021, there is an ivopt bug all the time.  As designed, 
> ivopt tries to identify and reuse induction variables in the original input.  
> Apparently we don't need to compute such original biv because the computation 
> is already there.  Every time an iv use is rewritten, ivopt checks if the 
> candidate is an original biv using below code:
>
>   /* An important special case -- if we are asked to express value of
>  the original iv by itself, just exit; there is no need to
>  introduce a new computation (that might also need casting the
>  variable to unsigned and back).  */
>   if (cand->pos == IP_ORIGINAL
>   && cand->incremented_at == use->stmt)
> {
>   enum tree_code stmt_code;
>
>   gcc_assert (is_gimple_assign (use->stmt));
>   gcc_assert (gimple_assign_lhs (use->stmt) == cand->var_after);
>
>   /* Check whether we may leave the computation unchanged.
>  This is the case only if it does not rely on other
>  computations in the loop -- otherwise, the computation
>  we rely upon may be removed in remove_unused_ivs,
>  thus leading to ICE.  */
>   stmt_code = gimple_assign_rhs_code (use->stmt);
>   if (stmt_code == PLUS_EXPR
>   || stmt_code == MINUS_EXPR
>   || stmt_code == POINTER_PLUS_EXPR)
> {
>   if (gimple_assign_rhs1 (use->stmt) == cand->var_before)
> op = gimple_assign_rhs2 (use->stmt);
>   else if (gimple_assign_rhs2 (use->stmt) == cand->var_before)
> op = gimple_assign_rhs1 (use->stmt);
>   else
> op = NULL_TREE;
> }
>   else
> op = NULL_TREE;
>
>   if (op && expr_invariant_in_loop_p (data->current_loop, op))
> return;
> }
>
> Note this code can only handle specific form biv, in which there is a single 
> explicit increment stmt in the form of "biv_after = biv_before + step".
>
> Unfortunately, in rare case like this, the biv is increased in two stmts, 
> like:
>   biv_x = biv_before + step_part_1;
>   biv_after = biv_x + step_part_2;
>
> That's why control flow goes to ICE point.  We should not fix code at the ICE 
> point because:
> 1) We shouldn't rewrite biv candidate.  Even there is no correctness issue, 
> it will introduce redundant code by rewriting it.
> 2) For non biv candidate, all the computation at ICE point has already been 
> done before at iv cost computation part.  In other words, if control flow 
> goes here, gcc_assert (comp != NULL" will be true.
>
> Back to this issue, there are two possible fixes.  First one is to specially 
> rewrite mentioned increment stmts into:
>   biv_after = biv_before + step
> This fix needs more change because we are already after candidate creation 
> point and need to do computation on ourself.
>
> Another fix is just don't add biv.  In this way, we check stricter condition 
> when adding biv candidate, guarantee control flow doesn't go to ICE point.  
> It won't cause worse code (Well, maybe a type conversion from unsigned to 
> signed) since we add exact the same candidate anyway (but not as a biv 
> candidate).  As a matter of fact, we create/use another candidate which has 
> the same {base, step} as the biv.  The computation of biv now becomes dead 
> code and will be removed by following passes.
>
> This patch fixes the issue by the 2nd method.  Bootstrap and test on x86_64 
> and AArch64 (test ongoing).  Is it OK if no failures?

Ok.

Thanks,
Richard.

> Thanks,
> bin
>
> 2016-02-04  Bin Cheng  
>
> PR tree-optimization/68021
> * tree-ssa-loop-ivopts.c (increment_stmt_for_biv_p): New function.
> (add_iv_candidate_for_biv, rewrite_use_nonlinear_expr): Call above
> function checking if stmt is increment stmt for biv.
>
> gcc/testsuite/ChangeLog
> 2016-02-04  Bin Cheng  
>
> PR tree-optimization/68021
> * gcc.dg/tree-ssa/pr68021.c: New test.
>


Re: RFA (convert): PATCH for c++/69631 (wrong overflow error with -fwrapv)

2016-02-08 Thread Richard Biener
On Fri, Feb 5, 2016 at 9:34 PM, Jason Merrill  wrote:
> The problem here was that the call to convert_to_integer_nofold was still
> pushing the conversion down into the multiply expression, and when we do the
> multiplication in unsigned short it overflows.  This patch fixes
> convert_to_integer_1 to not do any truncation distribution when !dofold.
> But that broke several testcases, because fold() doesn't ever try to do that
> distribution, and so was missing some optimizations revealed by the
> distribution.  So this patch changes cp_fold to redo the conversion with
> dofold enabled.  I also change the C++ convert() entry point to do folding
> conversion again, since a few places in convert_to_integer_1, and probably
> elsewhere in the back end, expect that.
>
> For this email I didn't re-indent the truncation distribution code in
> convert_to_integer_1 to make it easier to read; for what I check in should I
> leave it like this, re-indent it, or do something else like a goto to avoid
> the need for re-indentation?

re-indent is fine but you could also simply tail-duplicate (optimized)

  /* When parsing long initializers, we might end up with a lot of casts.
 Shortcut this.  */
  if (TREE_CODE (expr) == INTEGER_CST)
return fold_convert (type, expr);
  return build1 (CONVERT_EXPR, type, expr);

and thus do

   if (!dofold)
 return ...;

> Tested x86_64-pc-linux-gnu, OK for trunk?

Ok.

Note that I think we should move all the truncation distribution code elsewhere
(match.pd).  But that's probably for GCC 7.

Richard.


Re: PATCH] Fix PR 31531: A microoptimization of isnegative of signed integer

2016-02-08 Thread Senthil Kumar Selvaraj

Hurugalawadi, Naveen writes:

> Hi,
>
> Please find attached the patch that performs optimization on unsigned values.
>
> Original fold-const part implemented in match.pd.
>
> Please review the patch and let us know if it's OK?
>
> Regression Tested on X86_64 with no regressions.
>
> Thanks,
> Naveen
>
> ChangeLog:
> * match.pd (cmp (convert (bit_not @0)) INTEGER_CST@1): New Simplifier.
>
> Testsuite/ChangeLog:
> * gcc.dg/pr31531.c : New testcase.

Could you please add dg-require-effective-target int32 to the test to
avoid breaking targets with smaller ints (right shift by 31 bits)?

Regards
Senthil


Re: [PATCH] Fix PR 69282: ICE on 464.h264ref at -O3

2016-02-08 Thread Richard Biener
On Mon, Feb 8, 2016 at 7:22 AM, Andrew Pinski  wrote:
> Hi,
>   The problem is that even though expand knows how to expand
> VEC_COND when there is no vcond_mask pattern,
> expand_vec_cond_expr_p returns false.  So this patch allows
> expand_vec_cond_expr_p to try the next part of the function if there
> is no vcond_mask pattern.
>
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

Please combine the two ifs.

   if (VECTOR_BOOLEAN_TYPE_P (cmp_op_type)
  && get_vcond_mask_icode (TYPE_MODE (value_type),
+   TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing)
return true;

Ok with that change.

Thanks,
Richard.

> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * optabs-tree.c (expand_vec_cond_expr_p): Don't early return if
> get_vcond_mask_icode returns false.
>
> Testsuite/ChangeLog:
> * gcc.c-torture/compile/20160205-1.c: New testcase.


Re: PATCH] Fix PR 31531: A microoptimization of isnegative of signed integer

2016-02-08 Thread Richard Biener
On Mon, Feb 8, 2016 at 10:16 AM, Senthil Kumar Selvaraj
 wrote:
>
> Hurugalawadi, Naveen writes:
>
>> Hi,
>>
>> Please find attached the patch that performs optimization on unsigned values.
>>
>> Original fold-const part implemented in match.pd.
>>
>> Please review the patch and let us know if it's OK?
>>
>> Regression Tested on X86_64 with no regressions.
>>
>> Thanks,
>> Naveen
>>
>> ChangeLog:
>> * match.pd (cmp (convert (bit_not @0)) INTEGER_CST@1): New Simplifier.
>>
>> Testsuite/ChangeLog:
>> * gcc.dg/pr31531.c : New testcase.
>
> Could you please add dg-require-effective-target int32 to the test to
> avoid breaking targets with smaller ints (right shift by 31 bits)?

+/* Fold ((CAST)~X) op C as ((CAST)X) op' ~C, where op' is the
+   swapped comparison.  */
+(for cmp (tcc_comparison)
+ scmp (swapped_tcc_comparison)
+ (simplify
+  (cmp (convert (bit_not @0)) INTEGER_CST@1)
+  (if (TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@0))
+   && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)))
+   (with { tree cst = fold_convert (TREE_TYPE (@0), @1); }
+(scmp @0 (bit_not { cst; }))

I fail to see how this pattern is a good fit for the testcase, can you
try explaining?
I'm wondering that you require 'type's precision (a bool!) to be equal
to that of @0.
I'm also failing to see why you can't enhance the existing

/* Fold ~X op C as X op' ~C, where op' is the swapped comparison.  */
(for cmp (simple_comparison)
 scmp (swapped_simple_comparison)
 (simplify
  (cmp (bit_not@2 @0) CONSTANT_CLASS_P@1)
  (if (single_use (@2)
   && (TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST))
   (scmp @0 (bit_not @1)

Thanks,
Richard.

> Regards
> Senthil


[Ada] Fix oversight in latest create_var_decl change

2016-02-08 Thread Eric Botcazou
It causes the global uninitialized variables to be put in DATA instead of (the 
equivalent of) BSS on some platforms like AIX.

Fixed thusly, tested on x86_64-suse-linux, applied on the mainline.


2016-02-08  Eric Botcazou  

* gcc-interface/utils.c (create_var_decl): Set again DECL_COMMON and
DECL_IGNORED_P last.

-- 
Eric BotcazouIndex: gcc-interface/utils.c
===
--- gcc-interface/utils.c	(revision 233194)
+++ gcc-interface/utils.c	(working copy)
@@ -2484,6 +2484,24 @@ create_var_decl (tree name, tree asm_nam
   DECL_ARTIFICIAL (var_decl) = artificial_p;
   DECL_EXTERNAL (var_decl) = extern_flag;
 
+  TREE_CONSTANT (var_decl) = constant_p;
+  TREE_READONLY (var_decl) = const_flag;
+
+  /* The object is public if it is external or if it is declared public
+ and has static storage duration.  */
+  TREE_PUBLIC (var_decl) = extern_flag || (public_flag && static_storage);
+
+  /* We need to allocate static storage for an object with static storage
+ duration if it isn't external.  */
+  TREE_STATIC (var_decl) = !extern_flag && static_storage;
+
+  TREE_SIDE_EFFECTS (var_decl)
+= TREE_THIS_VOLATILE (var_decl)
+= TYPE_VOLATILE (type) | volatile_flag;
+
+  if (TREE_SIDE_EFFECTS (var_decl))
+TREE_ADDRESSABLE (var_decl) = 1;
+
   /* Ada doesn't feature Fortran-like COMMON variables so we shouldn't
  try to fiddle with DECL_COMMON.  However, on platforms that don't
  support global BSS sections, uninitialized global variables would
@@ -2508,24 +2526,6 @@ create_var_decl (tree name, tree asm_nam
 	 != null_pointer_node))
 DECL_IGNORED_P (var_decl) = 1;
 
-  TREE_CONSTANT (var_decl) = constant_p;
-  TREE_READONLY (var_decl) = const_flag;
-
-  /* The object is public if it is external or if it is declared public
- and has static storage duration.  */
-  TREE_PUBLIC (var_decl) = extern_flag || (public_flag && static_storage);
-
-  /* We need to allocate static storage for an object with static storage
- duration if it isn't external.  */
-  TREE_STATIC (var_decl) = !extern_flag && static_storage;
-
-  TREE_SIDE_EFFECTS (var_decl)
-= TREE_THIS_VOLATILE (var_decl)
-= TYPE_VOLATILE (type) | volatile_flag;
-
-  if (TREE_SIDE_EFFECTS (var_decl))
-TREE_ADDRESSABLE (var_decl) = 1;
-
   /* ??? Some attributes cannot be applied to CONST_DECLs.  */
   if (TREE_CODE (var_decl) == VAR_DECL)
 process_attributes (&var_decl, &attr_list, true, gnat_node);


[PATCH, PR69599] Fix GOMP/GOACC_parallel optimization in ipa-pta

2016-02-08 Thread Tom de Vries

Hi,

when compiling the fipa-pta tests in the libgomp testsuite 
(omp-nested-2.c, pr46032.c) with -flto -flto-partition=max, the tests 
fail in execution (PR69599).


The problem is related to the GOMP/GOACC_parallel optimization we do in 
fipa-pta, where we interpret a call GOMP_parallel (&foo._0, data) as a 
call foo._0 (data).


The problem is that this optimization is only legal in lto if both:
- foo containing the call GOMP_parallel (&foo._0, data) and
- foo._0
are contained in the same partition.

In the case of -flto-partition=max, foo is contained in it's own 
partition, and foo._0 is contained in another partition.  This means the 
data argument to the GOMP_parallel call appears unused, and the setting 
of the argument is optimized away, which causes the execution failure.


This patch fixes that by testing if foo and foo._0 are part of the same 
partition.


[ Note that the node_address_taken change in the patch has no effect, 
since nonlocal_p already tests for used_from_other_partition. But I 
thought it was clearer to state the conditions under which we are 
allowed to ignore node->address_taken explicitly. ]


Bootstrapped and reg-tested on x86_64.

Build for nvidia accelerator and reg-tested libgomp with various lto 
settings.


OK for trunk, stage4?

Thanks,
- Tom
Fix GOMP/GOACC_parallel optimization in ipa-pta

2016-02-08  Tom de Vries  

	PR tree-optimization/69599
	* tree-ssa-structalias.c (fndecl_maybe_in_other_partition): New
	function.
	(find_func_aliases_for_builtin_call, find_func_clobbers)
	(ipa_pta_execute):  Handle case that foo and foo._0 are not in same lto
	partition.

	* testsuite/libgomp.c/omp-nested-3.c: New test.
	* testsuite/libgomp.c/pr46032-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/kernels-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/parallel-2.c: New test.

---
 gcc/tree-ssa-structalias.c | 53 ++
 libgomp/testsuite/libgomp.c/omp-nested-3.c |  4 ++
 libgomp/testsuite/libgomp.c/pr46032-2.c|  4 ++
 .../libgomp.oacc-c-c++-common/kernels-2.c  |  4 ++
 .../libgomp.oacc-c-c++-common/parallel-2.c |  4 ++
 5 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index e7d0797..feebd8d 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4162,6 +4162,18 @@ find_func_aliases_for_call_arg (varinfo_t fi, unsigned index, tree arg)
 process_constraint (new_constraint (lhs, *rhsp));
 }
 
+/* Return true if FNDECL may be part of another lto partition.  */
+
+static bool
+fndecl_maybe_in_other_partition (tree fndecl)
+{
+  cgraph_node *fn_node = cgraph_node::get (fndecl);
+  if (fn_node == NULL)
+return true;
+
+  return fn_node->in_other_partition;
+}
+
 /* Create constraints for the builtin call T.  Return true if the call
was handled, otherwise false.  */
 
@@ -4537,6 +4549,11 @@ find_func_aliases_for_builtin_call (struct function *fn, gcall *t)
 	  tree fnarg = gimple_call_arg (t, fnpos);
 	  gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR);
 	  tree fndecl = TREE_OPERAND (fnarg, 0);
+	  if (in_lto_p
+		  && fndecl_maybe_in_other_partition (fndecl))
+		/* Fallthru to general call handling.  */
+		break;
+
 	  tree arg = gimple_call_arg (t, argpos);
 
 	  varinfo_t fi = get_vi_for_tree (fndecl);
@@ -5113,6 +5130,11 @@ find_func_clobbers (struct function *fn, gimple *origt)
 	  tree fnarg = gimple_call_arg (t, fnpos);
 	  gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR);
 	  tree fndecl = TREE_OPERAND (fnarg, 0);
+	  if (in_lto_p
+		  && fndecl_maybe_in_other_partition (fndecl))
+		/* Fallthru to general call handling.  */
+		break;
+
 	  varinfo_t cfi = get_vi_for_tree (fndecl);
 
 	  tree arg = gimple_call_arg (t, argpos);
@@ -7505,9 +7527,14 @@ ipa_pta_execute (void)
 	 address_taken bit for function foo._0, which would make it non-local.
 	 But for the purpose of ipa-pta, we can regard the run_on_threads call
 	 as a local call foo._0 (data),  so we ignore address_taken on nodes
-	 with parallelized_function set.  */
-  bool node_address_taken = (node->address_taken
- && !node->parallelized_function);
+	 with parallelized_function set.
+	 Note: this is only safe, if foo and foo._0 are in the same lto
+	 partition.  */
+  bool node_address_taken = ((node->parallelized_function
+  && !(in_lto_p
+   && node->used_from_other_partition))
+ ? false
+ : node->address_taken);
 
   /* For externally visible or attribute used annotated functions use
 	 local constraints for their arguments.
@@ -7676,12 +7703,20 @@ ipa_pta_execute (void)
 		continue;
 
 	  /* Handle direct calls to functions with body.  */
-	  if (gimple_call_builtin_p (stmt, BUILT_IN_GOMP_PARALLEL))
-		decl = TREE_OPERAND (gimple_call_arg (stmt, 0), 0);
-	  else if (gimple_call_builtin_p (stmt, BUILT_IN_GOACC_PARALLEL))
-		decl = TREE_OPERAND

[PATCH] Avoid bugs like PR68273 to trigger

2016-02-08 Thread Richard Biener

This makes it less likely (for example through the PRE path) to trigger
target bugs like PR68273 where targets use type alignment of call
arguments to decide on the ABI.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

IMHO targets should still be fixed.

Richard.

2016-02-08  Richard Biener  
Jeff Law  

PR target/68273
* tree-ssanames.c (make_ssa_name_fn): Always use unqualified
types for anonymous SSA names.

* gcc.target/mips/pr68273.c: New testcase.

Index: gcc/tree-ssanames.c
===
--- gcc/tree-ssanames.c (revision 233208)
+++ gcc/tree-ssanames.c (working copy)
@@ -286,7 +286,7 @@ make_ssa_name_fn (struct function *fn, t
 
   if (TYPE_P (var))
 {
-  TREE_TYPE (t) = var;
+  TREE_TYPE (t) = TYPE_MAIN_VARIANT (var);
   SET_SSA_NAME_VAR_OR_IDENTIFIER (t, NULL_TREE);
 }
   else
Index: gcc/testsuite/gcc.target/mips/pr68273.c
===
--- gcc/testsuite/gcc.target/mips/pr68273.c (revision 0)
+++ gcc/testsuite/gcc.target/mips/pr68273.c (revision 0)
@@ -0,0 +1,79 @@
+/* { dg-do compile } */
+/* { dg-options "-w -fdump-rtl-expand" } */
+/* { dg-skip-if "" { *-*-* }  { } { "-O2" } } */
+
+extern char errbuf[];
+typedef struct Symbol
+{
+  char *name;
+}
+Symbol;
+typedef struct Tnode
+{
+}
+Tnode;
+typedef union Value
+{
+  long long i;
+}
+Value;
+typedef struct Entry
+{
+}
+Entry;
+typedef struct Table
+{
+}
+Table;
+typedef struct Node
+{
+  Tnode *typ;
+  int sto;
+  Value val;
+}
+Node;
+struct Scope
+{
+  Table *table;
+} *sp;
+static Node op (Node);
+Entry *p, *q;
+union YYSTYPE
+{
+  Symbol *sym;
+  Node rec;
+};
+typedef union YYSTYPE YYSTYPE;
+typedef short int yytype_int16;
+
+int
+yyparse (void)
+{
+  YYSTYPE yyval;
+  int yyn;
+  YYSTYPE *yyvsp;
+  while (1)
+{
+  if (yyn == 34)
+   {
+ if ((yyvsp[-1].rec).sto & 0x10)
+   sprintf (errbuf, "invalid typedef qualifier for '%s'",
+(yyvsp[0].sym)->name);
+ p = enter (sp->table, (yyvsp[0].sym));
+   }
+  else
+   op ((yyvsp[0].rec));
+  *++yyvsp = yyval;
+}
+}
+
+static Node
+op (Node q)
+{
+  integer (q.typ);
+  mgtype (q.typ);
+}
+
+
+/* { dg-final { scan-rtl-dump-times "\\\(set \\\(reg:SI 5 \\\$5\\\)" 2 
"expand" } }  */
+/* { dg-final { scan-rtl-dump-times "\\\(set \\\(reg:SI 6 \\\$6\\\)" 1 
"expand" } }  */


Re: [PATCH] Avoid bugs like PR68273 to trigger

2016-02-08 Thread Eric Botcazou
> This makes it less likely (for example through the PRE path) to trigger
> target bugs like PR68273 where targets use type alignment of call
> arguments to decide on the ABI.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Thanks.  I think that we can also avoid this issue by teaching SRA not to 
over-align scalar types, I don't see any point in doing that.  It's done in:

  misalign = (misalign + offset) & (align - 1);
  if (misalign != 0)
align = (misalign & -misalign);
  if (align != TYPE_ALIGN (exp_type))
exp_type = build_aligned_type (exp_type, align);

  mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);

-- 
Eric Botcazou


[PATCH, PR59627, c++] Handle DECL_OMP_DECLARE_REDUCTION in discriminator_for_local_entity

2016-02-08 Thread Tom de Vries

Hi,

When running libgomp testsuite with -flto, we run into an ICE in the 
udr-*.C tests.


A minimal testcase is in listed in PR59627:
...
struct A { int i; };

void foo()
{
  A a;
  #pragma omp declare reduction (+: A: omp_out.i += omp_in.i)
  #pragma omp parallel reduction (+: a)
  ;
}
...

The ICE with backtrace looks as follows:
...
src/libgomp/testsuite/libgomp.c++/udr-1.C:56:13: internal compiler 
error: in discriminator_for_local_entity, at cp/mangle.c:1762

0x9ad52a discriminator_for_local_entity
src/gcc/cp/mangle.c:1762
0x9ad8a5 write_local_name
src/gcc/cp/mangle.c:1850
0x9a7975 write_name
src/gcc/cp/mangle.c:882
0x9a71c0 write_encoding
src/gcc/cp/mangle.c:744
0x9a6c61 write_mangled_name
src/gcc/cp/mangle.c:709
0x9b6d09 mangle_decl_string
src/gcc/cp/mangle.c:3509
0x9b6d4f get_mangled_id
src/gcc/cp/mangle.c:3531
0x9b71d7 mangle_decl(tree_node*)
src/gcc/cp/mangle.c:3598
0x14da287 decl_assembler_name(tree_node*)
src/gcc/tree.c:670
0x14edf74 assign_assembler_name_if_neeeded(tree_node*)
src/gcc/tree.c:5917
0x14ee0cc free_lang_data_in_cgraph
src/gcc/tree.c:5972
0x14ee280 free_lang_data
src/gcc/tree.c:6014
0x14ee320 execute
src/gcc/tree.c:6063
Please submit a full bug report
...

The problem seems to be that we're trying to get a discriminator (the 
lexical ordinal of a var among entities with the same name in the same 
function) for a DECL_OMP_DECLARE_REDUCTION_P function, which is not 
handled in discriminator_for_local_entity.


For the test-case above the declared reduction is shown as:
...
  void omp declare reduction operator+~1A (struct A &);
...

AFAIU, those DECL_OMP_DECLARE_REDUCTION_P decls are unique in a 
function, so I'd say we can simply return '0'.


Bootstrapped and reg-tested on x86_64.

OK for trunk, stage1?

Thanks,
- Tom
Handle DECL_OMP_DECLARE_REDUCTION in discriminator_for_local_entity

2016-02-08  Tom de Vries  

	PR c++/59627
	* mangle.c (discriminator_for_local_entity): Handle
	DECL_OMP_DECLARE_REDUCTION_P function.

	* testsuite/libgomp.c++/udr-20.C: New test.

---
 gcc/cp/mangle.c| 3 +++
 libgomp/testsuite/libgomp.c++/udr-20.C | 4 
 2 files changed, 7 insertions(+)

diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 410c7f4..e83fc0c 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -1758,6 +1758,9 @@ discriminator_for_local_entity (tree entity)
 
   return local_class_index (entity);
 }
+  else if (TREE_CODE (entity) == FUNCTION_DECL
+	   && DECL_OMP_DECLARE_REDUCTION_P (entity))
+return 0;
   else
 gcc_unreachable ();
 }
diff --git a/libgomp/testsuite/libgomp.c++/udr-20.C b/libgomp/testsuite/libgomp.c++/udr-20.C
new file mode 100644
index 000..2aaf618
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/udr-20.C
@@ -0,0 +1,4 @@
+// { dg-do run { target lto } }
+// { dg-additional-options "-flto" }
+
+#include "udr-1.C"


Re: [PATCH] Avoid bugs like PR68273 to trigger

2016-02-08 Thread Richard Biener
On Mon, 8 Feb 2016, Eric Botcazou wrote:

> > This makes it less likely (for example through the PRE path) to trigger
> > target bugs like PR68273 where targets use type alignment of call
> > arguments to decide on the ABI.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> 
> Thanks.  I think that we can also avoid this issue by teaching SRA not to 
> over-align scalar types, I don't see any point in doing that.  It's done in:
> 
>   misalign = (misalign + offset) & (align - 1);
>   if (misalign != 0)
> align = (misalign & -misalign);
>   if (align != TYPE_ALIGN (exp_type))
> exp_type = build_aligned_type (exp_type, align);
> 
>   mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);

No, that's not over-aliging a salar type, that's preserving alignment
information on the memory reference.  What would need to be fixed
here is to see where that 'mem_ref' is assigned to a SSA name and
avoid making that SSA name have an over-aligned type.

Richard.


Re: [PATCH] Avoid bugs like PR68273 to trigger

2016-02-08 Thread Eric Botcazou
> No, that's not over-aliging a salar type, that's preserving alignment
> information on the memory reference.

What would we lose exactly by lowering the alignment to that of the type?
What's the point in knowing that a 32-bit integer is 64-bit aligned at the 
GIMPLE level?

-- 
Eric Botcazou


[PATCH, PR69707] Handle -fdiagnostics-color in lto

2016-02-08 Thread Tom de Vries

Hi,

when running libgomp.oacc-c-c++-common/parallel-dims.c with -flto 
-fno-use-linker-plugin, we run into a failing 'test for excess errors'.


The problem is that while -fdiagnostics-color=never is passed to gcc, 
it's not propagated to lto1, and the error message is annotated with 
color information, which confuses the test for excess errors.


This patch fixes the problem by making sure that -fdiagnostics-color is 
propagated to lto1, in the same way that -fdiagnostics-show-caret is 
propagated to lto1.


Bootstrapped and reg-tested on x86_64.

OK for trunk, stage1?

Thanks,
- Tom
Handle -fdiagnostics-color in lto

2016-02-08  Tom de Vries  

	PR lto/69707
	* common.opt (fdiagnostics-color=): Remove Driver flag.
	* lto-wrapper.c (merge_and_complain, append_compiler_options): Handle
	OPT_fdiagnostics_color_.

	* testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c: New test.

---
 gcc/common.opt|  2 +-
 gcc/lto-wrapper.c |  2 ++
 .../libgomp.oacc-c-c++-common/parallel-dims-2.c   | 19 +++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 520fa9c..a740dcb 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1153,7 +1153,7 @@ Common Alias(fdiagnostics-color=,always,never)
 ;
 
 fdiagnostics-color=
-Driver Common Joined RejectNegative Var(flag_diagnostics_show_color) Enum(diagnostic_color_rule) Init(DIAGNOSTICS_COLOR_NO)
+Common Joined RejectNegative Var(flag_diagnostics_show_color) Enum(diagnostic_color_rule) Init(DIAGNOSTICS_COLOR_NO)
 -fdiagnostics-color=[never|always|auto]	Colorize diagnostics.
 
 ; Required for these enum values.
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index ced6f2f..484dbc1 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -232,6 +232,7 @@ merge_and_complain (struct cl_decoded_option **decoded_options,
 	break;
 
 	  /* Fallthru.  */
+	case OPT_fdiagnostics_color_:
 	case OPT_fdiagnostics_show_caret:
 	case OPT_fdiagnostics_show_option:
 	case OPT_fdiagnostics_show_location_:
@@ -497,6 +498,7 @@ append_compiler_options (obstack *argv_obstack, struct cl_decoded_option *opts,
 	 on any CL_TARGET flag and a few selected others.  */
   switch (option->opt_index)
 	{
+	case OPT_fdiagnostics_color_:
 	case OPT_fdiagnostics_show_caret:
 	case OPT_fdiagnostics_show_option:
 	case OPT_fdiagnostics_show_location_:
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c
new file mode 100644
index 000..eea8c7e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c
@@ -0,0 +1,19 @@
+/* { dg-do run { target { openacc_nvidia_accel_selected && lto } } } */
+/* { dg-additional-options "-flto -fno-use-linker-plugin" } */
+
+/* Worker and vector size checks.  Picked an outrageously large
+   value.  */
+
+int main ()
+{
+#pragma acc parallel num_workers (2<<20) /* { dg-error "using num_workers" } */
+  {
+  }
+
+#pragma acc parallel vector_length (2<<20) /* { dg-error "using vector_length" } */
+  {
+  }
+
+  return 0;
+}
+


[patch] Fix SSA_NAME_OCCURS_IN_ABNORMAL_PHI issue in backprop

2016-02-08 Thread Eric Botcazou
Hi,

the Ada runtime fails to build for platforms still using the old SJLJ scheme 
because of the new GIMPLE backprop pass (hence it's a regression):

eric@polaris:~/build/gcc/native> gcc/gnat1 -quiet a-ncelfu.ads -gnatpg -O -I 
~/svn/gcc/gcc/ada/
a-ngcefu.adb: In function 'Ada.Numerics.Complex_Elementary_Functions.Sqrt':
a-ngcefu.adb:16:4: error: SSA_NAME_OCCURS_IN_ABNORMAL_PHI should be set
for SSA_NAME: _28 in statement:
xr_3(ab) = PHI 
PHI argument
_28
for PHI node
xr_3(ab) = PHI 
+===GNAT BUG DETECTED==+
| 6.0.0 20160206 (experimental) [trunk revision 233194] (x86_64-suse-linux) 
GCC error:|
| verify_ssa failed|
| Error detected around a-ngcefu.adb:16:4  

The problem is that the pass propagates the source RHS (_28) of:

  xr_29(ab) = ABS_EXPR <_28>;

into a PHI note using xr_29 along an AB edge so the above check triggers.

I think replacing such an AB SSA_NAME is problematic in any case: if the RHS 
is not AB, then the check triggers; if the RHS is also AB, then we may create 
overlapping live ranges and the compilation will abort later.  We can probably 
create a new SSA_NAME but it's not really the spirit of the backprop pass and 
I'm not sure it's worth the hassle, hence the minimal attached fixlet.

Tested on x86_64-suse-linux, OK for the mainline?


2016-02-08  Eric Botcazou  

* gimple-ssa-backprop.c (optimize_phi): Do not replace an argument
with SSA_NAME_OCCURS_IN_ABNORMAL_PHI set.

-- 
Eric BotcazouIndex: gimple-ssa-backprop.c
===
--- gimple-ssa-backprop.c	(revision 233194)
+++ gimple-ssa-backprop.c	(working copy)
@@ -840,13 +840,14 @@ backprop::optimize_phi (gphi *phi, tree
   bool replaced = false;
   FOR_EACH_PHI_ARG (use, phi, oi, SSA_OP_USE)
 	{
-	  tree new_arg = strip_sign_op (USE_FROM_PTR (use));
-	  if (new_arg)
+	  tree arg = USE_FROM_PTR (use);
+	  tree new_arg = strip_sign_op (arg);
+	  if (new_arg && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (arg))
 	{
 	  if (!replaced)
 		prepare_change (var);
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-		note_replacement (phi, USE_FROM_PTR (use), new_arg);
+		note_replacement (phi, arg, new_arg);
 	  replace_exp (use, new_arg);
 	  replaced = true;
 	}


Re: [PATCH, PR69707] Handle -fdiagnostics-color in lto

2016-02-08 Thread Jakub Jelinek
On Mon, Feb 08, 2016 at 11:34:39AM +0100, Tom de Vries wrote:
> Hi,
> 
> when running libgomp.oacc-c-c++-common/parallel-dims.c with -flto
> -fno-use-linker-plugin, we run into a failing 'test for excess errors'.
> 
> The problem is that while -fdiagnostics-color=never is passed to gcc, it's
> not propagated to lto1, and the error message is annotated with color
> information, which confuses the test for excess errors.
> 
> This patch fixes the problem by making sure that -fdiagnostics-color is
> propagated to lto1, in the same way that -fdiagnostics-show-caret is
> propagated to lto1.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk, stage1?

Doesn't that mean diagnostics from the driver itself will no longer honor
that option when deciding if the diagnostics should be colorized or not?

Jakub


Re: [PATCH] Avoid bugs like PR68273 to trigger

2016-02-08 Thread Richard Biener
On Mon, 8 Feb 2016, Eric Botcazou wrote:

> > No, that's not over-aliging a salar type, that's preserving alignment
> > information on the memory reference.
> 
> What would we lose exactly by lowering the alignment to that of the type?
> What's the point in knowing that a 32-bit integer is 64-bit aligned at the 
> GIMPLE level?

It helps vectorization.  This was specifically introduced to fix a
regression on powerpc.

See PR65310.

Richard.


Re: [PATCH, PR69599] Fix GOMP/GOACC_parallel optimization in ipa-pta

2016-02-08 Thread Richard Biener
On Mon, 8 Feb 2016, Tom de Vries wrote:

> Hi,
> 
> when compiling the fipa-pta tests in the libgomp testsuite (omp-nested-2.c,
> pr46032.c) with -flto -flto-partition=max, the tests fail in execution
> (PR69599).
> 
> The problem is related to the GOMP/GOACC_parallel optimization we do in
> fipa-pta, where we interpret a call GOMP_parallel (&foo._0, data) as a call
> foo._0 (data).
> 
> The problem is that this optimization is only legal in lto if both:
> - foo containing the call GOMP_parallel (&foo._0, data) and
> - foo._0
> are contained in the same partition.
> 
> In the case of -flto-partition=max, foo is contained in it's own partition,
> and foo._0 is contained in another partition.  This means the data argument to
> the GOMP_parallel call appears unused, and the setting of the argument is
> optimized away, which causes the execution failure.
> 
> This patch fixes that by testing if foo and foo._0 are part of the same
> partition.
> 
> [ Note that the node_address_taken change in the patch has no effect, since
> nonlocal_p already tests for used_from_other_partition. But I thought it was
> clearer to state the conditions under which we are allowed to ignore
> node->address_taken explicitly. ]
> 
> Bootstrapped and reg-tested on x86_64.
> 
> Build for nvidia accelerator and reg-tested libgomp with various lto settings.
> 
> OK for trunk, stage4?

I don't like the in_lto_p checks, why's the check not working
for non-LTO?

Thanks,
Richard.

> 
> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [Patch AArch64] GCC 6 regression in vector performance. - Fix vector initialization to happen with lane load instructions.

2016-02-08 Thread James Greenhalgh
On Tue, Feb 02, 2016 at 10:29:29AM +, James Greenhalgh wrote:
> On Wed, Jan 20, 2016 at 03:22:11PM +, James Greenhalgh wrote:
> > 
> > Hi,
> > 
> > In a number of cases where we try to create vectors we end up spilling to 
> > the
> > stack and then filling. This is one example distilled from a couple of
> > micro-benchmrks where the issue shows up. The reason for the extra cost
> > in this case is the unnecessary use of the stack. The patch attempts to
> > finesse this by using lane loads or vector inserts to produce the right
> > results.
> > 
> > This patch is mostly Ramana's work, I've just cleaned it up a little.
> > 
> > This has been in a number of our trees lately, and we haven't seen any
> > regressions. I've also bootstrapped and tested it, and run a set of
> > benchmarks to show no regressions on Cortex-A57 or Cortex-A53.
> > 
> > The patch fixes some regressions caused by the more agressive vectorization
> > in GCC6, so I'd like to propose it to go in even though we are in Stage 4.
> > 
> > OK?
> 
> *Ping*

*ping^2*

Cheers,
James

> > 2016-01-20  James Greenhalgh  
> > Ramana Radhakrishnan  
> > 
> > * config/aarch64/aarch64.c (aarch64_expand_vector_init): Refactor,
> > always use lane loads to construct non-constant vectors.
> > 
> > gcc/testsuite/
> > 
> > 2016-01-20  James Greenhalgh  
> > Ramana Radhakrishnan  
> > 
> > * gcc.target/aarch64/vector_initialization_nostack.c: New.
> > 
> 
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index 03bc1b9..3787b38 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -10985,28 +10985,37 @@ aarch64_simd_make_constant (rtx vals)
> >  return NULL_RTX;
> >  }
> >  
> > +/* Expand a vector initialisation sequence, such that TARGET is
> > +   initialised to contain VALS.  */
> > +
> >  void
> >  aarch64_expand_vector_init (rtx target, rtx vals)
> >  {
> >machine_mode mode = GET_MODE (target);
> >machine_mode inner_mode = GET_MODE_INNER (mode);
> > +  /* The number of vector elements.  */
> >int n_elts = GET_MODE_NUNITS (mode);
> > +  /* The number of vector elements which are not constant.  */
> >int n_var = 0;
> >rtx any_const = NULL_RTX;
> > +  /* The first element of vals.  */
> > +  rtx v0 = XVECEXP (vals, 0, 0);
> >bool all_same = true;
> >  
> > +  /* Count the number of variable elements to initialise.  */
> >for (int i = 0; i < n_elts; ++i)
> >  {
> >rtx x = XVECEXP (vals, 0, i);
> > -  if (!CONST_INT_P (x) && !CONST_DOUBLE_P (x))
> > +  if (!(CONST_INT_P (x) || CONST_DOUBLE_P (x)))
> > ++n_var;
> >else
> > any_const = x;
> >  
> > -  if (i > 0 && !rtx_equal_p (x, XVECEXP (vals, 0, 0)))
> > -   all_same = false;
> > +  all_same &= rtx_equal_p (x, v0);
> >  }
> >  
> > +  /* No variable elements, hand off to aarch64_simd_make_constant which 
> > knows
> > + how best to handle this.  */
> >if (n_var == 0)
> >  {
> >rtx constant = aarch64_simd_make_constant (vals);
> > @@ -11020,14 +11029,15 @@ aarch64_expand_vector_init (rtx target, rtx vals)
> >/* Splat a single non-constant element if we can.  */
> >if (all_same)
> >  {
> > -  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, 0));
> > +  rtx x = copy_to_mode_reg (inner_mode, v0);
> >aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
> >return;
> >  }
> >  
> > -  /* Half the fields (or less) are non-constant.  Load constant then 
> > overwrite
> > - varying fields.  Hope that this is more efficient than using the 
> > stack.  */
> > -  if (n_var <= n_elts/2)
> > +  /* Initialise a vector which is part-variable.  We want to first try
> > + to build those lanes which are constant in the most efficient way we
> > + can.  */
> > +  if (n_var != n_elts)
> >  {
> >rtx copy = copy_rtx (vals);
> >  
> > @@ -11054,31 +11064,21 @@ aarch64_expand_vector_init (rtx target, rtx vals)
> >   XVECEXP (copy, 0, i) = subst;
> > }
> >aarch64_expand_vector_init (target, copy);
> > +}
> >  
> > -  /* Insert variables.  */
> > -  enum insn_code icode = optab_handler (vec_set_optab, mode);
> > -  gcc_assert (icode != CODE_FOR_nothing);
> > +  /* Insert the variable lanes directly.  */
> >  
> > -  for (int i = 0; i < n_elts; i++)
> > -   {
> > - rtx x = XVECEXP (vals, 0, i);
> > - if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
> > -   continue;
> > - x = copy_to_mode_reg (inner_mode, x);
> > - emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
> > -   }
> > -  return;
> > -}
> > +  enum insn_code icode = optab_handler (vec_set_optab, mode);
> > +  gcc_assert (icode != CODE_FOR_nothing);
> >  
> > -  /* Construct the vector in memory one field at a time
> > - and load the whole vector.  */
> > -  rtx mem = assign_stack_temp (mode, GET_MODE_SIZE (mode));
> >for (int i = 

Re: [AArch64] Remove AARCH64_EXTRA_TUNE_RECIP_SQRT from Cortex-A57 tuning

2016-02-08 Thread James Greenhalgh
On Mon, Feb 01, 2016 at 02:00:01PM +, James Greenhalgh wrote:
> On Mon, Jan 25, 2016 at 11:20:46AM +, James Greenhalgh wrote:
> > On Mon, Jan 11, 2016 at 12:04:43PM +, James Greenhalgh wrote:
> > > 
> > > Hi,
> > > 
> > > I've seen a couple of large performance issues caused by expanding
> > > the high-precision reciprocal square root for Cortex-A57, so I'd like
> > > to turn it off by default.
> > > 
> > > This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from
> > > Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) for
> > > some private microbenchmark kernels which stress the divide/sqrt/multiply
> > > units. It therefore seems to me to be the correct choice to make across
> > > a number of workloads.
> > > 
> > > Bootstrapped and tested on aarch64-none-linux-gnu with no issues.
> > > 
> > > OK?
> > 
> > *Ping*
> 
> *pingx2*

*ping^3*

Thanks,
James

> > > ---
> > > 2015-12-11  James Greenhalgh  
> > > 
> > >   * config/aarch64/aarch64.c (cortexa57_tunings): Remove
> > >   AARCH64_EXTRA_TUNE_RECIP_SQRT.
> > > 
> > 
> > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > > index 1d5d898..999c9fc 100644
> > > --- a/gcc/config/aarch64/aarch64.c
> > > +++ b/gcc/config/aarch64/aarch64.c
> > > @@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings =
> > >0, /* max_case_values.  */
> > >0, /* cache_line_size.  */
> > >tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
> > > -  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS
> > > -   | AARCH64_EXTRA_TUNE_RECIP_SQRT)  /* tune_flags.  */
> > > +  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS)   /* tune_flags.  */
> > >  };
> > >  
> > >  static const struct tune_params cortexa72_tunings =
> > 
> 


Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt

2016-02-08 Thread James Greenhalgh
On Mon, Feb 01, 2016 at 01:59:34PM +, James Greenhalgh wrote:
> On Mon, Jan 25, 2016 at 11:21:25AM +, James Greenhalgh wrote:
> > On Mon, Jan 11, 2016 at 11:53:39AM +, James Greenhalgh wrote:
> > > 
> > > Hi,
> > > 
> > > I'd like to switch the logic around in aarch64.c such that
> > > -mlow-precision-recip-sqrt causes us to always emit the low-precision
> > > software expansion for reciprocal square root. I have two reasons to do
> > > this; first is consistency across -mcpu targets, second is enabling more
> > > -mcpu targets to use the flag for peak tuning.
> > > 
> > > I don't much like that the precision we use for -mlow-precision-recip-sqrt
> > > differs between cores (and possibly compiler revisions). Yes, we're
> > > under -ffast-math but I take this flag to mean the user explicitly wants 
> > > the
> > > low-precision expansion, and we should not diverge from that based on an
> > > internal decision as to what is optimal for performance in the
> > > high-precision case. I'd prefer to keep things as predictable as possible,
> > > and here that means always emitting the low-precision expansion when 
> > > asked.
> > > 
> > > Judging by the comments in the thread proposing the reciprocal square
> > > root optimisation, this will benefit all cores currently supported by GCC.
> > > To be clear, we would still not expand in the high-precision case for any
> > > cores which do not explicitly ask for it. Currently that is Cortex-A57
> > > and xgene, though I will be proposing a patch to remove Cortex-A57 from
> > > that list shortly.
> > > 
> > > Which gives my second motivation for this patch. 
> > > -mlow-precision-recip-sqrt
> > > is intended as a tuning flag for situations where performance is more
> > > important than precision, but the current logic requires setting an
> > > internal flag which also changes the performance characteristics where
> > > high-precision is needed. This conflates two decisions the target might
> > > want to make, and reduces the applicability of an option targets might
> > > want to enable for performance. In particular, I'd still like to see
> > > -mlow-precision-recip-sqrt continue to emit the cheaper, low-precision
> > > sequence for floats under Cortex-A57.
> > > 
> > > Based on that reasoning, this patch makes the appropriate change to the
> > > logic. I've checked with the current -mcpu values to ensure that behaviour
> > > without -mlow-precision-recip-sqrt does not change, and that behaviour
> > > with -mlow-precision-recip-sqrt is to emit the low precision sequences.
> > > 
> > > I've also put this through bootstrap and test on aarch64-none-linux-gnu
> > > with no issues.
> > > 
> > > OK?
> > 
> > *Ping*
> 
> *Pingx2*

*Ping^3*

Thanks,
James

> > > 2015-12-10  James Greenhalgh  
> > > 
> > >   * config/aarch64/aarch64.c (use_rsqrt_p): Always use software
> > >   reciprocal sqrt for -mlow-precision-recip-sqrt.
> > > 
> > 
> > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > > index 9142ac0..1d5d898 100644
> > > --- a/gcc/config/aarch64/aarch64.c
> > > +++ b/gcc/config/aarch64/aarch64.c
> > > @@ -7485,8 +7485,9 @@ use_rsqrt_p (void)
> > >  {
> > >return (!flag_trapping_math
> > > && flag_unsafe_math_optimizations
> > > -   && (aarch64_tune_params.extra_tuning_flags
> > > -   & AARCH64_EXTRA_TUNE_RECIP_SQRT));
> > > +   && ((aarch64_tune_params.extra_tuning_flags
> > > +& AARCH64_EXTRA_TUNE_RECIP_SQRT)
> > > +   || flag_mrecip_low_precision_sqrt));
> > >  }
> > >  
> > >  /* Function to decide when to use
> > 
> 


Re: [PATCH] Avoid bugs like PR68273 to trigger

2016-02-08 Thread Eric Botcazou
> It helps vectorization.  This was specifically introduced to fix a
> regression on powerpc.
> 
> See PR65310.

Indeed, the SRA code was much better before that one liner. ;-)

-- 
Eric Botcazou


Re: [patch] Fix SSA_NAME_OCCURS_IN_ABNORMAL_PHI issue in backprop

2016-02-08 Thread Richard Biener
On Mon, Feb 8, 2016 at 11:37 AM, Eric Botcazou  wrote:
> Hi,
>
> the Ada runtime fails to build for platforms still using the old SJLJ scheme
> because of the new GIMPLE backprop pass (hence it's a regression):
>
> eric@polaris:~/build/gcc/native> gcc/gnat1 -quiet a-ncelfu.ads -gnatpg -O -I
> ~/svn/gcc/gcc/ada/
> a-ngcefu.adb: In function 'Ada.Numerics.Complex_Elementary_Functions.Sqrt':
> a-ngcefu.adb:16:4: error: SSA_NAME_OCCURS_IN_ABNORMAL_PHI should be set
> for SSA_NAME: _28 in statement:
> xr_3(ab) = PHI  _28(6), _28(7), xr_4(ab)(11), xr_4(ab)(12), xr_4(ab)(14), xr_4(ab)(15),
> xr_4(ab)(16), xr_4(ab)(17), xr_4(ab)(18), xr_4(ab)(19), xr_4(ab)(20), xr_4(ab)
> (21), xr_4(ab)(22), xr_4(ab)(23), xr_4(ab)(24), xr_4(ab)(25), xr_4(ab)(26),
> xr_4(ab)(27), xr_4(ab)(28), xr_4(ab)(29), xr_4(ab)(30), xr_4(ab)(32), xr_4(ab)
> (34), xr_4(ab)(36), xr_4(ab)(37)>
> PHI argument
> _28
> for PHI node
> xr_3(ab) = PHI  _28(6), _28(7), xr_4(ab)(11), xr_4(ab)(12), xr_4(ab)(14), xr_4(ab)(15),
> xr_4(ab)(16), xr_4(ab)(17), xr_4(ab)(18), xr_4(ab)(19), xr_4(ab)(20), xr_4(ab)
> (21), xr_4(ab)(22), xr_4(ab)(23), xr_4(ab)(24), xr_4(ab)(25), xr_4(ab)(26),
> xr_4(ab)(27), xr_4(ab)(28), xr_4(ab)(29), xr_4(ab)(30), xr_4(ab)(32), xr_4(ab)
> (34), xr_4(ab)(36), xr_4(ab)(37)>
> +===GNAT BUG DETECTED==+
> | 6.0.0 20160206 (experimental) [trunk revision 233194] (x86_64-suse-linux)
> GCC error:|
> | verify_ssa failed|
> | Error detected around a-ngcefu.adb:16:4
>
> The problem is that the pass propagates the source RHS (_28) of:
>
>   xr_29(ab) = ABS_EXPR <_28>;
>
> into a PHI note using xr_29 along an AB edge so the above check triggers.
>
> I think replacing such an AB SSA_NAME is problematic in any case: if the RHS
> is not AB, then the check triggers; if the RHS is also AB, then we may create
> overlapping live ranges and the compilation will abort later.  We can probably
> create a new SSA_NAME but it's not really the spirit of the backprop pass and
> I'm not sure it's worth the hassle, hence the minimal attached fixlet.

I think the problematic transform is when the replacement happens on a PHI arg
with its edge being abnormal only.  Thus simply do

  if (EDGE_PRED (gimple_bb (phi), PHI_ARG_INDEX_FROM_USE (use))->flags
& EDGE_ABNORMAL)
   continue;

at the start of the FOR_EACH_PHI_ARG body.

Thanks,
Richard.

> Tested on x86_64-suse-linux, OK for the mainline?
>
>
> 2016-02-08  Eric Botcazou  
>
> * gimple-ssa-backprop.c (optimize_phi): Do not replace an argument
> with SSA_NAME_OCCURS_IN_ABNORMAL_PHI set.
>
> --
> Eric Botcazou


Re: [PATCH, PR59627, c++] Handle DECL_OMP_DECLARE_REDUCTION in discriminator_for_local_entity

2016-02-08 Thread Jakub Jelinek
On Mon, Feb 08, 2016 at 11:20:57AM +0100, Tom de Vries wrote:
> When running libgomp testsuite with -flto, we run into an ICE in the udr-*.C
> tests.
> 
> A minimal testcase is in listed in PR59627:
> ...
> struct A { int i; };
> 
> void foo()
> {
>   A a;
>   #pragma omp declare reduction (+: A: omp_out.i += omp_in.i)
>   #pragma omp parallel reduction (+: a)
>   ;
> }
> ...
> The problem seems to be that we're trying to get a discriminator (the
> lexical ordinal of a var among entities with the same name in the same
> function) for a DECL_OMP_DECLARE_REDUCTION_P function, which is not handled
> in discriminator_for_local_entity.
> 
> For the test-case above the declared reduction is shown as:
> ...
>   void omp declare reduction operator+~1A (struct A &);
> ...
> 
> AFAIU, those DECL_OMP_DECLARE_REDUCTION_P decls are unique in a function, so
> I'd say we can simply return '0'.

They certainly aren't unique to a function, you can have hundreds of them:
void foo()
{  
  A a;
  {
#pragma omp declare reduction (+: A: omp_out.i += omp_in.i)
#pragma omp parallel reduction (+: a)
;
  }
  {
#pragma omp declare reduction (+: A: omp_out.i += omp_in.i)
#pragma omp parallel reduction (+: a)
;
  }
  {
#pragma omp declare reduction (+: A: omp_out.i += omp_in.i)
#pragma omp parallel reduction (+: a)
;
  }
//...
}
etc.  But they really should not be mangled, so IMHO it is better
to ensure they are not mangled, nothing really needs their mangled name for
anything.  They don't show up in the debug info, don't have a real body,
nothing calls them...

Thus I think I'd prefer something like following patch
(so far untested, will test later today), or returning early from
mangle_decl for DECL_OMP_DECLARE_REDUCTION_P:

2016-02-08  Jakub Jelinek  

PR c++/59627
* parser.c (cp_parser_omp_declare_reduction): Set assembler name
of the DECL_OMP_DECLARE_REDUCTION_P decls.

* g++.dg/gomp/pr59627.C: New test.

--- gcc/cp/parser.c.jj  2016-01-29 21:32:54.0 +0100
+++ gcc/cp/parser.c 2016-02-08 11:45:34.229860265 +0100
@@ -36080,6 +36080,7 @@ cp_parser_omp_declare_reduction (cp_pars
   DECL_DECLARED_INLINE_P (fndecl) = 1;
   DECL_IGNORED_P (fndecl) = 1;
   DECL_OMP_DECLARE_REDUCTION_P (fndecl) = 1;
+  SET_DECL_ASSEMBLER_NAME (fndecl, get_identifier (""));
   DECL_ATTRIBUTES (fndecl)
= tree_cons (get_identifier ("gnu_inline"), NULL_TREE,
 DECL_ATTRIBUTES (fndecl));
--- gcc/testsuite/g++.dg/gomp/pr59627.C.jj  2016-02-08 11:59:26.650053692 
+0100
+++ gcc/testsuite/g++.dg/gomp/pr59627.C 2016-02-08 11:55:52.0 +0100
@@ -0,0 +1,14 @@
+// PR c++/59627
+// { dg-do compile { target lto } }
+// { dg-options "-fopenmp -flto" }
+
+struct A { A () : i (0) {} int i; };
+
+void
+foo ()
+{
+  A a;
+  #pragma omp declare reduction (+: A: omp_out.i += omp_in.i)
+  #pragma omp parallel reduction (+: a)
+  ;
+}

Jakub


Re: [PATCH] Avoid bugs like PR68273 to trigger

2016-02-08 Thread Richard Biener
On Mon, 8 Feb 2016, Eric Botcazou wrote:

> > It helps vectorization.  This was specifically introduced to fix a
> > regression on powerpc.
> > 
> > See PR65310.
> 
> Indeed, the SRA code was much better before that one liner. ;-)

Not sure what to make of this comment ;)  Did you see where we
create the SSA name with the overaligned type?

Richard.


Re: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian

2016-02-08 Thread Kyrill Tkachov

Hi Charles,

On 03/02/16 18:59, charles.bay...@linaro.org wrote:

From: Charles Baylis 

gcc/ChangeLog:

2016-02-03  Charles Baylis  

PR target/68532
* config/arm/arm.c (neon_endian_lane_map): New function.
(neon_vector_pair_endian_lane_map): New function.
(arm_evpc_neon_vuzp): Allow for big endian lane order.
* config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big
endian.
(vuzpq_s16): Likewise.
(vuzpq_s32): Likewise.
(vuzpq_f32): Likewise.
(vuzpq_u8): Likewise.
(vuzpq_u16): Likewise.
(vuzpq_u32): Likewise.
(vuzpq_p8): Likewise.
(vuzpq_p16): Likewise.

gcc/testsuite/ChangeLog:

2015-12-15  Charles Baylis  

PR target/68532
* gcc.c-torture/execute/pr68532.c: New test.

Change-Id: Ifd35d79bd42825f05403a1b96d8f34ef0f21dac3

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d8a2745..e9aa982 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28208,6 +28208,35 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx 
sel)
arm_expand_vec_perm_1 (target, op0, op1, sel);
  }
  
+/* map lane ordering between architectural lane order, and GCC lane order,

+   taking into account ABI.  See comment above output_move_neon for details.  
*/
+static int
+neon_endian_lane_map (machine_mode mode, int lane)


s/map/Map/
New line between comment and function signature.


+{
+  if (BYTES_BIG_ENDIAN)
+  {
+int nelems = GET_MODE_NUNITS (mode);
+/* Reverse lane order.  */
+lane = (nelems - 1 - lane);
+/* Reverse D register order, to match ABI.  */
+if (GET_MODE_SIZE (mode) == 16)
+  lane = lane ^ (nelems / 2);
+  }
+  return lane;
+}
+
+/* some permutations index into pairs of vectors, this is a helper function
+   to map indexes into those pairs of vectors.  */
+static int
+neon_pair_endian_lane_map (machine_mode mode, int lane)


Similarly, s/some/Some/ and new line after comment.


+{
+  int nelem = GET_MODE_NUNITS (mode);
+  if (BYTES_BIG_ENDIAN)
+lane =
+  neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem);
+  return lane;
+}
+
  /* Generate or test for an insn that supports a constant permutation.  */
  
  /* Recognize patterns for the VUZP insns.  */

@@ -28218,14 +28247,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
unsigned int i, odd, mask, nelt = d->nelt;
rtx out0, out1, in0, in1;
rtx (*gen)(rtx, rtx, rtx, rtx);
+  int first_elem;
+  int swap;
  


Just make this a bool.


if (GET_MODE_UNIT_SIZE (d->vmode) >= 8)
  return false;
  
-  /* Note that these are little-endian tests.  Adjust for big-endian later.  */

-  if (d->perm[0] == 0)
+  /* arm_expand_vec_perm_const_1 () helpfully swaps the operands for the
+ big endian pattern on 64 bit vectors, so we correct for that.  */
+  swap = BYTES_BIG_ENDIAN && !d->one_vector_p
+&& GET_MODE_SIZE (d->vmode) == 8 ? d->nelt : 0;
+
+  first_elem = d->perm[neon_endian_lane_map (d->vmode, 0)] ^ swap;
+
+  if (first_elem == neon_endian_lane_map (d->vmode, 0))
  odd = 0;
-  else if (d->perm[0] == 1)
+  else if (first_elem == neon_endian_lane_map (d->vmode, 1))
  odd = 1;
else
  return false;
@@ -28233,8 +28270,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
  
for (i = 0; i < nelt; i++)

  {
-  unsigned elt = (i * 2 + odd) & mask;
-  if (d->perm[i] != elt)
+  unsigned elt =
+   (neon_pair_endian_lane_map (d->vmode, i) * 2 + odd) & mask;
+  if ((d->perm[i] ^ swap) != neon_pair_endian_lane_map (d->vmode, elt))
return false;
  }
  
@@ -28258,10 +28296,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
  
in0 = d->op0;

in1 = d->op1;
-  if (BYTES_BIG_ENDIAN)
+  if (swap)
  {
std::swap (in0, in1);
-  odd = !odd;
  }


remove the braces around the std::swap

  
out0 = d->target;

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 47816d5..2e014b6 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -8741,9 +8741,9 @@ vuzpq_s8 (int8x16_t __a, int8x16_t __b)
int8x16x2_t __rv;
  #ifdef __ARM_BIG_ENDIAN
__rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t)
-  { 17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15 });
+  { 9, 11, 13, 15, 1, 3, 5, 7, 25, 27, 29, 31, 17, 19, 21, 23 });
__rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t)
-  { 16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14 });
+  { 8, 10, 12, 14, 0, 2, 4, 6, 24, 26, 28, 30, 16, 18, 20, 22 });
  #else
__rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t)
{ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 });
@@ -8759,9 +8759,9 @@ vuzpq_s16 (int16x8_t __a, int16x8_t __b)
int16x8x2_t __rv;
  #ifdef __ARM_BIG_ENDIAN
__rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t)
-  { 9, 11, 13, 15, 1, 3, 5, 7 });
+  { 5, 7, 1, 3, 13, 15, 9, 11 });
__rv.val[1] = __b

Re: [PATCH 2/2] [ARM] PR68532 Fix up vzip recognition for big endian

2016-02-08 Thread Kyrill Tkachov

Hi Charles,

On 03/02/16 18:59, charles.bay...@linaro.org wrote:

From: Charles Baylis 

gcc/ChangeLog:

2016-02-03  Charles Baylis  

PR target/68532
* config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane
order.
* config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big
endian.
(vzipq_s16): Likewise.
(vzipq_s32): Likewise.
(vzipq_f32): Likewise.
(vzipq_u8): Likewise.
(vzipq_u16): Likewise.
(vzipq_u32): Likewise.
(vzipq_p8): Likewise.
(vzipq_p16): Likewise.

Change-Id: I327678f5e73c1de2f413c1d22769ab42ce1d6c16

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e9aa982..24239db 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28318,15 +28318,21 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
unsigned int i, high, mask, nelt = d->nelt;
rtx out0, out1, in0, in1;
rtx (*gen)(rtx, rtx, rtx, rtx);
+  int first_elem;
+  bool is_swapped;
  
if (GET_MODE_UNIT_SIZE (d->vmode) >= 8)

  return false;
  
+  is_swapped = BYTES_BIG_ENDIAN ? true : false;


This is just "is_swapped = BYTES_BIG_ENDIAN;"


+
/* Note that these are little-endian tests.  Adjust for big-endian later.  
*/


I think you can remove this comment now, like in patch 1/2


+  first_elem = d->perm[neon_endian_lane_map (d->vmode, 0) ^ is_swapped];
+
high = nelt / 2;
-  if (d->perm[0] == high)
+  if (first_elem == neon_endian_lane_map (d->vmode, high))
  ;
-  else if (d->perm[0] == 0)
+  else if (first_elem == neon_endian_lane_map (d->vmode, 0))
  high = 0;
else
  return false;
@@ -28334,11 +28340,16 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
  
for (i = 0; i < nelt / 2; i++)

  {
-  unsigned elt = (i + high) & mask;
-  if (d->perm[i * 2] != elt)
+  unsigned elt =
+   neon_pair_endian_lane_map (d->vmode, i + high) & mask;
+  if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + is_swapped)]
+ != elt)
return false;
-  elt = (elt + nelt) & mask;
-  if (d->perm[i * 2 + 1] != elt)
+  elt =
+   neon_pair_endian_lane_map (d->vmode, i + nelt + high)
+   & mask;


The "& mask" can go on the previous line.


+  if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + !is_swapped)]
+ != elt)
return false;
  }
  
@@ -28362,10 +28373,9 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
  
in0 = d->op0;

in1 = d->op1;
-  if (BYTES_BIG_ENDIAN)
+  if (is_swapped)
  {
std::swap (in0, in1);
-  high = !high;
  }


remove the braces around the std::swap.

Ok with these changes.
I've tried out both patch and they do fix execution failures on big-endian
and don't break any NEON intrinsics tests that I threw at them.

  
out0 = d->target;

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 2e014b6..aa17f49 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -8453,9 +8453,9 @@ vzipq_s8 (int8x16_t __a, int8x16_t __b)
int8x16x2_t __rv;
  #ifdef __ARM_BIG_ENDIAN
__rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t)
-  { 24, 8, 25, 9, 26, 10, 27, 11, 28, 12, 29, 13, 30, 14, 31, 15 });
+  { 20, 4, 21, 5, 22, 6, 23, 7, 16, 0, 17, 1, 18, 2, 19, 3 });
__rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t)
-  { 16, 0, 17, 1, 18, 2, 19, 3, 20, 4, 21, 5, 22, 6, 23, 7 });
+  { 28, 12, 29, 13, 30, 14, 31, 15, 24, 8, 25, 9, 26, 10, 27, 11 });
  #else
__rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t)
{ 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 });
@@ -8471,9 +8471,9 @@ vzipq_s16 (int16x8_t __a, int16x8_t __b)
int16x8x2_t __rv;
  #ifdef __ARM_BIG_ENDIAN
__rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t)
-  { 12, 4, 13, 5, 14, 6, 15, 7 });
+  { 10, 2, 11, 3, 8, 0, 9, 1 });
__rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t)
-  { 8, 0, 9, 1, 10, 2, 11, 3 });
+  { 14, 6, 15, 7, 12, 4, 13, 5 });
  #else
__rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t)
{ 0, 8, 1, 9, 2, 10, 3, 11 });
@@ -8488,8 +8488,8 @@ vzipq_s32 (int32x4_t __a, int32x4_t __b)
  {
int32x4x2_t __rv;
  #ifdef __ARM_BIG_ENDIAN
-  __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 6, 2, 7, 3 });
-  __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 0, 5, 1 });
+  __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 1, 4, 0 });
+  __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 7, 3, 6, 2 });
  #else
__rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 4, 1, 5 });
__rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 6, 3, 7 });
@@ -8502,8 +8502,8 @@ vzipq_f32 (float32x4_t __a, float32x4_t __b)
  {
float32x4x2_t __rv;
  #ifdef __ARM_BIG_ENDIAN
-  __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 6, 2, 7, 3 });
-  __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4,

Re: [PATCH] Avoid bugs like PR68273 to trigger

2016-02-08 Thread Eric Botcazou
> Not sure what to make of this comment ;)

I guess it was a variant of the usual can-of-worms-opening thing. :-)

> Did you see where we create the SSA name with the overaligned type?

In tree-ssa-pre.c:insert_into_preds_of_block:

(gdb) p debug_pre_expr(expr)
{mem_ref<0B>,yyvsp_1}@.MEM_2

(gdb) p debug_tree(type)
 
unit size 
align 8 symtab 0 alias set -1 canonical type 0x76cea3f0 context 

pointer_to_this  chain >
sizes-gimplified unsigned SI
size  constant 32>
unit size  constant 4>
align 64 symtab 0 alias set -1 canonical type 0x76cead20>

  /* Now build a phi for the new variable.  */
  temp = make_temp_ssa_name (type, NULL, "prephitmp");

-- 
Eric Botcazou


Re: [PATCH v3] PR48344: Fix unrecognizable insn error with -fstack-limit-register=r2

2016-02-08 Thread Bernd Schmidt

On 01/27/2016 07:12 PM, Kelvin Nilsen wrote:

+/* During execution of handle_common_deferred_options (), the Pmode
+   variable cannot be used because it has not yet been initialized.
+   For this reason, handling of the OPT_fstack_limit_register_ and
+   OPT_fstack_limit_symbol_ options is deferred until execution
+   of finish_deferred_option_handling (), which is invoked following
+   target-specific initialization.
+
+   The variable opt_fstack_limit_symbol_arg represents the name
+   of the register specified in an OPT_fstack_limit_symbol_ command
+   line option, or NULL to represent that no OPT_fstack_limit_symbol_
+   option is active.
+
+   The variable opt_fstack_limit_register_no represents the number of
+   the register specified in an OPT_fstack_limit_register_
+   command-line option, or -1 to indicate that no
+   OPT_fstack_limit_register_ option is active.  (Legal register
+   numbers are all >= 0.)
+
+   Note that these two command-line options are mutually exclusive.
+   If both are specified, subsequent option processing overwrites
+   earlier option processing. */
+
+static const char *opt_fstack_limit_symbol_arg = NULL;
+static int opt_fstack_limit_register_no = -1;
+


My suggestion would be: make these global, and move the code to 
initialize the rtx to init_emit_once. Then use a much shorter comment 
here ("Store values of -fstack-limit arguments for use by 
init_emit_once."). Also, note that there should be two spaces at the end 
of a sentence, even at the end of each comment.



Bernd


Re: [PATCH] Avoid bugs like PR68273 to trigger

2016-02-08 Thread Richard Biener
On Mon, 8 Feb 2016, Eric Botcazou wrote:

> > Not sure what to make of this comment ;)
> 
> I guess it was a variant of the usual can-of-worms-opening thing. :-)
> 
> > Did you see where we create the SSA name with the overaligned type?
> 
> In tree-ssa-pre.c:insert_into_preds_of_block:
> 
> (gdb) p debug_pre_expr(expr)
> {mem_ref<0B>,yyvsp_1}@.MEM_2
> 
> (gdb) p debug_tree(type)
>   type  size 
> unit size 
> align 8 symtab 0 alias set -1 canonical type 0x76cea3f0 context 
> 
> pointer_to_this  chain  0x76c54a18 D.1410>>
> sizes-gimplified unsigned SI
> size  bitsizetype> constant 32>
> unit size  sizetype> constant 4>
> align 64 symtab 0 alias set -1 canonical type 0x76cead20>
> 
>   /* Now build a phi for the new variable.  */
>   temp = make_temp_ssa_name (type, NULL, "prephitmp");

Yes, that place I just fixed.  I mean for the SRA case.

Richard.


Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-08 Thread Bernd Schmidt

On 02/08/2016 10:09 AM, Richard Biener wrote:

The gcc.target/i386/addr-sel-1.c (for PR28940) seems to just started
working at some point past in time and thus it was added and the
bug closed.  You could say RA does a better job after the patch
as it uses 1 less register but that restricts the followup
postreload combine attempts.  Though I wonder about what's "better"
RA here - isn't the best allocation one that avoids spills but
uses as many registers as possible (at least when targeting a CPU
that cannot to register renaming)?  regrename doesn't help this
testcase either (it runs too late and does a renaming that doesn't help).


I don't think regrename's place in the pass pipeline is set in stone. If 
we enable it for gcc-7. we could also experiment with putting it just 
before postreload, or schedule another reload-combine afterwards.



Bernd



libgcc: On AIX, increase chances to find landing pads for exceptions

2016-02-08 Thread Michael Haubenwallner
Hi David,

still experiencing exception-not-caught problems with gcc-4.2.4 on AIX
leads me to some patch proposed in http://gcc.gnu.org/PR13878 back in
2004 already, ought to be fixed by some different commit since 3.4.0.

As long as build systems (even libtool right now) on AIX do export these
_GLOBAL__* symbols from shared libraries, overlapping frame-base address
ranges may become registered, even if newer gcc (seen with 4.8) does name
the FDE symbols more complex to reduce these chances.

But still, just think of linking some static library into multiple shared
libraries and/or the main executable. Or sometimes there is just need for
some hackery to override a shared object's implementation detail and rely
on runtime linking to do the override at runtime.

Agreed both is "wrong" to some degree, but the larger an application is,
the higher is the chance for this to happen.

Thoughts?

Thanks!
/haubi/
2016-02-08  Michael Haubenwallner  

	On AIX, increase chances to find landing pads for exceptions.
	* unwind-dw2-fde.c (_Unwind_Find_FDE): Stop assuming registered
	object's address ranges to not overlap.

--- libgcc/unwind-dw2-fde.c
+++ libgcc/unwind-dw2-fde.c
@@ -1013,7 +1013,25 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
 	f = search_object (ob, pc);
 	if (f)
 	  goto fini;
+	/*
 	break;
+	   In an ideal world, even on AIX, we could break here because
+	   objects would not overlap. But the larger an application is,
+	   the more likely an "overlap" may happen (on AIX) because of:
+	   - Shared libraries do export the FDE symbols ("_GLOBAL__F*"),
+	 which is a bug in their build system, but out of gcc's control.
+	   - Other shared libraries, or the main executable, do contain
+	 identical or similar object files - which is suboptimal,
+	 but may be intentional. However, exporting their FDE symbols,
+	 which may have identical names as seen in the former shared
+	 libraries, again is a bug in their build system, but still
+	 out of gcc's control.
+	   - Finally, run time linking is enabled, redirecting adresses of
+	 identically named exported symbols from their original shared
+	 library's address range into another shared library's or the
+	 main executable's address range.
+	   This results in address ranges being registered by different
+	   objects to potentially overlap.  */
   }
 
   /* Classify and search the objects we've not yet processed.  */


Re: [PATCH][AArch64] PR target/69161: Don't use special predicate for CCmode comparisons in expressions that require matching modes

2016-02-08 Thread Kyrill Tkachov

Hi James,

On 04/02/16 13:49, James Greenhalgh wrote:

On Fri, Jan 29, 2016 at 02:27:34PM +, Kyrill Tkachov wrote:

Hi all,

In this PR we ICE during combine when trying to propagate a comparison into a 
vec_duplicate,
that is we end up creating the rtx:
(vec_duplicate:V4SI (eq:CC_NZ (reg:CC_NZ 66 cc)
 (const_int 0 [0])))

The documentation for vec_duplicate says:
"The output vector mode must have the same submodes as the input vector mode or the 
scalar modes"
So this is invalid RTL, which triggers an assert in simplify-rtx to that effect.

It has been suggested on the PR that this is because we use a special_predicate 
for
aarch64_comparison_operator which means that it ignores the mode when matching.
This is fine when used in RTXes that don't need it, like if_then_else 
expressions
but can cause trouble when used in places where the modes do matter, like in
SET operations. In this particular ICE the cause was the conditional store
patterns that could end up matching an intermediate rtx during combine of
(set (reg:SI) (eq:CC_NZ x y)).

The suggested solution is to define a separate predicate with the same
conditions as aarch64_comparison_operator but make it not special, so it gets
automatic mode checks to prevent such a situation.

This patch does that.
Bootstrapped and tested on aarch64-linux-gnu.
SPEC2006 codegen did not change with this patch, so there shouldn't be
any code quality regressions.

Ok for trunk?

It would be good to leave a more detailed comment on
"aarch64_comparison_operator_mode" as to why we need it.

Otherwise, this is OK.


Ok, here's the patch with a comment added.
I'll commit it when the arm version is approved as well.

Thanks,
Kyrill


Thanks,
James


Thanks,
Kyrill

2016-01-29  Kyrylo Tkachov  

 PR target/69161
 * config/aarch64/predicates.md (aarch64_comparison_operator_mode):
 New predicate.
 (aarch64_comparison_operator): Break overly long line into two.
 (aarch64_comparison_operation): Likewise.
 * config/aarch64/aarch64.md (cstorecc4): Use
 aarch64_comparison_operator_mode instead of
 aarch64_comparison_operator.
 (cstore4): Likewise.
 (aarch64_cstore): Likewise.
 (*cstoresi_insn_uxtw): Likewise.
 (cstore_neg): Likewise.
 (*cstoresi_neg_uxtw): Likewise.

2016-01-29  Kyrylo Tkachov  

 PR target/69161
 * gcc.c-torture/compile/pr69161.c: New test.


diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 221f106430ea2c4a44352d937e660d0c1bbe10da..bf2140c5fd9458476d42e49100347dd05e1b21df 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3039,7 +3039,7 @@ (define_expand "cstore4"
 
 (define_expand "cstorecc4"
   [(set (match_operand:SI 0 "register_operand")
-   (match_operator 1 "aarch64_comparison_operator"
+   (match_operator 1 "aarch64_comparison_operator_mode"
 	[(match_operand 2 "cc_register")
  (match_operand 3 "const0_operand")]))]
   ""
@@ -3051,7 +3051,7 @@ (define_expand "cstorecc4"
 
 (define_expand "cstore4"
   [(set (match_operand:SI 0 "register_operand" "")
-	(match_operator:SI 1 "aarch64_comparison_operator"
+	(match_operator:SI 1 "aarch64_comparison_operator_mode"
 	 [(match_operand:GPF 2 "register_operand" "")
 	  (match_operand:GPF 3 "aarch64_fp_compare_operand" "")]))]
   ""
@@ -3064,7 +3064,7 @@ (define_expand "cstore4"
 
 (define_insn "aarch64_cstore"
   [(set (match_operand:ALLI 0 "register_operand" "=r")
-	(match_operator:ALLI 1 "aarch64_comparison_operator"
+	(match_operator:ALLI 1 "aarch64_comparison_operator_mode"
 	 [(match_operand 2 "cc_register" "") (const_int 0)]))]
   ""
   "cset\\t%0, %m1"
@@ -3109,7 +3109,7 @@ (define_insn_and_split "*compare_cstore_insn"
 (define_insn "*cstoresi_insn_uxtw"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(zero_extend:DI
-	 (match_operator:SI 1 "aarch64_comparison_operator"
+	 (match_operator:SI 1 "aarch64_comparison_operator_mode"
 	  [(match_operand 2 "cc_register" "") (const_int 0)])))]
   ""
   "cset\\t%w0, %m1"
@@ -3118,7 +3118,7 @@ (define_insn "*cstoresi_insn_uxtw"
 
 (define_insn "cstore_neg"
   [(set (match_operand:ALLI 0 "register_operand" "=r")
-	(neg:ALLI (match_operator:ALLI 1 "aarch64_comparison_operator"
+	(neg:ALLI (match_operator:ALLI 1 "aarch64_comparison_operator_mode"
 		  [(match_operand 2 "cc_register" "") (const_int 0)])))]
   ""
   "csetm\\t%0, %m1"
@@ -3129,7 +3129,7 @@ (define_insn "cstore_neg"
 (define_insn "*cstoresi_neg_uxtw"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(zero_extend:DI
-	 (neg:SI (match_operator:SI 1 "aarch64_comparison_operator"
+	 (neg:SI (match_operator:SI 1 "aarch64_comparison_operator_mode"
 		  [(match_operand 2 "cc_register" "") (const_int 0)]]
   ""
   "csetm\\t%w0, %m1"
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index e80e40683cada374303ea987017f95654531a032..11868278c3d0a1887c2065568f890c3eb8ff7f0b 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/g

Re: [PATCH] Avoid bugs like PR68273 to trigger

2016-02-08 Thread Eric Botcazou
> Yes, that place I just fixed.  I mean for the SRA case.

Are you sure that there is one?

  expr = build_ref_for_model (loc, agg, access->offset - top_offset,
  access, gsi, insert_after);

  if (write)
{
  if (access->grp_partial_lhs)
expr = force_gimple_operand_gsi (gsi, expr, true, NULL_TREE,
 !insert_after,
 insert_after ? GSI_NEW_STMT
 : GSI_SAME_STMT);
  stmt = gimple_build_assign (repl, expr);


(gdb) p debug_tree(repl)
 
unit size 
align 8 symtab 0 alias set -1 canonical type 0x76cea3f0 
context 
pointer_to_this  chain >
sizes-gimplified unsigned SI
size 
unit size 
align 32 symtab 0 alias set -1 canonical type 0x76cead20>
used unsigned SI file pr68273.c line 67 col 10 size  unit size 
align 32 context >


(gdb) p debug_tree(expr)
 
unit size 
align 8 symtab 0 alias set -1 canonical type 0x76cea3f0 
context 
pointer_to_this  chain >
sizes-gimplified unsigned SI
size 
unit size 
align 64 symtab 0 alias set -1 canonical type 0x76cead20>
   
arg 0 
unsigned SI size  unit size 

align 32 symtab 0 alias set -1 canonical type 0x76d333f0>
   
arg 0 
used BLK file pr68273.c line 67 col 10
size 
unit size 
align 64 context  arg-type 
>>
arg 1  
constant 0>>

-- 
Eric Botcazou


[PATCH, PR67709 ] Don't call call_cgraph_insertion_hooks in simd_clone_create

2016-02-08 Thread Tom de Vries

Hi,

Consider libgomp.fortran/declare-simd-3.f90:
...
subroutine bar
  use declare_simd_2_mod
  real :: b(128)
  integer :: i

  !$omp simd
  do i = 1, 128
b(i) = i * 2.0
  end do
  !$omp simd
  do i = 1, 128
b(i) = foo (7.0_8, 5 * i, b(i))
  end do
  do i = 1, 128
if (b(i).ne.(7.0 + 10.0 * i * i)) call abort
  end do
end subroutine bar
...

when compiling declare-simd-3.f90 with '-O0 -fopenmp -flto 
-fno-use-linker-plugin', we run into an ICE with backtrace:

...
ICE backtrace:
...
src/libgomp/testsuite/libgomp.fortran/declare-simd-3.f90: At top level:
src/libgomp/testsuite/libgomp.fortran/declare-simd-3.f90:7:0: internal 
compiler error: in estimate_function_body_sizes, at 
ipa-inline-analysis.c:2486

   use declare_simd_2_mod

0xc9319d estimate_function_body_sizes
src/gcc/ipa-inline-analysis.c:2486
0xc950dd compute_inline_parameters(cgraph_node*, bool)
src/gcc/ipa-inline-analysis.c:2953
0xc9813b inline_analyze_function(cgraph_node*)
src/gcc/ipa-inline-analysis.c:4078
0xc98205 inline_summary_t::insert(cgraph_node*, inline_summary*)
src/gcc/ipa-inline-analysis.c:4105
0x9a6213 symbol_table::call_cgraph_insertion_hooks(cgraph_node*)
src/gcc/cgraph.c:371
0xdefa0e simd_clone_create
src/gcc/omp-low.c:18738
0xdf5012 expand_simd_clones
src/gcc/omp-low.c:19799
0xdf519b ipa_omp_simd_clone
src/gcc/omp-low.c:19839
0xdf520a execute
src/gcc/omp-low.c:19867
Please submit a full bug report,
...

During pass_omp_simd_clone, we call simd_clone_create for foo, and 
execute the  !old_node->definition part:

...
  tree old_decl = old_node->decl;
  tree new_decl = copy_node (old_node->decl);
  DECL_NAME (new_decl)
= clone_function_name (old_decl, "simdclone");
  SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
  SET_DECL_RTL (new_decl, NULL);
  DECL_STATIC_CONSTRUCTOR (new_decl) = 0;
  DECL_STATIC_DESTRUCTOR (new_decl) = 0;
  new_node = old_node->create_version_clone (new_decl, vNULL, NULL);
  if (old_node->in_other_partition)
new_node->in_other_partition = 1;
  symtab->call_cgraph_insertion_hooks (new_node);
...

The 'symtab->call_cgraph_insertion_hooks (new_node)' calls 
'inline_summary_t::insert', a hook inserted during pass_ipa_inline.


During execution of the hook we stumble over the fact that the new node 
has no 'struct function' in estimate_function_body_sizes:

...
  struct function *my_function = DECL_STRUCT_FUNCTION (node->decl);
  ...
  gcc_assert (my_function && my_function->cfg);
...

The patch fixes the ICE by removing the call to 
'symtab->call_cgraph_insertion_hooks'.


[ The pass before pass_omp_simd_clone is pass_dispatcher_calls. It has a 
function create_target_clone, similar to simd_clone_create, with a 
node.defition and !node.defition part. The !node.defition part does not 
call 'symtab->call_cgraph_insertion_hooks (new_node)'. ]


Bootstrapped and reg-tested on x86_64.

OK for stage1 trunk?

Thanks,
- Tom
Don't call call_cgraph_insertion_hooks in simd_clone_create

2016-02-08  Tom de Vries  

	PR lto/67709
	* omp-low.c (simd_clone_create): Remove call to
	symtab->call_cgraph_insertion_hooks.

	* testsuite/libgomp.fortran/declare-simd-4.f90: New test.

---
 gcc/omp-low.c| 1 -
 libgomp/testsuite/libgomp.fortran/declare-simd-4.f90 | 7 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d41688b..fcbb3e0 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -18735,7 +18735,6 @@ simd_clone_create (struct cgraph_node *old_node)
   new_node = old_node->create_version_clone (new_decl, vNULL, NULL);
   if (old_node->in_other_partition)
 	new_node->in_other_partition = 1;
-  symtab->call_cgraph_insertion_hooks (new_node);
 }
   if (new_node == NULL)
 return new_node;
diff --git a/libgomp/testsuite/libgomp.fortran/declare-simd-4.f90 b/libgomp/testsuite/libgomp.fortran/declare-simd-4.f90
new file mode 100644
index 000..bfdf9cf
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/declare-simd-4.f90
@@ -0,0 +1,7 @@
+! { dg-do run { target { vect_simd_clones && lto } } }
+! { dg-options "-fno-inline -flto -fno-use-linker-plugin" }
+! { dg-additional-sources declare-simd-3.f90 }
+! { dg-additional-options "-msse2" { target sse2_runtime } }
+! { dg-additional-options "-mavx" { target avx_runtime } }
+
+include 'declare-simd-2.f90'


Re: [Patch AArch64] Restrict 16-bit sqrdml{sa}h instructions to FP_LO_REGS

2016-02-08 Thread James Greenhalgh
On Tue, Jan 26, 2016 at 04:04:47PM +, James Greenhalgh wrote:
> 
> Hi,
> 
> In their forms using 16-bit lanes, the sqrdmlah and sqrdmlsh instruction
> available when compiling with -march=armv8.1-a are only usable with
> a register number in the range 0 to 15 for operand 3, as gas will point
> out:
> 
>   Error: register number out of range 0 to 15 at
> operand 3 -- `sqrdmlsh v2.4h,v4.4h,v23.h[5]'
> 
> This patch teaches GCC to avoid registers outside of this range when
> appropriate, in the same fashion as we do for other instructions with
> this limitation.
> 
> Tested on an internal testsuite targeting Neon intrinsics.
> 
> OK?

*ping*

Thanks,
James

> ---
> 2016-01-25  James Greenhalgh  
> 
>   * config/aarch64/aarch64.md
>   (arch64_sqrdmlh_lane): Fix register
>   constraints for operand 3.
>   (aarch64_sqrdmlh_laneq): Likewise.
> 

> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index e1f5682..0b46e78 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3240,7 +3240,7 @@
> [(match_operand:VDQHS 1 "register_operand" "0")
>  (match_operand:VDQHS 2 "register_operand" "w")
>  (vec_select:
> -  (match_operand: 3 "register_operand" "w")
> +  (match_operand: 3 "register_operand" "")
>(parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
> SQRDMLH_AS))]
> "TARGET_SIMD_RDMA"
> @@ -3258,7 +3258,7 @@
> [(match_operand:SD_HSI 1 "register_operand" "0")
>  (match_operand:SD_HSI 2 "register_operand" "w")
>  (vec_select:
> -  (match_operand: 3 "register_operand" "w")
> +  (match_operand: 3 "register_operand" "")
>(parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
> SQRDMLH_AS))]
> "TARGET_SIMD_RDMA"
> @@ -3278,7 +3278,7 @@
> [(match_operand:VDQHS 1 "register_operand" "0")
>  (match_operand:VDQHS 2 "register_operand" "w")
>  (vec_select:
> -  (match_operand: 3 "register_operand" "w")
> +  (match_operand: 3 "register_operand" "")
>(parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
> SQRDMLH_AS))]
> "TARGET_SIMD_RDMA"
> @@ -3296,7 +3296,7 @@
> [(match_operand:SD_HSI 1 "register_operand" "0")
>  (match_operand:SD_HSI 2 "register_operand" "w")
>  (vec_select:
> -  (match_operand: 3 "register_operand" "w")
> +  (match_operand: 3 "register_operand" "")
>(parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
> SQRDMLH_AS))]
> "TARGET_SIMD_RDMA"



Re: [PATCH] Avoid bugs like PR68273 to trigger

2016-02-08 Thread Richard Biener
On Mon, 8 Feb 2016, Eric Botcazou wrote:

> > Yes, that place I just fixed.  I mean for the SRA case.
> 
> Are you sure that there is one?

No, but if there is none left why would you want to "fix" SRA?

Richard.

> expr = build_ref_for_model (loc, agg, access->offset - top_offset,
> access, gsi, insert_after);
> 
> if (write)
>   {
> if (access->grp_partial_lhs)
>   expr = force_gimple_operand_gsi (gsi, expr, true, NULL_TREE,
>!insert_after,
>insert_after ? GSI_NEW_STMT
>: GSI_SAME_STMT);
> stmt = gimple_build_assign (repl, expr);
> 
> 
> (gdb) p debug_tree(repl)
>   type  type  size 
> unit size 
> align 8 symtab 0 alias set -1 canonical type 0x76cea3f0 
> context 
> pointer_to_this  chain  0x76c54a18 D.1410>>
> sizes-gimplified unsigned SI
> size 
> unit size 
> align 32 symtab 0 alias set -1 canonical type 0x76cead20>
> used unsigned SI file pr68273.c line 67 col 10 size  0x76c435e8 32> unit size 
> align 32 context >
> 
> 
> (gdb) p debug_tree(expr)
>   type  type  size 
> unit size 
> align 8 symtab 0 alias set -1 canonical type 0x76cea3f0 
> context 
> pointer_to_this  chain  0x76c54a18 D.1410>>
> sizes-gimplified unsigned SI
> size 
> unit size 
> align 64 symtab 0 alias set -1 canonical type 0x76cead20>
>
> arg 0  type  Node>
> unsigned SI size  unit size 
> 
> align 32 symtab 0 alias set -1 canonical type 0x76d333f0>
>
> arg 0  Node>
> used BLK file pr68273.c line 67 col 10
> size 
> unit size 
> align 64 context  arg-type 
> >>
> arg 1  
> constant 0>>
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH, PR67709 ] Don't call call_cgraph_insertion_hooks in simd_clone_create

2016-02-08 Thread Jakub Jelinek
On Mon, Feb 08, 2016 at 01:46:44PM +0100, Tom de Vries wrote:
> [ The pass before pass_omp_simd_clone is pass_dispatcher_calls. It has a
> function create_target_clone, similar to simd_clone_create, with a
> node.defition and !node.defition part. The !node.defition part does not call
> 'symtab->call_cgraph_insertion_hooks (new_node)'. ]

I'll defer to Honza or Richi if it is ok not to call cgraph insertion hooks
at this point (and since when they can be avoided), or what else should be
done.

The patch could be ok even for 6.0, not just stage1, if they are ok with it
(or propose some other change).

> Don't call call_cgraph_insertion_hooks in simd_clone_create
> 
> 2016-02-08  Tom de Vries  
> 
>   PR lto/67709
>   * omp-low.c (simd_clone_create): Remove call to
>   symtab->call_cgraph_insertion_hooks.
> 
>   * testsuite/libgomp.fortran/declare-simd-4.f90: New test.

Jakub


[PATCH, PR69607] Mark offload symbols as global in lto

2016-02-08 Thread Tom de Vries

Hi,

when running libgomp.c testsuite with "-flto -flto-partition=1to1
-fno-toplevel-reorder" we run into many compilation failures like this:
...
/tmp/.ltrans0.ltrans.o:(.gnu.offload_funcs+0x1a0): undefined 
reference to `MAIN__._omp_fn.0'^M

...

The problem is that the offload table is in one lto partition, and the 
function listed in the offload table is in another, without the function 
having been promoted to be visible in the other partition.


The patch fixes this by promoting the symbols in the offload table such 
that they're visible in all partitions.


Bootstrapped and reg-tested on x86_64.

Build for nvidia accelerator and reg-tested libgomp with various lto 
settings.


OK for trunk, stage1?

Thanks,
- Tom
Mark offload symbols as global in lto

2016-02-08  Tom de Vries  

	PR lto/69607
	* lto-partition.c (promote_offload_tables): New function.
	* lto-partition.h (promote_offload_tables):  Declare.
	* lto.c (do_whole_program_analysis): call promote_offload_tables.

	* testsuite/libgomp.c/target-36.c: New test.

---
 gcc/lto/lto-partition.c | 28 
 gcc/lto/lto-partition.h |  1 +
 gcc/lto/lto.c   |  2 ++
 libgomp/testsuite/libgomp.c/target-36.c |  4 
 4 files changed, 35 insertions(+)

diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 9eb63c2..56598d4 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-prop.h"
 #include "ipa-inline.h"
 #include "lto-partition.h"
+#include "omp-low.h"
 
 vec ltrans_partitions;
 
@@ -1003,6 +1004,33 @@ promote_symbol (symtab_node *node)
 	"Promoting as hidden: %s\n", node->name ());
 }
 
+/* Promote the symbols in the offload tables.  */
+
+void
+promote_offload_tables (void)
+{
+  if (vec_safe_is_empty (offload_funcs) && vec_safe_is_empty (offload_vars))
+return;
+
+  for (unsigned i = 0; i < vec_safe_length (offload_funcs); i++)
+{
+  tree fn_decl = (*offload_funcs)[i];
+  cgraph_node *node = cgraph_node::get (fn_decl);
+  if (node->externally_visible)
+	continue;
+  promote_symbol (node);
+}
+
+  for (unsigned i = 0; i < vec_safe_length (offload_vars); i++)
+{
+  tree var_decl = (*offload_vars)[i];
+  varpool_node *node = varpool_node::get (var_decl);
+  if (node->externally_visible)
+	continue;
+  promote_symbol (node);
+}
+}
+
 /* Return true if NODE needs named section even if it won't land in the partition
symbol table.
FIXME: we should really not use named sections for inline clones and master
diff --git a/gcc/lto/lto-partition.h b/gcc/lto/lto-partition.h
index 31e3764..1a38126 100644
--- a/gcc/lto/lto-partition.h
+++ b/gcc/lto/lto-partition.h
@@ -36,6 +36,7 @@ extern vec ltrans_partitions;
 void lto_1_to_1_map (void);
 void lto_max_map (void);
 void lto_balanced_map (int);
+extern void promote_offload_tables (void);
 void lto_promote_cross_file_statics (void);
 void free_ltrans_partitions (void);
 void lto_promote_statics_nonwpa (void);
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 6718fbbe..ecec1bc 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -3138,6 +3138,8 @@ do_whole_program_analysis (void)
  to globals with hidden visibility because they are accessed from multiple
  partitions.  */
   lto_promote_cross_file_statics ();
+  /* Promote all the offload symbols.  */
+  promote_offload_tables ();
   timevar_pop (TV_WHOPR_PARTITIONING);
 
   timevar_stop (TV_PHASE_OPT_GEN);
diff --git a/libgomp/testsuite/libgomp.c/target-36.c b/libgomp/testsuite/libgomp.c/target-36.c
new file mode 100644
index 000..bafb718
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-36.c
@@ -0,0 +1,4 @@
+/* { dg-do run { target lto } } */
+/* { dg-additional-options "-flto -flto-partition=1to1 -fno-toplevel-reorder" } */
+
+#include "target-1.c"


[Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump

2016-02-08 Thread James Greenhalgh

Hi,

As far as I can tell, this testcase will only vectorize for x86_64/i?86
targets, so it should be gated to only check for vectorization on those.

Additionally, this test wants to scan the vectorizer dumps, so we ought
to add -fdump-tree-vect-all to the options.

Checked on aarch64 (cross/native) and x86 with no issues.

OK?

Thanks,
James

---
2016-02-08  James Greenhalgh  

* gcc.dg/vect/vect-mask-store-move-1.c: Add dump option, and gate
check on x86_64/i?86.

diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
index e575f6d..3ef613d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -fdump-tree-vect-all" } */
 /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */
 
 #define N 256
@@ -16,4 +16,4 @@ void foo (int n)
   }
 }
 
-/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" { target { i?86-*-* x86_64-*-* } } } } */


Re: [PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables

2016-02-08 Thread Tom de Vries

On 26/01/16 14:01, Ilya Verbin wrote:

On Tue, Jan 26, 2016 at 13:21:57 +0100, Tom de Vries wrote:

On 25/01/16 14:27, Ilya Verbin wrote:

On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote:

diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 62e5454..cdaee41 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1911,6 +1911,11 @@ input_offload_tables (void)
  tree fn_decl
= lto_file_decl_data_get_fn_decl (file_data, decl_index);
  vec_safe_push (offload_funcs, fn_decl);
+
+ /* Prevent IPA from removing fn_decl as unreachable, since there
+may be no refs from the parent function to child_fn in offload
+LTO mode.  */
+ cgraph_node::get (fn_decl)->mark_force_output ();
}
  else if (tag == LTO_symtab_variable)
{
@@ -1918,6 +1923,10 @@ input_offload_tables (void)
  tree var_decl
= lto_file_decl_data_get_var_decl (file_data, decl_index);
  vec_safe_push (offload_vars, var_decl);
+
+ /* Prevent IPA from removing var_decl as unused, since there
+may be no refs to var_decl in offload LTO mode.  */
+ varpool_node::get (var_decl)->force_output = 1;
}


This doesn't work when there is more than one LTO partition, because only first
partition contains full offload table to maintain correct order, but cgraph and
varpool nodes aren't necessarily created for the first partition.  To reproduce:

$ make check-target-libgomp RUNTESTFLAGS="c.exp=for-* --target_board=unix/-flto"
FAIL: libgomp.c/for-3.c (internal compiler error)
FAIL: libgomp.c/for-5.c (internal compiler error)
FAIL: libgomp.c/for-6.c (internal compiler error)
$ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-* 
--target_board=unix/-flto"
FAIL: libgomp.c++/for-11.C (internal compiler error)
FAIL: libgomp.c++/for-13.C (internal compiler error)
FAIL: libgomp.c++/for-14.C (internal compiler error)


This works for me.

OK for trunk?

Thanks,
- Tom




Check that cgraph/varpool_node exists before use in input_offload_tables

2016-01-26  Tom de Vries  

* lto-cgraph.c (input_offload_tables): Check that cgraph/varpool_node
exists before use.


In this case they will be not marked as force_output in other partitions (except
the first one).


AFAIU, that's not the case.

If we're splitting up lto compilation over partitions, it means we're 
first calling lto1 in WPA mode. We'll read in all offload tables, and 
mark all symbols with force_output, and when writing out the partitions, 
we'll write the offload symbols out with force_output set.


This updated patch only does the force_output marking for offload 
symbols in WPA or LTO. It's not necessary in LTRANS mode.


Bootstrapped and reg-tested on x86_64.

Build for nvidia accelerator and reg-tested libgomp with various lto 
settings.


OK for trunk, stage4?

Thanks,
- Tom

Don't mark offload symbols with force_output in ltrans

2016-02-08  Tom de Vries  

	PR lto/69655
	* lto-cgraph.c (input_offload_tables): Add and handle bool parameter
	do_force_output.
	* lto-streamer.h (input_offload_tables): Add and handle bool parameter.

	* lto.c (read_cgraph_and_symbols): Call input_offload_tables with
	argument.

---
 gcc/lto-cgraph.c   | 8 +---
 gcc/lto-streamer.h | 2 +-
 gcc/lto/lto.c  | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 0634779..95c446d 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1885,7 +1885,7 @@ input_symtab (void)
target code, and store them into OFFLOAD_FUNCS and OFFLOAD_VARS.  */
 
 void
-input_offload_tables (void)
+input_offload_tables (bool do_force_output)
 {
   struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data ();
   struct lto_file_decl_data *file_data;
@@ -1915,7 +1915,8 @@ input_offload_tables (void)
 	  /* Prevent IPA from removing fn_decl as unreachable, since there
 		 may be no refs from the parent function to child_fn in offload
 		 LTO mode.  */
-	  cgraph_node::get (fn_decl)->mark_force_output ();
+	  if (do_force_output)
+		cgraph_node::get (fn_decl)->mark_force_output ();
 	}
 	  else if (tag == LTO_symtab_variable)
 	{
@@ -1926,7 +1927,8 @@ input_offload_tables (void)
 
 	  /* Prevent IPA from removing var_decl as unused, since there
 		 may be no refs to var_decl in offload LTO mode.  */
-	  varpool_node::get (var_decl)->force_output = 1;
+	  if (do_force_output)
+		varpool_node::get (var_decl)->force_output = 1;
 	}
 	  else
 	fatal_error (input_location,
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 0cb200e..f391161 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -915,7 +915,7 @@ bool lto_symtab_encoder_encode_initializer_p (lto_symtab_encoder_t,
 void output_symtab (void);
 void input_symtab (void);
 void output_offload_tables (void);
-void input_offload_tables (v

Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump

2016-02-08 Thread Yuri Rumyantsev
Hi James,

Thanks for reporting this issue.
I prepared slightly different patch since we don't need to add
tree-vect dump option - it is on by default for all tests in /vect
directory.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-mask-store-move-1.c: Gate dump with x86 target.

2016-02-08 16:07 GMT+03:00 James Greenhalgh :
>
> Hi,
>
> As far as I can tell, this testcase will only vectorize for x86_64/i?86
> targets, so it should be gated to only check for vectorization on those.
>
> Additionally, this test wants to scan the vectorizer dumps, so we ought
> to add -fdump-tree-vect-all to the options.
>
> Checked on aarch64 (cross/native) and x86 with no issues.
>
> OK?
>
> Thanks,
> James
>
> ---
> 2016-02-08  James Greenhalgh  
>
> * gcc.dg/vect/vect-mask-store-move-1.c: Add dump option, and gate
> check on x86_64/i?86.
>


patch-test
Description: Binary data


Re: [PATCH, PR69707] Handle -fdiagnostics-color in lto

2016-02-08 Thread Tom de Vries

On 08/02/16 11:42, Jakub Jelinek wrote:

On Mon, Feb 08, 2016 at 11:34:39AM +0100, Tom de Vries wrote:

Hi,

when running libgomp.oacc-c-c++-common/parallel-dims.c with -flto
-fno-use-linker-plugin, we run into a failing 'test for excess errors'.

The problem is that while -fdiagnostics-color=never is passed to gcc, it's
not propagated to lto1, and the error message is annotated with color
information, which confuses the test for excess errors.

This patch fixes the problem by making sure that -fdiagnostics-color is
propagated to lto1, in the same way that -fdiagnostics-show-caret is
propagated to lto1.

Bootstrapped and reg-tested on x86_64.

OK for trunk, stage1?


Doesn't that mean diagnostics from the driver itself will no longer honor
that option when deciding if the diagnostics should be colorized or not?



Hi,

hmm, indeed removing the 'Driver' flag from the fdiagnostics-color= 
entry in common.opt breaks the functioning of fdiagnostics-color= in the 
gcc driver.


This patch leaves the 'Driver' flag alone, and instead explicitly allows 
fdiagnostics-color= in lto_write_options.


Is this approach ok?

Thanks,
- Tom
Handle -fdiagnostics-color in lto

2016-02-08  Tom de Vries  

	PR lto/69707
	* lto-opts.c (lto_write_options): Allow fdiagnostics-color.
	* lto-wrapper.c (merge_and_complain, append_compiler_options): Handle
	OPT_fdiagnostics_color_.

	* testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c: New test.

---
 gcc/lto-opts.c  | 21 ++---
 gcc/lto-wrapper.c   |  2 ++
 .../libgomp.oacc-c-c++-common/parallel-dims-2.c | 19 +++
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/gcc/lto-opts.c b/gcc/lto-opts.c
index 5f4b427..b15b9e8 100644
--- a/gcc/lto-opts.c
+++ b/gcc/lto-opts.c
@@ -200,9 +200,24 @@ lto_write_options (void)
 	 which includes things like -o and -v or -fhelp for example.
 	 We do not need those.  The only exception is -foffload option, if we
 	 write it in offload_lto section.  Also drop all diagnostic options.  */
-  if ((cl_options[option->opt_index].flags & (CL_DRIVER|CL_WARNING))
-	  && (!lto_stream_offload_p || option->opt_index != OPT_foffload_))
-	continue;
+  if (cl_options[option->opt_index].flags & (CL_DRIVER|CL_WARNING))
+	{
+	  bool keep = false;
+	  switch (option->opt_index)
+	{
+	case OPT_foffload_:
+	  keep = lto_stream_offload_p;
+	  break;
+	case OPT_fdiagnostics_color_:
+	  keep = true;
+	  break;
+	default:
+	  break;
+	}
+
+	  if (!keep)
+	continue;
+	}
 
   for (j = 0; j < option->canonical_option_num_elements; ++j)
 	append_to_collect_gcc_options (&temporary_obstack, &first_p,
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index ced6f2f..484dbc1 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -232,6 +232,7 @@ merge_and_complain (struct cl_decoded_option **decoded_options,
 	break;
 
 	  /* Fallthru.  */
+	case OPT_fdiagnostics_color_:
 	case OPT_fdiagnostics_show_caret:
 	case OPT_fdiagnostics_show_option:
 	case OPT_fdiagnostics_show_location_:
@@ -497,6 +498,7 @@ append_compiler_options (obstack *argv_obstack, struct cl_decoded_option *opts,
 	 on any CL_TARGET flag and a few selected others.  */
   switch (option->opt_index)
 	{
+	case OPT_fdiagnostics_color_:
 	case OPT_fdiagnostics_show_caret:
 	case OPT_fdiagnostics_show_option:
 	case OPT_fdiagnostics_show_location_:
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c
new file mode 100644
index 000..eea8c7e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c
@@ -0,0 +1,19 @@
+/* { dg-do run { target { openacc_nvidia_accel_selected && lto } } } */
+/* { dg-additional-options "-flto -fno-use-linker-plugin" } */
+
+/* Worker and vector size checks.  Picked an outrageously large
+   value.  */
+
+int main ()
+{
+#pragma acc parallel num_workers (2<<20) /* { dg-error "using num_workers" } */
+  {
+  }
+
+#pragma acc parallel vector_length (2<<20) /* { dg-error "using vector_length" } */
+  {
+  }
+
+  return 0;
+}
+


Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump

2016-02-08 Thread James Greenhalgh
On Mon, Feb 08, 2016 at 04:29:31PM +0300, Yuri Rumyantsev wrote:
> Hi James,
> 
> Thanks for reporting this issue.
> I prepared slightly different patch since we don't need to add
> tree-vect dump option - it is on by default for all tests in /vect
> directory.

Hm, I added that line as my test runs were showing:

  UNRESOLVED: gcc.dg/vect/vect-mask-store-move-1.c: dump file does not exist

I would guess the explicit 

  /* { dg-options "-O3" } */

is clobbering the vect.exp setup of flags?

This also affects the x86-64 results H.J. Lu is sending out:

  https://gcc.gnu.org/ml/gcc-testresults/2016-02/msg00824.html

Thanks,
James

> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/vect-mask-store-move-1.c: Gate dump with x86 target.
> 
> 2016-02-08 16:07 GMT+03:00 James Greenhalgh :
> >
> > Hi,
> >
> > As far as I can tell, this testcase will only vectorize for x86_64/i?86
> > targets, so it should be gated to only check for vectorization on those.
> >
> > Additionally, this test wants to scan the vectorizer dumps, so we ought
> > to add -fdump-tree-vect-all to the options.
> >
> > Checked on aarch64 (cross/native) and x86 with no issues.
> >
> > OK?
> >
> > Thanks,
> > James
> >
> > ---
> > 2016-02-08  James Greenhalgh  
> >
> > * gcc.dg/vect/vect-mask-store-move-1.c: Add dump option, and gate
> > check on x86_64/i?86.
> >




Re: [PATCH, PR69707] Handle -fdiagnostics-color in lto

2016-02-08 Thread Jakub Jelinek
On Mon, Feb 08, 2016 at 02:38:17PM +0100, Tom de Vries wrote:
> hmm, indeed removing the 'Driver' flag from the fdiagnostics-color= entry in
> common.opt breaks the functioning of fdiagnostics-color= in the gcc driver.
> 
> This patch leaves the 'Driver' flag alone, and instead explicitly allows
> fdiagnostics-color= in lto_write_options.
> 
> Is this approach ok?

Doesn't that mean storing -fdiagnostics-color= into the LTO option section and
then using whatever was recorded from the first TU that recorded anything?
That doesn't look right for such diagnostics options either.
I mean, if I
$ gcc -fdiagnostics-color=always -c -flto a.c
on another terminal
$ gcc -fdiagnostics-color=never -c -flto b.c
on yet another terminal
$ gcc -flto -o a a.o b.o
then I'd expect the default setting for -fdiagnostics-color= for all
diagnostics while linking/LTO optimizing it, and for say
$ gcc -fdiagnostics-color=always -flto -o a a.o b.o
to have it always colorized, similarly for never.
IMHO we should just honor what has been specified on the linker command
line if anything.  So, the question is, is -fdiagnostics-color=never passed
in the testsuite just to the compilation and not to linking (that would be
IMHO a testsuite bug), or is it passed on the linking command line, but not
passed through to lto1?

> Handle -fdiagnostics-color in lto
> 
> 2016-02-08  Tom de Vries  
> 
>   PR lto/69707
>   * lto-opts.c (lto_write_options): Allow fdiagnostics-color.
>   * lto-wrapper.c (merge_and_complain, append_compiler_options): Handle
>   OPT_fdiagnostics_color_.
> 
>   * testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c: New test.

Jakub


Re: libgcc: On AIX, increase chances to find landing pads for exceptions

2016-02-08 Thread David Edelsohn
Runtime linking is disabled by default on AIX, and I disabled it for libstdc++.

There are two remaining issues:

1) FDEs with overlapping ranges causing problems with exceptions.  I'm
not sure of the best way to work around this.  Your patch is one
possible solution.

2) AIX linker garbage collection conflicting with scanning for
symbols.  collect2 scanning needs to better emulate SVR4 linker
semantics for object files and archives.

Thanks, David


On Mon, Feb 8, 2016 at 7:14 AM, Michael Haubenwallner
 wrote:
> Hi David,
>
> still experiencing exception-not-caught problems with gcc-4.2.4 on AIX
> leads me to some patch proposed in http://gcc.gnu.org/PR13878 back in
> 2004 already, ought to be fixed by some different commit since 3.4.0.
>
> As long as build systems (even libtool right now) on AIX do export these
> _GLOBAL__* symbols from shared libraries, overlapping frame-base address
> ranges may become registered, even if newer gcc (seen with 4.8) does name
> the FDE symbols more complex to reduce these chances.
>
> But still, just think of linking some static library into multiple shared
> libraries and/or the main executable. Or sometimes there is just need for
> some hackery to override a shared object's implementation detail and rely
> on runtime linking to do the override at runtime.
>
> Agreed both is "wrong" to some degree, but the larger an application is,
> the higher is the chance for this to happen.
>
> Thoughts?
>
> Thanks!
> /haubi/


Re: [PATCH, PR69607] Mark offload symbols as global in lto

2016-02-08 Thread Ilya Verbin
On Mon, Feb 08, 2016 at 14:00:00 +0100, Tom de Vries wrote:
> when running libgomp.c testsuite with "-flto -flto-partition=1to1
> -fno-toplevel-reorder" we run into many compilation failures like this:
> ...
> /tmp/.ltrans0.ltrans.o:(.gnu.offload_funcs+0x1a0): undefined
> reference to `MAIN__._omp_fn.0'^M
> ...
> 
> The problem is that the offload table is in one lto partition, and the
> function listed in the offload table is in another, without the function
> having been promoted to be visible in the other partition.
> 
> The patch fixes this by promoting the symbols in the offload table such that
> they're visible in all partitions.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> Build for nvidia accelerator and reg-tested libgomp with various lto
> settings.

Works fine with intelmic offloading.

  -- Ilya


Re: [PATCH][P1 tree-optimization/68541] Add heuristics to path splitting

2016-02-08 Thread Kyrill Tkachov

Hi Jeff,

On 05/02/16 23:49, Jeff Law wrote:

This patch addresses multiple issues with how we determine when to split paths. 
 The primary motivation is addressing 68541 (P1).


As I've gotten testcodes from Ajit, I've been able to look closely at the path splitting opportunities and consistently what I've seen is that contrary to the original belief, CSE/DCE opportunities are rarely exposed by path splitting.  
It seems the benefit is more due to removing the unconditional jump inherent in a CFG diamond -- even more so on the microblaze where these jumps have delay slots.


While there are cases where splitting allows for CSE/DCE, they're the exception 
rather than the rule in the codes I've looked at.

With that in mind, we want to encourage path splitting when the amount of code 
we're duplicating is small.   We also want to balance that with not path 
splitting when the original code is something that may be if-convertable.

The vast majority of if-convertable codes are cases where the two predecessors of the join block are single statement blocks with their results feeding the same PHI in the join block.  We now reject that case for path splitting so as not 
to spoil if-conversion.


The only wrinkle with that heuristic is that some codes (MIN, MAX, ABS, COND, calls, etc) are not good candidates for further if conversion. (ie, we have a MIN_EXPR in both predecessors that feed the PHI in the join).   So we still allow 
those cases for splitting.  This can be easily refined as we find more cases or as the if-converter is extended.


So with that in place as the first filter, we just need to put a limiter on the number of statements we allow to be copied.  I punted a bit and re-used the PARAM for jump threading.  They've got enough in common that I felt it was 
reasonable.  If I were to create a new PARAM, I'd probably start with a smaller default value after doing further instrumentation.


While I was working through this I noticed we were path splitting in some cases 
where we shouldn't.  Particularly if we had a CFG like:

  a
 / \
b   c
   / \ /
  e   d
 / \
/   \
   / \
 latch   exit


Duplicating d for the edges b->d and c->d isn't as helpful because the 
duplicate can't be merged into b.  We now detect this kind of case and reject it for 
path splitting.

The new tests are a combination of reductions of codes from Ajit and my own 
poking around.

Looking further out, I really wonder if the low level aspects of path splitting like we're trying to capture here really belong in the cross-jumping phase of the RTL optimizers.  The higher level aspects (when we are able to expose 
CSE/DCE opportunities) should be driven by actually looking at the propagation/simplifications enabled by a potential split path.  But those are gcc-7 things.


Bootstrapped and regression tested on x86 linux.  Installed on the trunk.  I'll obviously keep an eye out for how the tests are handled on other platforms -- I don't think the tests are real sensitive to branch costs and such, but I've 
been surprised before.


Onward to the jump threading code explosion wrap-up...

jeff



After this patch I also see:
FAIL: gcc.dg/tree-ssa/split-path-1.c scan-tree-dump split-paths "Duplicating join 
block"

on arm, but not on aarch64. My arm-none-eabi cross compiler is configured with:
--with-float=hard --with-cpu=cortex-a9 --with-fpu=neon --with-mode=thumb

Kyrill



Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump

2016-02-08 Thread Yuri Rumyantsev
Sorry for troubles.
One line must be excluded from test:
-/* { dg-options "-O3" } */

Here is updated patch.

Best regards.
Yuri.

2016-02-08 16:40 GMT+03:00 James Greenhalgh :
> On Mon, Feb 08, 2016 at 04:29:31PM +0300, Yuri Rumyantsev wrote:
>> Hi James,
>>
>> Thanks for reporting this issue.
>> I prepared slightly different patch since we don't need to add
>> tree-vect dump option - it is on by default for all tests in /vect
>> directory.
>
> Hm, I added that line as my test runs were showing:
>
>   UNRESOLVED: gcc.dg/vect/vect-mask-store-move-1.c: dump file does not exist
>
> I would guess the explicit
>
>   /* { dg-options "-O3" } */
>
> is clobbering the vect.exp setup of flags?
>
> This also affects the x86-64 results H.J. Lu is sending out:
>
>   https://gcc.gnu.org/ml/gcc-testresults/2016-02/msg00824.html
>
> Thanks,
> James
>
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.dg/vect/vect-mask-store-move-1.c: Gate dump with x86 target.
>>
>> 2016-02-08 16:07 GMT+03:00 James Greenhalgh :
>> >
>> > Hi,
>> >
>> > As far as I can tell, this testcase will only vectorize for x86_64/i?86
>> > targets, so it should be gated to only check for vectorization on those.
>> >
>> > Additionally, this test wants to scan the vectorizer dumps, so we ought
>> > to add -fdump-tree-vect-all to the options.
>> >
>> > Checked on aarch64 (cross/native) and x86 with no issues.
>> >
>> > OK?
>> >
>> > Thanks,
>> > James
>> >
>> > ---
>> > 2016-02-08  James Greenhalgh  
>> >
>> > * gcc.dg/vect/vect-mask-store-move-1.c: Add dump option, and gate
>> > check on x86_64/i?86.
>> >
>
>


patch-test
Description: Binary data


Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump

2016-02-08 Thread Richard Biener
On Mon, Feb 8, 2016 at 2:40 PM, James Greenhalgh
 wrote:
> On Mon, Feb 08, 2016 at 04:29:31PM +0300, Yuri Rumyantsev wrote:
>> Hi James,
>>
>> Thanks for reporting this issue.
>> I prepared slightly different patch since we don't need to add
>> tree-vect dump option - it is on by default for all tests in /vect
>> directory.
>
> Hm, I added that line as my test runs were showing:
>
>   UNRESOLVED: gcc.dg/vect/vect-mask-store-move-1.c: dump file does not exist
>
> I would guess the explicit
>
>   /* { dg-options "-O3" } */
>
> is clobbering the vect.exp setup of flags?

Yes.  Use { dg-additional-options "-O3" } instead.

> This also affects the x86-64 results H.J. Lu is sending out:
>
>   https://gcc.gnu.org/ml/gcc-testresults/2016-02/msg00824.html
>
> Thanks,
> James
>
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.dg/vect/vect-mask-store-move-1.c: Gate dump with x86 target.
>>
>> 2016-02-08 16:07 GMT+03:00 James Greenhalgh :
>> >
>> > Hi,
>> >
>> > As far as I can tell, this testcase will only vectorize for x86_64/i?86
>> > targets, so it should be gated to only check for vectorization on those.
>> >
>> > Additionally, this test wants to scan the vectorizer dumps, so we ought
>> > to add -fdump-tree-vect-all to the options.
>> >
>> > Checked on aarch64 (cross/native) and x86 with no issues.
>> >
>> > OK?
>> >
>> > Thanks,
>> > James
>> >
>> > ---
>> > 2016-02-08  James Greenhalgh  
>> >
>> > * gcc.dg/vect/vect-mask-store-move-1.c: Add dump option, and gate
>> > check on x86_64/i?86.
>> >
>
>


Re: Fix PR67639

2016-02-08 Thread Bernd Schmidt

On 12/21/2015 08:39 PM, Jeff Law wrote:

On 12/18/2015 11:38 AM, Bernd Schmidt wrote:

In an earlier fix, the following change was made in varasm.c for invalid
register variables:

--- trunk/gcc/varasm.c2014/08/26 14:59:59214525
+++ trunk/gcc/varasm.c2014/08/26 17:06:31214526
@@ -1371,6 +1371,11 @@ make_decl_rtl (tree decl)
/* As a register variable, it has no section.  */
return;
  }
+  /* Avoid internal errors from invalid register
+ specifications.  */
+  SET_DECL_ASSEMBLER_NAME (decl, NULL_TREE);
+  DECL_HARD_REGISTER (decl) = 0;
+  return;
  }

As seen in PR67639, this makes the IL inconsistent and triggers another
internal error where we expect to see an SSA_NAME instead of a VAR_DECL.

The following patch extends the above slightly, by also setting
DECL_EXTERNAL to pretend that the erroneous variable is actually a
global.

Bootstrapped and tested on x86_64-linux, ok?

OK.


Turns out 65702 is a dup and this should go into gcc-5 as well. Ok to 
backport?



Bernd


[PATCH] Fix PR69719

2016-02-08 Thread Richard Biener

The following fixes a latent bug in vect_prune_runtime_alias_test_list,
not considering negative offset1 - offset2.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk
sofar.

Richard.

2016-02-08  Richard Biener  

PR tree-optimization/69719
* tree-vect-data-refs.c (vect_prune_runtime_alias_test_list):
Properly use absolute of the difference of the two offsets to
compare or adjust the segment length.

* gcc.dg/torture/pr69719.c: New testcase.

Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 233211)
--- gcc/tree-vect-data-refs.c   (working copy)
*** vect_prune_runtime_alias_test_list (loop
*** 2905,2912 
  || !tree_fits_shwi_p (dr_a2->offset))
continue;
  
! HOST_WIDE_INT diff = (tree_to_shwi (dr_a2->offset)
!   - tree_to_shwi (dr_a1->offset));
  
  
  /* Now we check if the following condition is satisfied:
--- 2905,2913 
  || !tree_fits_shwi_p (dr_a2->offset))
continue;
  
! unsigned HOST_WIDE_INT diff
!   = absu_hwi (tree_to_shwi (dr_a2->offset)
!   - tree_to_shwi (dr_a1->offset));
  
  
  /* Now we check if the following condition is satisfied:
*** vect_prune_runtime_alias_test_list (loop
*** 2925,2937 
  
 */
  
! HOST_WIDE_INT  min_seg_len_b = (tree_fits_shwi_p (dr_b1->seg_len)
! ? tree_to_shwi (dr_b1->seg_len)
! : vect_factor);
  
  if (diff <= min_seg_len_b
! || (tree_fits_shwi_p (dr_a1->seg_len)
! && diff - tree_to_shwi (dr_a1->seg_len) < min_seg_len_b))
{
  if (dump_enabled_p ())
{
--- 2926,2939 
  
 */
  
! unsigned HOST_WIDE_INT min_seg_len_b
!   = (tree_fits_uhwi_p (dr_b1->seg_len)
!  ? tree_to_uhwi (dr_b1->seg_len)
!  : vect_factor);
  
  if (diff <= min_seg_len_b
! || (tree_fits_uhwi_p (dr_a1->seg_len)
! && diff - tree_to_uhwi (dr_a1->seg_len) < min_seg_len_b))
{
  if (dump_enabled_p ())
{
Index: gcc/testsuite/gcc.dg/torture/pr69719.c
===
*** gcc/testsuite/gcc.dg/torture/pr69719.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr69719.c  (working copy)
***
*** 0 
--- 1,24 
+ /* { dg-do run } */
+ 
+ int b, c = 1, e, f; 
+ int a[6][5] = { {0, 0, 0, 0, 0}, {0, 0, 0, 0, 0}, {0, 1, 0, 0, 0} };
+ 
+ void  __attribute__((noinline))
+ fn1 ()
+ {
+   int d;
+   for (b = 0; b < 5; b++)
+ for (d = 4; d; d--)
+   a[c + 1][b] = a[d + 1][d];
+ }
+ 
+ int
+ main ()
+ {
+   fn1 ();
+ 
+   if (a[2][1] != 0) 
+ __builtin_abort (); 
+ 
+   return 0; 
+ }


[PATCH] Re: PR c++/69139

2016-02-08 Thread Adam Butcher
The following fixes up the handling of trailing returns with
cv/ref specifiers mentioned by TC in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69139#c3.  I've added
handling of exception and transaction specs too.

---
 gcc/cp/parser.c | 12 ++--
 gcc/testsuite/g++.dg/cpp0x/trailing12.C | 10 ++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index c1a9674..f51fac4 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -16055,8 +16055,16 @@ cp_parser_simple_type_specifier (cp_parser* parser,
/*recovering*/false,
/*or_comma*/false,
/*consume_paren*/true)))
-   have_trailing_return_fn_decl
- = cp_lexer_next_token_is (parser->lexer, CPP_DEREF);
+   {
+ /* Consume any cv-qualifier-seq, ref-qualifier,
+tx-qualifier and/or exception specifier.  */
+ cp_parser_cv_qualifier_seq_opt (parser);
+ cp_parser_ref_qualifier_opt (parser);
+ cp_parser_tx_qualifier_opt (parser);
+ cp_parser_exception_specification_opt (parser);
+ have_trailing_return_fn_decl
+   = cp_lexer_next_token_is (parser->lexer, CPP_DEREF);
+   }
}
  cp_parser_abort_tentative_parse (parser);
}
diff --git a/gcc/testsuite/g++.dg/cpp0x/trailing12.C 
b/gcc/testsuite/g++.dg/cpp0x/trailing12.C
index f3e02a8..707b753 100644
--- a/gcc/testsuite/g++.dg/cpp0x/trailing12.C
+++ b/gcc/testsuite/g++.dg/cpp0x/trailing12.C
@@ -4,3 +4,13 @@
 auto get(int) -> int { return {}; }
 template  int f(auto (*)(int) -> R) { return {}; }
 int i = f(get);
+
+struct X {
+  auto get(int) const & throw() -> int { return {}; }
+  auto get(int) && -> long { return {}; }
+};
+
+template  auto f(auto (X::*)(int) const & -> R) -> R {}
+
+using I = decltype(f(&X::get));
+using I = int;
-- 
2.7.0



Re: [PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables

2016-02-08 Thread Ilya Verbin
On Mon, Feb 08, 2016 at 14:20:11 +0100, Tom de Vries wrote:
> On 26/01/16 14:01, Ilya Verbin wrote:
> >On Tue, Jan 26, 2016 at 13:21:57 +0100, Tom de Vries wrote:
> >>On 25/01/16 14:27, Ilya Verbin wrote:
> >>>On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote:
> >diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> >index 62e5454..cdaee41 100644
> >--- a/gcc/lto-cgraph.c
> >+++ b/gcc/lto-cgraph.c
> >@@ -1911,6 +1911,11 @@ input_offload_tables (void)
> >   tree fn_decl
> > = lto_file_decl_data_get_fn_decl (file_data, 
> > decl_index);
> >   vec_safe_push (offload_funcs, fn_decl);
> >+
> >+  /* Prevent IPA from removing fn_decl as unreachable, 
> >since there
> >+ may be no refs from the parent function to child_fn in 
> >offload
> >+ LTO mode.  */
> >+  cgraph_node::get (fn_decl)->mark_force_output ();
> > }
> >   else if (tag == LTO_symtab_variable)
> > {
> >@@ -1918,6 +1923,10 @@ input_offload_tables (void)
> >   tree var_decl
> > = lto_file_decl_data_get_var_decl (file_data, 
> > decl_index);
> >   vec_safe_push (offload_vars, var_decl);
> >+
> >+  /* Prevent IPA from removing var_decl as unused, since 
> >there
> >+ may be no refs to var_decl in offload LTO mode.  */
> >+  varpool_node::get (var_decl)->force_output = 1;
> > }
> >>>
> >>>This doesn't work when there is more than one LTO partition, because only 
> >>>first
> >>>partition contains full offload table to maintain correct order, but 
> >>>cgraph and
> >>>varpool nodes aren't necessarily created for the first partition.  To 
> >>>reproduce:
> >>>
> >>>$ make check-target-libgomp RUNTESTFLAGS="c.exp=for-* 
> >>>--target_board=unix/-flto"
> >>>FAIL: libgomp.c/for-3.c (internal compiler error)
> >>>FAIL: libgomp.c/for-5.c (internal compiler error)
> >>>FAIL: libgomp.c/for-6.c (internal compiler error)
> >>>$ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-* 
> >>>--target_board=unix/-flto"
> >>>FAIL: libgomp.c++/for-11.C (internal compiler error)
> >>>FAIL: libgomp.c++/for-13.C (internal compiler error)
> >>>FAIL: libgomp.c++/for-14.C (internal compiler error)
> >>
> >>This works for me.
> >>
> >>OK for trunk?
> >>
> >>Thanks,
> >>- Tom
> >>
> >
> >>Check that cgraph/varpool_node exists before use in input_offload_tables
> >>
> >>2016-01-26  Tom de Vries  
> >>
> >>* lto-cgraph.c (input_offload_tables): Check that cgraph/varpool_node
> >>exists before use.
> >
> >In this case they will be not marked as force_output in other partitions 
> >(except
> >the first one).
> 
> AFAIU, that's not the case.
> 
> If we're splitting up lto compilation over partitions, it means we're first
> calling lto1 in WPA mode. We'll read in all offload tables, and mark all
> symbols with force_output, and when writing out the partitions, we'll write
> the offload symbols out with force_output set.
> 
> This updated patch only does the force_output marking for offload symbols in
> WPA or LTO. It's not necessary in LTRANS mode.

You're right, works for me.

  -- Ilya


Re: C++ PATCH for c++/69688 (bogus error with -Wsign-compare)

2016-02-08 Thread Marek Polacek
On Fri, Feb 05, 2016 at 05:36:21PM -0500, Jason Merrill wrote:
> On 02/05/2016 05:32 PM, Marek Polacek wrote:
> >   if (TREE_CODE (type) == ERROR_MARK)
> > return NULL_TREE;
> >
> >+  /* Here, DECL may change value; purge caches.  */
> >+  clear_fold_cache ();
> >+  clear_cv_cache ();
> >+
> >   if (MAYBE_CLASS_TYPE_P (type))
> 
> This should happen after computing the value to be stored, not before. Also,
> could you combine those two functions into one?  There's no reason for
> callers such as this to need to call two different functions.

Okay.  So like this?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-02-08  Marek Polacek  

PR c++/69688
* constexpr.c (clear_cv_and_fold_caches): Renamed from clear_cv_cache.
Call clear_fold_cache.
* cp-tree.h: Adjust declaration.
* decl.c (finish_enum_value_list): Call clear_cv_and_fold_caches
rather than clear_cv_cache and clear_fold_cache.
* typeck2.c (store_init_value): Call clear_cv_and_fold_caches.

* g++.dg/init/const12.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index 05f6843..85fc64e 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -4098,12 +4098,13 @@ maybe_constant_value (tree t, tree decl)
   return ret;
 }
 
-/* Dispose of the whole CV_CACHE.  */
+/* Dispose of the whole CV_CACHE and FOLD_CACHE.  */
 
 void
-clear_cv_cache (void)
+clear_cv_and_fold_caches (void)
 {
   gt_cleare_cache (cv_cache);
+  clear_fold_cache ();
 }
 
 /* Like maybe_constant_value but first fully instantiate the argument.
diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h
index 0aeee57..a67e9b6 100644
--- gcc/cp/cp-tree.h
+++ gcc/cp/cp-tree.h
@@ -6920,7 +6920,7 @@ extern bool var_in_constexpr_fn (tree);
 extern void explain_invalid_constexpr_fn(tree);
 extern vec cx_error_context   (void);
 extern tree fold_sizeof_expr   (tree);
-extern void clear_cv_cache (void);
+extern void clear_cv_and_fold_caches   (void);
 
 /* In c-family/cilk.c */
 extern bool cilk_valid_spawn(tree);
diff --git gcc/cp/decl.c gcc/cp/decl.c
index 2c337bc..11f7ce6 100644
--- gcc/cp/decl.c
+++ gcc/cp/decl.c
@@ -13414,8 +13414,7 @@ finish_enum_value_list (tree enumtype)
 
   /* Each enumerator now has the type of its enumeration.  Clear the cache
  so that this change in types doesn't confuse us later on.  */
-  clear_cv_cache ();
-  clear_fold_cache ();
+  clear_cv_and_fold_caches ();
 }
 
 /* Finishes the enum type. This is called only the first time an
diff --git gcc/cp/typeck2.c gcc/cp/typeck2.c
index 419faa2..2a76c96 100644
--- gcc/cp/typeck2.c
+++ gcc/cp/typeck2.c
@@ -837,6 +837,9 @@ store_init_value (tree decl, tree init, vec** 
cleanups, int flags)
 /* Handle aggregate NSDMI in non-constant initializers, too.  */
 value = replace_placeholders (value, decl);
 
+  /* DECL may change value; purge caches.  */
+  clear_cv_and_fold_caches ();
+
   /* If the initializer is not a constant, fill in DECL_INITIAL with
  the bits that are constant, and then return an expression that
  will perform the dynamic initialization.  */
diff --git gcc/testsuite/g++.dg/init/const12.C 
gcc/testsuite/g++.dg/init/const12.C
index e69de29..2f6f9b2 100644
--- gcc/testsuite/g++.dg/init/const12.C
+++ gcc/testsuite/g++.dg/init/const12.C
@@ -0,0 +1,20 @@
+// PR c++/69688
+// { dg-do compile }
+// { dg-options "-Wsign-compare" }
+
+struct S
+{
+  static const int s;
+  static const char c[];
+  static wchar_t w[];
+
+  S ()
+{
+  for (int i = 0; i < s; i++)
+   w[i] = 0;
+}
+};
+
+const char S::c[] = "x";
+const int S::s = sizeof (S::c) - 1;
+wchar_t S::w[S::s];

Marek


Re: C++ PATCH for c++/69688 (bogus error with -Wsign-compare)

2016-02-08 Thread Jason Merrill

OK, thanks.

Jason


Re: [PATCH] PR c++/69139

2016-02-08 Thread Jason Merrill

On 02/06/2016 05:25 PM, Adam Butcher wrote:

+ if (cp_lexer_next_token_is (parser->lexer, CPP_DEREF))
+   have_trailing_return_fn_decl = true;
+ else if ((cp_lexer_consume_token (parser->lexer)->type
+   == CPP_OPEN_PAREN)
+  && (cp_parser_skip_to_closing_parenthesis
+  (parser,
+   /*recovering*/false,
+   /*or_comma*/false,
+   /*consume_paren*/true)))
+   have_trailing_return_fn_decl
+ = cp_lexer_next_token_is (parser->lexer, CPP_DEREF);


Let's reorganize this so you only check CPP_DEREF in one place; this is 
more important now that you've added more processing to only one of them 
in your followup patch.


Jason



Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-08 Thread Vladimir Makarov

On 02/08/2016 04:09 AM, Richard Biener wrote:
With all of the above I'm not sure what to do for GCC 6 (even though 
you just approved the patch). Going with the patch alternative (just 
revert swapping parts of the commutative operands) looks like 
completely bogus though it works for fixing the regression as well.
Yes, it is bogus.  But I don't see other easy way to close P1 PR except 
making it P2.


I still have plans to rewrite ira-costs.c (may be for GCC7).  The 
algorithm is actually adaptation of old regclass.c one.  There are few 
complaints about wrong costs calculations because of this (the most 
recent complaint was Peter Bergner's one about a power8 test). I don't 
like the algorithm, it ignores the fact that insn operands should be 
from the same alternative during pseudo (allocno) cost calculations.


So this change probably will not survive after rewriting ira-costs.c.




Re: [PATCH] Fix libstdc++-v3/include/math.h:66:1 2: error: 'constexpr bool std::isnan(double)' conflicts with a previous declaration

2016-02-08 Thread Jonathan Wakely

On 06/02/16 15:31 +, Jonathan Wakely wrote:

On 6 February 2016 at 12:51, Gerald Pfeifer  wrote:

On Fri, 5 Feb 2016, Jonathan Wakely wrote:


Can anyone else test this on an older FreeBSD or other target that
isn't gnu/aix/hpux?



Thank you, Jonathan!

I did not have such an older environment available, but now
could install the infrastructure and get all prerequisites in place for
FreeBSD 9.3/AMD64 testing.

That allowed to both reproduce the original failure and confirm
that it does not occur any more with your proposed patch.


OK, thanks. I'll commit it on Monday (with a slight tweak).


This enables the isinf/isnan checks everywhere and also adds #define
_GLIBCXX_INCLUDE_NEXT_C_HEADERS to the configure test, to ensure that
it finds the libc math.h and not an already installed libstdc++
wrapper (which would confuse the test). I also accidentally committed
a change to add 'constexpr' to the configure test, which I reverted.

Tested powerp64-linux, committed to trunk.

commit b0a7399294012b65df3d155da0182e017d6e4214
Author: redi 
Date:   Mon Feb 8 15:22:32 2016 +

Enable isinf/isnan checks for all targets

	PR libstdc++/48891
	* acinclude.m4 (GLIBCXX_CHECK_MATH11_PROTO): Enable isinf and isnan
	checks for all targets except *-*-solaris2.* and ensure we find the
	libc math.h header not our own.
	* configure: Regenerate.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233214 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 057b58e..e667ccc 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -2215,7 +2215,7 @@ AC_DEFUN([GLIBCXX_CHECK_MATH11_PROTO], [
   fi
   AC_MSG_RESULT([$glibcxx_cv_math11_overload])
   ;;
-*-*-*gnu* | *-*-aix* | *-*-hpux*)
+*)
   # If  defines the obsolete isinf(double) and isnan(double)
   # functions (instead of or as well as the C99 generic macros) then we
   # can't define std::isinf(double) and std::isnan(double) in 
@@ -2223,12 +2223,13 @@ AC_DEFUN([GLIBCXX_CHECK_MATH11_PROTO], [
   AC_MSG_CHECKING([for obsolete isinf function in ])
 AC_CACHE_VAL(glibcxx_cv_obsolete_isinf, [
   AC_COMPILE_IFELSE([AC_LANG_SOURCE(
-[#include 
+[#define _GLIBCXX_INCLUDE_NEXT_C_HEADERS
+ #include 
  #undef isinf
  namespace std {
using ::isinf;
-   bool isinf(float);
-   bool isinf(long double);
+   constexpr bool isinf(float);
+   constexpr bool isinf(long double);
  }
  using std::isinf;
  bool b = isinf(0.0);

commit 6c1b4080c3cd651e3559bbbf155d1f09236c68ee
Author: redi 
Date:   Mon Feb 8 15:37:59 2016 +

Remove accidentally added 'constexpr' in previous commit

	* acinclude.m4 (GLIBCXX_CHECK_MATH11_PROTO): Remove accidentally
	added 'constexpr' in previous commit.
	* configure: Regenerate.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233219 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index e667ccc..95df24a 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -2228,8 +2228,8 @@ AC_DEFUN([GLIBCXX_CHECK_MATH11_PROTO], [
  #undef isinf
  namespace std {
using ::isinf;
-   constexpr bool isinf(float);
-   constexpr bool isinf(long double);
+   bool isinf(float);
+   bool isinf(long double);
  }
  using std::isinf;
  bool b = isinf(0.0);


Re: Fix PR67639

2016-02-08 Thread Jeff Law

On 02/08/2016 07:26 AM, Bernd Schmidt wrote:

On 12/21/2015 08:39 PM, Jeff Law wrote:

On 12/18/2015 11:38 AM, Bernd Schmidt wrote:

In an earlier fix, the following change was made in varasm.c for invalid
register variables:

--- trunk/gcc/varasm.c2014/08/26 14:59:59214525
+++ trunk/gcc/varasm.c2014/08/26 17:06:31214526
@@ -1371,6 +1371,11 @@ make_decl_rtl (tree decl)
/* As a register variable, it has no section.  */
return;
  }
+  /* Avoid internal errors from invalid register
+ specifications.  */
+  SET_DECL_ASSEMBLER_NAME (decl, NULL_TREE);
+  DECL_HARD_REGISTER (decl) = 0;
+  return;
  }

As seen in PR67639, this makes the IL inconsistent and triggers another
internal error where we expect to see an SSA_NAME instead of a VAR_DECL.

The following patch extends the above slightly, by also setting
DECL_EXTERNAL to pretend that the erroneous variable is actually a
global.

Bootstrapped and tested on x86_64-linux, ok?

OK.


Turns out 65702 is a dup and this should go into gcc-5 as well. Ok to
backport?

Yes.

Thanks,
jeff


Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump

2016-02-08 Thread Jeff Law

On 02/08/2016 06:07 AM, James Greenhalgh wrote:


Hi,

As far as I can tell, this testcase will only vectorize for x86_64/i?86
targets, so it should be gated to only check for vectorization on those.

Additionally, this test wants to scan the vectorizer dumps, so we ought
to add -fdump-tree-vect-all to the options.

Checked on aarch64 (cross/native) and x86 with no issues.

OK?

Thanks,
James

---
2016-02-08  James Greenhalgh  

* gcc.dg/vect/vect-mask-store-move-1.c: Add dump option, and gate
check on x86_64/i?86.

OK.

Thanks,
Jeff



Re: [PATCH] PR preprocessor/69664: fix rich_location::override_column

2016-02-08 Thread Jeff Law

On 02/04/2016 03:51 PM, David Malcolm wrote:

In gcc 5 and earlier, struct diagnostic_info had a field:
   unsigned int override_column;
which could be set by the macro:
   diagnostic_override_column

This was only used by the frontends' callbacks for handling errors
from libcpp: c_cpp_error for the c-family, and cb_cpp_error for Fortran:
   if (column_override)
 diagnostic_override_column (&diagnostic, column_override);
based on a column_override value passed by libcpp to the callback.

I removed the field when introducing rich_location (in r229884),
instead adding a rich_location::override_column method, called from
libcpp immediately before calling the frontend's error-handling
callback (the callback takes a rich_location *).

I got the implementation of rich_location::override_column wrong
(sorry), and PR preprocessor/69664 is a symptom of this.

Specifically, I was only overriding the column within
m_expanded_location, which affects some parts of
diagnostic-show-locus.c, but I was not overriding the column within
the various expanded_location instances within the rich_location's
m_ranges[0].

Hence the wrong column information is printed for diagnostics
emitted by libcpp where the column is overridden.
This happens for any of the "_with_line" diagnostics calls in
libcpp that are passed a non-zero column override.
I believe these are all confined to libcpp/lex.c.

The attached patch fixes this, by ensuring that all relevant
columns are updated by rich_location::override_column, and
also adding the missing conditional to only call it for non-zero
column_override values (this time in libcpp before calling the
error-handling callback, rather than from within the error-handling
callbacks).

It also adds expected column numbers to some pre-existing tests,
giving us test coverage for this.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu;
OK for trunk?

gcc/testsuite/ChangeLog:
PR preprocessor/69664
* gcc.dg/cpp/trad/comment-2.c: Add expected column number.
* gcc.dg/cpp/warn-comments.c: Likewise.

libcpp/ChangeLog:
PR preprocessor/69664
* errors.c (cpp_diagnostic_with_line): Only call
rich_location::override_column if the column is non-zero.
* line-map.c (rich_location::override_column): Update columns
within m_ranges[0].  Add assertions to verify that doing so is
sane.

OK

Thanks,
Jeff



Re: [RFC] Variants of __typeof

2016-02-08 Thread Jeff Law

On 02/03/2016 11:01 PM, Richard Henderson wrote:

While attempting to write some code that uses the new x86 named address
space support in gcc 6, I found that __typeof is very unhelpful.  In
particular, given

 int __seg_fs *ptr;
 __typeof(*ptr) obj;

OBJ will not be type "int", but "int __seg_fs".  Which means that you
can't use it to create temporaries within statement expressions.
ISTM __typeof is doing the obvious thing here -- what's wrong is the 
__seg_fs is inherently "associated" with the pointer, right?







In the process of writing this, I found a hack in __typeof added just to
support _Atomic.  Which suggests that one of these variants would be
more generally helpful than the hack.

Yea, seems that way.




I add __typeof_noas and __typeof_noqual.  The first strips only the
address space, leaving 'const' and 'volatile' (and, I suppose
'restrict').  The second strips all qualifiers, essentially yielding the
TYPE_MAIN_VARIANT.

Something like this seems reasonable.

jeff


Re: Combine simplify_set WORD_REGISTER_OPERATIONS

2016-02-08 Thread Jeff Law

On 01/31/2016 03:16 PM, Alan Modra wrote:

The comment says this test is supposed to prevent "a narrower
operation than requested", but it actually only allows a larger
subreg, not one the same size.  Fix that.

Bootstrapped and regression tested powerpc64-linux.  OK for stage1?

Note that this bug was found when investigating why gcc-6 does not
suffer from pr69548, ie. this bug was masking a powerpc backend bug.

* combine.c (simplify_set): Correct WORD_REGISTER_OPERATIONS test.
Is there a strong need to apply this to gcc6?  Can we construct a 
testcase where this makes a difference in the code we generate?


My inclination would be to approve for gcc-7 as-is, but I'm more 
hesitant for gcc-6.


jeff



Re: Fix c/69522, memory management issue in c-parser

2016-02-08 Thread Jeff Law

On 01/29/2016 04:40 AM, Bernd Schmidt wrote:

Let's say we have

struct a {
  int x[1];
  int y[1];
} x = { 0, { 0 } };
^

When we reach the marked brace, we call into push_init_level, where we
notice that we have implicit initializers (for x[]) lying around that we
should deal with now that we've seen another open brace. The problem is
that we've created a new obstack for the initializer of y, and this is
where we also put data for the inits of x, freeing it when we see the
close brace for the initialization of y.

In the actual testcase, which is a little more complex to actually
demonstrate the issue, we end up allocating two init elts at the same
address (because of premature freeing) and place them in the same tree,
which ends up containing a cycle because of this. Then we hang.

Fixed by this patch, which splits off a new function
finish_implicit_inits from push_init_level and ensures it's called with
the outer obstack instead of the new one in the problematic case.

Bootstrapped and tested on x86_64-linux, ok?


Bernd

braced-init.diff


c/
PR c/69522
* c-parser.c (c_parser_braced_init): New arg outer_obstack.  All
callers changed.  If nested_p is true, use it to call
finish_implicit_inits.
* c-tree.h (finish_implicit_inits): Declare.
* c-typeck.c (finish_implicit_inits): New function.  Move code
from ...
(push_init_level): ... here.
(set_designator, process_init_element): Call finish_implicit_inits.

testsuite/
PR c/69522
gcc.dg/pr69522.c: New test.

OK.  Thanks for tracking this down.

Thanks,
Jeff



Re: [PATCH] Avoid bugs like PR68273 to trigger

2016-02-08 Thread Eric Botcazou
> No, but if there is none left why would you want to "fix" SRA?

Because I'm afraid this over-aligned type might leak into other places so we 
would probably be better off not creating it in the first place, all the more 
so that it is probably useless in most cases.

For PR tree-opt/65310, why couldn't the vectorizer use the alignment of the 
pointed-to type of the addend of the MEM_REF, like for the alias set:

  # .MEM_8 = VDEF <.MEM_1(D)>
  # lhs access alignment 128+0
  MEM[(struct LorentzVector *)res_7(D)] = SR.22_44;
  # .MEM_53 = VDEF <.MEM_8>
  # lhs access alignment 32+0
  MEM[(struct LorentzVector *)res_7(D) + 4B] = SR.23_43;
  # .MEM_54 = VDEF <.MEM_53>
  # lhs access alignment 64+0
  MEM[(struct LorentzVector *)res_7(D) + 8B] = SR.24_42;
  # .MEM_55 = VDEF <.MEM_54>
  # lhs access alignment 32+0
  MEM[(struct LorentzVector *)res_7(D) + 12B] = SR.25_41;

-- 
Eric Botcazou


Re: [Patch] Update GCC Internals: remove section Preserving virtual SSA form

2016-02-08 Thread Jeff Law

On 01/29/2016 04:24 AM, Nicklas Bo Jensen wrote:

Hi,

The section "12.3.2 Preserving the virtual SSA form" in GCC Internals
is outdated. The two functions it documents push_stmt_changes and
pop_stmt_changes have been removed. The functionality have been
replaced with update_stmt. update_stmt is documented elsewhere in
internals. I therefore propose to remove section 12.3.2.

Furthermore, the function mark_stmt_modified have been replaced by
gimple_set_modified.

Please install this patch.

Best,
Nicklas

2016-01-29  Nicklas Bo Jensen  

 * doc/tree-ssa.texi (Preserving the virtual SSA form): Remove
outdated section

Thanks.  Installed.
jeff



Re: [PATCH] PR c++/69139

2016-02-08 Thread Patrick Palka
On Sat, Feb 6, 2016 at 5:25 PM, Adam Butcher  wrote:
> PR c++/69139
> * cp/parser.c (cp_parser_simple_type_specifier): Don't mistake 'auto'
> in trailing return function pointer types as an implicit template
> parameter.
>
> PR c++/69139
> * g++.dg/cpp0x/trailing12.C: New test.
> ---
>  gcc/cp/parser.c | 22 ++
>  gcc/testsuite/g++.dg/cpp0x/trailing12.C |  6 ++
>  2 files changed, 24 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/trailing12.C
>
> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index d03b0c9..c1a9674 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -16029,8 +16029,11 @@ cp_parser_simple_type_specifier (cp_parser* parser,
>maybe_warn_cpp0x (CPP0X_AUTO);
>if (parser->auto_is_implicit_function_template_parm_p)
> {
> - /* The 'auto' might be the placeholder return type for a function 
> decl
> -with trailing return type.  */
> + /* The 'auto' might be the placeholder return type for a function 
> type
> +with trailing return type.  Look for a '->' after parameter list.
> +Handle pointer-to-function, function reference and
> +pointer-to-member-function by tentatively consuming two pairs of
> +parens before testing for '->'.  */
>   bool have_trailing_return_fn_decl = false;
>   if (cp_lexer_peek_nth_token (parser->lexer, 2)->type
>   == CPP_OPEN_PAREN)
> @@ -16042,8 +16045,19 @@ cp_parser_simple_type_specifier (cp_parser* parser,
>  /*recovering*/false,
>  /*or_comma*/false,
>  
> /*consume_paren*/true))
> -   have_trailing_return_fn_decl
> - = cp_lexer_next_token_is (parser->lexer, CPP_DEREF);
> +   {
> + if (cp_lexer_next_token_is (parser->lexer, CPP_DEREF))
> +   have_trailing_return_fn_decl = true;
> + else if ((cp_lexer_consume_token (parser->lexer)->type
> +   == CPP_OPEN_PAREN)
> +  && (cp_parser_skip_to_closing_parenthesis
> +  (parser,
> +   /*recovering*/false,
> +   /*or_comma*/false,
> +   /*consume_paren*/true)))
> +   have_trailing_return_fn_decl
> + = cp_lexer_next_token_is (parser->lexer, CPP_DEREF);
> +   }
>   cp_parser_abort_tentative_parse (parser);
> }
>
> diff --git a/gcc/testsuite/g++.dg/cpp0x/trailing12.C 
> b/gcc/testsuite/g++.dg/cpp0x/trailing12.C
> new file mode 100644
> index 000..f3e02a8
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/trailing12.C
> @@ -0,0 +1,6 @@
> +// PR c++/69139
> +// { dg-do compile { target c++11 } }
> +
> +auto get(int) -> int { return {}; }
> +template  int f(auto (*)(int) -> R) { return {}; }
> +int i = f(get);
> --
> 2.7.0
>

Does the trailing-return get noticed if there's an __attribute__
specifier in between it and the auto? For example,

   void foo (auto __attribute__ ((unused)) f (int) -> int) { }


BTW, last month I posted a patch for this PR that handles all kinds of
specifiers as well __attribute__ specifiers.

Patch is at: https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02004.html
-- it makes the parser arbitrarily look ahead (while skipping over
pairs of parens) until it finds a DEREF, a COMMA, a CLOSE_PAREN or an
EQ.  If it first finds a DEREF then have_trailing_return_fn_decl is
set.  Dunno if it's better to have this kind of "dumb" lookahead, or
to be more explicit about one expects to consume like your followup
patch does.


Re: [PATCH] Avoid bugs like PR68273 to trigger

2016-02-08 Thread Richard Biener
On February 8, 2016 5:31:16 PM GMT+01:00, Eric Botcazou  
wrote:
>> No, but if there is none left why would you want to "fix" SRA?
>
>Because I'm afraid this over-aligned type might leak into other places
>so we 
>would probably be better off not creating it in the first place, all
>the more 
>so that it is probably useless in most cases.
>
>For PR tree-opt/65310, why couldn't the vectorizer use the alignment of
>the 
>pointed-to type of the addend of the MEM_REF, like for the alias set:

Because that's not valid.

Richard.

>  # .MEM_8 = VDEF <.MEM_1(D)>
>  # lhs access alignment 128+0
>  MEM[(struct LorentzVector *)res_7(D)] = SR.22_44;
>  # .MEM_53 = VDEF <.MEM_8>
>  # lhs access alignment 32+0
>  MEM[(struct LorentzVector *)res_7(D) + 4B] = SR.23_43;
>  # .MEM_54 = VDEF <.MEM_53>
>  # lhs access alignment 64+0
>  MEM[(struct LorentzVector *)res_7(D) + 8B] = SR.24_42;
>  # .MEM_55 = VDEF <.MEM_54>
>  # lhs access alignment 32+0
>  MEM[(struct LorentzVector *)res_7(D) + 12B] = SR.25_41;




Re: [PR66726] Fixe regression caused by Factor conversion out of COND_EXPR

2016-02-08 Thread Jeff Law

On 01/18/2016 08:52 PM, Kugan wrote:

Hi,

This is an updated version of
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02196.html.

Patch to fix PR66726 missed optimization, factor conversion out of
COND_EXPR caused a regression for targets with branch cost greater than
i.e., testcase gcc.dg/pr46309.c failed for these targets. I posted a
patch for this which had some issues. Please find an updated version of
this patch that now passes regression.

This patch makes optimize_range_tests understand the factored out
COND_EXPR. i.e., Updated the final_range_test_p to look for the new
pattern. Changed the maybe_optimize_range_tests (which does the inter
basic block range test optimization) accordingly.

Bootstrapped and regression tested on x86_64-none-linux-gnu with no new
regressions. And also regression tested on arm-none-linux-gnu and
aarch64-none-linux-gnu with no new regressions.
Is this Ok for trunk?

Thanks,
Kugan


gcc/ChangeLog:

2016-01-19  Kugan Vivekanandarajah  

PR middle-end/66726
* tree-ssa-reassoc.c (optimize_range_tests): Handle tcc_compare stmt
whose result is used in PHI.
(maybe_optimize_range_tests): Likewise.
(final_range_test_p): Lokweise.


s/Lokeweise/Likewise/ in the ChangeLog entry.

Otherwise this looks OK for the trunk.  It really hasn't changed much 
since the version from July.  And while the PR is not marked as such, 
this is a code quality regression fix for targets with a BRANCH_COST > 1.


Thanks for your patience and not letting this get lost.

Jeff


Re: [PATCH][P1 tree-optimization/68541] Add heuristics to path splitting

2016-02-08 Thread Jeff Law

On 02/08/2016 07:18 AM, Kyrill Tkachov wrote:

Hi Jeff,

On 05/02/16 23:49, Jeff Law wrote:

This patch addresses multiple issues with how we determine when to
split paths.  The primary motivation is addressing 68541 (P1).


As I've gotten testcodes from Ajit, I've been able to look closely at
the path splitting opportunities and consistently what I've seen is
that contrary to the original belief, CSE/DCE opportunities are rarely
exposed by path splitting. It seems the benefit is more due to
removing the unconditional jump inherent in a CFG diamond -- even more
so on the microblaze where these jumps have delay slots.

While there are cases where splitting allows for CSE/DCE, they're the
exception rather than the rule in the codes I've looked at.

With that in mind, we want to encourage path splitting when the amount
of code we're duplicating is small.   We also want to balance that
with not path splitting when the original code is something that may
be if-convertable.

The vast majority of if-convertable codes are cases where the two
predecessors of the join block are single statement blocks with their
results feeding the same PHI in the join block.  We now reject that
case for path splitting so as not to spoil if-conversion.

The only wrinkle with that heuristic is that some codes (MIN, MAX,
ABS, COND, calls, etc) are not good candidates for further if
conversion. (ie, we have a MIN_EXPR in both predecessors that feed the
PHI in the join).   So we still allow those cases for splitting.  This
can be easily refined as we find more cases or as the if-converter is
extended.

So with that in place as the first filter, we just need to put a
limiter on the number of statements we allow to be copied.  I punted a
bit and re-used the PARAM for jump threading.  They've got enough in
common that I felt it was reasonable.  If I were to create a new
PARAM, I'd probably start with a smaller default value after doing
further instrumentation.

While I was working through this I noticed we were path splitting in
some cases where we shouldn't.  Particularly if we had a CFG like:

  a
 / \
b   c
   / \ /
  e   d
 / \
/   \
   / \
 latch   exit


Duplicating d for the edges b->d and c->d isn't as helpful because the
duplicate can't be merged into b.  We now detect this kind of case and
reject it for path splitting.

The new tests are a combination of reductions of codes from Ajit and
my own poking around.

Looking further out, I really wonder if the low level aspects of path
splitting like we're trying to capture here really belong in the
cross-jumping phase of the RTL optimizers.  The higher level aspects
(when we are able to expose CSE/DCE opportunities) should be driven by
actually looking at the propagation/simplifications enabled by a
potential split path.  But those are gcc-7 things.

Bootstrapped and regression tested on x86 linux.  Installed on the
trunk.  I'll obviously keep an eye out for how the tests are handled
on other platforms -- I don't think the tests are real sensitive to
branch costs and such, but I've been surprised before.

Onward to the jump threading code explosion wrap-up...

jeff



After this patch I also see:
FAIL: gcc.dg/tree-ssa/split-path-1.c scan-tree-dump split-paths
"Duplicating join block"

on arm, but not on aarch64. My arm-none-eabi cross compiler is
configured with:
--with-float=hard --with-cpu=cortex-a9 --with-fpu=neon --with-mode=thumb
Thanks.  That configuration just has more statements in the join block 
and bumps up against the limit.


I'll probably just manually increase the limit for the test.

jeff


[committed] jit: fix build after r233218 (build_common_tree_nodes, -fshort-double)

2016-02-08 Thread David Malcolm
Looks like r233218's change to build_common_tree_nodes missed the jit
subdirectory:

../../src/gcc/jit/dummy-frontend.c: In function ‘bool
jit_langhook_init()’:
../../src/gcc/jit/dummy-frontend.c:108:40: error: too many arguments to
function ‘void build_common_tree_nodes(bool)’
   build_common_tree_nodes (false, false);
^

In file included from ../../src/gcc/jit/jit-common.h:27:0,
 from ../../src/gcc/jit/jit-recording.h:24,
 from ../../src/gcc/jit/jit-playback.h:28,
 from ../../src/gcc/jit/dummy-frontend.c:23:
../../src/gcc/tree.h:4766:13: note: declared here
 extern void build_common_tree_nodes (bool);
 ^~~

Fixed in trunk in r233222 in the obvious way, as attached.

DaveIndex: gcc/jit/ChangeLog
===
--- gcc/jit/ChangeLog	(revision 233221)
+++ gcc/jit/ChangeLog	(revision 233222)
@@ -1,3 +1,9 @@
+2016-02-08  David Malcolm  
+
+	* dummy-frontend.c (jit_langhook_init): Remove
+	second argument to build_common_tree_nodes to
+	track r233218.
+
 2016-01-23  Iain Buclaw  
 
 	* jit-playback.c: Include pthread.h.
Index: gcc/jit/dummy-frontend.c
===
--- gcc/jit/dummy-frontend.c	(revision 233221)
+++ gcc/jit/dummy-frontend.c	(revision 233222)
@@ -105,7 +105,7 @@
   registered_root_tab = true;
 }
 
-  build_common_tree_nodes (false, false);
+  build_common_tree_nodes (false);
 
   /* I don't know why this has to be done explicitly.  */
   void_list_node = build_tree_list (NULL_TREE, void_type_node);


Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-08 Thread Michael Matz
Hi,

On Mon, 8 Feb 2016, Richard Biener wrote:

> 429.mcf  9120243   37.6 S9120245   37.3 S
> 429.mcf  9120224   40.7 S9120241   37.8 *
> 429.mcf  9120225   40.5 *9120229   39.9 S

> 471.omnetpp  6250326   19.1 S6250328   19.1 S
> 471.omnetpp  6250305   20.5 *6250324   19.3 *
> 471.omnetpp  6250296   21.1 S6250305   20.5 S

> 435.gromacs  7140316   22.6 S7140264   27.1 S
> 435.gromacs  7140314   22.7 S7140263   27.2 S
> 435.gromacs  7140316   22.6 *7140263   27.1 *
> 
> So overall the patch is a loss for SPEC CPU 2006 INT due to the 429.mcf 
> and 471.omnetpp regressions and a win on SPEC FP. (I didn't test SPEC 
> INT previously, only FP)

That's no regression if you look at the detailed results (like above).  
mcf and omnetpp were noisy on your machine, if you took the minimum of 
each run only the large progression of gromacs would remain.

> The gcc.target/i386/addr-sel-1.c (for PR28940) seems to just started
> working at some point past in time and thus it was added and the
> bug closed.  You could say RA does a better job after the patch
> as it uses 1 less register but that restricts the followup
> postreload combine attempts.  Though I wonder about what's "better"
> RA here - isn't the best allocation one that avoids spills but
> uses as many registers as possible (at least when targeting a CPU
> that cannot to register renaming)?

Yeah, I think the testcase was added for the wrong reasons.  It does 
exhibit a desirable outcome for sure, but the output became such by random 
changes, and hence also can go back by other changes.

> So I have applied the patch now,

I think the patch makes perfect sense.  ira_setup_alts should have no 
observable behaviour from the outside, except the returned value of merged 
acceptable alternatives.  Certainly it has no business to fiddle with 
recog_data.  It only does the swapping to merge alternatives, and 
accidentaly left them swapped; the merging could have been implemented 
without swapping recog_data.operands, and then the whole issue wouldn't 
have occurred (and addr-sel-1.c wouldn't have been added because it still 
would be "broken").


Ciao,
Michael.


[committed] Add testcase for already fixed PR ipa/69239

2016-02-08 Thread Jakub Jelinek
Hi!

Honza has fixed this PR in r232356, I've added the testcase as
obvious, so that we can close the PR.

2016-02-08  Jakub Jelinek  

PR ipa/69239
* g++.dg/ipa/pr69239.C: New test.

--- gcc/testsuite/g++.dg/ipa/pr69239.C.jj   2016-02-08 18:14:08.807338362 
+0100
+++ gcc/testsuite/g++.dg/ipa/pr69239.C  2016-02-08 18:15:30.748211371 +0100
@@ -0,0 +1,71 @@
+// PR ipa/69239
+// { dg-do run }
+// { dg-options "-O2 --param=early-inlining-insns=196" }
+// { dg-additional-options "-fPIC" { target fpic } }
+
+struct D
+{
+  float f;
+  D () {}
+  virtual float bar (float z);
+};
+
+struct A
+{
+  A ();
+  virtual int foo (int i);
+};
+
+struct B : public D, public A
+{
+  virtual int foo (int i);
+};
+
+float
+D::bar (float)
+{
+  return f / 2;
+}
+
+int
+A::foo (int i)
+{
+  return i + 1;
+}
+
+int
+B::foo (int i)
+{
+  return i + 2;
+}
+
+int __attribute__ ((noinline,noclone))
+baz ()
+{
+  return 1;
+}
+
+static int __attribute__ ((noinline))
+fn (A *obj, int i)
+{
+  return obj->foo (i);
+}
+
+inline __attribute__ ((always_inline))
+A::A ()
+{
+  if (fn (this, baz ()) != 2)
+__builtin_abort ();
+}
+
+static void
+bah ()
+{
+  B b;
+}
+
+int
+main ()
+{
+  bah ();
+}

Jakub


Re: [RFC] Variants of __typeof

2016-02-08 Thread Andrew MacLeod
mailer decided to send html I guess... for some reason.. all of a 
sudden...  stupid computer. anyway...  the original bounced., so here it 
is again...



On 02/08/2016 12:05 PM, Andrew MacLeod wrote:

On 02/08/2016 11:21 AM, Jeff Law wrote:

On 02/03/2016 11:01 PM, Richard Henderson wrote:





In the process of writing this, I found a hack in __typeof added 
just to

support _Atomic.  Which suggests that one of these variants would be
more generally helpful than the hack.

Yea, seems that way.




fyi, it originated here:

https://gcc.gnu.org/ml/gcc-patches/2013-08/msg00420.html

snippet near the bottom:

From: "Joseph S. Myers" 

> On Fri, 2 Aug 2013, Andrew MacLeod wrote:
> Ive included an additional 2 line patch which should change the
meaning of
> __typeof__  (again untested, the joys of imminently leaving for
2 weeks  :-).
> Im not sure the normal practical uses of
> __typeof__ have much meaning for an atomic type, it seems far
more useful to
> have __typeof__ for an atomic qualified type to return the
non-atomic variant.

What typeof should do in general for qualified types is unclear
(especially in the case of rvalues, where the movement in ISO C
seems to
be to say that rvalues can't have qualified types at all) -
returning the
non-atomic type seems reasonable to me.


Andrew




Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-08 Thread Vladimir Makarov

On 02/08/2016 12:38 PM, Michael Matz wrote:


I think the patch makes perfect sense.  ira_setup_alts should have no
observable behaviour from the outside, except the returned value of merged
acceptable alternatives.  Certainly it has no business to fiddle with
recog_data.  It only does the swapping to merge alternatives, and
accidentaly left them swapped; the merging could have been implemented
without swapping recog_data.operands, and then the whole issue wouldn't
have occurred (and addr-sel-1.c wouldn't have been added because it still
would be "broken").

Sorry, I was confused by Richard's message thinking that his patch 
actually exchanges the operands.  I think we have some expression 
shaping optimizations and exchanging operands probably rejects the 
optimization effect.  With this point of view ira-costs.c should not 
exchange operands.   So the patch is not bogus as Richard wrote but 
perfectly legitimate.




Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-08 Thread Richard Biener
On February 8, 2016 7:07:46 PM GMT+01:00, Vladimir Makarov 
 wrote:
>On 02/08/2016 12:38 PM, Michael Matz wrote:
>>
>> I think the patch makes perfect sense.  ira_setup_alts should have no
>> observable behaviour from the outside, except the returned value of
>merged
>> acceptable alternatives.  Certainly it has no business to fiddle with
>> recog_data.  It only does the swapping to merge alternatives, and
>> accidentaly left them swapped; the merging could have been
>implemented
>> without swapping recog_data.operands, and then the whole issue
>wouldn't
>> have occurred (and addr-sel-1.c wouldn't have been added because it
>still
>> would be "broken").
>>
>Sorry, I was confused by Richard's message thinking that his patch 
>actually exchanges the operands.  I think we have some expression 
>shaping optimizations and exchanging operands probably rejects the 
>optimization effect.  With this point of view ira-costs.c should not 
>exchange operands.   So the patch is not bogus as Richard wrote but 
>perfectly legitimate.

Yes, I meant the posted alternative patch that just unswapped for the cases 
that Andreas patch changed whether we consider swapping or not.

The applied patch indeed makes perfect sense.

Richard.



[PATCH] Fix fnsplit ICE (PR tree-optimization/69209)

2016-02-08 Thread Jakub Jelinek
Hi!

This fixes an ICE, where the split part doesn't return a value
of a is_gimple_reg_type retval, only compares its address (therefore it is
addressable), and the main part only uses the var in return_bb.
In that case, retval is not gimple val, but needs to be returned
as gimple val.  So we need to load it into a SSA_NAME.
I've tried to construct testcases for other cases where we might need
something similar, but have not succeeded, so maybe it is the only spot
that needs such handling.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-02-08  Jakub Jelinek  

PR tree-optimization/69209
* ipa-split.c (split_function): If split part is not
returning retval, retval has gimple type but is not
gimple value, force it into a SSA_NAME first.

* gcc.c-torture/compile/pr69209.c: New test.

--- gcc/ipa-split.c.jj  2016-02-03 23:36:29.0 +0100
+++ gcc/ipa-split.c 2016-02-08 15:56:41.701994621 +0100
@@ -1628,8 +1628,22 @@ split_function (basic_block return_bb, s
gimple_call_set_lhs (call, build_simple_mem_ref (retval));
  else
gimple_call_set_lhs (call, retval);
+ gsi_insert_after (&gsi, call, GSI_NEW_STMT);
+   }
+ else
+   {
+ gsi_insert_after (&gsi, call, GSI_NEW_STMT);
+ if (retval
+ && is_gimple_reg_type (TREE_TYPE (retval))
+ && !is_gimple_val (retval))
+   {
+ gassign *g
+   = gimple_build_assign (make_ssa_name (TREE_TYPE (retval)),
+  retval);
+ retval = gimple_assign_lhs (g);
+ gsi_insert_after (&gsi, g, GSI_NEW_STMT);
+   }
}
-  gsi_insert_after (&gsi, call, GSI_NEW_STMT);
  /* Build bndret call to obtain returned bounds.  */
  if (retbnd)
chkp_insert_retbnd_call (retbnd, retval, &gsi);
--- gcc/testsuite/gcc.c-torture/compile/pr69209.c.jj2016-02-08 
16:13:32.527138280 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr69209.c   2016-02-08 
16:13:19.0 +0100
@@ -0,0 +1,28 @@
+/* PR tree-optimization/69209 */
+
+int a, c, *d, e;
+
+void foo (void) __attribute__ ((__noreturn__));
+
+int
+bar (void)
+{
+  int f;
+  if (a)
+{
+  if (e)
+   foo ();
+  foo ();
+}
+  if (d != &f)
+foo ();
+  if (!c)
+foo ();
+  return f;
+}
+
+void
+baz ()
+{
+  bar ();
+}

Jakub


Re: [PATCH] PR c++/69139

2016-02-08 Thread Jason Merrill

On 02/08/2016 11:43 AM, Patrick Palka wrote:

BTW, last month I posted a patch for this PR that handles all kinds of
specifiers as well __attribute__ specifiers.

Patch is at: https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02004.html
-- it makes the parser arbitrarily look ahead (while skipping over
pairs of parens) until it finds a DEREF, a COMMA, a CLOSE_PAREN or an
EQ.  If it first finds a DEREF then have_trailing_return_fn_decl is
set.  Dunno if it's better to have this kind of "dumb" lookahead, or
to be more explicit about one expects to consume like your followup
patch does.


Hmm, I think I prefer your approach, but please add the testcase with 
stuff between ) and ->.


Jason



Re: [PATCH] Fix PR c++/69283 (auto deduction fails when ADL is required)

2016-02-08 Thread Jason Merrill

OK.

Jason


Re: [PATCH] PR c++/69139

2016-02-08 Thread Patrick Palka

On Mon, 8 Feb 2016, Jason Merrill wrote:


On 02/08/2016 11:43 AM, Patrick Palka wrote:

BTW, last month I posted a patch for this PR that handles all kinds of
specifiers as well __attribute__ specifiers.

Patch is at: https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02004.html
-- it makes the parser arbitrarily look ahead (while skipping over
pairs of parens) until it finds a DEREF, a COMMA, a CLOSE_PAREN or an
EQ.  If it first finds a DEREF then have_trailing_return_fn_decl is
set.  Dunno if it's better to have this kind of "dumb" lookahead, or
to be more explicit about one expects to consume like your followup
patch does.


Hmm, I think I prefer your approach, but please add the testcase with stuff 
between ) and ->.


Done.  Does this look OK?

-- >8 --

gcc/cp/ChangeLog:

PR c++/69139
* parser.c (cp_parser_simple_type_specifier): Make the check
for disambiguating between an 'auto' placeholder and an implicit
template parameter more robust.

gcc/testsuite/ChangeLog:

PR c++/69139
* g++.dg/cpp0x/trailing12.C: New test.
* g++.dg/cpp0x/trailing13.C: New test.
---
 gcc/cp/parser.c | 33 +++--
 gcc/testsuite/g++.dg/cpp0x/trailing12.C | 22 ++
 gcc/testsuite/g++.dg/cpp0x/trailing13.C | 12 
 3 files changed, 57 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/trailing12.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/trailing13.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d03b0c9..56c834f 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -16032,20 +16032,33 @@ cp_parser_simple_type_specifier (cp_parser* parser,
  /* The 'auto' might be the placeholder return type for a function decl
 with trailing return type.  */
  bool have_trailing_return_fn_decl = false;
- if (cp_lexer_peek_nth_token (parser->lexer, 2)->type
- == CPP_OPEN_PAREN)
+
+ cp_parser_parse_tentatively (parser);
+ cp_lexer_consume_token (parser->lexer);
+ while (cp_lexer_next_token_is_not (parser->lexer, CPP_EQ)
+&& cp_lexer_next_token_is_not (parser->lexer, CPP_COMMA)
+&& cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN)
+&& cp_lexer_next_token_is_not (parser->lexer, CPP_EOF))
{
- cp_parser_parse_tentatively (parser);
- cp_lexer_consume_token (parser->lexer);
- cp_lexer_consume_token (parser->lexer);
- if (cp_parser_skip_to_closing_parenthesis (parser,
+ if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN))
+   {
+ cp_lexer_consume_token (parser->lexer);
+ cp_parser_skip_to_closing_parenthesis (parser,
 /*recovering*/false,
 /*or_comma*/false,
-/*consume_paren*/true))
-   have_trailing_return_fn_decl
- = cp_lexer_next_token_is (parser->lexer, CPP_DEREF);
- cp_parser_abort_tentative_parse (parser);
+/*consume_paren*/true);
+ continue;
+   }
+
+ if (cp_lexer_next_token_is (parser->lexer, CPP_DEREF))
+   {
+ have_trailing_return_fn_decl = true;
+ break;
+   }
+
+ cp_lexer_consume_token (parser->lexer);
}
+ cp_parser_abort_tentative_parse (parser);

  if (have_trailing_return_fn_decl)
{
diff --git a/gcc/testsuite/g++.dg/cpp0x/trailing12.C 
b/gcc/testsuite/g++.dg/cpp0x/trailing12.C
new file mode 100644
index 000..a27d988
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/trailing12.C
@@ -0,0 +1,22 @@
+// PR c++/69139
+// { dg-do compile { target c++11 } }
+
+auto get(int) -> int { return {}; }
+template  int f(auto (*)(int) -> R) { return {}; }
+int i = f(get);
+
+int foo1 (auto (int) -> char);
+
+int foo2 (auto f(int) -> char);
+
+int foo2 (auto (f)(int) -> char);
+
+int foo3 (auto (*f)(int) -> char);
+
+int foo4 (auto (*const **&f)(int) -> char);
+
+int foo5 (auto (*const **&f)(int, int *) -> char);
+
+int foo6 (auto (int) const -> char); // { dg-error "const" }
+
+void foo7 (auto __attribute__ ((unused)) f (int) -> int) { }
diff --git a/gcc/testsuite/g++.dg/cpp0x/trailing13.C 
b/gcc/testsuite/g++.dg/cpp0x/trailing13.C
new file mode 100644
index 000..2681bcd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/trailing13.C
@@ -0,0 +1,12 @@
+// PR c++/69139
+// { dg-do compile { target c++11 } }
+
+struct X {
+  auto get(int) const & noexcept -> int { return {}; }
+  auto get(int) && throw () -> long { return {}; }
+};
+
+template  auto f(auto (X::*)(int) const & -> R) -> R {}
+
+using I = decltype(f(&X::get));
+using I = i

[patch, fortran] PR56007 Remarkably bad error message with DO array=1,2

2016-02-08 Thread Harald Anlauf
Hi,

the simple patch below rejects arrays as do loop index
variable before another (confusing) error message is emitted.
Two new testcases derived from the PR, plus adaption of one
testcase that relies on the old error message.

Whoever wants to take it...

Harald


2016-02-08  Harald Anlauf 

PR fortran/56007
* match.c (gfc_match_iterator): Add diagnostic for array variable
as do loop index.

Index: gcc/fortran/match.c
===
--- gcc/fortran/match.c (revision 233203)
+++ gcc/fortran/match.c (working copy)
@@ -877,6 +877,12 @@
   if (m != MATCH_YES)
 return MATCH_NO;

+  if (var->symtree->n.sym->attr.dimension)
+{
+  gfc_error ("Loop variable at %C cannot be an array");
+  goto cleanup;
+}
+
   /* F2008, C617 & C565.  */
   if (var->symtree->n.sym->attr.codimension)
 {


2016-02-08  Harald Anlauf 

PR fortran/56007
* gfortran.dg/coarray_8.f90: Adjust error message.
* gfortran.dg/pr56007.f90: New test.
* gfortran.dg/pr56007.f: New test.

Index: gcc/testsuite/gfortran.dg/coarray_8.f90
===
--- gcc/testsuite/gfortran.dg/coarray_8.f90 (revision 233203)
+++ gcc/testsuite/gfortran.dg/coarray_8.f90 (working copy)
@@ -146,7 +146,7 @@
 subroutine tfgh()
   integer :: i(2)
   DATA i/(i, i=1,2)/ ! { dg-error "Expected PARAMETER symbol" }
-  do i = 1, 5 ! { dg-error "cannot be a sub-component" }
+  do i = 1, 5 ! { dg-error "cannot be an array" }
   end do ! { dg-error "Expecting END SUBROUTINE" }
 end subroutine tfgh

Index: gcc/testsuite/gfortran.dg/pr56007.f90
===
--- gcc/testsuite/gfortran.dg/pr56007.f90   (revision 0)
+++ gcc/testsuite/gfortran.dg/pr56007.f90   (revision 0)
@@ -0,0 +1,11 @@
+! { dg-do compile }
+! PR fortran/56007
+! Based on testcase by Tobias Schlüter
+
+  integer iw1(90), doiw1(90)
+  do iw1=1,2 ! { dg-error "cannot be an array" }
+  end do ! { dg-error "Expecting END PROGRAM statement" }
+  do iw1(1)=1! { dg-error "Unclassifiable statement" }
+  do iw1=1   ! { dg-error "cannot be an array" }
+  end do ! { dg-error "Expecting END PROGRAM statement" }
+END program
Index: gcc/testsuite/gfortran.dg/pr56007.f
===
--- gcc/testsuite/gfortran.dg/pr56007.f (revision 0)
+++ gcc/testsuite/gfortran.dg/pr56007.f (revision 0)
@@ -0,0 +1,10 @@
+! { dg-do compile }
+! PR fortran/56007
+! Based on testcase by Tobias Schlüter
+
+  integer iw1(90), doiw1(90)
+  do iw1(1)=1
+  do iw1=1
+  do iw1=1,2! { dg-error "cannot be an array" }
+  end do! { dg-error "Expecting END PROGRAM statement" }
+  END


Re: [PATCH] Fix fnsplit ICE (PR tree-optimization/69209)

2016-02-08 Thread Richard Biener
On February 8, 2016 7:27:49 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>This fixes an ICE, where the split part doesn't return a value
>of a is_gimple_reg_type retval, only compares its address (therefore it
>is
>addressable), and the main part only uses the var in return_bb.
>In that case, retval is not gimple val, but needs to be returned
>as gimple val.  So we need to load it into a SSA_NAME.
>I've tried to construct testcases for other cases where we might need
>something similar, but have not succeeded, so maybe it is the only spot
>that needs such handling.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

>2016-02-08  Jakub Jelinek  
>
>   PR tree-optimization/69209
>   * ipa-split.c (split_function): If split part is not
>   returning retval, retval has gimple type but is not
>   gimple value, force it into a SSA_NAME first.
>
>   * gcc.c-torture/compile/pr69209.c: New test.
>
>--- gcc/ipa-split.c.jj 2016-02-03 23:36:29.0 +0100
>+++ gcc/ipa-split.c2016-02-08 15:56:41.701994621 +0100
>@@ -1628,8 +1628,22 @@ split_function (basic_block return_bb, s
>   gimple_call_set_lhs (call, build_simple_mem_ref (retval));
> else
>   gimple_call_set_lhs (call, retval);
>+gsi_insert_after (&gsi, call, GSI_NEW_STMT);
>+  }
>+else
>+  {
>+gsi_insert_after (&gsi, call, GSI_NEW_STMT);
>+if (retval
>+&& is_gimple_reg_type (TREE_TYPE (retval))
>+&& !is_gimple_val (retval))
>+  {
>+gassign *g
>+  = gimple_build_assign (make_ssa_name (TREE_TYPE (retval)),
>+ retval);
>+retval = gimple_assign_lhs (g);
>+gsi_insert_after (&gsi, g, GSI_NEW_STMT);
>+  }
>   }
>-  gsi_insert_after (&gsi, call, GSI_NEW_STMT);
> /* Build bndret call to obtain returned bounds.  */
> if (retbnd)
>   chkp_insert_retbnd_call (retbnd, retval, &gsi);
>--- gcc/testsuite/gcc.c-torture/compile/pr69209.c.jj   2016-02-08
>16:13:32.527138280 +0100
>+++ gcc/testsuite/gcc.c-torture/compile/pr69209.c  2016-02-08
>16:13:19.0 +0100
>@@ -0,0 +1,28 @@
>+/* PR tree-optimization/69209 */
>+
>+int a, c, *d, e;
>+
>+void foo (void) __attribute__ ((__noreturn__));
>+
>+int
>+bar (void)
>+{
>+  int f;
>+  if (a)
>+{
>+  if (e)
>+  foo ();
>+  foo ();
>+}
>+  if (d != &f)
>+foo ();
>+  if (!c)
>+foo ();
>+  return f;
>+}
>+
>+void
>+baz ()
>+{
>+  bar ();
>+}
>
>   Jakub




Re: [PATCH][P1 tree-optimization/68541] Add heuristics to path splitting

2016-02-08 Thread Jeff Law

On 02/08/2016 10:02 AM, Jeff Law wrote:




After this patch I also see:
FAIL: gcc.dg/tree-ssa/split-path-1.c scan-tree-dump split-paths
"Duplicating join block"

on arm, but not on aarch64. My arm-none-eabi cross compiler is
configured with:
--with-float=hard --with-cpu=cortex-a9 --with-fpu=neon --with-mode=thumb

Thanks.  That configuration just has more statements in the join block
and bumps up against the limit.

I'll probably just manually increase the limit for the test.
So just for the record, for the configuration above, we generate more 
statements in the join block than for the x86_64 and aarch64 targets. 
This appears to be due to differences in addressing modes and IVopts 
selections.


Prior to my change for 68541, there was no limit on the number of 
statements we were allowed to copy, so it "just worked", even with the 
additional statements.  After the change to 68541 we hit the newly added 
upper limit for some configurations.


For the purposes of the test I have bumped the upper limit on how many 
statements it will copy from the join block and verified by hand the 
path is split as expected for the configuration noted above and verified 
that the x86-64 testresults are unaffected.


Thanks,
Jeff

commit 63b766bdef44518d33a21569a089fca2356d6ffe
Author: Jeff Law 
Date:   Mon Feb 8 12:52:21 2016 -0700

PR tree-optimization/68541
* gcc.dg/tree-ssa/split-path-1.c: Increase limit for number
of statements allowed in join block for path splitting.

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 64f512d..9f629d6 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2016-02-08  Jeff Law  
+
+   PR tree-optimization/68541
+   * gcc.dg/tree-ssa/split-path-1.c: Increase limit for number
+   of statements allowed in join block for path splitting.
+
 2016-02-08  Jakub Jelinek  
 
PR c++/59627
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/split-path-1.c
index b24f6a9..8b23ef4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/split-path-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details " } */
+/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details --param 
max-jump-thread-duplication-stmts=20" } */
 
 #include 
 #include 


Re: [PATCH 1/4] Make SRA scalarize constant-pool loads

2016-02-08 Thread Jeff Law

On 01/27/2016 05:14 PM, David Edelsohn wrote:

On Wed, Jan 27, 2016 at 6:36 PM, Jeff Law  wrote:

On 01/27/2016 12:39 PM, David Edelsohn wrote:


The new sra-17.c and sra-18.c tests fail on AIX because the regex is
too restrictive -- AIX labels don't have exactly the same format.  On
AIX, the labels in the dumps look like "LC..0" instead of ".LC0".

This patch adds "*" and ".*" so that the "." prepended to LC is
optional and to allow characters between the "LC" and the "0".

I needed extra escapes for the sra-17.c line that matches multiple
times - for no apparent reason.


The joys of expect/tcl.  I just keep escaping until the regex that I
developed outside the suite works.  I have been trying to get away from
using .* though.  The longest match nature sometimes gives surprising
results.  In theory .*? ought to work better, but I haven't tried using it
much.

Anyway, the change looks fine to me.


Segher pointed out to me that my revised regex was matching multiple
lines, so it was not triggering multiple times without the restriction
on the pattern.

A revised, tighter patch uses "?"

Yup, that's precisely why I've moved away from .* :-)

jeff



Re: AW: Wonly-top-basic-asm

2016-02-08 Thread David Wohlferd

On 2/7/2016 10:45 PM, Bernd Edlinger wrote:

On 8. 2. 2016 04:45, David Wohlferd wrote:

I replied with a patch that includes most of the changes you asked for
(see inline below).  Were you waiting on me for something more?

ChangeLog entries are still missing.


I'll add them back in the next post.


David, there is a tool that you can use to check the patch
for some style-nits before submission.


I was not aware of this tool.  I'll fix these before the next post.

At one point you proposed renaming the option -Wbasic-asm.  This seemed 
a little terse (and possibly misleading) so I counter-proposed 
-Wbasic-asm-in-function (a little verbose, but clearer).  I have no 
strong preferences here, and you haven't said one way or the other.  Are 
we just going to stick with -Wonly-top-basic-asm?


Hopefully one more try and this will be done.

Thanks,
dw


Re: [PATCH] PR c++/69139

2016-02-08 Thread Adam Butcher

On 2016-02-08 19:14, Patrick Palka wrote:

On Mon, 8 Feb 2016, Jason Merrill wrote:

On 02/08/2016 11:43 AM, Patrick Palka wrote:
BTW, last month I posted a patch for this PR that handles all kinds 
of

specifiers as well __attribute__ specifiers.
Patch is at: 
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02004.html

-- it makes the parser arbitrarily look ahead (while skipping over
pairs of parens) until it finds a DEREF, a COMMA, a CLOSE_PAREN or 
an

EQ.  If it first finds a DEREF then have_trailing_return_fn_decl is
set.  Dunno if it's better to have this kind of "dumb" lookahead, 
or

to be more explicit about one expects to consume like your followup
patch does.


Hmm, I think I prefer your approach [snip]


Me too.  I was worried that the cases handled in the explicit solution 
might get longer, more complex and repeat large amounts of other parser 
code.  Providing this approach gives us no false positives I would say 
it's superior.


Re: [PATCH, PR69707] Handle -fdiagnostics-color in lto

2016-02-08 Thread Tom de Vries

On 08/02/16 14:54, Jakub Jelinek wrote:

On Mon, Feb 08, 2016 at 02:38:17PM +0100, Tom de Vries wrote:

hmm, indeed removing the 'Driver' flag from the fdiagnostics-color= entry in
common.opt breaks the functioning of fdiagnostics-color= in the gcc driver.

This patch leaves the 'Driver' flag alone, and instead explicitly allows
fdiagnostics-color= in lto_write_options.

Is this approach ok?


Doesn't that mean storing -fdiagnostics-color= into the LTO option section and
then using whatever was recorded from the first TU that recorded anything?


Yes.


That doesn't look right for such diagnostics options either.
I mean, if I
$ gcc -fdiagnostics-color=always -c -flto a.c
on another terminal
$ gcc -fdiagnostics-color=never -c -flto b.c
on yet another terminal
$ gcc -flto -o a a.o b.o
then I'd expect the default setting for -fdiagnostics-color= for all
diagnostics while linking/LTO optimizing it, and for say
$ gcc -fdiagnostics-color=always -flto -o a a.o b.o
to have it always colorized, similarly for never.
IMHO we should just honor what has been specified on the linker command
line if anything.


Agreed.


So, the question is, is -fdiagnostics-color=never passed
in the testsuite just to the compilation and not to linking (that would be
IMHO a testsuite bug),


In libgomp.exp we have:
...
# Disable color diagnostics
lappend ALWAYS_CFLAGS "additional_flags=-fdiagnostics-color=never"
...

The test setup looks ok to me.


or is it passed on the linking command line, but not
passed through to lto1?


Yes.

I've now realized this is specific to mkoffload.

If we're doing lto, in lto-wrapper.c:run_gcc, we have:
...
  append_compiler_options (&argv_obstack, fdecoded_options,
   fdecoded_options_count);
  append_linker_options (&argv_obstack, decoded_options,
 decoded_options_count);
...
And the last line will propagate the fdiagnostics-color setting.

But for calling mkoffload, we just have this:
...
  /* Append options from offload_lto sections.  */
  append_compiler_options (&argv_obstack, compiler_opts,
   compiler_opt_count);
...

Followed by this, which just filters the -foffload options:
...
  /* Append options specified by -foffload last.  In case of
 conflicting options we expect offload compiler to choose
 the latest.  */
  append_offload_options (&argv_obstack, target, compiler_opts,
  compiler_opt_count);
  append_offload_options (&argv_obstack, target, linker_opts,
  linker_opt_count);
...

Attached patch adds the diagnostics flags to mkoffload.

Bootstrapped and reg-tested on x86_64.

OK for trunk, stage1?

Thanks,
- Tom
Handle -fdiagnostics-color in lto

2016-02-08  Tom de Vries  

	PR lto/69707
	* lto-wrapper.c (append_diag_options): New function.
	(compile_offload_image): Call append_diag_options.

	* testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c: New test.

---
 gcc/lto-wrapper.c  | 31 ++
 .../libgomp.oacc-c-c++-common/parallel-dims-2.c| 19 +
 2 files changed, 50 insertions(+)

diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index ced6f2f..8cda1fa 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -542,6 +542,36 @@ append_compiler_options (obstack *argv_obstack, struct cl_decoded_option *opts,
 }
 }
 
+/* Append diag options in OPTS with length COUNT to ARGV_OBSTACK.  */
+
+static void
+append_diag_options (obstack *argv_obstack, struct cl_decoded_option *opts,
+		 unsigned int count)
+{
+  /* Append compiler driver arguments as far as they were merged.  */
+  for (unsigned int j = 1; j < count; ++j)
+{
+  struct cl_decoded_option *option = &opts[j];
+
+  switch (option->opt_index)
+	{
+	case OPT_fdiagnostics_color_:
+	case OPT_fdiagnostics_show_caret:
+	case OPT_fdiagnostics_show_option:
+	case OPT_fdiagnostics_show_location_:
+	case OPT_fshow_column:
+	  break;
+	default:
+	  continue;
+	}
+
+  /* Pass the option on.  */
+  for (unsigned int i = 0; i < option->canonical_option_num_elements; ++i)
+	obstack_ptr_grow (argv_obstack, option->canonical_option[i]);
+}
+}
+
+
 /* Append linker options OPTS to ARGV_OBSTACK.  */
 
 static void
@@ -724,6 +754,7 @@ compile_offload_image (const char *target, const char *compiler_path,
   /* Append options from offload_lto sections.  */
   append_compiler_options (&argv_obstack, compiler_opts,
 			   compiler_opt_count);
+  append_diag_options (&argv_obstack, linker_opts, linker_opt_count);
 
   /* Append options specified by -foffload last.  In case of conflicting
 	 options we expect offload compiler to choose the latest.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c
new file mode 100644
index 000..eea8c7e
--- /dev/null
+++ b/libgomp/testsuite/libgo

Re: [PATCH, gcc7, aarch64] Add arithmetic overflow patterns

2016-02-08 Thread Richard Henderson

On 02/07/2016 10:58 AM, Michael Collison wrote:

Richard,

One other question on the patch. I note that when you expand the addv and uaddv
patterns you emit rtl using gen_add3_compareV and en_add3_compareC
respectively. These patterns use sign_extend and zero_extend respectively. Why
do you not do the same thing for the subv and usubv patterns? The subv patterns
expand into calls to gen_sub3_compare1 which does not emit sign or zero
extends. Why the difference?


The canonical "compare" is a subtraction.  Thus if we want the flag bits off of 
a subtraction, we only need issue a compare.


We only have to do something special if the flag bits aren't the same as a 
subtract.  E.g. for addition, where the carry and overflow flags are 
(logically) reversed from that of subtraction.



r~


Re: [PATCH, PR69599] Fix GOMP/GOACC_parallel optimization in ipa-pta

2016-02-08 Thread Tom de Vries

On 08/02/16 11:54, Richard Biener wrote:

On Mon, 8 Feb 2016, Tom de Vries wrote:


Hi,

when compiling the fipa-pta tests in the libgomp testsuite (omp-nested-2.c,
pr46032.c) with -flto -flto-partition=max, the tests fail in execution
(PR69599).

The problem is related to the GOMP/GOACC_parallel optimization we do in
fipa-pta, where we interpret a call GOMP_parallel (&foo._0, data) as a call
foo._0 (data).

The problem is that this optimization is only legal in lto if both:
- foo containing the call GOMP_parallel (&foo._0, data) and
- foo._0
are contained in the same partition.

In the case of -flto-partition=max, foo is contained in it's own partition,
and foo._0 is contained in another partition.  This means the data argument to
the GOMP_parallel call appears unused, and the setting of the argument is
optimized away, which causes the execution failure.

This patch fixes that by testing if foo and foo._0 are part of the same
partition.

[ Note that the node_address_taken change in the patch has no effect, since
nonlocal_p already tests for used_from_other_partition. But I thought it was
clearer to state the conditions under which we are allowed to ignore
node->address_taken explicitly. ]

Bootstrapped and reg-tested on x86_64.

Build for nvidia accelerator and reg-tested libgomp with various lto settings.

OK for trunk, stage4?


I don't like the in_lto_p checks, why's the check not working
for non-LTO?



I was not sure if the partition flags were valid outside lto.

Updated patch removes the in_lto_p checks.

Bootstrapped on x86_64.

Build and reg-tested libgomp testsuite.

OK?

Thanks,
- Tom

Thanks,
- Tom

Fix GOMP/GOACC_parallel optimization in ipa-pta

2016-02-08  Tom de Vries  

	PR tree-optimization/69599
	* tree-ssa-structalias.c (fndecl_maybe_in_other_partition): New
	function.
	(find_func_aliases_for_builtin_call, find_func_clobbers)
	(ipa_pta_execute):  Handle case that foo and foo._0 are not in same lto
	partition.

	* testsuite/libgomp.c/omp-nested-3.c: New test.
	* testsuite/libgomp.c/pr46032-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/kernels-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/parallel-2.c: New test.

---
 gcc/tree-ssa-structalias.c | 49 ++
 libgomp/testsuite/libgomp.c/omp-nested-3.c |  4 ++
 libgomp/testsuite/libgomp.c/pr46032-2.c|  4 ++
 .../libgomp.oacc-c-c++-common/kernels-2.c  |  4 ++
 .../libgomp.oacc-c-c++-common/parallel-2.c |  4 ++
 5 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index e7d0797..d7a7dc5 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4162,6 +4162,18 @@ find_func_aliases_for_call_arg (varinfo_t fi, unsigned index, tree arg)
 process_constraint (new_constraint (lhs, *rhsp));
 }
 
+/* Return true if FNDECL may be part of another lto partition.  */
+
+static bool
+fndecl_maybe_in_other_partition (tree fndecl)
+{
+  cgraph_node *fn_node = cgraph_node::get (fndecl);
+  if (fn_node == NULL)
+return true;
+
+  return fn_node->in_other_partition;
+}
+
 /* Create constraints for the builtin call T.  Return true if the call
was handled, otherwise false.  */
 
@@ -4537,6 +4549,10 @@ find_func_aliases_for_builtin_call (struct function *fn, gcall *t)
 	  tree fnarg = gimple_call_arg (t, fnpos);
 	  gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR);
 	  tree fndecl = TREE_OPERAND (fnarg, 0);
+	  if (fndecl_maybe_in_other_partition (fndecl))
+		/* Fallthru to general call handling.  */
+		break;
+
 	  tree arg = gimple_call_arg (t, argpos);
 
 	  varinfo_t fi = get_vi_for_tree (fndecl);
@@ -5113,6 +5129,10 @@ find_func_clobbers (struct function *fn, gimple *origt)
 	  tree fnarg = gimple_call_arg (t, fnpos);
 	  gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR);
 	  tree fndecl = TREE_OPERAND (fnarg, 0);
+	  if (fndecl_maybe_in_other_partition (fndecl))
+		/* Fallthru to general call handling.  */
+		break;
+
 	  varinfo_t cfi = get_vi_for_tree (fndecl);
 
 	  tree arg = gimple_call_arg (t, argpos);
@@ -7505,9 +7525,13 @@ ipa_pta_execute (void)
 	 address_taken bit for function foo._0, which would make it non-local.
 	 But for the purpose of ipa-pta, we can regard the run_on_threads call
 	 as a local call foo._0 (data),  so we ignore address_taken on nodes
-	 with parallelized_function set.  */
-  bool node_address_taken = (node->address_taken
- && !node->parallelized_function);
+	 with parallelized_function set.
+	 Note: this is only safe, if foo and foo._0 are in the same lto
+	 partition.  */
+  bool node_address_taken = ((node->parallelized_function
+  && !node->used_from_other_partition)
+ ? false
+ : node->address_taken);
 
   /* For externally visible or attribute used annotated functions use
 	 local constraints for their arguments.
@@ -7676,12 +7700,19 @@ ipa_pta_execute (void)
 		continu

Re: [PATCH] Make C++ use the enum mode attribute

2016-02-08 Thread Mike Stump
On Feb 6, 2016, at 2:37 PM, Trevor Saunders  wrote:
> it is allowed if you do something like
> 
> enum X : int;
> 
> but it seems really pointless to support setting the size of something
> when the language gives you a standard way to do that.

Yes, it is, as long as you never use header files, and multiple languages.  :-) 
 Unfortunately, those are vended features of C and C++.  Once you have a C 
project that you want to be able to be used from C++, it then is handy to be 
able to do this.



Re: [PATCH, PR69707] Handle -fdiagnostics-color in lto

2016-02-08 Thread Jakub Jelinek
On Mon, Feb 08, 2016 at 10:04:48PM +0100, Tom de Vries wrote:
> Attached patch adds the diagnostics flags to mkoffload.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk, stage1?

Ok for stage4.

> Handle -fdiagnostics-color in lto
> 
> 2016-02-08  Tom de Vries  
> 
>   PR lto/69707
>   * lto-wrapper.c (append_diag_options): New function.
>   (compile_offload_image): Call append_diag_options.
> 
>   * testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c: New test.

Jakub


C++ PATCH for c++/69657 (abs not inlined)

2016-02-08 Thread Jason Merrill
The issue in this bug was that due to changes in the libstdc++ headers, 
the built-in abs declaration was getting hidden by a using-declaration, 
so that then when the built-in got an explicit declaration, the original 
declaration wasn't there anymore and so the new declaration didn't get 
marked as built-in.


Fixed by overloading an anticipated built-in rather than clobbering it 
when we see a using-declaration of the same name.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 8f47c80051a73a75fb44900d9edb1444142c3d5f
Author: Jason Merrill 
Date:   Mon Feb 8 16:39:42 2016 -0500

	PR c++/69657
	* name-lookup.c (do_nonmember_using_decl): Leave anticipated
	built-ins alone.

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 92d99aa..227d6f2 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -2583,15 +2583,6 @@ do_nonmember_using_decl (tree scope, tree name, tree oldval, tree oldtype,
   decls.value = NULL_TREE;
 }
 
-  /* It is impossible to overload a built-in function; any explicit
- declaration eliminates the built-in declaration.  So, if OLDVAL
- is a built-in, then we can just pretend it isn't there.  */
-  if (oldval
-  && TREE_CODE (oldval) == FUNCTION_DECL
-  && DECL_ANTICIPATED (oldval)
-  && !DECL_HIDDEN_FRIEND_P (oldval))
-oldval = NULL_TREE;
-
   if (decls.value)
 {
   /* Check for using functions.  */
@@ -2610,6 +2601,10 @@ do_nonmember_using_decl (tree scope, tree name, tree oldval, tree oldtype,
 	{
 	  tree new_fn = OVL_CURRENT (tmp);
 
+	  /* Don't import functions that haven't been declared.  */
+	  if (DECL_ANTICIPATED (new_fn))
+		continue;
+
 	  /* [namespace.udecl]
 
 		 If a function declaration in namespace scope or block
@@ -2627,13 +2622,13 @@ do_nonmember_using_decl (tree scope, tree name, tree oldval, tree oldtype,
 		continue; /* this is a using decl */
 		  else if (compparms_for_decl_and_using_decl (new_fn, old_fn))
 		{
-		  gcc_assert (!DECL_ANTICIPATED (old_fn)
-  || DECL_HIDDEN_FRIEND_P (old_fn));
-
 		  /* There was already a non-using declaration in
 			 this scope with the same parameter types. If both
 			 are the same extern "C" functions, that's ok.  */
-		  if (decls_match (new_fn, old_fn))
+		  if (DECL_ANTICIPATED (old_fn)
+			  && !DECL_HIDDEN_FRIEND_P (old_fn))
+			/* Ignore anticipated built-ins.  */;
+		  else if (decls_match (new_fn, old_fn))
 			break;
 		  else
 			{
@@ -2669,6 +2664,14 @@ do_nonmember_using_decl (tree scope, tree name, tree oldval, tree oldtype,
 	}
   else
 	{
+	  /* If we're declaring a non-function and OLDVAL is an anticipated
+	 built-in, just pretend it isn't there.  */
+	  if (oldval
+	  && TREE_CODE (oldval) == FUNCTION_DECL
+	  && DECL_ANTICIPATED (oldval)
+	  && !DECL_HIDDEN_FRIEND_P (oldval))
+	oldval = NULL_TREE;
+
 	  *newval = decls.value;
 	  if (oldval && !decls_match (*newval, oldval))
 	error ("%qD is already declared in this scope", name);
diff --git a/gcc/testsuite/g++.dg/lookup/builtin6.C b/gcc/testsuite/g++.dg/lookup/builtin6.C
new file mode 100644
index 000..456ade7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lookup/builtin6.C
@@ -0,0 +1,23 @@
+// PR c++/69657
+// { dg-options -fdump-tree-gimple }
+// { dg-final { scan-tree-dump "ABS_EXPR" "gimple" } }
+
+namespace foo
+{
+  inline double
+  abs(double x)
+  { return __builtin_fabs(x); }
+}
+using foo::abs;
+
+extern "C" int abs(int);
+
+namespace bar {
+  using ::abs;
+}
+
+int
+wrapper (int x)
+{
+  return bar::abs (x) + bar::abs(x);
+}


Re: [PATCH] PR c++/69139

2016-02-08 Thread Jason Merrill

OK.

Jason


  1   2   >