Re: [1/2] OpenACC routine support

2015-12-03 Thread Thomas Schwinge
Hi Cesar!

On Wed, 2 Dec 2015 15:37:17 -0800, Cesar Philippidis 
 wrote:
> On 12/01/2015 06:49 AM, Cesar Philippidis wrote:
> > On 12/01/2015 06:40 AM, Thomas Schwinge wrote:
> > 
> >> I noticed while working on other test cases:
> >>
> >> On Wed, 18 Nov 2015 11:02:01 -0800, Cesar Philippidis 
> >>  wrote:
> >>> --- a/gcc/cp/parser.c
> >>> +++ b/gcc/cp/parser.c
> >>
> >>> @@ -1318,13 +1318,21 @@ cp_finalize_omp_declare_simd (cp_parser *parser, 
> >>> tree fndecl)
> >>>  }
> >>>  }
> >>>  
> >>> -/* Diagnose if #pragma omp routine isn't followed immediately
> >>> -   by function declaration or definition.   */
> >>> +/* Diagnose if #pragma acc routine isn't followed immediately by function
> >>> +   declaration or definition.  */
> >>>  
> >>>  static inline void
> >>>  cp_ensure_no_oacc_routine (cp_parser *parser)
> >>>  {
> >>> -  cp_finalize_oacc_routine (parser, NULL_TREE, false, true);
> >>> +  if (parser->oacc_routine && !parser->oacc_routine->error_seen)
> >>> +{
> >>> +  tree clauses = parser->oacc_routine->clauses;
> >>> +  location_t loc = OMP_CLAUSE_LOCATION (TREE_PURPOSE(clauses));
> >>> +
> >>> +  error_at (loc, "%<#pragma oacc routine%> not followed by function "
> >>> + "declaration or definition");
> >>> +  parser->oacc_routine = NULL;
> >>> +}
> >>>  }

> >> Next, in the function quoted above, you use "not followed by function
> >> declaration or definition", but you use "not followed by a single
> >> function declaration or definition" in a lot of (but not all) other
> >> places -- is that intentional?
> > 
> > I probably wasn't being consistent. Which error message do you prefer?
> > I'll take a look at what the c front end does.
> > 
> >> For example: [...]

> >> (I have not verified all of the parser(s) source code.)
> > 
> > Thanks. I'll go through and update the comments and error messages.
> 
> Here's the updated patch.

ENOPATCH.

> The test cases were written in a way such that
> none of them needed to be updated with these changes.

... which potentially means they'd match for all kinds of "random"
diagnostics.  ;-)

> I'm tempted to commit this as obvious, but I want to make sure you're ok
> with these new messages.

I don't care very much, as long as it's understandable for a user.  I
just tripped over this because of mismatches between C and C++ as well as
different C++ diagnostic variants.

> The major change is to report these errors as
> "pragma acc routine not followed by a function declaration or
> definition". I think that's more descriptive then "not followed by a
> single function". That said, it looks like the c front end uses the
> latter error message.

(In the C front end, the "a" is missing: "not followed by single
function"; that should be fixed up as well.)

> Is this OK or do you prefer the "not followed by a single function" message?

"not followed by a function declaration or definition" sounds good to me.


Grüße
 Thomas


signature.asc
Description: PGP signature


Fix buildbreaker with isl 0.14

2015-12-03 Thread Tom de Vries

[ was: Re: [PATCH] [graphite] handle missing isl_ast_expr ]

On 03/12/15 00:56, Tom de Vries wrote:

Hi,

This break the build for me, with isl 0.14.

...
src/gcc/graphite-isl-ast-to-gimple.c: In member function ‘tree_node*
translate_isl_ast_to_gimple::binary_op_to_tree(tree, isl_ast_expr*,
ivs_params&)’:
src/gcc/graphite-isl-ast-to-gimple.c:591:10: error: ‘isl_ast_op_zdiv_r’
was not declared in this scope
  case isl_ast_op_zdiv_r:
   ^
...

Thanks,
- Tom

On 02/12/15 23:17, Sebastian Pop wrote:

 From ISL's documentation, isl_ast_op_zdiv_r is equal to zero iff the
remainder
on integer division is zero.  Code generate a modulo operation for that.

* graphite-isl-ast-to-gimple.c (binary_op_to_tree): Handle
isl_ast_op_zdiv_r.
 (gcc_expression_from_isl_expr_op): Same.

* gcc.dg/graphite/id-28.c: New.


this patch fixes the build breaker with isl 0.14 for me. I'm using the 
HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS macro (which is set for isl 
0.15, and not before) to guard the code handling isl_ast_op_zdiv_r 
(which I suppose is new in isl 0.15).


OK for trunk?

Thanks,
- Tom
Guard isl_ast_op_zdiv_r usage with HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS

2015-12-03  Tom de Vries  

	* graphite-isl-ast-to-gimple.c (binary_op_to_tree)
	(gcc_expression_from_isl_expr_op): Guard isl_ast_op_zdiv_r usage with
	HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS.

---
 gcc/graphite-isl-ast-to-gimple.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 06a2062..20eb80f 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -588,7 +588,9 @@ binary_op_to_tree (tree type, __isl_take isl_ast_expr *expr, ivs_params &ip)
 	}
   return fold_build2 (TRUNC_DIV_EXPR, type, tree_lhs_expr, tree_rhs_expr);
 
+#if HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
 case isl_ast_op_zdiv_r:
+#endif
 case isl_ast_op_pdiv_r:
   /* As ISL operates on arbitrary precision numbers, we may end up with
 	 division by 2^64 that is folded to 0.  */
@@ -759,7 +761,9 @@ gcc_expression_from_isl_expr_op (tree type, __isl_take isl_ast_expr *expr,
 case isl_ast_op_pdiv_q:
 case isl_ast_op_pdiv_r:
 case isl_ast_op_fdiv_q:
+#if HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
 case isl_ast_op_zdiv_r:
+#endif
 case isl_ast_op_and:
 case isl_ast_op_or:
 case isl_ast_op_eq:


Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT

2015-12-03 Thread Thomas Schwinge
Hi Chung-Lin!

On Mon, 23 Nov 2015 21:15:00 +0800, Chung-Lin Tang  
wrote:
> The OpenACC wait directive is represented as a call to the runtime
> function "GOACC_wait" instead of a tree code.  I am seeing when
> '#pragma acc wait' is using inside a template function, the CALL_EXPR
> to GOACC_wait is being silently ignored/removed during tsubst_expr().

Uh.

> I think the correct way to organize this is that the call should be inside
> an EXPR_STMT, so here's a patch to do that; basically remove the
> add_stmt() call from the shared c_finish_oacc_wait() code, and add
> add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts.
> 
> Tested with no regressions on trunk, okay to commit?

> --- c-family/c-omp.c  (revision 230703)
> +++ c-family/c-omp.c  (working copy)
> @@ -63,7 +63,6 @@ c_finish_oacc_wait (location_t loc, tree parms, tr
>  }
>  
>stmt = build_call_expr_loc_vec (loc, stmt, args);
> -  add_stmt (stmt);
>  
>vec_free (args);
|  
|return stmt;
|  }

I see in gcc/c/c-omp.c that several other c_finish_omp_* functions that
build builtin calls instead of tree nodes, do similar things like
c_finish_oacc_wait; I'd like to understand why it's -- presumably -- not
a problem for these: c_finish_omp_barrier, c_finish_omp_taskwait,
c_finish_omp_taskyield, c_finish_omp_flush?  (Jakub?)

> --- c/c-parser.c  (revision 230703)
> +++ c/c-parser.c  (working copy)
> @@ -13886,6 +13886,7 @@ c_parser_oacc_wait (location_t loc, c_parser *pars
>strcpy (p_name, " wait");
>clauses = c_parser_oacc_all_clauses (parser, OACC_WAIT_CLAUSE_MASK, 
> p_name);
>stmt = c_finish_oacc_wait (loc, list, clauses);
> +  add_stmt (stmt);
>  
>return stmt;
>  }
> --- cp/parser.c   (revision 230703)
> +++ cp/parser.c   (working copy)
> @@ -34930,6 +34930,7 @@ cp_parser_oacc_wait (cp_parser *parser, cp_token *
>   "#pragma acc wait", pragma_tok);
>  
>stmt = c_finish_oacc_wait (loc, list, clauses);
> +  stmt = finish_expr_stmt (stmt);
>  
>return stmt;
>  }


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT

2015-12-03 Thread Thomas Schwinge
Hi!

On Thu, 03 Dec 2015 09:51:31 +0100, I wrote:
> On Mon, 23 Nov 2015 21:15:00 +0800, Chung-Lin Tang  
> wrote:
> > The OpenACC wait directive is represented as a call to the runtime
> > function "GOACC_wait" instead of a tree code.  I am seeing when
> > '#pragma acc wait' is using inside a template function, the CALL_EXPR
> > to GOACC_wait is being silently ignored/removed during tsubst_expr().
> 
> Uh.
> 
> > I think the correct way to organize this is that the call should be inside
> > an EXPR_STMT, so here's a patch to do that; basically remove the
> > add_stmt() call from the shared c_finish_oacc_wait() code, and add
> > add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts.
> > 
> > Tested with no regressions on trunk, okay to commit?
> 
> > --- c-family/c-omp.c(revision 230703)
> > +++ c-family/c-omp.c(working copy)
> > @@ -63,7 +63,6 @@ c_finish_oacc_wait (location_t loc, tree parms, tr
> >  }
> >  
> >stmt = build_call_expr_loc_vec (loc, stmt, args);
> > -  add_stmt (stmt);
> >  
> >vec_free (args);
> |  
> |return stmt;
> |  }
> 
> I see in gcc/c/c-omp.c that several other c_finish_omp_* functions that
> build builtin calls instead of tree nodes, do similar things like
> c_finish_oacc_wait; I'd like to understand why it's -- presumably -- not
> a problem for these: c_finish_omp_barrier, c_finish_omp_taskwait,
> c_finish_omp_taskyield, c_finish_omp_flush?  (Jakub?)

Oh wait, it looks like the C++ front end is not actually using the
functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its
own implementations in gcc/cp/semantics.c, without "c_" prefixes?  In
addition to finish_expr_stmt calls, I see it's also using
finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec.
So I guess we'll want to model this the same way for OpenACC support
functions, and then (later) we should clean this up, to move the
C-specific code from gcc/c-family/c-omp.c into the C front end?  (Jakub?)

> > --- c/c-parser.c(revision 230703)
> > +++ c/c-parser.c(working copy)
> > @@ -13886,6 +13886,7 @@ c_parser_oacc_wait (location_t loc, c_parser *pars
> >strcpy (p_name, " wait");
> >clauses = c_parser_oacc_all_clauses (parser, OACC_WAIT_CLAUSE_MASK, 
> > p_name);
> >stmt = c_finish_oacc_wait (loc, list, clauses);
> > +  add_stmt (stmt);
> >  
> >return stmt;
> >  }
> > --- cp/parser.c (revision 230703)
> > +++ cp/parser.c (working copy)
> > @@ -34930,6 +34930,7 @@ cp_parser_oacc_wait (cp_parser *parser, cp_token *
> > "#pragma acc wait", pragma_tok);
> >  
> >stmt = c_finish_oacc_wait (loc, list, clauses);
> > +  stmt = finish_expr_stmt (stmt);
> >  
> >return stmt;
> >  }


Grüße
 Thomas



signature.asc
Description: PGP signature


Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-03 Thread Richard Biener
On Thu, 3 Dec 2015, Tom de Vries wrote:

> On 03/12/15 01:10, Tom de Vries wrote:
> > 
> > I've managed to reproduce it. The difference between pass and fail is
> > whether the compiler is configured with or without accelerator.
> > 
> > I'll look into it.
> 
> In the configuration with accelerator, the flag node->force_output is on for
> foo._omp.fn.
> 
> This causes nonlocal_p to be true in ipa_pta_execute, which causes the
> optimization to fail.
> 
> The flag is decribed as:
> ...
>   /* The symbol will be assumed to be used in an invisible way (like
>  by an toplevel asm statement).  */
>  ...
> 
> Looks like I have to ignore the force_output flag as well in ipa_pta_execute
> for this sort of node.

It rather looks like the flag shouldn't be set.  The fn after all has
its address taken!(?)

Richard.


Add an rsqrt_optab and IFN_RSQRT internal function

2015-12-03 Thread Richard Sandiford
All current uses of builtin_reciprocal convert 1.0/sqrt into rsqrt.
This patch adds an rsqrt optab and associated internal function for
that instead.  We can then pick up the vector forms of rsqrt automatically,
fixing an AArch64 regression from my internal_fn patches.

With that change, builtin_reciprocal only needs to handle target-specific
built-in functions.  I've restricted the hook to those since, if we need
a reciprocal of another standard function later, I think there should be
a strong preference for adding a new optab and internal function for it,
rather than hiding the code in a backend.

Three targets implement builtin_reciprocal: aarch64, i386 and rs6000.
i386 and rs6000 already used the obvious rsqrt2 pattern names
for the instructions, so they pick up the new code automatically.
aarch64 needs a slight rename.

mn10300 is unusual in that its native operation is rsqrt, and
sqrt is approximated as 1.0/rsqrt.  The port also uses rsqrt2
for the rsqrt pattern, so after the patch we now pick it up as a native
operation.

Two other ports define rsqrt patterns: sh and v850.  AFAICT these
patterns aren't currently used, but I think the patch does what the
authors of the patterns would have expected.  There's obviously some
risk of fallout though.

Tested on x86_64-linux-gnu, aarch64-linux-gnu, arm-linux-gnueabihf
(as a target without the hooks) and powerpc64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* internal-fn.def (RSQRT): New function.
* optabs.def (rsqrt_optab): New optab.
* doc/tm.texi (rsqrtM2): Document
* target.def (builtin_reciprocal): Replace gcall argument with
a function decl.  Restrict hook to machine functions.
* doc/tm.texi: Regenerate.
* targhooks.h (default_builtin_reciprocal): Update prototype.
* targhooks.c (default_builtin_reciprocal): Likewise.
* tree-ssa-math-opts.c: Include internal-fn.h.
(internal_fn_reciprocal): New function.
(pass_cse_reciprocals::execute): Call it, and build a call to an
internal function on success.  Only call targetm.builtin_reciprocal
for machine functions.
* config/aarch64/aarch64-protos.h (aarch64_builtin_rsqrt): Remove
second argument.
* config/aarch64/aarch64-builtins.c (aarch64_expand_builtin_rsqrt):
Rename aarch64_rsqrt_2 to rsqrt2.
(aarch64_builtin_rsqrt): Remove md_fn argument and only handle
machine functions.
* config/aarch64/aarch64.c (use_rsqrt_p): New function.
(aarch64_builtin_reciprocal): Replace gcall argument with a
function decl.  Use use_rsqrt_p.  Remove optimize_size check.
Only handle machine functions.  Update call to aarch64_builtin_rsqrt.
(aarch64_optab_supported_p): New function.
(TARGET_OPTAB_SUPPORTED_P): Define.
* config/aarch64/aarch64-simd.md (aarch64_rsqrt_2): Rename to...
(rsqrt2): ...this.
* config/i386/i386.c (use_rsqrt_p): New function.
(ix86_builtin_reciprocal): Replace gcall argument with a
function decl.  Use use_rsqrt_p.  Remove optimize_insn_for_size_p
check.  Only handle machine functions.
(ix86_optab_supported_p): Handle rsqrt_optab.
* config/rs6000/rs6000.c (TARGET_OPTAB_SUPPORTED_P): Define.
(rs6000_builtin_reciprocal): Replace gcall argument with a
function decl.  Remove optimize_insn_for_size_p check.
Only handle machine functions.
(rs6000_optab_supported_p): New function.

Index: gcc/internal-fn.def
===
--- gcc/internal-fn.def 2015-12-03 09:16:57.0 +
+++ gcc/internal-fn.def 2015-12-03 09:17:00.811513362 +
@@ -91,6 +91,8 @@ DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_C
 DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
 DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
 
+DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary)
+
 /* Unary math functions.  */
 DEF_INTERNAL_FLT_FN (ACOS, ECF_CONST, acos, unary)
 DEF_INTERNAL_FLT_FN (ASIN, ECF_CONST, asin, unary)
Index: gcc/optabs.def
===
--- gcc/optabs.def  2015-12-03 09:16:57.0 +
+++ gcc/optabs.def  2015-12-03 09:17:00.811513362 +
@@ -267,6 +267,7 @@ OPTAB_D (log_optab, "log$a2")
 OPTAB_D (logb_optab, "logb$a2")
 OPTAB_D (pow_optab, "pow$a3")
 OPTAB_D (remainder_optab, "remainder$a3")
+OPTAB_D (rsqrt_optab, "rsqrt$a2")
 OPTAB_D (scalb_optab, "scalb$a3")
 OPTAB_D (signbit_optab, "signbit$F$a2")
 OPTAB_D (significand_optab, "significand$a2")
Index: gcc/doc/md.texi
===
--- gcc/doc/md.texi 2015-12-03 09:16:57.0 +
+++ gcc/doc/md.texi 2015-12-03 09:17:00.811513362 +
@@ -5331,6 +5331,18 @@ corresponds to the C data type @code{dou
 built-in function uses the mode which corresponds to th

Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-03 Thread Janne Blomqvist
On Tue, Dec 1, 2015 at 7:51 PM, Bernhard Reutner-Fischer
 wrote:
> As said, we could as well use a list of candidates with NULL as record marker.
> Implementation cosmetics. Steve seems to not be thrilled by the
> overall idea in the first place, so unless there is clear support by
> somebody else i won't pursue this any further, it's not that i'm bored
> or ran out of stuff i should do.. ;)

FWIW, I think the idea of this patch is quite nice, and I'd like to
see it in the compiler.

I'm personally Ok with "C++-isms", but nowadays my contributions are
so minor that my opinion shouldn't carry that much weight on this
matter.


-- 
Janne Blomqvist


[PATCH][RTL-ifcvt] PR rtl-optimization/68624: Clean up logic that checks for clobbering conflicts across basic blocks

2015-12-03 Thread Kyrill Tkachov

Hi all,

In this fix I want to simplify the control flow of the code that chooses the 
order in which to emit
the then and else basic blocks (and their associated emit_a and emit_b 
instructions).
Currently we check the then block and only if there is a modification there we 
check the else block
and make a decision there. IMO it's much simpler if we check both blocks and 
write the logic that
chooses the order as a simple IF-ELSEIF-ELSE block that only emits the blocks 
and doesn't try to do
any other checks.  The bug in the logic that was preventing the clobber check 
from being performed
in this PR was in the code:
  if (emit_a || modified_in_a)
{
  modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a, emit_b);
  if (tmp_b && else_bb)
{
  FOR_BB_INSNS (else_bb, tmp_insn)

where the second if condition should have been:
  if (tmp_a && else_bb)

Just changing the tmp_b to tmp_a in that condition would have fixed the 
wrong-code part of this PR
as we would have ended up rejecting if-conversion. However, there is a valid 
if-conversion opportunity
here, we just have to emit emit_a followed by else_bb, which the current 
control flow made awkward, which
is why I'm suggesting this small rewrite.

Bootstrapped and tested on x86_64, aarch64, arm.

Ok for trunk?
Thanks,
Kyrill

2015-12-03  Kyrylo Tkachov  

PR rtl-optimization/68624
* ifcvt.c (noce_try_cmove_arith): Check clobbers of temp regs in both
blocks if they exist and simplify the logic choosing the order to emit
them in.

2015-12-03  Kyrylo Tkachov  

PR rtl-optimization/68624
* gcc.c-torture/execute/pr68624.c: New test.
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 86b6ef7246ceddd223e93922737496af3d93f148..ef23c4cda66e6a659eee9b30089a6cc056cea30f 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2202,10 +2202,6 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	}
 }
 
-/* If insn to set up A clobbers any registers B depends on, try to
-   swap insn that sets up A with the one that sets up B.  If even
-   that doesn't help, punt.  */
-
   modified_in_a = emit_a != NULL_RTX && modified_in_p (orig_b, emit_a);
   if (tmp_b && then_bb)
 {
@@ -2220,31 +2216,33 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	  }
 
 }
-  if (emit_a || modified_in_a)
+
+  modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a, emit_b);
+  if (tmp_a && else_bb)
 {
-  modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a, emit_b);
-  if (tmp_b && else_bb)
+  FOR_BB_INSNS (else_bb, tmp_insn)
+  /* Don't check inside insn_b.  We will have changed it to emit_b
+	 with a destination that doesn't conflict.  */
+  if (!(insn_b && tmp_insn == insn_b)
+	  && modified_in_p (orig_a, tmp_insn))
 	{
-	  FOR_BB_INSNS (else_bb, tmp_insn)
-	  /* Don't check inside insn_b.  We will have changed it to emit_b
-	 with a destination that doesn't conflict.  */
-	  if (!(insn_b && tmp_insn == insn_b)
-	  && modified_in_p (orig_a, tmp_insn))
-	{
-	  modified_in_b = true;
-	  break;
-	}
+	  modified_in_b = true;
+	  break;
 	}
-  if (modified_in_b)
-	goto end_seq_and_fail;
+}
 
+  /* If insn to set up A clobbers any registers B depends on, try to
+ swap insn that sets up A with the one that sets up B.  If even
+ that doesn't help, punt.  */
+  if (modified_in_a && !modified_in_b)
+{
   if (!noce_emit_bb (emit_b, else_bb, b_simple))
 	goto end_seq_and_fail;
 
   if (!noce_emit_bb (emit_a, then_bb, a_simple))
 	goto end_seq_and_fail;
 }
-  else
+  else if (!modified_in_a)
 {
   if (!noce_emit_bb (emit_a, then_bb, a_simple))
 	goto end_seq_and_fail;
@@ -2252,6 +2250,8 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   if (!noce_emit_bb (emit_b, else_bb, b_simple))
 	goto end_seq_and_fail;
 }
+  else
+goto end_seq_and_fail;
 
   target = noce_emit_cmove (if_info, x, code, XEXP (if_info->cond, 0),
 			XEXP (if_info->cond, 1), a, b);
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68624.c b/gcc/testsuite/gcc.c-torture/execute/pr68624.c
new file mode 100644
index ..abb716b1550038cb3d0e96e8917b7ed0ba8bfa83
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr68624.c
@@ -0,0 +1,30 @@
+int b, c, d, e = 1, f, g, h, j;
+
+static int
+fn1 ()
+{
+  int a = c;
+  if (h)
+return 9;
+  g = (c || b) % e;
+  if ((g || f) && b)
+return 9;
+  e = d;
+  for (c = 0; c > -4; c--)
+;
+  if (d)
+c--;
+  j = c;
+  return d;
+}
+
+int
+main ()
+{
+  fn1 ();
+
+  if (c != -4)
+__builtin_abort ();
+
+  return 0;
+}


Re: [RFA] [PR tree-optimization/68599] Avoid over-zealous optimization with -funsafe-loop-optimizations

2015-12-03 Thread Richard Biener
On Wed, Dec 2, 2015 at 5:27 PM, Jeff Law  wrote:
>
>
> I strongly recommend reading the analysis in pr45122 since pr68599 uses the
> same testcase and just triggers the same bug in the RTL optimizers instead
> of the tree optimziers.
>
> As noted in 45122, with -funsafe-loop-optimizations, we may exit the loop an
> iteration too early.  The loop in question is finite and the counter does
> not overflow.  Yet -funsafe-loop-optimizations munges it badly.
>
> As is noted in c#6 and patched in c#8, when there's more than one exit from
> the loop, simply discarding the assumptions for the trip count is "a bit too
> unsafe".  Richi & Zdenek agreed that disabling the optimization when the
> loop has > 1 exit was the preferred approach. Alex's patch did just that,
> but only for the tree optimizers.
>
> This patch does essentially the same thing for the RTL loop optimizer. If
> the candidate loop has > 1 exit, then we don't allow
> -funsafe-loop-optimizations to drop the assumptions/infinite notes for the
> RTL loop.
>
> This required ensuring that LOOPS_HAVE_RECORDED_EXITS when initializing the
> loop optimizer.
>
> Bootstrapped and regression tested on x86_64-linux-gnu and
> powerpc64-linux-gnu.  For the latter, pr45122.c flips to a pass.  Given this
> is covered by the pr45122 testcase, I didn't add a new one.
>
> OK for the trunk?

Ok.

Note that I believe we should dump -funsafe-loop-optimizations in
favor of a per-loop
#pragma now that we can properly track such.  Globally it's known to miscompile
SPEC at least.

Thanks,
Richard.

> Jeff
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 9a78b7a..ed677ec 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,12 @@
> +2015-12-02  Jeff Law  
> +
> +   PR tree-optimization/68599
> +   * loop-init.c (rtl_loop_init): Set LOOPS_HAVE_RECORDED_EXITS
> +   in call to loop_optimizer_init.
> +   * loop-iv.c (get_simple_loop_desc): Only allow unsafe loop
> +   optimization to drop the assumptions/infinite notations if
> +   the loop has a single exit.
> +
>  2015-12-01  Andreas Tobler  
>
> * config/rs6000/freebsd64.h (ELFv2_ABI_CHECK): Add new macro.
> diff --git a/gcc/loop-init.c b/gcc/loop-init.c
> index e32c94a..120316d 100644
> --- a/gcc/loop-init.c
> +++ b/gcc/loop-init.c
> @@ -395,7 +395,7 @@ rtl_loop_init (void)
>dump_flow_info (dump_file, dump_flags);
>  }
>
> -  loop_optimizer_init (LOOPS_NORMAL);
> +  loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS);
>return 0;
>  }
>
> diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c
> index c7d5164..dfa3ca3 100644
> --- a/gcc/loop-iv.c
> +++ b/gcc/loop-iv.c
> @@ -3054,7 +3054,7 @@ get_simple_loop_desc (struct loop *loop)
> }
> }
>
> -  if (flag_unsafe_loop_optimizations)
> +  if (flag_unsafe_loop_optimizations && single_exit (loop))
> {
>   desc->assumptions = NULL_RTX;
>   desc->infinite = NULL_RTX;
>


Re: Add an rsqrt_optab and IFN_RSQRT internal function

2015-12-03 Thread Jakub Jelinek
On Thu, Dec 03, 2015 at 09:21:03AM +, Richard Sandiford wrote:
>   * internal-fn.def (RSQRT): New function.
>   * optabs.def (rsqrt_optab): New optab.
>   * doc/tm.texi (rsqrtM2): Document

Missing full stop.

Otherwise looks to me like a nice cleanup and hopefully fixes the aarch64
regression.

Jakub


Re: [PATCH] Derive interface buffers from max name length

2015-12-03 Thread Janne Blomqvist
On Tue, Dec 1, 2015 at 6:51 PM, Bernhard Reutner-Fischer
 wrote:
> On 1 December 2015 at 15:52, Janne Blomqvist  
> wrote:
>> On Tue, Dec 1, 2015 at 2:54 PM, Bernhard Reutner-Fischer
>>  wrote:
>>> These three function used a hardcoded buffer of 100 but would be better
>>> off to base off GFC_MAX_SYMBOL_LEN which denotes the maximum length of a
>>> name in any of our supported standards (63 as of f2003 ff.).
>>
>> Please use xasprintf() instead (and free the result, or course). One
>> of my backburner projects is to get rid of these static symbol
>> buffers, and use dynamic buffers (or the symbol table) instead. We
>> IIRC already have some ugly hacks by using hashing to get around
>> GFC_MAX_SYMBOL_LEN when handling mangled symbols. Your patch doesn't
>> make the situation worse per se, but if you're going to fix it, lets
>> do it properly.
>
> I see.
>
> /scratch/src/gcc-6.0.mine/gcc/fortran$ git grep
> "^[[:space:]]*char[[:space:]][[:space:]]*[^[;[:space:]]*\[" | wc -l
> 142
> /scratch/src/gcc-6.0.mine/gcc/fortran$ git grep "xasprintf" | wc -l
> 32

Yes, that's why it's on the TODO-list rather than on the DONE-list. :)

> What about memory fragmentation when switching to heap-based allocation?
> Or is there consensus that these are in the noise compared to other
> parts of the compiler?

Heap fragmentation is an issue, yes. I'm not sure it's that
performance-critical, but I don't think there is any consensus. I just
want to avoid ugly hacks like symbol hashing to fit within some fixed
buffer. Perhaps an good compromise would be something like std::string
with small string optimization, but as you have seen there is some
resistance to C++. But this is more relevant for mangled symbols, so
GFC_MAX_MANGLED_SYMBOL_LEN is more relevant here, and there's only a
few of them left. So, well, if you're sure that mangled symbols are
never copied into the buffers your patch modifies, please consider
your original patch Ok as well. Whichever you prefer.

Performance-wise I think a bigger benefit would be to use the symbol
table more and then e.g. be able to do pointer comparisons rather than
strcmp(). But that is certainly much more work.

> BTW:
> $ git grep APO
> io.c:  static const char *delim[] = { "APOSTROPHE", "QUOTE", "NONE", NULL };
> io.c:  static const char *delim[] = { "APOSTROPHE", "QUOTE", "NONE", NULL };

? What are you saying?



-- 
Janne Blomqvist


Re: [PATCH 2/2] [graphite] fix invalid bounds on array refs

2015-12-03 Thread Richard Biener
On Wed, Dec 2, 2015 at 10:36 PM, Sebastian Paul Pop  wrote:
> Do you recommend that we add a gcc_assert that min is always lower than max?

No, min can be one less than max if the array has size zero.

> The change in Graphite code can be reverted then:
>
>>+  /* Fortran has some arrays where high bound is -1 and low is 0.  */
>>+  if (integer_onep (fold_build2 (LT_EXPR, boolean_type_node, high,
>>low)))
>>+return false;
>
>
> -Original Message-
>
> But either that is the case or the frontend has a bug and should be fixed.  
> So your patch doesn't make any sense.
>
> Richard.
>
>


Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path

2015-12-03 Thread Richard Biener
On Wed, Dec 2, 2015 at 11:56 PM, Jeff Law  wrote:
> On 12/02/2015 08:35 AM, Richard Biener wrote:
>
 be possible to make it do that much like I extended SCCVN to do this
 (when doing the DOM walk see if any incoming edge is marked executable
 and if not, mark all outgoing edges as not executable, if the block is
 executable
 at the time we process the last stmt determine if we can compute the
 edge
 that ends up always executed and mark all others as not executable)
>>>
>>>
>>> Essentially yes. I'm using the not-executable flag and bypassing things
>>> when
>>> it's discovered.
>
> I took at look at what you did with SCCVN and I like it better than what I
> was doing -- your handling is more complete.
>
> There's some code that ought to be factored out.  In particular at the start
> of before_dom_children you've got code that clears EDGE_EXECUTABLE.
> There's no reason we should duplicate in SCCVN, DOM and perhaps other DOM
> walkers in the future.  If you've got a place in mind where you think it
> ought to live (cfg-something) speak up.  Else I'll find a spot on my own.

I think domwalk.[ch] itself might be a good enough spot, a static method
inside the domwalk class

  bool before_dom_children_track_and_query_bb_executable (basic_block);

using a derived class that automagically does this would still require
to explicitely call the parent function before_dom_children, so better
make it explicit.

And document it is non-optimistic thus all edges have to be marked
EDGE_EXECUTABLE before the domwalk and before_dom_children
is supposed to mark non-executable outgoing edges.

>>>
>>> The most interesting side effect, and one I haven't fully analyzed yet is
>>> an
>>> unexpected jump thread -- which I've traced back to differences in what
>>> the
>>> alias oracle is able to find when we walk unaliased vuses. Which makes
>>> totally no sense that it's unable to find the unaliased vuse in the
>>> simplified CFG, but finds it when we don't remove the unexecutable edge.
>>> As
>>> I said, it makes no sense to me yet and I'm still digging.
>>
>>
>> The walking of PHI nodes is quite simplistic to avoid doing too much work
>> so
>> an extra (not executable) edge may confuse it enough.  So this might be
>> "expected".  Adding a flag on whether EDGE_EXECUTABLE is to be
>> trusted would be an option (also helping SCCVN).
>
> It was actually the opposite effect.  ie, with the simplified CFG, we missed
> the jump thread, but with the more complex CFG we found the jump thread.
> And the edge we were removing didn't (at first glance) appear to be of any
> real significance.

Oh, I see...

> I'm not seeing the oddity anymore now that I've converted DOM to mimick what
> you were doing in SCCVN.  So I'm going to assume I botched something
> somewhere.
>
> Back to analysis to see if there's other fallout.
>
> jeff
>


Re: Fix buildbreaker with isl 0.14

2015-12-03 Thread Richard Biener
On Thu, Dec 3, 2015 at 9:49 AM, Tom de Vries  wrote:
> [ was: Re: [PATCH] [graphite] handle missing isl_ast_expr ]
>
> On 03/12/15 00:56, Tom de Vries wrote:
>>
>> Hi,
>>
>> This break the build for me, with isl 0.14.
>>
>> ...
>> src/gcc/graphite-isl-ast-to-gimple.c: In member function ‘tree_node*
>> translate_isl_ast_to_gimple::binary_op_to_tree(tree, isl_ast_expr*,
>> ivs_params&)’:
>> src/gcc/graphite-isl-ast-to-gimple.c:591:10: error: ‘isl_ast_op_zdiv_r’
>> was not declared in this scope
>>   case isl_ast_op_zdiv_r:
>>^
>> ...
>>
>> Thanks,
>> - Tom
>>
>> On 02/12/15 23:17, Sebastian Pop wrote:
>>>
>>>  From ISL's documentation, isl_ast_op_zdiv_r is equal to zero iff the
>>> remainder
>>> on integer division is zero.  Code generate a modulo operation for that.
>>>
>>> * graphite-isl-ast-to-gimple.c (binary_op_to_tree): Handle
>>> isl_ast_op_zdiv_r.
>>>  (gcc_expression_from_isl_expr_op): Same.
>>>
>>> * gcc.dg/graphite/id-28.c: New.
>
>
> this patch fixes the build breaker with isl 0.14 for me. I'm using the
> HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS macro (which is set for isl
> 0.15, and not before) to guard the code handling isl_ast_op_zdiv_r (which I
> suppose is new in isl 0.15).
>
> OK for trunk?

Ok.  Ideally we'd get another configure check for this (who knows if
either of both
guarded things will vanish in future but with different versions...)

Thanks,
Richard.

> Thanks,
> - Tom


Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT

2015-12-03 Thread Chung-Lin Tang
On 2015/12/3 4:59 PM, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 03 Dec 2015 09:51:31 +0100, I wrote:
>> On Mon, 23 Nov 2015 21:15:00 +0800, Chung-Lin Tang  
>> wrote:
>>> The OpenACC wait directive is represented as a call to the runtime
>>> function "GOACC_wait" instead of a tree code.  I am seeing when
>>> '#pragma acc wait' is using inside a template function, the CALL_EXPR
>>> to GOACC_wait is being silently ignored/removed during tsubst_expr().
>>
>> Uh.
>>
>>> I think the correct way to organize this is that the call should be inside
>>> an EXPR_STMT, so here's a patch to do that; basically remove the
>>> add_stmt() call from the shared c_finish_oacc_wait() code, and add
>>> add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts.
>>>
>>> Tested with no regressions on trunk, okay to commit?
>>
>>> --- c-family/c-omp.c(revision 230703)
>>> +++ c-family/c-omp.c(working copy)
>>> @@ -63,7 +63,6 @@ c_finish_oacc_wait (location_t loc, tree parms, tr
>>>  }
>>>  
>>>stmt = build_call_expr_loc_vec (loc, stmt, args);
>>> -  add_stmt (stmt);
>>>  
>>>vec_free (args);
>> |  
>> |return stmt;
>> |  }
>>
>> I see in gcc/c/c-omp.c that several other c_finish_omp_* functions that
>> build builtin calls instead of tree nodes, do similar things like
>> c_finish_oacc_wait; I'd like to understand why it's -- presumably -- not
>> a problem for these: c_finish_omp_barrier, c_finish_omp_taskwait,
>> c_finish_omp_taskyield, c_finish_omp_flush?  (Jakub?)
> 
> Oh wait, it looks like the C++ front end is not actually using the
> functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its
> own implementations in gcc/cp/semantics.c, without "c_" prefixes?  In
> addition to finish_expr_stmt calls, I see it's also using
> finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec.
> So I guess we'll want to model this the same way for OpenACC support
> functions, and then (later) we should clean this up, to move the
> C-specific code from gcc/c-family/c-omp.c into the C front end?  (Jakub?)

I see most OpenACC/OpenMP constructs are represented by special statement codes,
so they should be a different case. I so far only see the OpenACC wait directive
being represented as a CALL_EXPR (maybe there are others, haven't exhaustively 
searched).

Chung-Lin




Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT

2015-12-03 Thread Jakub Jelinek
On Thu, Dec 03, 2015 at 06:05:36PM +0800, Chung-Lin Tang wrote:
> > Oh wait, it looks like the C++ front end is not actually using the
> > functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its
> > own implementations in gcc/cp/semantics.c, without "c_" prefixes?  In
> > addition to finish_expr_stmt calls, I see it's also using
> > finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec.
> > So I guess we'll want to model this the same way for OpenACC support
> > functions, and then (later) we should clean this up, to move the
> > C-specific code from gcc/c-family/c-omp.c into the C front end?  (Jakub?)
> 
> I see most OpenACC/OpenMP constructs are represented by special statement 
> codes,
> so they should be a different case. I so far only see the OpenACC wait 
> directive
> being represented as a CALL_EXPR (maybe there are others, haven't 
> exhaustively searched).

No, Thomas is right, just look at
finish_omp_{barrier,flush,taskwait,taskyield,cancel,cancellation_point},
all those are represented as CALL_EXPRs.

Jakub


Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-12-03 Thread Richard Earnshaw
e.  The problem for atomic load store is AArch64
>>> only supports direct register addressing mode.  After LRA reloads
>>> address expression out of memory reference, there is no combine/fwprop
>>> optimizer to merge instructions.  The problem is atomic_store's
>>> predicate doesn't match its constraint.   The predicate used for
>>> atomic_store is memory_operand, while all other atomic patterns
>>> use aarch64_sync_memory_operand.  I think this might be a typo.  With
>>> this change, expand will not generate addressing mode requiring reload
>>> anymore.  I will test another patch fixing this.
>>>
>>> Thanks,
>>> bin
>>
>> Some comments inline.
>>
>>>>
>>>> R.
>>>>
>>>> aarch64_legitimize_addr-20151128.txt
>>>>
>>>>
>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>> index 3fe2f0f..5b3e3c4 100644
>>>> --- a/gcc/config/aarch64/aarch64.c
>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
>>>> */, machine_mode mode)
>>>>   We try to pick as large a range for the offset as possible to
>>>>   maximize the chance of a CSE.  However, for aligned addresses
>>>>   we limit the range to 4k so that structures with different sized
>>>> - elements are likely to use the same base.  */
>>>> + elements are likely to use the same base.  We need to be careful
>>>> + not split CONST for some forms address expressions, otherwise it
>>
>> not to split a CONST for some forms of address expression,
>>
>>>> + will generate sub-optimal code.  */
>>>>
>>>>if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>>>>  {
>>>>HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>>>HOST_WIDE_INT base_offset;
>>>>
>>>> +  if (GET_CODE (XEXP (x, 0)) == PLUS)
>>>> +{
>>>> +  rtx op0 = XEXP (XEXP (x, 0), 0);
>>>> +  rtx op1 = XEXP (XEXP (x, 0), 1);
>>>> +
>>>> +  /* For addr expression in the form like "r1 + r2 + 0x3ffc".
>>>> + Since the offset is within range supported by addressing
>>>> + mode "reg+offset", we don't split the const and legalize
>>>> + it into below insn and expr sequence:
>>>> +   r3 = r1 + r2;
>>>> +   "r3 + 0x3ffc".  */
>>
>> I think this comment would read better as
>>
>> /* Address expressions of the form Ra + Rb + CONST.
>>
>>If CONST is within the range supported by the addressing
>>mode "reg+offset", do not split CONST and use the
>>sequence
>> Rt = Ra + Rb
>> addr = Rt + CONST.  */
>>
>>>> +  if (REG_P (op0) && REG_P (op1))
>>>> +{
>>>> +  machine_mode addr_mode = GET_MODE (x);
>>>> +  rtx base = gen_reg_rtx (addr_mode);
>>>> +  rtx addr = plus_constant (addr_mode, base, offset);
>>>> +
>>>> +  if (aarch64_legitimate_address_hook_p (mode, addr, false))
>>>> +{
>>>> +  emit_insn (gen_adddi3 (base, op0, op1));
>>>> +  return addr;
>>>> +}
>>>> +}
>>>> +  /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc".
>>>> + Live above, we don't split the const and legalize it into
>>>> + below insn and expr sequence:
>>
>> Similarly.
>>>> +   r3 = 0x3ffc;
>>>> +   r4 = r1 + r3;
>>>> +   "r4 + r2<<2".  */
>>
>> Why don't we generate
>>
>>   r3 = r1 + r2 << 2
>>   r4 = r3 + 0x3ffc
>>
>> utilizing the shift-and-add instructions?
> 
> All other comments are addressed in the attached new patch.
> As for this question, Wilco also asked it on internal channel before.
> The main idea is to depend on GIMPLE IVO/SLSR to find CSE
> opportunities of the scaled plus sub expr.  The scaled index is most
> likely loop iv, so I would like to split const plus out of memory
> reference so that it can be identified/hoisted as loop invariant.
> This is more important when base is sfp related.
>

Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT

2015-12-03 Thread Chung-Lin Tang
On 2015/12/3 6:11 PM, Jakub Jelinek wrote:
> On Thu, Dec 03, 2015 at 06:05:36PM +0800, Chung-Lin Tang wrote:
>>> Oh wait, it looks like the C++ front end is not actually using the
>>> functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its
>>> own implementations in gcc/cp/semantics.c, without "c_" prefixes?  In
>>> addition to finish_expr_stmt calls, I see it's also using
>>> finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec.
>>> So I guess we'll want to model this the same way for OpenACC support
>>> functions, and then (later) we should clean this up, to move the
>>> C-specific code from gcc/c-family/c-omp.c into the C front end?  (Jakub?)
>>
>> I see most OpenACC/OpenMP constructs are represented by special statement 
>> codes,
>> so they should be a different case. I so far only see the OpenACC wait 
>> directive
>> being represented as a CALL_EXPR (maybe there are others, haven't 
>> exhaustively searched).
> 
> No, Thomas is right, just look at
> finish_omp_{barrier,flush,taskwait,taskyield,cancel,cancellation_point},
> all those are represented as CALL_EXPRs.
> 
>   Jakub
> 

Okay, I guess my impression was only for some OpenACC constructs.

Overall, OpenACC wait seems one of the few cases of using c_finish_* in 
cp/parser.c.
Whether other cases should move towards/away from that kind of style is a 
larger question,
I was only trying to fix a libgomp.oacc-c++/template-reduction.C regression 
(testcase currently still in gomp4 branch)

Chung-Lin



[PATCHES, PING] Enhance standard DWARF for Ada

2015-12-03 Thread Pierre-Marie de Rodat

On 11/26/2015 01:34 PM, Pierre-Marie de Rodat wrote:

Done! (I repalced the dwarf_proc_decl_table hash table with a
dwarf_proc_stack_usage_map hash_map) Here's an update for the only
affected patch. Regtested again on x86_64-linux.


Ping for the patches submitted in 
 and for the 
2/8 update submitted in 
.


Thank you in advance!

--
Pierre-Marie de Rodat


[Ping^2][AArch64][TLSGD][2/2] Implement TLS GD traditional for tiny code model

2015-12-03 Thread Jiong Wang

On 13/11/15 15:21, Jiong Wang wrote:


On 05/11/15 14:57, Jiong Wang wrote:

Marcus Shawcroft writes:


+#ifdef HAVE_AS_TINY_TLSGD_RELOCS
+  return SYMBOL_TINY_TLSGD;
+#else
+  return SYMBOL_SMALL_TLSGD;
+#endif

Rather than introduce blocks of conditional compilation it is better
to gate different behaviours with a test on a constant expression. In
this case add something like this:

#if define(HAVE_AS_TINY_TLSGD_RELOCS)
#define USE_TINY_TLSGD 1
#else
#define USE_TINY_TLSGD 0
#endif

up near the definition of TARGET_HAVE_TLS then write the above
fragment without using the preprocessor:

return USE_TINY_TLSGD ? SYMBOL_TINY_TLSGD : SYMBOL_SMALL_TLSGD;


Done.


- aarch64_emit_call_insn (gen_tlsgd_small (result, imm, resolver));
+ if (type == SYMBOL_SMALL_TLSGD)
+  aarch64_emit_call_insn (gen_tlsgd_small (result, imm, resolver));
+ else
+  aarch64_emit_call_insn (gen_tlsgd_tiny (result, imm, resolver));
  insns = get_insns ();
  end_sequence ();

Add a separate case statment for SYMBOL_TINY_TLSGD rather than reusing
the case statement for SYMBOL_SMALL_TLSGD and then needing to add
another test against symbol type within the body of the case
statement.


Done.



+(define_insn "tlsgd_tiny"
+  [(set (match_operand 0 "register_operand" "")
+ (call (mem:DI (match_operand:DI 2 "" "")) (const_int 1)))
+   (unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "S")]
UNSPEC_GOTTINYTLS)
+   (clobber (reg:DI LR_REGNUM))
+  ]
+  ""
+  "adr\tx0, %A1;bl\t%2;nop";
+  [(set_attr "type" "multiple")
+   (set_attr "length" "12")])

I don't think the explicit clobber LR_REGNUM is required since your
change last September:
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02654.html


We don't need this explict clobber LR_REGNUM only if operand 0 happen
be allocated to LR_REGNUM as after my above patch LR_REGNUM is 
allocable.


However we still need the explict clobber here.  Because for all other
cases LR_REGNUM not allocated, gcc data flow analysis can't deduct 
LR_REGNUM

will still be clobbered implicitly by the call instruction.

Without this "clobber" tag, a direct impact is df_regs_ever_live is 
calculated

incorrectly for x30, then for the following simple testcase:

__thread int t0 = 0x10;
__thread int t1 = 0x10;

int
main (int argc, char **argv)
{
  if (t0 != t1)
return 1;
  return  0;
}


if you compile with

 "-O2 -ftls-model=global-dynamic -fpic -mtls-dialect=trad t.c 
-mcmodel=tiny -fomit-frame-pointer",

wrong code will be generated:

 main:
str x19, [sp, -16]!  <--- x30 is not saved.
adr x0, :tlsgd:t0
bl __tls_get_addr
nop

Patch updated. tls regression OK

OK for trunk?

2015-11-05  Jiong Wang  

gcc/
  * configure.ac: Add check for binutils global dynamic tiny code model
  relocation support.
  * configure: Regenerate.
  * config.in: Regenerate.
  * config/aarch64/aarch64.md (tlsgd_tiny): New define_insn.
  * config/aarch64/aarch64-protos.h (aarch64_symbol_type): New
  enumeration SYMBOL_TINY_TLSGD.
  (aarch64_symbol_context): New comment on SYMBOL_TINY_TLSGD.
  * config/aarch64/aarch64.c (aarch64_classify_tls_symbol): Support
  SYMBOL_TINY_TLSGD.
  (aarch64_print_operand): Likewise.
  (aarch64_expand_mov_immediate): Likewise.
  (aarch64_load_symref_appropriately): Likewise.

gcc/testsuite/
  * lib/target-supports.exp (check_effective_target_aarch64_tlsgdtiny):
  New effective check.
  * gcc.target/aarch64/tlsgd_small_1.c: New testcase.
  * gcc.target/aarch64/tlsgd_small_ilp32_1.c: Likewise.
  * gcc.target/aarch64/tlsgd_tiny_1.c: Likewise.
  * gcc.target/aarch64/tlsgd_tiny_ilp32_1.c: Likewise.

Ping ~


Ping^2


Re: [gomp-nvptx 4/9] nvptx backend: add -mgomp option and multilib

2015-12-03 Thread Alexander Monakov
On Wed, 2 Dec 2015, Jakub Jelinek wrote:
> Can you post sample code with assembly for -msoft-stack and -muniform-simt
> showing how are short interesting cases expanded?

Here's short examples;  please let me know if I'm misunderstanding and you
wanted something else.

First, -muniform-simt effect on this input:

int f (int *p, int v)
{
  return __atomic_exchange_n (p, v, __ATOMIC_SEQ_CST);
}

leads to this assembly (showing diff -without/+with option):

 .visible .func (.param.u32 %out_retval)f(.param.u64 %in_ar1, .param.u32 
%in_ar2)
 {
.reg.u64 %ar1;
.reg.u32 %ar2;
.reg.u32 %retval;
.reg.u64 %hr10;
.reg.u32 %r23;
.reg.u64 %r25;
.reg.u32 %r26;
+   .reg.u32 %r28;
+   .reg.pred %r29;
ld.param.u64 %ar1, [%in_ar1];
ld.param.u32 %ar2, [%in_ar2];
+   {
+   .reg.u32 %ustmp0;
+   .reg.u64 %ustmp1;
+   .reg.u64 %ustmp2;
+   mov.u32 %ustmp0, %tid.y;
+   mul.wide.u32 %ustmp1, %ustmp0, 4;
+   mov.u64 %ustmp2, __nvptx_uni;
+   add.u64 %ustmp2, %ustmp2, %ustmp1;
+   ld.shared.u32 %r28, [%ustmp2];
+   mov.u32 %ustmp0, %tid.x;
+   and.b32 %r28, %r28, %ustmp0;
+   setp.eq.u32 %r29, %r28, %ustmp0;
+   }
mov.u64 %r25, %ar1;
mov.u32 %r26, %ar2;
-   atom.exch.b32   %r23, [%r25], %r26;
+   @%r29   atom.exch.b32   %r23, [%r25], %r26;
+   shfl.idx.b32%r23, %r23, %r28, 31;
mov.u32 %retval, %r23;
st.param.u32[%out_retval], %retval;
ret;
}
+// BEGIN GLOBAL VAR DECL: __nvptx_uni
+.extern .shared .u32 __nvptx_uni[32];

And, -msoft-stack for this input:

void g(void *);
void f()
{
  char a[42] __attribute__((aligned(64)));
  g(a);
}

leads to:

 .visible .func f
 {
.reg.u64 %hr10;
.reg.u64 %r22;
.reg.u64 %frame;
-   .local.align 64 .b8 %farray[48];
-   cvta.local.u64 %frame, %farray;
+   .reg.u32 %fstmp0;
+   .reg.u64 %fstmp1;
+   .reg.u64 %fstmp2;
+   mov.u32 %fstmp0, %tid.y;
+   mul.wide.u32 %fstmp1, %fstmp0, 8;
+   mov.u64 %fstmp2, __nvptx_stacks;
+   add.u64 %fstmp2, %fstmp2, %fstmp1;
+   ld.shared.u64 %fstmp1, [%fstmp2];
+   sub.u64 %frame, %fstmp1, 48;
+   and.b64 %frame, %frame, -64;
+   st.shared.u64 [%fstmp2], %frame;
mov.u64 %r22, %frame;
{
.param.u64 %out_arg0;
st.param.u64 [%out_arg0], %r22;
call g, (%out_arg0);
}
+   st.shared.u64 [%fstmp2], %fstmp1;
ret;
}
 // BEGIN GLOBAL FUNCTION DECL: g
 .extern .func g(.param.u64 %in_ar1);
+// BEGIN GLOBAL VAR DECL: __nvptx_stacks
+.extern .shared .u64 __nvptx_stacks[32];


Alexander


[PATCH] Handle OBJ_TYPE_REF in FRE

2015-12-03 Thread Richard Biener

The following patch handles CSEing OBJ_TYPE_REF which was omitted
because it is a GENERIC expression even on GIMPLE (for whatever
reason...).  Rather than changing this now the following patch
simply treats it properly as such.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Note that this does not (yet) substitute OBJ_TYPE_REFs in calls
with SSA names that have the same value - not sure if that would
be desired generally (does the devirt machinery cope with that?).

Thanks,
Richard.

2015-12-03  Richard Biener  

PR tree-optimization/64812
* tree-ssa-sccvn.c (vn_get_stmt_kind): Handle OBJ_TYPE_REF.
(vn_nary_length_from_stmt): Likewise.
(init_vn_nary_op_from_stmt): Likewise.
* gimple-match-head.c (maybe_build_generic_op): Likewise.
* gimple-pretty-print.c (dump_unary_rhs): Likewise.

* g++.dg/tree-ssa/ssa-fre-1.C: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 231221)
--- gcc/tree-ssa-sccvn.c(working copy)
*** vn_get_stmt_kind (gimple *stmt)
*** 460,465 
--- 460,467 
  ? VN_CONSTANT : VN_REFERENCE);
else if (code == CONSTRUCTOR)
  return VN_NARY;
+   else if (code == OBJ_TYPE_REF)
+ return VN_NARY;
return VN_NONE;
  }
  default:
*** vn_nary_length_from_stmt (gimple *stmt)
*** 2479,2484 
--- 2481,2487 
return 1;
  
  case BIT_FIELD_REF:
+ case OBJ_TYPE_REF:
return 3;
  
  case CONSTRUCTOR:
*** init_vn_nary_op_from_stmt (vn_nary_op_t
*** 2508,2513 
--- 2511,2517 
break;
  
  case BIT_FIELD_REF:
+ case OBJ_TYPE_REF:
vno->length = 3;
vno->op[0] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
vno->op[1] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 1);
Index: gcc/gimple-match-head.c
===
*** gcc/gimple-match-head.c (revision 231221)
--- gcc/gimple-match-head.c (working copy)
*** maybe_build_generic_op (enum tree_code c
*** 243,248 
--- 243,249 
*op0 = build1 (code, type, *op0);
break;
  case BIT_FIELD_REF:
+ case OBJ_TYPE_REF:
*op0 = build3 (code, type, *op0, op1, op2);
break;
  default:;
Index: gcc/gimple-pretty-print.c
===
*** gcc/gimple-pretty-print.c   (revision 231221)
--- gcc/gimple-pretty-print.c   (working copy)
*** dump_unary_rhs (pretty_printer *buffer,
*** 302,308 
  || TREE_CODE_CLASS (rhs_code) == tcc_reference
  || rhs_code == SSA_NAME
  || rhs_code == ADDR_EXPR
! || rhs_code == CONSTRUCTOR)
{
  dump_generic_node (buffer, rhs, spc, flags, false);
  break;
--- 302,309 
  || TREE_CODE_CLASS (rhs_code) == tcc_reference
  || rhs_code == SSA_NAME
  || rhs_code == ADDR_EXPR
! || rhs_code == CONSTRUCTOR
! || rhs_code == OBJ_TYPE_REF)
{
  dump_generic_node (buffer, rhs, spc, flags, false);
  break;
Index: gcc/testsuite/g++.dg/tree-ssa/ssa-fre-1.C
===
*** gcc/testsuite/g++.dg/tree-ssa/ssa-fre-1.C   (revision 0)
--- gcc/testsuite/g++.dg/tree-ssa/ssa-fre-1.C   (working copy)
***
*** 0 
--- 1,44 
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-fre2" } */
+ 
+ template  class A
+ {
+   T *p;
+ 
+ public:
+   A (T *p1) : p (p1) { p->acquire (); }
+ };
+ 
+ class B
+ {
+ public:
+ virtual void acquire ();
+ };
+ class D : public B
+ {
+ };
+ class F : B
+ {
+   int mrContext;
+ };
+ class WindowListenerMultiplexer : F, public D
+ {
+   void acquire () { acquire (); }
+ };
+ class C
+ {
+   void createPeer () throw ();
+   WindowListenerMultiplexer maWindowListeners;
+ };
+ class FmXGridPeer
+ {
+ public:
+ void addWindowListener (A);
+ } a;
+ void
+ C::createPeer () throw ()
+ {
+   a.addWindowListener (&maWindowListeners);
+ }
+ 
+ /* { dg-final { scan-tree-dump-times "= OBJ_TYPE_REF" 1 "fre2" } } */


[PATCH, CHKP] Fix bounds returned for structures

2015-12-03 Thread Ilya Enkovich
Hi,

Currently multiple return-struct-* tests from MPX testsuite fail.  This patch 
fixes it.  Bootstrapped and tested on x86_64-unknown-linux-gnu.  Applied to 
trunk.  I'm going to port it to GCC5 after 5.3 release.

Thanks,
Ilya
--
gcc/

2015-12-03  Ilya Enkovich  

* cfgexpand.c (expand_gimple_stmt_1): Return statement with
DECL as return value is allowed to have NULL bounds.


diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 1990e10..2c3b23d 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3534,6 +3534,12 @@ expand_gimple_stmt_1 (gimple *stmt)
  {
tree result = DECL_RESULT (current_function_decl);
 
+   /* Mark we have return statement with missing bounds.  */
+   if (!bnd
+   && chkp_function_instrumented_p (cfun->decl)
+   && !DECL_P (op0))
+ bnd = error_mark_node;
+
/* If we are not returning the current function's RESULT_DECL,
   build an assignment to it.  */
if (op0 != result)
@@ -3550,9 +3556,6 @@ expand_gimple_stmt_1 (gimple *stmt)
op0 = build2 (MODIFY_EXPR, TREE_TYPE (result),
  result, op0);
  }
-   /* Mark we have return statement with missing bounds.  */
-   if (!bnd && chkp_function_instrumented_p (cfun->decl))
- bnd = error_mark_node;
  }
 
if (!op0)


Re: [UPC 02/22] tree-related changes

2015-12-03 Thread Richard Biener
On Wed, 2 Dec 2015, Gary Funck wrote:

> On 12/01/15 12:26:32, Richard Biener wrote:
> > On Mon, 30 Nov 2015, Gary Funck wrote:
> > > -struct GTY(()) tree_type_common {
> > > +struct GTY((user)) tree_type_common {
> > >struct tree_common common;
> > >tree size;
> > >tree size_unit;
> > > @@ -1441,10 +1458,10 @@ struct GTY(()) tree_type_common {
> > >tree pointer_to;
> > >tree reference_to;
> > >union tree_type_symtab {
> > > -int GTY ((tag ("TYPE_SYMTAB_IS_ADDRESS"))) address;
> > > -const char * GTY ((tag ("TYPE_SYMTAB_IS_POINTER"))) pointer;
> > > -struct die_struct * GTY ((tag ("TYPE_SYMTAB_IS_DIE"))) die;
> > > -  } GTY ((desc ("debug_hooks->tree_type_symtab_field"))) symtab;
> > > +int address;
> > > +const char *pointer;
> > > +struct die_struct *die;
> > > +  } symtab;
> >
> > Err, you don't have debug info for this?  What is address?
> 
> Not sure what you mean.  The 'die' field is retained.
> Is there something in the semantics of "GTY(( ((tag "
> that relates to debug information?

Ah, sorry.  I misread the diff.

> > I do not like the explict GC of tree_type_common.
> 
> I'm not a fan either.
> 
> The gist is that we needed a map from tree nodes to tree nodes
> to record the "layout qualifier" for layout qualifiers with
> a value greater than one.  But when the garbage collector ran
> over the hash table that maps integer constants to tree nodes,
> it didn't know that the constant was being referenced by the
> layout qualifier tree map.
> 
> We described the issue here:
> https://gcc.gnu.org/ml/gcc-patches/2011-10/msg00800.html
> 
> The conclusion that we reached is that when tree nodes
> were walked, we needed to check if there was a
> tree node -> integer constant mapping, the integer constant map
> (used to make tree nodes used to hold CST's unique)
> needed to be marked to keep the CST mapping from going away.
> 
> This led to the conclusion that a custom GC routine was
> needed for tree nodes.  Maybe that conclusion is wrong or
> there is a better way to do things?

It should simply work as long as the hash-map is properly marked
as GC root.  It might _not_ work (reliably) if the hash-map is
also a "cache" by itself.  But it eventually works now given some
fixes went into the area of collecting/marking caches.

> > > ===
> > > --- gcc/tree-pretty-print.c   (.../trunk) (revision 231059)
> > > +++ gcc/tree-pretty-print.c   (.../branches/gupc) (revision 
> > > 231080)
> > > @@ -1105,6 +1105,25 @@ dump_block_node (pretty_printer *pp, tre
> > >  }
> > >  
> > >  
> > > +static void
> > > +dump_upc_type_quals (pretty_printer *buffer, tree type, int quals)
> >
> > Functions need comments.
> 
> OK.  Missed that one.  Will check on others.
> 
> > > Index: gcc/tree-sra.c
> > > ===
> > > --- gcc/tree-sra.c(.../trunk) (revision 231059)
> > > +++ gcc/tree-sra.c(.../branches/gupc) (revision 231080)
> > > @@ -3882,6 +3882,7 @@ find_param_candidates (void)
> > >  
> > > if (TREE_CODE (type) == FUNCTION_TYPE
> > > || TYPE_VOLATILE (type)
> > > +   || SHARED_TYPE_P (type)
> > 
> > UPC_SHARED_TYPE_P ()
> 
> OK. As I mentioned in a previous reply, originally we prefixed
> all "UPC" specific tree node fields and functions with UPC_ or upc_,
> but as we transitioned away from UPC as a separate language
> (ala ObjC) and made compilation conditional upon -fupc, an
> observation was made off list that since the base tree nodes
> are generic that naming UPC-related fields with "UPC" prefixes
> didn't make sense, so we removed those prefixes.  There might
> be a middle ground, however, whee UPC_SHARED_TYPE_P() is preferred
> to SHARED_TYPE_P() because as you/others have mentioned,
> the term "shared" gets used in a lot of contexts.

Yes, specifically for predicates/functions used in the middle-end.

> > > @@ -4381,6 +4422,7 @@ build1_stat (enum tree_code code, tree t
> > >/* Whether a dereference is readonly has nothing to do with whether
> > >its operand is readonly.  */
> > >TREE_READONLY (t) = 0;
> > > +  TREE_SHARED (t) = SHARED_TYPE_P (type);
> > 
> > This is frontend logic and should reside in FEs.
> 
> [... several other similar actions taken contingent
> upon SHARED_TYPE_P() elided ...]
> 
> OK, will take a look.
> 
> > > +  outer_is_pts_p = (POINTER_TYPE_P (outer_type)
> > > +&& SHARED_TYPE_P (TREE_TYPE (outer_type)));
> > > +  inner_is_pts_p = (POINTER_TYPE_P (inner_type)
> > > +&& SHARED_TYPE_P (TREE_TYPE (inner_type)));
> > > +
> > > +  /* Pointer-to-shared types have special
> > > + equivalence rules that must be checked.  */
> > > +  if (outer_is_pts_p && inner_is_pts_p
> > > +  && lang_hooks.types_compatible_p)
> > > +return lang_hooks.types_compatible_p (outer_type, inner_type);
> > 
> > Sorry, but 

Re: [ARM] Fix PR middle-end/65958

2015-12-03 Thread Richard Earnshaw
Sorry for the delay, very busy on other things these days...

On 16/11/15 20:00, Eric Botcazou wrote:
>> More comments inline.
>
> Revised version attached, which addresses all your comments and in
particular
> removes the
>
> +#if PROBE_INTERVAL > 4096
> +#error Cannot use indexed addressing mode for stack probing
> +#endif
>
> compile-time assertion.  It generates the same code for PROBE_INTERVAL
== 4096
> as before and it generates code that can be assembled for 8192.
>
> Tested on Aarch64/Linux, OK for the mainline?
>

> +#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
> +
> +/* We use the 12-bit shifted immediate arithmetic instructions so values
> +   must be multiple of (1 << 12), i.e. 4096.  */
> +#if (PROBE_INTERVAL % 4096) != 0

I can understand this restriction, but...

> +  /* See the same assertion on PROBE_INTERVAL above.  */
> +  gcc_assert ((first % 4096) == 0);

... why isn't this a test that FIRST is aligned to PROBE_INTERVAL?

> +  /* See if we have a constant small number of probes to generate.
If so,
> + that's the easy case.  */
> +  if (size <= PROBE_INTERVAL)
> +{
> +  const HOST_WIDE_INT base = ROUND_UP (size, 4096);
> +  emit_set_insn (reg1,

blank line between declarations and code. Also, can we come up with a
suitable define for 4096 here that expresses the context and then use
that consistently through the remainder of this function?

> +(define_insn "probe_stack_range"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (unspec_volatile:DI [(match_operand:DI 1 "register_operand" "0")
> +  (match_operand:DI 2 "register_operand" "r")]
> +  UNSPEC_PROBE_STACK_RANGE))]

I think this should really use PTRmode, so that it's ILP32 ready (I'm
not going to ask you to make sure that works though, since I suspect
there are still other issues to resolve with ILP32 at this time).

R.




Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-03 Thread Tom de Vries

On 03/12/15 09:59, Richard Biener wrote:

On Thu, 3 Dec 2015, Tom de Vries wrote:


On 03/12/15 01:10, Tom de Vries wrote:


I've managed to reproduce it. The difference between pass and fail is
whether the compiler is configured with or without accelerator.

I'll look into it.


In the configuration with accelerator, the flag node->force_output is on for
foo._omp.fn.

This causes nonlocal_p to be true in ipa_pta_execute, which causes the
optimization to fail.

The flag is decribed as:
...
   /* The symbol will be assumed to be used in an invisible way (like
  by an toplevel asm statement).  */
  ...

Looks like I have to ignore the force_output flag as well in ipa_pta_execute
for this sort of node.


It rather looks like the flag shouldn't be set.  The fn after all has
its address taken!(?)



The flag is set here in expand_omp_target:
...
12682 /* Prevent IPA from removing child_fn as unreachable,
 since there are no
12683refs from the parent function to child_fn in offload
 LTO mode.  */
12684 if (ENABLE_OFFLOADING)
12685   cgraph_node::get (child_fn)->mark_force_output ();
...

I guess setting forced_by_abi instead would also mean child_fn is not 
removed as unreachable, while still allowing optimizations:

...
  /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
 to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
 symbols promoted to static and it does not inhibit
 optimization.  */
  unsigned forced_by_abi : 1;
...

But I suspect that other optimizations (than ipa-pta) might break things.

Essentially we have two situations:
- in the host compiler, there is no need for the forced_output flag,
  and it inhibits optimization
- in the accelerator compiler, it (or some equivalent) is needed

I wonder if setting the force_output flag only when streaming the 
bytecode for offloading would work. That way, it wouldn't be set in the 
host compiler, while being set in the accelerator compiler.


Thanks,
- Tom


Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-03 Thread Richard Biener
On Thu, 3 Dec 2015, Tom de Vries wrote:

> On 03/12/15 09:59, Richard Biener wrote:
> > On Thu, 3 Dec 2015, Tom de Vries wrote:
> > 
> > > On 03/12/15 01:10, Tom de Vries wrote:
> > > > 
> > > > I've managed to reproduce it. The difference between pass and fail is
> > > > whether the compiler is configured with or without accelerator.
> > > > 
> > > > I'll look into it.
> > > 
> > > In the configuration with accelerator, the flag node->force_output is on
> > > for
> > > foo._omp.fn.
> > > 
> > > This causes nonlocal_p to be true in ipa_pta_execute, which causes the
> > > optimization to fail.
> > > 
> > > The flag is decribed as:
> > > ...
> > >/* The symbol will be assumed to be used in an invisible way (like
> > >   by an toplevel asm statement).  */
> > >   ...
> > > 
> > > Looks like I have to ignore the force_output flag as well in
> > > ipa_pta_execute
> > > for this sort of node.
> > 
> > It rather looks like the flag shouldn't be set.  The fn after all has
> > its address taken!(?)
> > 
> 
> The flag is set here in expand_omp_target:
> ...
> 12682 /* Prevent IPA from removing child_fn as unreachable,
>  since there are no
> 12683refs from the parent function to child_fn in offload
>  LTO mode.  */
> 12684 if (ENABLE_OFFLOADING)
> 12685   cgraph_node::get (child_fn)->mark_force_output ();
> ...
> 

How are there no refs from the "parent"?  Are there not refs from
some kind of descriptor that maps fallback CPU and offloaded variants?

I think the above needs sorting out in somw way, making the refs
explicit rather than implicit via force_output.

> I guess setting forced_by_abi instead would also mean child_fn is not removed
> as unreachable, while still allowing optimizations:
> ...
>   /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
>  to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
>  symbols promoted to static and it does not inhibit
>  optimization.  */
>   unsigned forced_by_abi : 1;
> ...
> 
> But I suspect that other optimizations (than ipa-pta) might break things.

How so?

> Essentially we have two situations:
> - in the host compiler, there is no need for the forced_output flag,
>   and it inhibits optimization
> - in the accelerator compiler, it (or some equivalent) is needed
> 
> I wonder if setting the force_output flag only when streaming the bytecode for
> offloading would work. That way, it wouldn't be set in the host compiler,
> while being set in the accelerator compiler.

Yeah, that was my original thinking btw.

Richard.


Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-03 Thread Jakub Jelinek
On Thu, Dec 03, 2015 at 12:09:04PM +0100, Tom de Vries wrote:
> The flag is set here in expand_omp_target:
> ...
> 12682 /* Prevent IPA from removing child_fn as unreachable,
>  since there are no
> 12683refs from the parent function to child_fn in offload
>  LTO mode.  */
> 12684 if (ENABLE_OFFLOADING)
> 12685   cgraph_node::get (child_fn)->mark_force_output ();
> ...
> 
> I guess setting forced_by_abi instead would also mean child_fn is not
> removed as unreachable, while still allowing optimizations:
> ...
>   /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
>  to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
>  symbols promoted to static and it does not inhibit
>  optimization.  */
>   unsigned forced_by_abi : 1;
> ...
> 
> But I suspect that other optimizations (than ipa-pta) might break things.
> 
> Essentially we have two situations:
> - in the host compiler, there is no need for the forced_output flag,
>   and it inhibits optimization
> - in the accelerator compiler, it (or some equivalent) is needed
> 
> I wonder if setting the force_output flag only when streaming the bytecode
> for offloading would work. That way, it wouldn't be set in the host
> compiler, while being set in the accelerator compiler.

I believe that the host and offload func (and var) tables need to be in
sync, so there needs to be something both in the host and accel compilers
that prevents the functions and variables that have their accel or host
counterpart in the tables from being optimized away, or say replaced by
a clone with different arguments etc.

Jakub


Re: [Patch, fortran] PR68534 - No error on mismatch in number of arguments between submodule and module interface

2015-12-03 Thread Paul Richard Thomas
Dear Steve,

I'll take a look at this this afternoon. Thanks for bringing it to my attention.

Cheers

Paul

On 3 December 2015 at 07:43, Steve Kargl
 wrote:
> On Wed, Dec 02, 2015 at 10:26:30PM -0800, Steve Kargl wrote:
>> On Wed, Dec 02, 2015 at 10:02:33PM -0800, Steve Kargl wrote:
>> > Paul,
>> >
>> > I'm stumped.  Something is broken on i386-*-freebsd. :-(
>> >
>> > Running /mnt/kargl/gcc/gcc/testsuite/gfortran.dg/dg.exp ...
>> > FAIL: gfortran.dg/submodule_10.f08   -O  (internal compiler error)
>> > FAIL: gfortran.dg/submodule_10.f08   -O  (test for excess errors)
>> > FAIL: gfortran.dg/submodule_11.f08   -O0  (internal compiler error)
>> > FAIL: gfortran.dg/submodule_11.f08   -O0  (test for excess errors)
>> > FAIL: gfortran.dg/submodule_11.f08   -O1  (internal compiler error)
>> > FAIL: gfortran.dg/submodule_11.f08   -O1  (test for excess errors)
>> > FAIL: gfortran.dg/submodule_11.f08   -O2  (internal compiler error)
>> > FAIL: gfortran.dg/submodule_11.f08   -O2  (test for excess errors)
>> > FAIL: gfortran.dg/submodule_11.f08   -O3 -fomit-frame-pointer 
>> > -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal 
>> > compiler error)
>> > FAIL: gfortran.dg/submodule_11.f08   -O3 -fomit-frame-pointer 
>> > -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
>> > errors)
>> > FAIL: gfortran.dg/submodule_11.f08   -O3 -g  (internal compiler error)
>> > FAIL: gfortran.dg/submodule_11.f08   -O3 -g  (test for excess errors)
>> > FAIL: gfortran.dg/submodule_11.f08   -Os  (internal compiler error)
>> > FAIL: gfortran.dg/submodule_11.f08   -Os  (test for excess errors)
>>
>> Well, if I change the order of the conditionals decl.c:4831, I
>> can get rid of the above FAILs.
>>
>> Index: decl.c
>> ===
>> --- decl.c  (revision 231219)
>> +++ decl.c  (working copy)
>> @@ -4826,7 +4826,7 @@ ok:
>>
>>/* Abbreviated module procedure declaration is not meant to have any
>>  formal arguments!  */
>> -  if (!sym->abr_modproc_decl && formal && !head)
>> +  if (formal && !head && sym && !sym->abr_modproc_decl)
>> arg_count_mismatch = true;
>>
>>for (p = formal, q = head; p && q; p = p->next, q = q->next)
>>
>> --
>> steve
>>
>> > FAIL: gfortran.dg/submodule_13.f08   -O  (internal compiler error)
>> > FAIL: gfortran.dg/submodule_13.f08   -O   (test for errors, line 29)
>> > FAIL: gfortran.dg/submodule_13.f08   -O  (test for excess errors)
>
> These ICEs persist at line 4831.  In looking at the code, I'm
> now somewhat unsure what it should be doing.  In particular,
> there are 2 gfc_error_now() calls in the below:
>
>
>   for (p = formal, q = head; p && q; p = p->next, q = q->next)
> {
>   if ((p->next != NULL && q->next == NULL)
>   || (p->next == NULL && q->next != NULL))
> arg_count_mismatch = true;
>   else if ((p->sym == NULL && q->sym == NULL)
> || strcmp (p->sym->name, q->sym->name) == 0)
> continue;
>   else
> gfc_error_now ("Mismatch in MODULE PROCEDURE formal "
>"argument names (%s/%s) at %C",
>p->sym->name, q->sym->name);
> }
>
>   if (arg_count_mismatch)
>   gfc_error_now ("Mismatch in number of MODULE PROCEDURE "
>  "formal arguments at %C");
> }
>
>   return MATCH_YES;
>
> cleanup:
>   gfc_free_formal_arglist (head);
>   return m;
>
> But, we return MATCH_YES?  I would expect setting m = MATCH_ERROR
> and jumping to cleanup.  That's ugly.
>
> --
> Steve



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx


Re: [PATCH] Fix shrink-wrap bug with anticipating into loops (PR67778, PR68634)

2015-12-03 Thread Bernd Schmidt

On 12/02/2015 07:21 PM, Segher Boessenkool wrote:

After shrink-wrapping has found the "tightest fit" for where to place
the prologue, it tries move it earlier (so that frame saves are run
earlier) -- but without copying any more basic blocks.

Unfortunately a candidate block we select can be inside a loop, and we
will still allow it (because the loop always exits via our previously
chosen block).



So we need to detect this situation.  We can place the prologue at a
previous block PRE only if PRE dominates every block reachable from
it.  This is a bit hard / expensive to compute, so instead this patch
allows a block PRE only if PRE does not post-dominate any of its
successors (other than itself).


Are the two conditions equivalent though? I'm not fully convinced. Let's 
say the loop has multiple exits, then none of these exit blocks 
postdominate the loop entry block, right?


I think I agree with Jakub that we don't want to do unnecessary work in 
this piece of code.



/* If we can move PRO back without having to duplicate more blocks, do so.
   We can move back to a block PRE if every path from PRE will eventually
- need a prologue, that is, PRO is a post-dominator of PRE.  */
+ need a prologue, that is, PRO is a post-dominator of PRE.  We might
+ need to duplicate PRE if there is any path from a successor of PRE back
+ to PRE, so don't allow that either (but self-loops are fine, as are any
+ other loops entirely dominated by PRE; this in general seems too
+ expensive to check for, for such an uncommon case).  */


The last comment is unclear and I don't know what it wants to tell me.


Bernd


Re: [PATCH] Fix shrink-wrap bug with anticipating into loops (PR67778, PR68634)

2015-12-03 Thread Bernd Schmidt

On 12/02/2015 07:21 PM, Segher Boessenkool wrote:

After shrink-wrapping has found the "tightest fit" for where to place
the prologue, it tries move it earlier (so that frame saves are run
earlier) -- but without copying any more basic blocks.


Another question would be - is there really a good reason to do this at all?


Bernd


Re: Add an rsqrt_optab and IFN_RSQRT internal function

2015-12-03 Thread Richard Biener
On Thu, Dec 3, 2015 at 10:39 AM, Jakub Jelinek  wrote:
> On Thu, Dec 03, 2015 at 09:21:03AM +, Richard Sandiford wrote:
>>   * internal-fn.def (RSQRT): New function.
>>   * optabs.def (rsqrt_optab): New optab.
>>   * doc/tm.texi (rsqrtM2): Document
>
> Missing full stop.
>
> Otherwise looks to me like a nice cleanup and hopefully fixes the aarch64
> regression.

Looks good to me as well.

Richard.

> Jakub


Re: Add fuzzing coverage support

2015-12-03 Thread Bernd Schmidt

On 12/02/2015 06:38 PM, Dmitry Vyukov wrote:

One thing to consider would
be whether you really need this split between O0/optimize versions, or
whether you can find a place in the queue where to insert it
unconditionally. Have you considered this at all or did you just follow
asan/tsan?


I inserted the pass just before asan/tsan because it looks like the
right place for it. If we do it after asan, it will insert coverage
for all asan-emited BBs which is highly undesirable. I also think it
is a good idea to run a bunch of optimizations before coverage pass to
not emit too many coverage callbacks (but I can't say that I am very
knowledgeable in this area). FWIW clang does the same: coverage passes
run just before asan/tsan.


There's one other thing I want to put out there. Is this kind of thing 
maybe what plugins were invented for? I don't really like the concept of 
plugins, but it seems to me that this sort of thing might be an 
application for them.



+public:
+  static pass_data pd ()
+  {
+static const pass_data data =



I think a static data member would be better than the unnecessary pd ()
function. This is also unlike existing practice, and I wonder how others
think about it. IMO a fairly strong case could be made that if we're using
C++, then this sort of thing ought to be part of the class definition.


I vary name of the pass depending on the O0 template argument (again
following asan):

 O0 ? "sancov_O0" : "sancov", /* name */

If we call it "sancov" always, then I can make it just a global var
(as all other passes in gcc).
Or I can make it a static variable of the template class and move
definition of the class (as you proposed).
What would you prefer?


I think I prefer the static var of the template class. I just wonder why 
we don't have the pass_data for all the existing passes as static data 
members? I'm sure there's some reason.


asan also distinguishes the name between asan/asan0. I'd either follow 
that naming convention, or remove the _O0 variant for all three of them. 
I lean towards the latter.



Bernd


Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-12-03 Thread Tom de Vries

On 30/11/15 14:32, Jakub Jelinek wrote:

On Mon, Nov 30, 2015 at 02:24:18PM +0100, Richard Biener wrote:

OK for stage3 trunk if bootstrap and reg-test succeeds?


-|| node->address_taken);
+|| (node->address_taken
+&& !node->parallelized_function));

please add a comment here on why this is safe.

Ok with this change.


BTW, __builting_GOMP_task supposedly can be treated similarly
if the third argument is NULL (if 3rd arg is non-NULL, then
the caller passes a different structure from what the callee receives,
but perhaps it could be emulated as pretending that cpyfn is called first
with address of a temporary var and the data argument and then fn
is called with the address of the temporary var).


Filed as PR68673 - Handle __builtin_GOMP_task optimally in ipa-pta.

Can you provide testcases for both (3rd arg NULL/non-NULL) cases? I'm 
not fluent in openmp.


Thanks,
- Tom


Re: [PATCH, 4/16] Implement -foffload-alias

2015-12-03 Thread Tom de Vries

On 11/11/15 12:00, Jakub Jelinek wrote:

On Wed, Nov 11, 2015 at 11:51:02AM +0100, Richard Biener wrote:

The option -foffload-alias=pointer instructs the compiler to assume that
objects references in an offload region do not alias.

The option -foffload-alias=all instructs the compiler to make no
assumptions about aliasing in offload regions.

The default value is -foffload-alias=none.


I think global options for this is nonsense.  Please follow what
we do for #pragma GCC ivdep for example, thus allow the alias
behavior to be specified per "region" (whatever makes sense here
in the context of offloading).


Yeah, completely agreed.  I don't see why the offloaded region would be in
any way special, they are C/C++/Fortran code as any other.
What we can and should improve is teach IPA aliasing/points to analysis
about the way we lower the host vs. offloading region boundary, so that
if alias analysis on the caller of GOMP_target_ext/GOACC_parallel_keyed
determines something it can be used on the offloaded function side and vice
versa, but a switch like the above is just wrong.


Filed the GOMP_target_ext bit as PR 68675 - Handle GOMP_target_ext 
optimally in ipa-pta.


Thanks,
- Tom


Re: [ARM] Fix PR middle-end/65958

2015-12-03 Thread Eric Botcazou
> I can understand this restriction, but...
> 
> > +  /* See the same assertion on PROBE_INTERVAL above.  */
> > +  gcc_assert ((first % 4096) == 0);
> 
> ... why isn't this a test that FIRST is aligned to PROBE_INTERVAL?

Because that isn't guaranteed, FIRST is related to the size of the protection 
area while PROBE_INTERVAL is related to the page size.

> blank line between declarations and code. Also, can we come up with a
> suitable define for 4096 here that expresses the context and then use
> that consistently through the remainder of this function?

OK, let's use ARITH_BASE.

> > +(define_insn "probe_stack_range"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +   (unspec_volatile:DI [(match_operand:DI 1 "register_operand" "0")
> > +(match_operand:DI 2 "register_operand" "r")]
> > +UNSPEC_PROBE_STACK_RANGE))]
> 
> I think this should really use PTRmode, so that it's ILP32 ready (I'm
> not going to ask you to make sure that works though, since I suspect
> there are still other issues to resolve with ILP32 at this time).

Done.  Manually tested for now, I'll fully test it if approved.


PR middle-end/65958
* config/aarch64/aarch64-protos.h (aarch64_output_probe_stack-range):
Declare.
* config/aarch64/aarch64.md: Declare UNSPECV_BLOCKAGE and
UNSPEC_PROBE_STACK_RANGE.
(blockage): New instruction.
(probe_stack_range_): Likewise.
* config/aarch64/aarch64.c (aarch64_emit_probe_stack_range): New
function.
(aarch64_output_probe_stack_range): Likewise.
(aarch64_expand_prologue): Invoke aarch64_emit_probe_stack_range if
static builtin stack checking is enabled.
* config/aarch64/aarch64-linux.h (STACK_CHECK_STATIC_BUILTIN):
Define.

-- 
Eric BotcazouIndex: config/aarch64/aarch64-linux.h
===
--- config/aarch64/aarch64-linux.h	(revision 231206)
+++ config/aarch64/aarch64-linux.h	(working copy)
@@ -88,4 +88,7 @@
 #undef TARGET_BINDS_LOCAL_P
 #define TARGET_BINDS_LOCAL_P default_binds_local_p_2
 
+/* Define this to be nonzero if static stack checking is supported.  */
+#define STACK_CHECK_STATIC_BUILTIN 1
+
 #endif  /* GCC_AARCH64_LINUX_H */
Index: config/aarch64/aarch64-protos.h
===
--- config/aarch64/aarch64-protos.h	(revision 231206)
+++ config/aarch64/aarch64-protos.h	(working copy)
@@ -340,6 +340,7 @@ void aarch64_asm_output_labelref (FILE *
 void aarch64_cpu_cpp_builtins (cpp_reader *);
 void aarch64_elf_asm_named_section (const char *, unsigned, tree);
 const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *);
+const char * aarch64_output_probe_stack_range (rtx, rtx);
 void aarch64_err_no_fpadvsimd (machine_mode, const char *);
 void aarch64_expand_epilogue (bool);
 void aarch64_expand_mov_immediate (rtx, rtx);
Index: config/aarch64/aarch64.c
===
--- config/aarch64/aarch64.c	(revision 231206)
+++ config/aarch64/aarch64.c	(working copy)
@@ -62,6 +62,7 @@
 #include "sched-int.h"
 #include "cortex-a57-fma-steering.h"
 #include "target-globals.h"
+#include "common/common-target.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2183,6 +2184,179 @@ aarch64_libgcc_cmp_return_mode (void)
   return SImode;
 }
 
+#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
+
+/* We use the 12-bit shifted immediate arithmetic instructions so values
+   must be multiple of (1 << 12), i.e. 4096.  */
+#define ARITH_BASE 4096
+
+#if (PROBE_INTERVAL % ARITH_BASE) != 0
+#error Cannot use simple address calculation for stack probing
+#endif
+
+/* The pair of scratch registers used for stack probing.  */
+#define PROBE_STACK_FIRST_REG  9
+#define PROBE_STACK_SECOND_REG 10
+
+/* Emit code to probe a range of stack addresses from FIRST to FIRST+SIZE,
+   inclusive.  These are offsets from the current stack pointer.  */
+
+static void
+aarch64_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size)
+{
+  rtx reg1 = gen_rtx_REG (ptr_mode, PROBE_STACK_FIRST_REG);
+
+  /* See the same assertion on PROBE_INTERVAL above.  */
+  gcc_assert ((first % ARITH_BASE) == 0);
+
+  /* See if we have a constant small number of probes to generate.  If so,
+ that's the easy case.  */
+  if (size <= PROBE_INTERVAL)
+{
+  const HOST_WIDE_INT base = ROUND_UP (size, ARITH_BASE);
+
+  emit_set_insn (reg1,
+		 plus_constant (ptr_mode,
+stack_pointer_rtx, -(first + base)));
+  emit_stack_probe (plus_constant (ptr_mode, reg1, base - size));
+}
+
+  /* The run-time loop is made up of 8 insns in the generic case while the
+ compile-time loop is made up of 4+2*(n-2) insns for n # of intervals.  */
+  else if (size <= 4 * PROBE_INTERVAL)
+{
+  HOST_WIDE_INT i, rem;
+
+  emit_set_insn (reg1,
+		

Re: [PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-03 Thread Alan Lawrence

On 02/12/15 14:13, Jeff Law wrote:

On 12/02/2015 01:33 AM, Richard Biener wrote:

Right.  So the question I have is how/why did DOM leave anything in the map.
And if DOM is fixed to not leave stuff lying around, can we then assert that
nothing is ever left in those maps between passes?  There's certainly no good
reason I'm aware of why DOM would leave things in this state.


It happens not only with DOM but with all passes doing edge redirection.
This is because the map is populated by GIMPLE cfg hooks just in case
it might be used.  But there is no such thing as a "start CFG manip"
and "end CFG manip" to cleanup such dead state.

Sigh.



IMHO the redirect-edge-var-map stuff is just the very most possible
unclean implementation possible. :(  (see how remove_edge "clears"
stale info from the map to avoid even more "interesting" stale
data)

Ideally we could assert the map is empty whenever we leave a pass,
but as said it triggers all over the place.  Even cfg-cleanup causes
such stale data.

I agree that the patch is only a half-way "solution", but a full
solution would require sth more explicit, like we do with
initialize_original_copy_tables/free_original_copy_tables.  Thus
require passes to explicitely request the edge data to be preserved
with a initialize_edge_var_map/free_edge_var_map call pair.

Not appropriate at this stage IMHO (well, unless it turns out to be
a very localized patch).

So maybe as a follow-up to aid folks in the future, how about a debugging
verify_whatever function that we can call manually if debugging a problem in
this space.  With a comment indicating why we can't call it unconditionally 
(yet).


jeff


I did a (fwiw disable bootstrap) build with the map-emptying code in passes.c 
(not functions.c), printing out passes after which the map was non-empty (before 
emptying it, to make sure passes weren't just carrying through stale data from 
earlier). My (non-exhaustive!) list of passes after which the 
edge_var_redirect_map can be non-empty stands at...


aprefetch ccp cddce ch ch_vect copyprop crited crited cselim cunroll cunrolli 
dce dom ehcleanup einline esra fab fnsplit forwprop fre graphite ifcvt 
isolate-paths ldist lim local-pure-const mergephi oaccdevlow ompexpssa optimized 
parloops pcom phicprop phiopt phiprop pre profile profile_estimate sccp sink 
slsr split-paths sra switchconv tailc tailr tracer unswitch veclower2 vect vrm 
vrp whole-program


FWIW, the route by which dom added the edge to the redirect map was:
#0  redirect_edge_var_map_add (e=e@entry=0x7fb7a5f508, result=0x7fb725a000,
def=0x7fb78eaea0, locus=2147483884) at ../../gcc/gcc/tree-ssa.c:54
#1  0x00cccf58 in ssa_redirect_edge (e=e@entry=0x7fb7a5f508,
dest=dest@entry=0x7fb79cc680) at ../../gcc/gcc/tree-ssa.c:158
#2  0x00b00738 in gimple_redirect_edge_and_branch (e=0x7fb7a5f508,
dest=0x7fb79cc680) at ../../gcc/gcc/tree-cfg.c:5662
#3  0x006ec678 in redirect_edge_and_branch (e=e@entry=0x7fb7a5f508,
dest=) at ../../gcc/gcc/cfghooks.c:356
#4  0x00cb4530 in ssa_fix_duplicate_block_edges (rd=0x1a29f10,
local_info=local_info@entry=0x7fed40)
at ../../gcc/gcc/tree-ssa-threadupdate.c:1184
#5  0x00cb5520 in ssa_fixup_template_block (slot=,
local_info=0x7fed40) at ../../gcc/gcc/tree-ssa-threadupdate.c:1369
#6  traverse_noresize (
argument=0x7fed40, this=0x1a21a00) at ../../gcc/gcc/hash-table.h:911
#7  traverse (
argument=0x7fed40, this=0x1a21a00) at ../../gcc/gcc/hash-table.h:933
#8  thread_block_1 (bb=bb@entry=0x7fb7485bc8,
noloop_only=noloop_only@entry=true, joiners=joiners@entry=true)
at ../../gcc/gcc/tree-ssa-threadupdate.c:1592
#9  0x00cb5a40 in thread_block (bb=0x7fb7485bc8,
noloop_only=noloop_only@entry=true)
at ../../gcc/gcc/tree-ssa-threadupdate.c:1629
---Type  to continue, or q  to quit---
#10 0x00cb6bf8 in thread_through_all_blocks (
may_peel_loop_headers=true) at ../../gcc/gcc/tree-ssa-threadupdate.c:2736
#11 0x00becf6c in (anonymous namespace)::pass_dominator::execute (
this=, fun=0x7fb77d1b28)
at ../../gcc/gcc/tree-ssa-dom.c:622
#12 0x009feef4 in execute_one_pass (pass=pass@entry=0x16d1a80)
at ../../gcc/gcc/passes.c:2311

The edge is then deleted much later:
#3  0x00f858e4 in free_edge (fn=, e=)
at ../../gcc/gcc/cfg.c:91
#4  remove_edge_raw (e=) at ../../gcc/gcc/cfg.c:350
#5  0x006ec814 in remove_edge (e=)
at ../../gcc/gcc/cfghooks.c:418
#6  0x006ecaec in delete_basic_block (bb=bb@entry=0x7fb74b3618)
at ../../gcc/gcc/cfghooks.c:597
#7  0x00f8d1d4 in try_optimize_cfg (mode=32)
at ../../gcc/gcc/cfgcleanup.c:2701
#8  cleanup_cfg (mode=mode@entry=32) at ../../gcc/gcc/cfgcleanup.c:3028
#9  0x0070180c in cfg_layout_initialize (flags=flags@entry=0)
at ../../gcc/gcc/cfgrtl.c:4264
#10 0x00f7cdc8 in (anonymous 
namespace)::pass_duplicate_computed_gotos::execute (this=, 
fun=0x7fb77d1b28)


Re: [PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-03 Thread Richard Biener
On Thu, 3 Dec 2015, Alan Lawrence wrote:

> On 02/12/15 14:13, Jeff Law wrote:
> > On 12/02/2015 01:33 AM, Richard Biener wrote:
> > > > Right.  So the question I have is how/why did DOM leave anything in the
> > > > map.
> > > > And if DOM is fixed to not leave stuff lying around, can we then assert
> > > > that
> > > > nothing is ever left in those maps between passes?  There's certainly no
> > > > good
> > > > reason I'm aware of why DOM would leave things in this state.
> > > 
> > > It happens not only with DOM but with all passes doing edge redirection.
> > > This is because the map is populated by GIMPLE cfg hooks just in case
> > > it might be used.  But there is no such thing as a "start CFG manip"
> > > and "end CFG manip" to cleanup such dead state.
> > Sigh.
> > 
> > > 
> > > IMHO the redirect-edge-var-map stuff is just the very most possible
> > > unclean implementation possible. :(  (see how remove_edge "clears"
> > > stale info from the map to avoid even more "interesting" stale
> > > data)
> > > 
> > > Ideally we could assert the map is empty whenever we leave a pass,
> > > but as said it triggers all over the place.  Even cfg-cleanup causes
> > > such stale data.
> > > 
> > > I agree that the patch is only a half-way "solution", but a full
> > > solution would require sth more explicit, like we do with
> > > initialize_original_copy_tables/free_original_copy_tables.  Thus
> > > require passes to explicitely request the edge data to be preserved
> > > with a initialize_edge_var_map/free_edge_var_map call pair.
> > > 
> > > Not appropriate at this stage IMHO (well, unless it turns out to be
> > > a very localized patch).
> > So maybe as a follow-up to aid folks in the future, how about a debugging
> > verify_whatever function that we can call manually if debugging a problem in
> > this space.  With a comment indicating why we can't call it unconditionally
> > (yet).
> > 
> > 
> > jeff
> 
> I did a (fwiw disable bootstrap) build with the map-emptying code in passes.c
> (not functions.c), printing out passes after which the map was non-empty
> (before emptying it, to make sure passes weren't just carrying through stale
> data from earlier). My (non-exhaustive!) list of passes after which the
> edge_var_redirect_map can be non-empty stands at...
> 
> aprefetch ccp cddce ch ch_vect copyprop crited crited cselim cunroll cunrolli
> dce dom ehcleanup einline esra fab fnsplit forwprop fre graphite ifcvt
> isolate-paths ldist lim local-pure-const mergephi oaccdevlow ompexpssa
> optimized parloops pcom phicprop phiopt phiprop pre profile profile_estimate
> sccp sink slsr split-paths sra switchconv tailc tailr tracer unswitch
> veclower2 vect vrm vrp whole-program

Yeah, exactly my findings...  note that most of the above are likely
due to cfgcleanup even though it already does sth like

  e = redirect_edge_and_branch (e, dest);
  redirect_edge_var_map_clear (e);

so eventually placing a redirect_edge_var_map_empty () at the end
of the cleanup_tree_cfg function should prune down the above list
considerably (well, then assert the map is empty on entry to that
function of course)

> FWIW, the route by which dom added the edge to the redirect map was:
> #0  redirect_edge_var_map_add (e=e@entry=0x7fb7a5f508, result=0x7fb725a000,
> def=0x7fb78eaea0, locus=2147483884) at ../../gcc/gcc/tree-ssa.c:54
> #1  0x00cccf58 in ssa_redirect_edge (e=e@entry=0x7fb7a5f508,
> dest=dest@entry=0x7fb79cc680) at ../../gcc/gcc/tree-ssa.c:158
> #2  0x00b00738 in gimple_redirect_edge_and_branch (e=0x7fb7a5f508,
> dest=0x7fb79cc680) at ../../gcc/gcc/tree-cfg.c:5662
> #3  0x006ec678 in redirect_edge_and_branch (e=e@entry=0x7fb7a5f508,
> dest=) at ../../gcc/gcc/cfghooks.c:356
> #4  0x00cb4530 in ssa_fix_duplicate_block_edges (rd=0x1a29f10,
> local_info=local_info@entry=0x7fed40)
> at ../../gcc/gcc/tree-ssa-threadupdate.c:1184
> #5  0x00cb5520 in ssa_fixup_template_block (slot=,
> local_info=0x7fed40) at ../../gcc/gcc/tree-ssa-threadupdate.c:1369
> #6  traverse_noresize (
> argument=0x7fed40, this=0x1a21a00) at ../../gcc/gcc/hash-table.h:911
> #7  traverse (
> argument=0x7fed40, this=0x1a21a00) at ../../gcc/gcc/hash-table.h:933
> #8  thread_block_1 (bb=bb@entry=0x7fb7485bc8,
> noloop_only=noloop_only@entry=true, joiners=joiners@entry=true)
> at ../../gcc/gcc/tree-ssa-threadupdate.c:1592
> #9  0x00cb5a40 in thread_block (bb=0x7fb7485bc8,
> noloop_only=noloop_only@entry=true)
> at ../../gcc/gcc/tree-ssa-threadupdate.c:1629
> ---Type  to continue, or q  to quit---
> #10 0x00cb6bf8 in thread_through_all_blocks (
> may_peel_loop_headers=true) at ../../gcc/gcc/tree-ssa-threadupdate.c:2736
> #11 0x00becf6c in (anonymous namespace)::pass_dominator::execute (
> this=, fun=0x7fb77d1b28)
> at ../../gcc/gcc/tree-ssa-dom.c:622
> #12 0x009feef4 in execute_one_pass (pass=pass

Documentation tweaks for internal-fn-related optabs

2015-12-03 Thread Richard Sandiford
As Bernd requested, this patch adds "This pattern cannot FAIL" to the
documentation of optabs that came to be mapped to interal functions.
For consistency I did the same for optabs that were already being
used for internal functions.

Many of the optabs weren't documented in the first place, so I added
entries for the missing ones.  Also, there were some inaccuracies in
the documentation of the rounding optabs.  The bitcount optabs said
that operand 0 has mode @var{m} and that operand 1 is under target
control, whereas it should be the other way around.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* doc/md.texi (vec_load_lanes@var{m}@var{n}): Document that
the pattern cannot FAIL.
(vec_store_lanes@var{m}@var{n}): Likewise.
(maskload@var{m}@var{n}): Likewise.
(maskstore@var{m}@var{n}): Likewise.  Fix a cut-&-paste error
in the name of the pattern.
(rsqrt@var{m}2): Document that mode m must be a scalar or vector
floating-point mode and that all operands have that mode.
(fmin@var{m}3, fmax@var{m}3): Likewise.  Document that the
pattern cannot FAIL.
(sqrt@var{m}2): Document that mode m must be a scalar or vector
floating-point mode, that all operands have that mode, and that
the patterns cannot FAIL.  Remove previous documentation referring
to @code{double} and @code{float}.
(fmod@var{m}3, remainder@var{m}3, cos@var{m}2, sin@var{m}2)
(sincos@var{m}3, log@var{m}2, pow@var{m}3, atan2@var{m}3)
(copysign@var{m}3): Likewise.
(exp@var{m}2): Likewise.  Explicitly state the base.
(floor@var{m}2): As for sqrt@var{m}2, but also specify the operands.
(btrunc@var{m}2, rint@var{m}2): Likewise.
(round@var{m}2): Likewise.  Fix incorrect description of rounding
effect.
(ceil@var{m}2): As for round@var{m}2.
(nearbyint@var{m}2): As for floor@var{m}2, but also mention that
the instruction must not raise an inexact condition.
(scalb@var{m}3): Document previously-undocumented pattern
(ldexp@var{m}3, tan@var{m}2, asin@var{m}2, acos@var{m}2)
(atan@var{m}2, expm1@var{m}2, exp10@var{m}2, exp2@var{m}2)
(log1p@var{m}2, log10@var{m}2, log2@var{m}2, logb@var{m}2)
(significand@var{m}2): Likewise.
(ffs@var{m}2): Fix the description of the modes, so that operand 1 has
mode m and operand 0 is defined more freely.  Document that @var{m}
can be a scalar or vector integer mode and that the pattern is not
allowed to FAIL.
(clz@var{m}2, ctz@var{m}2, popcount@var{m}2, parity@var{m}2): Likewise.
(clrsb@var{m}2): Likewise, except that the description of the
mode was missing in this case.

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index dcb3ee0..4848e64 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4753,6 +4753,8 @@ is true.  GCC assumes that, if a target supports this 
kind of
 instruction for some mode @var{n}, it also supports unaligned
 loads for vectors of mode @var{n}.
 
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{vec_store_lanes@var{m}@var{n}} instruction pattern
 @item @samp{vec_store_lanes@var{m}@var{n}}
 Equivalent to @samp{vec_load_lanes@var{m}@var{n}}, with the memory
@@ -4768,6 +4770,8 @@ for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
 
 for a memory operand 0 and register operand 1.
 
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
@@ -4822,12 +4826,16 @@ Perform a masked load of vector from memory operand 1 
of mode @var{m}
 into register operand 0.  Mask is provided in register operand 2 of
 mode @var{n}.
 
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{maskstore@var{m}@var{n}} instruction pattern
-@item @samp{maskload@var{m}@var{n}}
+@item @samp{maskstore@var{m}@var{n}}
 Perform a masked store of vector from register operand 1 of mode @var{m}
 into memory operand 0.  Mask is provided in register operand 2 of
 mode @var{n}.
 
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{vec_perm@var{m}} instruction pattern
 @item @samp{vec_perm@var{m}}
 Output a (variable) vector permutation.  Operand 0 is the destination
@@ -4993,6 +5001,9 @@ IEEE-conformant minimum and maximum operations.  If one 
operand is a quiet
 signalling @code{NaN} (-fsignaling-nans) an invalid floating point exception is
 raised and a quiet @code{NaN} is returned.
 
+All operands have mode @var{m}, which is a scalar or vector
+floating-point mode.  These patterns are not allowed to @code{FAIL}.
+
 @cindex @code{reduc_smin_@var{m}} instruction pattern
 @cindex @code{reduc_smax_@var{m}} instruction pattern
 @item @samp{reduc_smin_@var{m}}, @samp{reduc_smax_@var{m}}
@@ -5324,16 +5335,17 @@ Store the absolute value of operand 1 into operand 0.
 
 @cind

Re: [PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-03 Thread Richard Biener
On Thu, 3 Dec 2015, Richard Biener wrote:

> On Thu, 3 Dec 2015, Alan Lawrence wrote:
> 
> > On 02/12/15 14:13, Jeff Law wrote:
> > > On 12/02/2015 01:33 AM, Richard Biener wrote:
> > > > > Right.  So the question I have is how/why did DOM leave anything in 
> > > > > the
> > > > > map.
> > > > > And if DOM is fixed to not leave stuff lying around, can we then 
> > > > > assert
> > > > > that
> > > > > nothing is ever left in those maps between passes?  There's certainly 
> > > > > no
> > > > > good
> > > > > reason I'm aware of why DOM would leave things in this state.
> > > > 
> > > > It happens not only with DOM but with all passes doing edge redirection.
> > > > This is because the map is populated by GIMPLE cfg hooks just in case
> > > > it might be used.  But there is no such thing as a "start CFG manip"
> > > > and "end CFG manip" to cleanup such dead state.
> > > Sigh.
> > > 
> > > > 
> > > > IMHO the redirect-edge-var-map stuff is just the very most possible
> > > > unclean implementation possible. :(  (see how remove_edge "clears"
> > > > stale info from the map to avoid even more "interesting" stale
> > > > data)
> > > > 
> > > > Ideally we could assert the map is empty whenever we leave a pass,
> > > > but as said it triggers all over the place.  Even cfg-cleanup causes
> > > > such stale data.
> > > > 
> > > > I agree that the patch is only a half-way "solution", but a full
> > > > solution would require sth more explicit, like we do with
> > > > initialize_original_copy_tables/free_original_copy_tables.  Thus
> > > > require passes to explicitely request the edge data to be preserved
> > > > with a initialize_edge_var_map/free_edge_var_map call pair.
> > > > 
> > > > Not appropriate at this stage IMHO (well, unless it turns out to be
> > > > a very localized patch).
> > > So maybe as a follow-up to aid folks in the future, how about a debugging
> > > verify_whatever function that we can call manually if debugging a problem 
> > > in
> > > this space.  With a comment indicating why we can't call it 
> > > unconditionally
> > > (yet).
> > > 
> > > 
> > > jeff
> > 
> > I did a (fwiw disable bootstrap) build with the map-emptying code in 
> > passes.c
> > (not functions.c), printing out passes after which the map was non-empty
> > (before emptying it, to make sure passes weren't just carrying through stale
> > data from earlier). My (non-exhaustive!) list of passes after which the
> > edge_var_redirect_map can be non-empty stands at...
> > 
> > aprefetch ccp cddce ch ch_vect copyprop crited crited cselim cunroll 
> > cunrolli
> > dce dom ehcleanup einline esra fab fnsplit forwprop fre graphite ifcvt
> > isolate-paths ldist lim local-pure-const mergephi oaccdevlow ompexpssa
> > optimized parloops pcom phicprop phiopt phiprop pre profile profile_estimate
> > sccp sink slsr split-paths sra switchconv tailc tailr tracer unswitch
> > veclower2 vect vrm vrp whole-program
> 
> Yeah, exactly my findings...  note that most of the above are likely
> due to cfgcleanup even though it already does sth like
> 
>   e = redirect_edge_and_branch (e, dest);
>   redirect_edge_var_map_clear (e);
> 
> so eventually placing a redirect_edge_var_map_empty () at the end
> of the cleanup_tree_cfg function should prune down the above list
> considerably (well, then assert the map is empty on entry to that
> function of course)

Maybe

Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c   (revision 231221)
+++ gcc/tree-cfgcleanup.c   (working copy)
@@ -456,6 +456,7 @@ remove_forwarder_block (basic_block bb)
}
   else
s = redirect_edge_and_branch (e, dest);
+  redirect_edge_var_map_clear (s);
 
   if (s == e)
{

also helps...

Richard.

> 
> > FWIW, the route by which dom added the edge to the redirect map was:
> > #0  redirect_edge_var_map_add (e=e@entry=0x7fb7a5f508, result=0x7fb725a000,
> > def=0x7fb78eaea0, locus=2147483884) at ../../gcc/gcc/tree-ssa.c:54
> > #1  0x00cccf58 in ssa_redirect_edge (e=e@entry=0x7fb7a5f508,
> > dest=dest@entry=0x7fb79cc680) at ../../gcc/gcc/tree-ssa.c:158
> > #2  0x00b00738 in gimple_redirect_edge_and_branch (e=0x7fb7a5f508,
> > dest=0x7fb79cc680) at ../../gcc/gcc/tree-cfg.c:5662
> > #3  0x006ec678 in redirect_edge_and_branch (e=e@entry=0x7fb7a5f508,
> > dest=) at ../../gcc/gcc/cfghooks.c:356
> > #4  0x00cb4530 in ssa_fix_duplicate_block_edges (rd=0x1a29f10,
> > local_info=local_info@entry=0x7fed40)
> > at ../../gcc/gcc/tree-ssa-threadupdate.c:1184
> > #5  0x00cb5520 in ssa_fixup_template_block (slot=,
> > local_info=0x7fed40) at ../../gcc/gcc/tree-ssa-threadupdate.c:1369
> > #6  traverse_noresize (
> > argument=0x7fed40, this=0x1a21a00) at ../../gcc/gcc/hash-table.h:911
> > #7  traverse (
> > argument=0x7fed40, this=0x1a21a00) at ../../gcc/gcc/hash-table.h:933
> > 

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68624: Clean up logic that checks for clobbering conflicts across basic blocks

2015-12-03 Thread Bernd Schmidt

On 12/03/2015 10:33 AM, Kyrill Tkachov wrote:

 PR rtl-optimization/68624
 * ifcvt.c (noce_try_cmove_arith): Check clobbers of temp regs in both
 blocks if they exist and simplify the logic choosing the order to emit
 them in.

2015-12-03  Kyrylo Tkachov  

 PR rtl-optimization/68624
 * gcc.c-torture/execute/pr68624.c: New test.


I think this is good. OK.


Bernd


Re: Documentation tweaks for internal-fn-related optabs

2015-12-03 Thread Bernd Schmidt

On 12/03/2015 02:06 PM, Richard Sandiford wrote:

As Bernd requested, this patch adds "This pattern cannot FAIL" to the
documentation of optabs that came to be mapped to interal functions.
For consistency I did the same for optabs that were already being
used for internal functions.

Many of the optabs weren't documented in the first place, so I added
entries for the missing ones.  Also, there were some inaccuracies in
the documentation of the rounding optabs.  The bitcount optabs said
that operand 0 has mode @var{m} and that operand 1 is under target
control, whereas it should be the other way around.


That actually goes beyond what I imagined. I was looking at the top part 
of md.texi (line 87), where there is a brief discussion of what is 
allowed to FAIL and what isn't. Also, there is "@item FAIL":


  "Failure is currently supported only for binary (addition,
   multiplication, shifting, etc.) and bit-field (@code{extv},
   @code{extzv}, and @code{insv}) operations."

That's pretty outdated. I think unary operations are probably missing by 
accident, and from what my grep showed there are also conditional moves, 
atomic operations, certain vec_ patterns that can all fail. As a minimum 
this paragraph should also mention internal functions.


Thank you for this patch, it is OK, but we probably ought to tweak at 
least the @item FAIL sections as well.



Bernd


Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-03 Thread Mikael Morin

Le 03/12/2015 10:29, Janne Blomqvist a écrit :

On Tue, Dec 1, 2015 at 7:51 PM, Bernhard Reutner-Fischer
 wrote:

As said, we could as well use a list of candidates with NULL as record marker.
Implementation cosmetics. Steve seems to not be thrilled by the
overall idea in the first place, so unless there is clear support by
somebody else i won't pursue this any further, it's not that i'm bored
or ran out of stuff i should do.. ;)


FWIW, I think the idea of this patch is quite nice, and I'd like to
see it in the compiler.


I like this feature as well.


I'm personally Ok with "C++-isms", but nowadays my contributions are
so minor that my opinion shouldn't carry that much weight on this
matter.


Same here.
David Malcolm suggested to move the candidate selection code to the 
common middle-end infrastructure, which would move half of the so-called 
"bloat" there.  Steve, would that work for you?


It seems to me that the remaining C++-isms are rather acceptable.
I do agree that the vec implementation details seem overly complex for 
something whose job is just the memory management of a growing (or 
shrinking) vector.  However, the API is consistent and self-explanatory, 
and the usage of it that is made here (just a few "safe_push") is not 
more complex than what would be done with a C-only API.


Mikael


Re: [Patch,microblaze]: Instruction prefetch optimization for microblaze.

2015-12-03 Thread Michael Eager

On 12/01/2015 12:49 AM, Ajit Kumar Agarwal wrote:

The changes are made in this patch for the instruction prefetch optimizations 
for Microblaze.

Reg tested for Microblaze target.

The changes are made for instruction prefetch optimizations for Microblaze. The 
"wic" microblaze instruction is the
instruction prefetch instruction. The instruction prefetch optimization is done 
to generate the iprefetch instruction
at the call site fall through path. This optimization is enabled with  
microblaze target flag mxl-prefetch. The purpose
of adding the flags is that selection of "wic" instruction should be enabled in 
the reconfigurable design and the
selection is not enabled by default.

ChangeLog:
2015-12-01  Ajit Agarwal  

* config/microblaze/microblaze.c
(get_branch_target): New.
(insert_wic_for_ilb_runout): New.
(insert_wic): New.
(microblaze_machine_dependent_reorg): New.
(TARGET_MACHINE_DEPENDENT_REORG): Define macro.
* config/microblaze/microblaze.md
(UNSPEC_IPREFETCH): Define.
(iprefetch): New pattern
* config/microblaze/microblaze.opt
(mxl-prefetch): New flag.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com


Thanks & Regards
Ajit



+  rtx_insn *insn, *before_4 = 0, *before_16 = 0;
+  int addr = 0, length, first_addr = -1;
+  int wic_addr0 = 128 * 4, wic_addr1 = 128 * 4;

Especially when there are initializers, I prefer to see each variable declared
on a separate line.  If the meaning of a variable is not clear (and most of
these are not), include a comment before the declaration.

+if (first_addr == -1)
+  first_addr = INSN_ADDRESSES (INSN_UID (insn));

Can be moved to initialize first_addr.

+addr = INSN_ADDRESSES (INSN_UID (insn)) - first_addr;

Is "addr" and address or offset?  If the latter, use a more descriptive name.

+if (before_4 == 0 && addr + length >= 4 * 4)
+  before_4 = insn;
...

Please add comments to describe what you are doing here.  What are before_4
and before_16?  What are all these conditions testing?

+  loop_optimizer_finalize();

Space before parens.

--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-03 Thread Alexander Monakov
On Wed, 2 Dec 2015, Nathan Sidwell wrote:
> On 12/02/15 12:09, Alexander Monakov wrote:
> 
> > I meant the PTX linked (post PTX-JIT link) image, so regardless of support,
> > it's not an issue.  E.g. check early in gomp_nvptx_main if .weak
> > __nvptx_has_simd != 0.  It would only break if there was dlopen on PTX.
> 
> Note I found a bug in .weak support.  See the comment in
> gcc.dg/special/weak-2.c
> 
> /* NVPTX's implementation of weak is broken when a strong symbol is in
>a later object file than the weak definition.   */

Thanks for the warning.  However, the issue seems limited to function symbols:
I've made a test for data symbols, and they appear to work fine -- which
suffices in this context.

Alexander


Re: [1/2] OpenACC routine support

2015-12-03 Thread Cesar Philippidis
On 12/03/2015 12:36 AM, Thomas Schwinge wrote:

>> Here's the updated patch.
> 
> ENOPATCH.

Here it is.

>> The test cases were written in a way such that
>> none of them needed to be updated with these changes.
> 
> ... which potentially means they'd match for all kinds of "random"
> diagnostics.  ;-)

They were supposed to be generic enough so that they work both in c and
c++. But, yeah, that randomness is likely.

>> I'm tempted to commit this as obvious, but I want to make sure you're ok
>> with these new messages.
> 
> I don't care very much, as long as it's understandable for a user.  I
> just tripped over this because of mismatches between C and C++ as well as
> different C++ diagnostic variants.
> 
>> The major change is to report these errors as
>> "pragma acc routine not followed by a function declaration or
>> definition". I think that's more descriptive then "not followed by a
>> single function". That said, it looks like the c front end uses the
>> latter error message.
> 
> (In the C front end, the "a" is missing: "not followed by single
> function"; that should be fixed up as well.)
> 
>> Is this OK or do you prefer the "not followed by a single function" message?
> 
> "not followed by a function declaration or definition" sounds good to me.

Ok, I'll apply this patch in a couple of hours.

Cesar
2015-12-02  Cesar Philippidis  

	gcc/cp/
	* parser.c (cp_ensure_no_oacc_routine): Update error message.
	(cp_parser_oacc_routine): Likewise.
	(cp_parser_late_parsing_oacc_routine): Likewise.  Update comment
	describing this function.
	(cp_finalize_oacc_routine): Update error message.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index b4ecac7..1c14354 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -1329,7 +1329,7 @@ cp_ensure_no_oacc_routine (cp_parser *parser)
   tree clauses = parser->oacc_routine->clauses;
   location_t loc = OMP_CLAUSE_LOCATION (TREE_PURPOSE (clauses));
 
-  error_at (loc, "%<#pragma oacc routine%> not followed by function "
+  error_at (loc, "%<#pragma acc routine%> not followed by a function "
 		"declaration or definition");
   parser->oacc_routine = NULL;
 }
@@ -35857,7 +35857,7 @@ cp_parser_oacc_routine (cp_parser *parser, cp_token *pragma_tok,
 	  cp_parser_require_pragma_eol (parser, pragma_tok);
 
 	  error_at (OMP_CLAUSE_LOCATION (parser->oacc_routine->clauses),
-		"%<#pragma oacc routine%> not followed by a single "
+		"%<#pragma acc routine%> not followed by a "
 		"function declaration or definition");
 
 	  parser->oacc_routine->error_seen = true;
@@ -35962,7 +35962,7 @@ cp_parser_oacc_routine (cp_parser *parser, cp_token *pragma_tok,
 	  if (parser->oacc_routine
 	  && !parser->oacc_routine->error_seen
 	  && !parser->oacc_routine->fndecl_seen)
-	error_at (loc, "%<#pragma acc routine%> not followed by "
+	error_at (loc, "%<#pragma acc routine%> not followed by a "
 		  "function declaration or definition");
 
 	  data.tokens.release ();
@@ -35972,7 +35972,7 @@ cp_parser_oacc_routine (cp_parser *parser, cp_token *pragma_tok,
 }
 
 /* Finalize #pragma acc routine clauses after direct declarator has
-   been parsed, and put that into "oacc routine" attribute.  */
+   been parsed, and put that into "oacc function" attribute.  */
 
 static tree
 cp_parser_late_parsing_oacc_routine (cp_parser *parser, tree attrs)
@@ -35987,7 +35987,7 @@ cp_parser_late_parsing_oacc_routine (cp_parser *parser, tree attrs)
   if ((!data->error_seen && data->fndecl_seen)
   || data->tokens.length () != 1)
 {
-  error_at (loc, "%<#pragma oacc routine%> not followed by a single "
+  error_at (loc, "%<#pragma acc routine%> not followed by a "
 		"function declaration or definition");
   data->error_seen = true;
   return attrs;
@@ -36003,7 +36003,7 @@ cp_parser_late_parsing_oacc_routine (cp_parser *parser, tree attrs)
 
   cp_token *pragma_tok = cp_lexer_consume_token (parser->lexer);
   cl = cp_parser_oacc_all_clauses (parser, OACC_ROUTINE_CLAUSE_MASK,
-  "#pragma oacc routine", pragma_tok);
+  "#pragma acc routine", pragma_tok);
   cp_parser_pop_lexer (parser);
 
   tree c_head = build_omp_clause (loc, OMP_CLAUSE_SEQ);
@@ -36044,7 +36044,8 @@ cp_finalize_oacc_routine (cp_parser *parser, tree fndecl, bool is_defn)
   if (!fndecl || TREE_CODE (fndecl) != FUNCTION_DECL)
 	{
 	  error_at (loc,
-		"%<#pragma acc routine%> not followed by single function");
+		"%<#pragma acc routine%> not followed by a function "
+		"declaration or definition");
 	  parser->oacc_routine = NULL;
 	}
 	  


[PTX] reorganize data space handling

2015-12-03 Thread Nathan Sidwell
The PTX backend superficially looks like it is using the address space extension 
mechanism to handle the various PTX data areas.  However, it is not really doing 
that -- the ADDR_SPACE #define values are not registered with the address space 
handling.  The addr_space_t enumeration is used to hold values not of that type.


GCC already has a mechanism to handle symbols that need special addressing 
instructions -- SYMBOL_REF_FLAGS & the TARGET_ENCODE_SECTION_INFO hook.


This patch uses those to mark SYMBOL_REFs with the PTX section they are placed 
in and then uses those same flags when emitting the cvta insn to get the 
address, the load/store directly accessing them, and the object emission code.


We still have a single unspec 'UNSPEC_TO_GENERIC' to move a SYMBOL_REF into a 
register.  You'll probably notice this is really just a fancy mov insn.  I'm 
sure with a little tinkering with the move insn predicates and constraints, that 
unspec can go away too, but I didn't want to tackle that in this patch.


nathan
2015-12-03  Nathan Sidwell  

	gcc/
	* config/nvptx/nvptx-protos.h (npvptx_section_from_addr_space):	Delete.
	* config/nvptx/nvptx.c (enum nvptx_data_area): New.
	(SYMBOL_DATA_AREA, SET_SYMBOL_DATA_AREA): New defines.
	(nvptx_option_override): Set data ares for worker vars.
	(nvptx_addr_space_from_sym): Delete.
	(nvptx_encode_section_info): New.
	(section_for_sym, section_for_decl): New.
	(nvptx_maybe_convert_symbolic_operand): Get data area from symbol
	flags,
	(nvptx_section_from_addr_space): Delete.
	(nvptx_section_for_decl): Delete.
	(nvptx_output_aligned, nvptx_declare_object_name,
	nvptx_assemble_undefined_decl): Use section_for_decl, remove
	unnecessary checks.
	(nvptx_print_operand): Add 'D', adjust 'A'.
	(nvptx_expand_worker_addr): Adjust unspec generation.
	(TARGET_ENCODE_SECTION_INFO): Override.
	* config/nvptx/nvptx.h (ADDR_SPACE_GLOBAL, ADDR_SPACE_SHARED,
	ADDR_SPACE_CONST, ADDR_SPACE_LOCAL, ADDR_SPACE_PARAM): Delete.
	* config/nvptx/nvptx.md (UNSPEC_FROM_GLOBAL, UNSPEC_FROM_LOCAL,
	UNSPEC_FROM_PARAM, UNSPEC_FROM_SHARED, UNSPEC_FROM_CONST,
	UNSPEC_TO_GLOBAL, UNSPEC_TO_LOCAL, UNSPEC_TO_PARAM,
	UNSPEC_TO_SHARED, UNSPEC_TO_CONST): Delete.
	(UNSPEC_TO_GENERIC): New.
	(nvptx_register_or_symbolic_operand): Delete.
	(cvt_code, cvt_name, cvt_str): Delete.
	(convaddr_ [P]): Delete.
	(convaddr_ [P]): New.

	gcc/testsuite/
	* gcc.target/nvptx/decl.c: New.
	* gcc.target/nvptx/uninit-decl.c: Robustify regexps.

Index: gcc/config/nvptx/nvptx-protos.h
===
--- gcc/config/nvptx/nvptx-protos.h	(revision 231226)
+++ gcc/config/nvptx/nvptx-protos.h	(working copy)
@@ -41,7 +41,6 @@ extern const char *nvptx_ptx_type_from_m
 extern const char *nvptx_output_mov_insn (rtx, rtx);
 extern const char *nvptx_output_call_insn (rtx_insn *, rtx, rtx);
 extern const char *nvptx_output_return (void);
-extern const char *nvptx_section_from_addr_space (addr_space_t);
 extern bool nvptx_hard_regno_mode_ok (int, machine_mode);
 extern rtx nvptx_maybe_convert_symbolic_operand (rtx);
 #endif
Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 231226)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -80,6 +80,25 @@ enum nvptx_shuffle_kind
   SHUFFLE_MAX
 };
 
+/* The various PTX memory areas an object might reside in.  */
+enum nvptx_data_area
+{
+  DATA_AREA_GENERIC,
+  DATA_AREA_GLOBAL,
+  DATA_AREA_SHARED,
+  DATA_AREA_LOCAL,
+  DATA_AREA_CONST,
+  DATA_AREA_PARAM,
+  DATA_AREA_MAX
+};
+
+/*  We record the data area in the target symbol flags.  */
+#define SYMBOL_DATA_AREA(SYM) \
+  (nvptx_data_area)((SYMBOL_REF_FLAGS (SYM) >> SYMBOL_FLAG_MACH_DEP_SHIFT) \
+		& 7)
+#define SET_SYMBOL_DATA_AREA(SYM,AREA) \
+  (SYMBOL_REF_FLAGS (SYM) |= (AREA) << SYMBOL_FLAG_MACH_DEP_SHIFT)
+
 /* Record the function decls we've written, and the libfuncs and function
decls corresponding to them.  */
 static std::stringstream func_decls;
@@ -154,9 +173,11 @@ nvptx_option_override (void)
 = hash_table::create_ggc (17);
 
   worker_bcast_sym = gen_rtx_SYMBOL_REF (Pmode, worker_bcast_name);
+  SET_SYMBOL_DATA_AREA (worker_bcast_sym, DATA_AREA_SHARED);
   worker_bcast_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 
   worker_red_sym = gen_rtx_SYMBOL_REF (Pmode, worker_red_name);
+  SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 }
 
@@ -194,22 +215,49 @@ nvptx_ptx_type_from_mode (machine_mode m
 }
 }
 
-/* Determine the address space to use for SYMBOL_REF SYM.  */
+/* Encode the PTX data area that DECL (which might not actually be a
+   _DECL) should reside in.  */
 
-static addr_space_t
-nvptx_addr_space_from_sym (rtx sym)
+static void
+nvptx_encode_section_info (tree decl, rtx rtl, int first)
 {
-  tree decl = SYMBOL_REF_DECL (sym);
-  if (decl == NULL_TREE || TREE_CODE (decl) == FUNCTION_DECL)
-

Re: [PATCH,RFC] Introduce RUN_UNDER_VALGRIND in test-suite

2015-12-03 Thread Bernd Schmidt

On 11/23/2015 10:34 AM, Martin Liška wrote:

On 11/21/2015 05:26 AM, Hans-Peter Nilsson wrote:

IIRC you can replace the actual dg-runtest proc with your own
(implementing a wrapper).  Grep aroung, I think we do that
already.  That's certainly preferable instead of touching all
callers.


You are right, the suggested patch was over-kill, wrapper should be fine for 
that.
Currently I've been playing with a bit different approach (suggested by Markus),
where I would like to enable valgrind in gcc.c using an environmental variable.

Question is if it should replace existing ENABLE_VALGRIND_CHECKING and how to
integrate it with a valgrind suppressions file?


This patch still seems to be in the queue. I've been looking at it every 
now and then, without really forming an opinion. In any case, I think 
we'll need to postpone this to stage1 at this point.


Wouldn't it be better to fix issues first and only then enable running 
the testsuite with valgrind, rather than make a suppression file?


Your latest patch seems to add the option of running the compiler 
without ENABLE_CHECKING_VALGRIND being defined. Doesn't this run into 
problems when the support in ggc isn't compiled in?



Bernd


Re: [PATCH] S/390: Fix warning in "*movstr" pattern.

2015-12-03 Thread Dominik Vogt
Version 5 with two fixes to the test case.  :-/  (ChangeLog is the
same.)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
>From 5965f62501b271285bacb90b11ef3f748338d1cf Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Tue, 3 Nov 2015 18:03:02 +0100
Subject: [PATCH] S/390: Fix warning in "*movstr" pattern.

---
 gcc/config/s390/s390.md | 20 ---
 gcc/testsuite/gcc.target/s390/md/movstr-1.c | 30 +
 gcc/testsuite/gcc.target/s390/s390.exp  | 25 +++-
 3 files changed, 67 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/movstr-1.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index e5db537..7eca315 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -2910,13 +2910,27 @@
 ;
 
 (define_expand "movstr"
+  ;; The pattern is never generated.
+  [(match_operand 0 "" "")
+   (match_operand 1 "" "")
+   (match_operand 2 "" "")]
+  ""
+{
+  if (TARGET_64BIT)
+emit_insn (gen_movstrdi (operands[0], operands[1], operands[2]));
+  else
+emit_insn (gen_movstrsi (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "movstr"
   [(set (reg:SI 0) (const_int 0))
(parallel
 [(clobber (match_dup 3))
  (set (match_operand:BLK 1 "memory_operand" "")
 	  (match_operand:BLK 2 "memory_operand" ""))
- (set (match_operand 0 "register_operand" "")
-	  (unspec [(match_dup 1)
+ (set (match_operand:P 0 "register_operand" "")
+	  (unspec:P [(match_dup 1)
 		   (match_dup 2)
 		   (reg:SI 0)] UNSPEC_MVST))
  (clobber (reg:CC CC_REGNUM))])]
@@ -2937,7 +2951,7 @@
(set (mem:BLK (match_operand:P 1 "register_operand" "0"))
 	(mem:BLK (match_operand:P 3 "register_operand" "2")))
(set (match_operand:P 0 "register_operand" "=d")
-	(unspec [(mem:BLK (match_dup 1))
+	(unspec:P [(mem:BLK (match_dup 1))
 		 (mem:BLK (match_dup 3))
 		 (reg:SI 0)] UNSPEC_MVST))
(clobber (reg:CC CC_REGNUM))]
diff --git a/gcc/testsuite/gcc.target/s390/md/movstr-1.c b/gcc/testsuite/gcc.target/s390/md/movstr-1.c
new file mode 100644
index 000..6ab0050
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/movstr-1.c
@@ -0,0 +1,30 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do run } */
+/* { dg-options "-dP -save-temps" } */
+
+__attribute__ ((noinline))
+void test(char *dest, const char *src)
+{
+  __builtin_stpcpy (dest, src);
+}
+
+/* { dg-final { scan-assembler-times {{[*]movstr}} 1 } } */
+
+#include 
+#include 
+
+#define LEN 200
+char buf[LEN];
+
+int main(void)
+{
+  memset(buf, 0, LEN);
+  test(buf, "hello world!");
+  if (strcmp(buf, "hello world!") != 0)
+{
+  fprintf(stderr, "error: test() failed\n");
+  return 1;
+}
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/s390/s390.exp b/gcc/testsuite/gcc.target/s390/s390.exp
index 0b8f80ed..0d7a7eb 100644
--- a/gcc/testsuite/gcc.target/s390/s390.exp
+++ b/gcc/testsuite/gcc.target/s390/s390.exp
@@ -61,20 +61,35 @@ if ![info exists DEFAULT_CFLAGS] then {
 # Initialize `dg'.
 dg-init
 
-set hotpatch_tests $srcdir/$subdir/hotpatch-\[0-9\]*.c
+set md_tests $srcdir/$subdir/md/*.c
 
 # Main loop.
 dg-runtest [lsort [prune [glob -nocomplain $srcdir/$subdir/*.\[cS\]] \
-			 $hotpatch_tests]] "" $DEFAULT_CFLAGS
+			 $md_tests]] "" $DEFAULT_CFLAGS
 
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*vector*/*.\[cS\]]] \
 	"" $DEFAULT_CFLAGS
 
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/md/*.\[cS\]]] \
+	"" $DEFAULT_CFLAGS
+
 # Additional hotpatch torture tests.
 torture-init
-set HOTPATCH_TEST_OPTS [list -Os -O0 -O1 -O2 -O3]
-set-torture-options $HOTPATCH_TEST_OPTS
-gcc-dg-runtest [lsort [glob -nocomplain $hotpatch_tests]] "" $DEFAULT_CFLAGS
+set-torture-options [list -Os -O0 -O1 -O2 -O3]
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/hotpatch-\[0-9\]*.c]] \
+	"" $DEFAULT_CFLAGS
+torture-finish
+
+# Additional md torture tests.
+torture-init
+set MD_TEST_OPTS [list \
+	{-Os -march=z900} {-Os -march=z13} \
+	{-O0 -march=z900} {-O0 -march=z13} \
+	{-O1 -march=z900} {-O1 -march=z13} \
+	{-O2 -march=z900} {-O2 -march=z13} \
+	{-O3 -march=z900} {-O3 -march=z13}]
+set-torture-options $MD_TEST_OPTS
+gcc-dg-runtest [lsort [glob -nocomplain $md_tests]] "" $DEFAULT_CFLAGS
 torture-finish
 
 # All done.
-- 
2.3.0



Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"

2015-12-03 Thread Christophe Lyon
On 27 November 2015 at 12:26, Ramana Radhakrishnan
 wrote:
>
>
> On 27/11/15 09:40, Renlin Li wrote:
>> Hi Ramana,
>>
>> On 16/10/15 14:54, Renlin Li wrote:
>>>
>>>
 The command line implies we remove r7 (frame pointer in Thumb2 - 
 historical accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 
 (-mpic-register) which
 leaves us with:

 * r0, r1
 * r2, r3
 * r4, r5

 as the only free registers available for DImode values for the whole 
 compilation.

 We then have r0, r1 and r2 live across the insn which means that there are 
 no free registers to handle DImode values
 under the constraints provided unless LRA / reload can spill the argument 
 registers which it doesn't seem to be able to do
 in this particular testcase. Vlad, is that correct ?
>>> According to the logic, conflict hard register are excluded from spill 
>>> candidate. That's why, in this case, r0, r1, r2 cannot be used.
>>
>>
>> In the test case, there are code structure like this.
>>
>>
>> uint64_t callee (int a, int b, int c, int d);
>> uint64_t caller (int a, int b, int c, int d)
>> {
>>   uint64_t res;
>> /*
>> single BB contains complicated data processing which requires register pair
>> */
>>
>>   res = callee (tmp, b ,c, d);
>>   return res;
>> }
>>
>> CES pass in this case will extend the hard register live range across the 
>> whole BB until the callee. In this case, r1, r2, r3 are excluded from 
>> allocatable registers.
>>
>> There are places in CES which prevents extending the hard register's live 
>> range, for example for hard register which fullfil 
>> small_register_classes_for_mode_p(), class_likely_spilled_p(). However, 
>> argument registers belong to neither of them.
>>
>> I tried to stop CES from extending argument registers live range. However, 
>> later, scheduler jumps in and re-orders the instruction to reduce the pseudo 
>> register pressure, which in effect extend the argument register live again.
>
> Thanks for digging further and trying to figure out what the solution was. I 
> can't think of a less risky fix than what you have proposed, thus Ok if no 
> regressions.
>
>

Hi,

I have noticed regressions after this commit to the 4.9 branch:
Passed now fails  [PASS => FAIL]:
  gcc.c-torture/compile/pr34856.c  -O3 -fomit-frame-pointer
-funroll-all-loops -finline-functions  (test for excess errors)
  gcc.c-torture/compile/pr34856.c  -O3 -fomit-frame-pointer
-funroll-loops  (test for excess errors)
Pass disappears   [PASS => ]:
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2 -flto
-fno-use-linker-plugin -flto-partition=none
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects
Fail appears  [ => FAIL]:
  gcc.c-torture/compile/pr34856.c  -O3 -fomit-frame-pointer
-funroll-all-loops -finline-functions  (internal compiler error)
  gcc.c-torture/compile/pr34856.c  -O3 -fomit-frame-pointer
-funroll-loops  (internal compiler error)
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2  (internal
compiler error)
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2 -flto
-fno-use-linker-plugin -flto-partition=none  (internal compiler error)
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error)

See the red links in
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-4_9-branch/231177/report-build-info.html

Christophe.

> regards
> Ramana
>
>
>
>
>
>>
>> Regards,
>>
>> Renlin Li
>>
>>
>>


[PATCH 04/10] Fix g++.dg/template/crash55.C

2015-12-03 Thread David Malcolm
The patch kit changes the output of this case:

  1  //PR c++/27668
  2
  3  template // { dg-error 
"nested-name-specifier|two or more|valid type" }
  4  struct A {};
  5
  6  template void foo(A);  // { dg-error "cast|argument" "" { target 
c++98_only } }

but only for c++98, from:
  g++.dg/template/crash55.C:3:19: error: expected nested-name-specifier before 
'class'
  g++.dg/template/crash55.C:3:25: error: two or more data types in declaration 
of 'parameter'
  g++.dg/template/crash55.C:3:34: error: 'class T' is not a valid type for a 
template non-type parameter
  g++.dg/template/crash55.C:6:29: error: a cast to a type other than an 
integral or enumeration type cannot appear in a constant-expression
  g++.dg/template/crash55.C:6:29: error: template argument 2 is invalid
to:
  g++.dg/template/crash55.C:3:19: error: expected nested-name-specifier before 
'class'
  g++.dg/template/crash55.C:3:25: error: two or more data types in declaration 
of 'parameter'
  g++.dg/template/crash55.C:3:34: error: 'class T' is not a valid type for a 
template non-type parameter
  g++.dg/template/crash55.C:3:32: error: a cast to a type other than an 
integral or enumeration type cannot appear in a constant-expression
  g++.dg/template/crash55.C:6:29: error: template argument 2 is invalid

i.e. the 4th error moves from line 6 to line 3
("a cast to a type other than an integral or enumeration type cannot appear in 
a constant-expression")

This change is reasonable, so the patch updates the dg-error
directives accordingly.

gcc/testsuite/ChangeLog:
* g++.dg/template/crash55.C: Update dg-error directives.
---
 gcc/testsuite/g++.dg/template/crash55.C | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/template/crash55.C 
b/gcc/testsuite/g++.dg/template/crash55.C
index 9b80fd1..b9b29f7 100644
--- a/gcc/testsuite/g++.dg/template/crash55.C
+++ b/gcc/testsuite/g++.dg/template/crash55.C
@@ -1,6 +1,7 @@
 //PR c++/27668
 
 template // { dg-error "nested-name-specifier|two 
or more|valid type" }
+// { dg-error "cast" "" { target c++98_only } 3 }
 struct A {};
 
-template void foo(A);// { dg-error "cast|argument" "" { 
target c++98_only } }
+template void foo(A);// { dg-error "template argument 2" "" 
{ target c++98_only } }
-- 
1.8.5.3



[PATCH 10/10] Fix g++.dg/warn/Wconversion-real-integer2.C

2015-12-03 Thread David Malcolm
This testcase's output is changed by the patchkit from printing at the "=":

BEFORE:
g++.dg/warn/Wconversion-real-integer2.C: In function 'void h()':
g++.dg/warn/Wconversion-real-integer2.C:32:12: warning: conversion to 'float' 
alters 'int' constant value [-Wfloat-conversion]
 vfloat = INT_MAX; // { dg-warning "conversion to .float. alters .int. 
constant value" }
^
to showing the token of interest and its macro expansion:

AFTER:
g++.dg/warn/Wconversion-real-integer2.C: In function ‘void h()’:
g++.dg/warn/Wconversion-real-integer2.C:26:17: warning: conversion to ‘float’ 
alters ‘int’ constant value [-Wfloat-conversion]
 #define INT_MAX __INT_MAX__
 ^

g++.dg/warn/Wconversion-real-integer2.C:32:14: note: in expansion of macro 
‘INT_MAX’
 vfloat = INT_MAX; // { dg-warning "conversion to .float. alters .int. 
constant value" }
  ^~~

This is an improvement, so this patch updates the test case accordingly.

gcc/testsuite/ChangeLog:
* g++.dg/warn/Wconversion-real-integer2.C: Update location of
dg-warning; add a dg-message.
---
 gcc/testsuite/g++.dg/warn/Wconversion-real-integer2.C | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/warn/Wconversion-real-integer2.C 
b/gcc/testsuite/g++.dg/warn/Wconversion-real-integer2.C
index 0494588..7e39d5f 100644
--- a/gcc/testsuite/g++.dg/warn/Wconversion-real-integer2.C
+++ b/gcc/testsuite/g++.dg/warn/Wconversion-real-integer2.C
@@ -23,11 +23,11 @@
 //
 // That is more useful.
 
-#define INT_MAX __INT_MAX__ 
+#define INT_MAX __INT_MAX__ // { dg-warning "17: conversion to .float. alters 
.int. constant value" }
 
 float  vfloat;
 
 void h (void)
 {
-vfloat = INT_MAX; // { dg-warning "conversion to .float. alters .int. 
constant value" }
+vfloat = INT_MAX; // { dg-message "14: in expansion of macro .INT_MAX." }
 }
-- 
1.8.5.3



[PATCH 08/10] Fix g++.dg/ubsan/pr63956.C

2015-12-03 Thread David Malcolm
With the location patch, various errors in g++.dg/ubsan/pr63956.C
change:

 8  constexpr int
 9  fn1 (int a, int b)
10  {
11if (b != 2)
12  a <<= b;
13return a;
14  }
15
16  constexpr int i1 = fn1 (5, 3);
17  constexpr int i2 = fn1 (5, -2); // { dg-error "is negative" }

Here's the first error as printed by the status quo:
g++.dg/ubsan/pr63956.C:17:24:   in constexpr expansion of ‘fn1(5, -2)’
g++.dg/ubsan/pr63956.C:17:30: error: right operand of shift expression ‘(5 << 
-2)’ is negative
 constexpr int i2 = fn1 (5, -2); // { dg-error "is negative" }
  ^

...and with the location patch:
g++.dg/ubsan/pr63956.C:17:24:   in constexpr expansion of ‘fn1(5, -2)’
g++.dg/ubsan/pr63956.C:12:11: error: right operand of shift expression ‘(5 << 
-2)’ is negative
 a <<= b;
   ^
I believe this is an improvement: we're now identifying both relevant
places, rather than just one, and clearly highlighting the exact
subexpression of interest.

Hence this patch updates the testcase to reflect the improved
location information.

gcc/testsuite/ChangeLog:
* g++.dg/ubsan/pr63956.C: Update dg directives to reflect
improved location information.
---
 gcc/testsuite/g++.dg/ubsan/pr63956.C | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/g++.dg/ubsan/pr63956.C 
b/gcc/testsuite/g++.dg/ubsan/pr63956.C
index 185a719..b265631 100644
--- a/gcc/testsuite/g++.dg/ubsan/pr63956.C
+++ b/gcc/testsuite/g++.dg/ubsan/pr63956.C
@@ -10,15 +10,18 @@ fn1 (int a, int b)
 {
   if (b != 2)
 a <<= b;
+// { dg-error "5 << -2.. is negative" "" { target *-*-* } 12 }
+// { dg-error "is >= than the precision of the left operand" "" { target 
*-*-* } 12 }
+// { dg-error "-2 << 4.. is negative" "" { target *-*-* } 12 }
   return a;
 }
 
 constexpr int i1 = fn1 (5, 3);
-constexpr int i2 = fn1 (5, -2); // { dg-error "is negative" }
-constexpr int i3 = fn1 (5, sizeof (int) * __CHAR_BIT__); // { dg-error "is >= 
than the precision of the left operand" }
-constexpr int i4 = fn1 (5, 256); // { dg-error "is >= than the precision of 
the left operand" }
+constexpr int i2 = fn1 (5, -2); // { dg-message "in constexpr expansion" }
+constexpr int i3 = fn1 (5, sizeof (int) * __CHAR_BIT__); // { dg-message "in 
constexpr expansion" }
+constexpr int i4 = fn1 (5, 256); // { dg-message "in constexpr expansion" }
 constexpr int i5 = fn1 (5, 2);
-constexpr int i6 = fn1 (-2, 4); // { dg-error "is negative" }
+constexpr int i6 = fn1 (-2, 4); // { dg-message "in constexpr expansion" }
 constexpr int i7 = fn1 (0, 2);
 
 SA (i1 == 40);
@@ -30,13 +33,16 @@ fn2 (int a, int b)
 {
   if (b != 2)
 a >>= b;
+// { dg-error "4 >> -1.. is negative" "" { target *-*-* } 35 }
+// { dg-error "is >= than the precision of the left operand" "" { target 
*-*-* } 35 }
+
   return a;
 }
 
 constexpr int j1 = fn2 (4, 1);
-constexpr int j2 = fn2 (4, -1); // { dg-error "is negative" }
-constexpr int j3 = fn2 (10, sizeof (int) * __CHAR_BIT__); // { dg-error "is >= 
than the precision of the left operand" }
-constexpr int j4 = fn2 (1, 256); // { dg-error "is >= than the precision of 
the left operand" }
+constexpr int j2 = fn2 (4, -1); // { dg-message "in constexpr expansion" }
+constexpr int j3 = fn2 (10, sizeof (int) * __CHAR_BIT__); // { dg-message "in 
constexpr expansion" }
+constexpr int j4 = fn2 (1, 256); // { dg-message "in constexpr expansion" }
 constexpr int j5 = fn2 (5, 2);
 constexpr int j6 = fn2 (-2, 4);
 constexpr int j7 = fn2 (0, 4);
@@ -49,12 +55,12 @@ constexpr int
 fn3 (int a, int b)
 {
   if (b != 2)
-a = a / b;
+a = a / b; // { dg-error "..7 / 0.. is not a constant expression" }
   return a;
 }
 
 constexpr int k1 = fn3 (8, 4);
-constexpr int k2 = fn3 (7, 0); // { dg-error "is not a constant expression" }
+constexpr int k2 = fn3 (7, 0); // { dg-message "in constexpr expansion" }
 constexpr int k3 = fn3 (INT_MIN, -1); // { dg-error "overflow in constant 
expression" }
 
 SA (k1 == 2);
@@ -63,12 +69,12 @@ constexpr float
 fn4 (float a, float b)
 {
   if (b != 2.0)
-a = a / b;
+a = a / b; // { dg-error "is not a constant expression" }
   return a;
 }
 
 constexpr float l1 = fn4 (5.0, 3.0);
-constexpr float l2 = fn4 (7.0, 0.0); // { dg-error "is not a constant 
expression" }
+constexpr float l2 = fn4 (7.0, 0.0); // { dg-message "in constexpr expansion" }
 
 constexpr int
 fn5 (const int *a, int b)
-- 
1.8.5.3



Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-12-03 Thread Richard Biener
On Sat, Nov 14, 2015 at 12:35 AM, Jeff Law  wrote:
> On 11/13/2015 01:23 PM, Jeff Law wrote:
>>
>> On 11/13/2015 11:09 AM, Richard Biener wrote:
>>

 BTW Do we have an API for indicating that new blocks have been added to

 a loop?  If so, then we can likely drop the LOOPS_NEED_FIXUP.
>>>
>>>
>>> Please. It's called add_to_loop or so.
>>
>> Haha, the block duplication code was handling this already.  So in
>> theory I can just drop the LOOPS_NEED_FIXUP completely.  Testing now.
>>
>> jeff
>>
> Attached is the committed patch for path splitting.  As noted above, we
> didn't need the LOOPS_NEED_FIXUP in the final version, so that wart is gone
> :-)
>
> I do find myself wondering if this can/should be generalized beyond just
> paths heading to loop backedges.  However to do so I think we'd need to be
> able to undo this transformation reliably and we'd need some heuristics when
> to duplicate to expose the redundancy vs rely on PRE techniques and jump
> threading.  I vaguely remember a paper which touched on these topics, but I
> can't seem to find it.
>
> Anyway, bootstrapped and regression tested on x86_64-linux-gnu. Installed on
> the trunk.

This pass is now enabled by default with -Os but has no limits on the amount of
stmts it copies.  It also will make all loops with this shape have at least two
exits (if the resulting loop will be disambiguated the inner loop will
have two exits).
Having more than one exit will disable almost all loop optimizations after it.

The pass itself documents the transform it does but does zero to motivate it.

What's the benefit of this pass (apart from disrupting further optimizations)?

I can see a _single_ case where duplicating the latch will allow threading
one of the paths through the loop header to eliminate the original exit.  Then
disambiguation may create a nice nested loop out of this.  Of course that
is only profitable again if you know the remaining single exit of the inner
loop (exiting to the outer one) is executed infrequently (thus the inner loop
actually loops).

But no checks other than on the CFG shape exist (oh, it checks it will
at _least_ copy two stmts!).

Given the profitability constraints above (well, correct me if I am
wrong on these)
it looks like the whole transform should be done within the FSM threading
code which might be able to compute whether there will be an inner loop
with a single exit only.

I'm inclined to request the pass to be removed again or at least disabled by
default.

What closed source benchmark was this transform invented for?

Richard.

>
>
>
> commit c1891376e5dcc99ad8be2d22f9551c03f9bb2729
> Author: Jeff Law 
> Date:   Fri Nov 13 16:29:34 2015 -0700
>
> [Patch,tree-optimization]: Add new path Splitting pass on tree ssa
> representation
>
> * Makefile.in (OBJS): Add gimple-ssa-split-paths.o
> * common.opt (-fsplit-paths): New flag controlling path splitting.
> * doc/invoke.texi (fsplit-paths): Document.
> * opts.c (default_options_table): Add -fsplit-paths to -O2.
> * passes.def: Add split_paths pass.
> * timevar.def (TV_SPLIT_PATHS): New timevar.
> * tracer.c: Include "tracer.h"
> (ignore_bb_p): No longer static.
> (transform_duplicate): New function, broken out of tail_duplicate.
> (tail_duplicate): Use transform_duplicate.
> * tracer.h (ignore_bb_p): Declare
> (transform_duplicate): Likewise.
> * tree-pass.h (make_pass_split_paths): Declare.
> * gimple-ssa-split-paths.c: New file.
>
> * gcc.dg/tree-ssa/split-path-1.c: New test.
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index dde2695..a7abe37 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,21 @@
> +2015-11-13  Ajit Agarwal  
> +   Jeff Law  
> +
> +   * Makefile.in (OBJS): Add gimple-ssa-split-paths.o
> +   * common.opt (-fsplit-paths): New flag controlling path splitting.
> +   * doc/invoke.texi (fsplit-paths): Document.
> +   * opts.c (default_options_table): Add -fsplit-paths to -O2.
> +   * passes.def: Add split_paths pass.
> +   * timevar.def (TV_SPLIT_PATHS): New timevar.
> +   * tracer.c: Include "tracer.h"
> +   (ignore_bb_p): No longer static.
> +   (transform_duplicate): New function, broken out of tail_duplicate.
> +   (tail_duplicate): Use transform_duplicate.
> +   * tracer.h (ignore_bb_p): Declare
> +   (transform_duplicate): Likewise.
> +   * tree-pass.h (make_pass_split_paths): Declare.
> +   * gimple-ssa-split-paths.c: New file.
> +
>  2015-11-13  Kai Tietz  
> Marek Polacek  
> Jason Merrill  
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index d3fd5e9..5c294df 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1277,6 +1277,7 @@ OBJS = \
> gimple-pretty-print.o \
> gimple-ssa-backprop.o \
> gimple-ssa-isolate-paths.o \
> +   gimple-ssa-split-paths.o \
>

[PATCH 03/10] Fix g++.dg/gomp/loop-1.C

2015-12-03 Thread David Malcolm
The patch kit affects the locations of the errors reported by
g++.dg/gomp/loop-1.C.

I reviewed the new locations, and they seemed sane.

This patch updates the locations of omp_for_cond to use the location of
the cond if available, falling back to the existing behavior of using
input_location otherwise.  This improves the reported locations.

The patch also updates the testcase to reflect the various changes
to the locations.

For reference, here's the updated output from the testcase (with
caret-printing enabled):

g++.dg/gomp/loop-1.C: In function ‘void f1(int)’:
g++.dg/gomp/loop-1.C:21:3: error: initializer expression refers to iteration 
variable ‘i’
   for (i = i; i < 16; i++) /* { dg-error "initializer expression refers to 
iteration variable" } */
   ^~~

g++.dg/gomp/loop-1.C:24:14: error: initializer expression refers to iteration 
variable ‘i’
   for (i = 2 * (i & x); i < 16; i++) /* { dg-error "initializer expression 
refers to iteration variable" } */
~~^

g++.dg/gomp/loop-1.C:27:3: error: initializer expression refers to iteration 
variable ‘i’
   for (i = bar (i); i < 16; i++) /* { dg-error "initializer expression refers 
to iteration variable" } */
   ^~~

g++.dg/gomp/loop-1.C:30:3: error: initializer expression refers to iteration 
variable ‘i’
   for (i = baz (&i); i < 16; i++) /* { dg-error "initializer expression refers 
to iteration variable" } */
   ^~~

g++.dg/gomp/loop-1.C:33:17: error: condition expression refers to iteration 
variable ‘i’
   for (i = 5; i < 2 * i + 17; i++) /* { dg-error "condition expression refers 
to iteration variable" } */
   ~~^~~~

g++.dg/gomp/loop-1.C:36:26: error: condition expression refers to iteration 
variable ‘i’
   for (i = 5; 2 * i + 17 > i; i++) /* { dg-error "condition expression refers 
to iteration variable" } */
   ~~~^~~

g++.dg/gomp/loop-1.C:39:23: error: condition expression refers to iteration 
variable ‘i’
   for (i = 5; bar (i) > i; i++) /* { dg-error "condition expression refers to 
iteration variable" } */
   ^~~

g++.dg/gomp/loop-1.C:42:17: error: condition expression refers to iteration 
variable ‘i’
   for (i = 5; i <= baz (&i); i++) /* { dg-error "condition expression refers 
to iteration variable" } */
   ~~^~~

g++.dg/gomp/loop-1.C:45:17: error: condition expression refers to iteration 
variable ‘i’
   for (i = 5; i <= i; i++) /* { dg-error "invalid controlling 
predicate|condition expression refers to iteration variable" } */
   ~~^~~~

g++.dg/gomp/loop-1.C:48:3: error: increment expression refers to iteration 
variable ‘i’
   for (i = 5; i < 16; i += i) /* { dg-error "increment expression refers to 
iteration variable" } */
   ^~~

g++.dg/gomp/loop-1.C:51:33: error: increment expression refers to iteration 
variable ‘i’
   for (i = 5; i < 16; i = i + 2 * i) /* { dg-error "invalid increment 
expression|increment expression refers to iteration variable" } */
   ~~^~~

g++.dg/gomp/loop-1.C:54:3: error: increment expression refers to iteration 
variable ‘i’
   for (i = 5; i < 16; i = i + i) /* { dg-error "increment expression refers to 
iteration variable" } */
   ^~~

g++.dg/gomp/loop-1.C:57:35: error: increment expression refers to iteration 
variable ‘i’
   for (i = 5; i < 16; i = i + bar (i)) /* { dg-error "increment expression 
refers to iteration variable" } */
   ^~~

g++.dg/gomp/loop-1.C:60:31: error: increment expression refers to iteration 
variable ‘i’
   for (i = 5; i < 16; i = baz (&i) + i) /* { dg-error "increment expression 
refers to iteration variable" } */
   ^~~~

g++.dg/gomp/loop-1.C:63:32: error: increment expression refers to iteration 
variable ‘i’
   for (i = 5; i < 16; i += bar (i)) /* { dg-error "increment expression refers 
to iteration variable" } */
^~~

g++.dg/gomp/loop-1.C:66:32: error: increment expression refers to iteration 
variable ‘i’
   for (i = 5; i < 16; i += baz (&i)) /* { dg-error "increment expression 
refers to iteration variable" } */
^~~~

g++.dg/gomp/loop-1.C:73:3: error: initializer expression refers to iteration 
variable ‘j’
   for (i = j; i < 16; i = i + 2) /* { dg-error "initializer expression refers 
to iteration variable" } */
   ^~~

g++.dg/gomp/loop-1.C:77:3: error: initializer expression refers to iteration 
variable ‘i’
   for (i = 0; i < 16; i = i + 2) /* { dg-error "initializer expression refers 
to iteration variable" } */
   ^~~

g++.dg/gomp/loop-1.C:82:16: error: initializer expression refers to iteration 
variable ‘i’
 for (j = i + 3; j < 16; j += 2) /* { dg-error "initializer expression 
refers to iteration variable" } */
  ~~^~~

g++.dg/gomp/loop-1.C:85:3: error: initializer expression refers to iteration 
variable ‘i’
   for (i = 0; i < 16; i++) /* { dg-error "initializer expression refers to

[PATCH 02/10] Fix g++.dg/cpp0x/nsdmi-template14.C

2015-12-03 Thread David Malcolm
When building new-expressions, we use cp_lexer_previous_token
and access its location to get the final position in the source
range.

Within g++.dg/cpp0x/nsdmi-template14.C, the previous token
within a new expr can have been purged, leading to UNKNOWN_LOCATION.

  g++.dg/cpp0x/nsdmi-template14.C:11:10: error: recursive instantiation of 
non-static data member initializer for ‘B<1>::p’
  B* p = new B;

(note the lack of caret)

(gdb) p *end_tok
$54 = {type = CPP_GREATER, keyword = RID_MAX, flags = 0 '\000', pragma_kind = 
PRAGMA_NONE, implicit_extern_c = 0,
error_reported = 0, purged_p = 1, location = 0, u = {tree_check_value = 0x0, 
value = }}

This patch adds bulletproofing to detect purged tokens, and avoid using
them.

Alternatively, is it OK to access purged tokens for this kind of thing?
If so, would it make more sense to instead leave their locations untouched
when purging them?

The patch also updates the location of a dg-error directive in the
testcase to reflect improved location information.

gcc/cp/ChangeLog:
* parser.c (cp_parser_new_expression): Avoid accessing purged
tokens when getting end of location range.

gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/nsdmi-template14.C: Move dg-error directive.
---
 gcc/cp/parser.c   | 10 +++---
 gcc/testsuite/g++.dg/cpp0x/nsdmi-template14.C |  4 ++--
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d859a89..f3d406e 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -7957,9 +7957,13 @@ cp_parser_new_expression (cp_parser* parser)
  with caret == start at the start of the "new" token, and the end
  at the end of the final token we consumed.  */
   cp_token *end_tok = cp_lexer_previous_token (parser->lexer);
-  location_t end_loc = get_finish (end_tok->location);
-  location_t combined_loc = make_location (start_loc, start_loc, end_loc);
-
+  location_t combined_loc = start_loc;
+  if (!end_tok->purged_p)
+{
+  location_t end_loc = get_finish (end_tok->location);
+  gcc_assert (end_loc);
+  combined_loc = make_location (start_loc, start_loc, end_loc);
+}
   /* Create a representation of the new-expression.  */
   ret = build_new (&placement, type, nelts, &initializer, global_scope_p,
   tf_warning_or_error);
diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-template14.C 
b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template14.C
index 9cb01f1..47f5b63 100644
--- a/gcc/testsuite/g++.dg/cpp0x/nsdmi-template14.C
+++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template14.C
@@ -8,10 +8,10 @@ template struct A // { dg-error "has been parsed" }
 
 template struct B
 {
-  B* p = new B;
+  B* p = new B; // { dg-error "recursive instantiation of non-static data" }
 };
 
-B<1> x; // { dg-error "recursive instantiation of non-static data" }
+B<1> x;
 
 struct C
 {
-- 
1.8.5.3



[PATCH 07/10] Fix g++.dg/template/ref3.C

2015-12-03 Thread David Malcolm
Testcase g++.dg/template/ref3.C:

 1  // PR c++/28341
 2
 3  template struct A {};
 4
 5  template struct B
 6  {
 7A<(T)0> b; // { dg-error "constant|not a valid" }
 8A a; // { dg-error "constant|not a valid" }
 9  };
10
11  B b;

The output of this test for both c++11 and c++14 is unaffected
by the patch kit:
 g++.dg/template/ref3.C: In instantiation of 'struct B':
 g++.dg/template/ref3.C:11:15:   required from here
 g++.dg/template/ref3.C:7:11: error: '0' is not a valid template argument for 
type 'const int&' because it is not an lvalue
 g++.dg/template/ref3.C:8:11: error: '0' is not a valid template argument for 
type 'const int&' because it is not an lvalue

However, the c++98 output is changed:

Status quo for c++98:
g++.dg/template/ref3.C: In instantiation of 'struct B':
g++.dg/template/ref3.C:11:15:   required from here
g++.dg/template/ref3.C:7:11: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression
g++.dg/template/ref3.C:8:11: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression

(line 7 and 8 are at the closing semicolon for fields b and a)

With the patchkit for c++98:
g++.dg/template/ref3.C: In instantiation of 'struct B':
g++.dg/template/ref3.C:11:15:   required from here
g++.dg/template/ref3.C:7:5: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression
g++.dg/template/ref3.C:7:5: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression

So the 2nd:
  "error: a cast to a type other than an integral or enumeration type cannot 
appear in a constant-expression"
moves from line 8 to line 7 (and moves them to earlier, having ranges)

What's happening is that cp_parser_enclosed_template_argument_list
builds a CAST_EXPR, the first time from cp_parser_cast_expression,
the second time from cp_parser_functional_cast; these have locations
representing the correct respective caret&ranges, i.e.:

   A<(T)0> b;
 ^~~~

and:

   A a;
 ^~~~

Eventually finish_template_type is called for each, to build a RECORD_TYPE,
and we get a cache hit the 2nd time through here in pt.c:
8281  hash = spec_hasher::hash (&elt);
8282  entry = type_specializations->find_with_hash (&elt, hash);
8283
8284  if (entry)
8285return entry->spec;

due to:
  template_args_equal (ot=, nt=) at ../../src/gcc/cp/pt.c:7778
which calls:
  cp_tree_equal (t1=, t2=) 
at ../../src/gcc/cp/tree.c:2833
and returns equality.

Hence we get a single RECORD_TYPE for the type A<(T)(0)>, and hence
when issuing the errors it uses the TREE_VEC for the first one,
using the location of the first line.

I'm not sure what the ideal fix for this is; for now I've worked
around it by updating the dg directives to reflect the new output.

gcc/testsuite/ChangeLog:
* g++.dg/template/ref3.C: Update locations of dg directives.
---
 gcc/testsuite/g++.dg/template/ref3.C | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/template/ref3.C 
b/gcc/testsuite/g++.dg/template/ref3.C
index 976c093..6e568c3 100644
--- a/gcc/testsuite/g++.dg/template/ref3.C
+++ b/gcc/testsuite/g++.dg/template/ref3.C
@@ -4,8 +4,10 @@ template struct A {};
 
 template struct B
 {
-  A<(T)0> b; // { dg-error "constant|not a valid" }
-  A a; // { dg-error "constant|not a valid" }
+  A<(T)0> b; // { dg-error "constant" "" { target c++98_only } }
+  // { dg-error "not a valid" "" { target c++11 } 7 }
+
+  A a; // { dg-error "not a valid" "" { target c++11 } }
 };
 
 B b;
-- 
1.8.5.3



[PATCH 09/10] Fix g++.dg/warn/pr35635.C

2015-12-03 Thread David Malcolm
This testcase was broken by the patch kit; upon investigation
the best fix is to try to use the location of the relevant
expression when warning about conversions, rather than
input_location, falling back to the latter via EXPR_LOC_OR_LOC.

One dg-warning needed moving, since the caret is on the "?" of the
conditional here:

   uchar_x = bar != 0
 
 ? (unsigned char) 1024
 ^~
 : -1;
 

gcc/cp/ChangeLog:
* cvt.c (cp_convert_and_check): When warning about conversions,
attempt to use the location of "expr" if available, otherwise
falling back to the old behavior of using input_location.

gcc/testsuite/ChangeLog:
* g++.dg/warn/pr35635.C (func3): Update location of a
dg-warning.
---
 gcc/cp/cvt.c| 4 ++--
 gcc/testsuite/g++.dg/warn/pr35635.C | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index ebca004..f24f280 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -650,8 +650,8 @@ cp_convert_and_check (tree type, tree expr, tsubst_flags_t 
complain)
   folded_result = fold_simple (folded_result);
   if (!TREE_OVERFLOW_P (folded)
  && folded_result != error_mark_node)
-   warnings_for_convert_and_check (input_location, type, folded,
-   folded_result);
+   warnings_for_convert_and_check (EXPR_LOC_OR_LOC (expr, input_location),
+   type, folded, folded_result);
 }
 
   return result;
diff --git a/gcc/testsuite/g++.dg/warn/pr35635.C 
b/gcc/testsuite/g++.dg/warn/pr35635.C
index de68ceb..19345c5 100644
--- a/gcc/testsuite/g++.dg/warn/pr35635.C
+++ b/gcc/testsuite/g++.dg/warn/pr35635.C
@@ -62,9 +62,9 @@ void func3()
   /* At least one branch of ? does not fit in the destination, thus
  warn.  */
   uchar_x = bar != 0 ? 2.1 : 10; /* { dg-warning "conversion" } */
-  uchar_x = bar != 0  /* { dg-warning "negative integer implicitly converted 
to unsigned type" } */
-? (unsigned char) 1024 
-: -1; 
+  uchar_x = bar != 0
+? (unsigned char) 1024 /* { dg-warning "negative integer implicitly 
converted to unsigned type" } */
+: -1;
 }
 
 void func4()
-- 
1.8.5.3



[PATCH 00/10] C++ expression ranges v4

2015-12-03 Thread David Malcolm
On Wed, 2015-11-25 at 16:26 -0500, Jason Merrill wrote:
> > It's not clear to me whether I should be passing in UNKNOWN_LOCATION
> > or input_location to the various functions.
> >
> > cp_build_unary_op used input_location in various places internally,
> > so I've passed that in wherever there isn't a better value.
> 
> Rather than try to get this right now I'm inclined to save it for the 
> next stage 1 and go back to protected_set_expr_location for GCC 6.

Thanks; I've reworked the patch based on that idea.  I found whilst
bugfixing that in general it was better to use
cp_expr::set_location, which calls protected_set_expr_location,
since the former sets both the location in the tree node (if any)
*and* the shadow copy in the cp_expr (thus ensuring that compound
expressions use the correct location_t).

I've also done a lot of bugfixing, and rebased
from r230562 (Nov 18th) to r231208 (Dec 2nd).

> > Bootstraps (on x86_64-pc-linux-gnu), but regresses some tests, due to
> > changes in locations at which diagnostics are emitted:
> >
> >   c-c++-common/cilk-plus/CK/cilk_for_errors.c
> >   c-c++-common/cilk-plus/PS/for1.c
> >   c-c++-common/gomp/pr59073.c
> >   g++.dg/cpp0x/nsdmi-template14.C
> >   g++.dg/gomp/for-1.C
> >   g++.dg/gomp/pr39495-2.C
> >   g++.dg/init/new38.C
> >   g++.dg/warn/Wconversion-real-integer2.C
> >   g++.dg/warn/pr35635.C
> 
> Are the changes good or bad?

Some were bad, which I've fixed in the code.  Others were
improvements, requiring tweaks/movement of dg- directives.
I've broken out any such changes I needed to make to
specific test cases as separate patches in the kit, with notes
on each, in the hope it will make review easier.  (The kit would be
applied as a single commit; I've been testing it as one).

The following 10-patch kit bootstraps®rtests successfully on
x86_64-pc-linux-gnu.

It adds 213 new PASS results to g++.sum, and changes the location
of 154 PASS results there.

It adds 16 new PASS results to obj-c++.sum.

OK for trunk for gcc 6?


David Malcolm (10):
  C++ FE: expression ranges v4
  Fix g++.dg/cpp0x/nsdmi-template14.C
  Fix g++.dg/gomp/loop-1.C
  Fix g++.dg/template/crash55.C
  Fix location of dg-error within g++.dg/template/pr64100.C
  Fix g++.dg/template/pseudodtor3.C
  Fix g++.dg/template/ref3.C
  Fix g++.dg/ubsan/pr63956.C
  Fix g++.dg/warn/pr35635.C
  Fix g++.dg/warn/Wconversion-real-integer2.C

 gcc/convert.c  |   9 +-
 gcc/cp/cp-tree.h   |  86 ++-
 gcc/cp/cvt.c   |   4 +-
 gcc/cp/name-lookup.c   |   6 +-
 gcc/cp/name-lookup.h   |   2 +-
 gcc/cp/parser.c| 576 +++
 gcc/cp/semantics.c |  53 +-
 gcc/cp/typeck.c|  42 +-
 gcc/testsuite/g++.dg/cpp0x/nsdmi-template14.C  |   4 +-
 gcc/testsuite/g++.dg/gomp/loop-1.C |  32 +-
 .../g++.dg/plugin/diagnostic-test-expressions-1.C  | 775 +
 gcc/testsuite/g++.dg/plugin/plugin.exp |   5 +-
 gcc/testsuite/g++.dg/template/crash55.C|   3 +-
 gcc/testsuite/g++.dg/template/pr64100.C|   4 +-
 gcc/testsuite/g++.dg/template/pseudodtor3.C|   4 +-
 gcc/testsuite/g++.dg/template/ref3.C   |   6 +-
 gcc/testsuite/g++.dg/ubsan/pr63956.C   |  28 +-
 .../g++.dg/warn/Wconversion-real-integer2.C|   4 +-
 gcc/testsuite/g++.dg/warn/pr35635.C|   6 +-
 .../plugin/diagnostic-test-expressions-1.mm|  94 +++
 gcc/testsuite/obj-c++.dg/plugin/plugin.exp |  90 +++
 gcc/tree.c |  25 +-
 gcc/tree.h |  17 +-
 23 files changed, 1632 insertions(+), 243 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/plugin/diagnostic-test-expressions-1.C
 create mode 100644 
gcc/testsuite/obj-c++.dg/plugin/diagnostic-test-expressions-1.mm
 create mode 100644 gcc/testsuite/obj-c++.dg/plugin/plugin.exp

-- 
1.8.5.3



[PATCH 05/10] Fix location of dg-error within g++.dg/template/pr64100.C

2015-12-03 Thread David Malcolm
Here's what it now emits (if caret-printing were enabled):

g++.dg/template/pr64100.C: In instantiation of ‘class foo’:
g++.dg/template/pr64100.C:8:16:   required from here
g++.dg/template/pr64100.C:5:41: error: invalid use of incomplete type ‘class 
foo’
 static_assert(noexcept(((foo *)1)->~foo()), ""); // { dg-error "incomplete 
type" }
~^~~

g++.dg/template/pr64100.C:3:27: note: definition of ‘class foo’ is not 
complete until the closing brace
 template struct foo // { dg-message "note" }
   ^~~

gcc/testsuite/ChangeLog:
* g++.dg/template/pr64100.C: Update location of dg-error
directive.
---
 gcc/testsuite/g++.dg/template/pr64100.C | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/template/pr64100.C 
b/gcc/testsuite/g++.dg/template/pr64100.C
index 493849f..051800c 100644
--- a/gcc/testsuite/g++.dg/template/pr64100.C
+++ b/gcc/testsuite/g++.dg/template/pr64100.C
@@ -1,8 +1,8 @@
 // { dg-do compile { target c++11 } }
 
 template struct foo // { dg-message "note" }
-{ // { dg-error "incomplete type" }
-static_assert(noexcept(((foo *)1)->~foo()), "");
+{
+static_assert(noexcept(((foo *)1)->~foo()), ""); // { dg-error "incomplete 
type" }
 }; 
 
 template class foo;
-- 
1.8.5.3



Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-12-03 Thread Richard Biener
On Thu, Dec 3, 2015 at 3:38 PM, Richard Biener
 wrote:
> On Sat, Nov 14, 2015 at 12:35 AM, Jeff Law  wrote:
>> On 11/13/2015 01:23 PM, Jeff Law wrote:
>>>
>>> On 11/13/2015 11:09 AM, Richard Biener wrote:
>>>
>
> BTW Do we have an API for indicating that new blocks have been added to
>
> a loop?  If so, then we can likely drop the LOOPS_NEED_FIXUP.


 Please. It's called add_to_loop or so.
>>>
>>> Haha, the block duplication code was handling this already.  So in
>>> theory I can just drop the LOOPS_NEED_FIXUP completely.  Testing now.
>>>
>>> jeff
>>>
>> Attached is the committed patch for path splitting.  As noted above, we
>> didn't need the LOOPS_NEED_FIXUP in the final version, so that wart is gone
>> :-)
>>
>> I do find myself wondering if this can/should be generalized beyond just
>> paths heading to loop backedges.  However to do so I think we'd need to be
>> able to undo this transformation reliably and we'd need some heuristics when
>> to duplicate to expose the redundancy vs rely on PRE techniques and jump
>> threading.  I vaguely remember a paper which touched on these topics, but I
>> can't seem to find it.
>>
>> Anyway, bootstrapped and regression tested on x86_64-linux-gnu. Installed on
>> the trunk.
>
> This pass is now enabled by default with -Os but has no limits on the amount 
> of
> stmts it copies.  It also will make all loops with this shape have at least 
> two
> exits (if the resulting loop will be disambiguated the inner loop will
> have two exits).
> Having more than one exit will disable almost all loop optimizations after it.
>
> The pass itself documents the transform it does but does zero to motivate it.
>
> What's the benefit of this pass (apart from disrupting further optimizations)?
>
> I can see a _single_ case where duplicating the latch will allow threading
> one of the paths through the loop header to eliminate the original exit.  Then
> disambiguation may create a nice nested loop out of this.  Of course that
> is only profitable again if you know the remaining single exit of the inner
> loop (exiting to the outer one) is executed infrequently (thus the inner loop
> actually loops).
>
> But no checks other than on the CFG shape exist (oh, it checks it will
> at _least_ copy two stmts!).
>
> Given the profitability constraints above (well, correct me if I am
> wrong on these)
> it looks like the whole transform should be done within the FSM threading
> code which might be able to compute whether there will be an inner loop
> with a single exit only.
>
> I'm inclined to request the pass to be removed again or at least disabled by
> default.
>
> What closed source benchmark was this transform invented for?

Ah, some EEMBC one.

Btw, the testcase that was added shows

   if (xc < xm)
 {
   xk = (unsigned char) (xc < xy ? xc : xy);
 }
   else
{
  xk = (unsigned char) (xm < xy ? xm : xy);
}

which might be better handled by phiopt transforming it into

xk = MIN (xc, MIN (xm, xy))

phiopt1 sees (hooray to GENERIC folding)

  xc_26 = ~xr_21;
  xm_27 = ~xg_23;
  xy_28 = ~xb_25;
  if (xr_21 > xg_23)
goto ;
  else
goto ;

  :
  xk_29 = MIN_EXPR ;
  goto ;

  :
  xk_30 = MIN_EXPR ;

  :
  # xk_4 = PHI 

btw, see PR67438 for a similar testcase and the above pattern.

Richard.

> Richard.
>
>>
>>
>>
>> commit c1891376e5dcc99ad8be2d22f9551c03f9bb2729
>> Author: Jeff Law 
>> Date:   Fri Nov 13 16:29:34 2015 -0700
>>
>> [Patch,tree-optimization]: Add new path Splitting pass on tree ssa
>> representation
>>
>> * Makefile.in (OBJS): Add gimple-ssa-split-paths.o
>> * common.opt (-fsplit-paths): New flag controlling path splitting.
>> * doc/invoke.texi (fsplit-paths): Document.
>> * opts.c (default_options_table): Add -fsplit-paths to -O2.
>> * passes.def: Add split_paths pass.
>> * timevar.def (TV_SPLIT_PATHS): New timevar.
>> * tracer.c: Include "tracer.h"
>> (ignore_bb_p): No longer static.
>> (transform_duplicate): New function, broken out of tail_duplicate.
>> (tail_duplicate): Use transform_duplicate.
>> * tracer.h (ignore_bb_p): Declare
>> (transform_duplicate): Likewise.
>> * tree-pass.h (make_pass_split_paths): Declare.
>> * gimple-ssa-split-paths.c: New file.
>>
>> * gcc.dg/tree-ssa/split-path-1.c: New test.
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index dde2695..a7abe37 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,21 @@
>> +2015-11-13  Ajit Agarwal  
>> +   Jeff Law  
>> +
>> +   * Makefile.in (OBJS): Add gimple-ssa-split-paths.o
>> +   * common.opt (-fsplit-paths): New flag controlling path splitting.
>> +   * doc/invoke.texi (fsplit-paths): Document.
>> +   * opts.c (default_options_table): Add -fsplit-paths to -O2.
>> +   * passes.def: Add split_paths pass.
>> +   * timevar.def (TV_SPLIT_PATHS): 

[RTL] canonical form of AND-immediate within COMPARE?

2015-12-03 Thread Kyrill Tkachov

Hi all,

Some ISAs have instructions to perform a bitwise AND operation with an 
immediate and compare
the result with zero. For example, the aarch64 TST instruction.
This is represented naturally in the MD file as:

(define_insn "*and3nr_compare0"
  [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ
 (and:GPI (match_operand:GPI 0 "register_operand" "%r,r")
  (match_operand:GPI 1 "aarch64_logical_operand" "r,"))
 (const_int 0)))]
  ""
  "tst\\t%0, %1"
  [(set_attr "type" "logics_reg,logics_imm")]
)

However, when the immediate operand of the AND is all ones, combine transforms 
that
into a zero_extract.
For example, the testcase is on aarch64:
void g ();

void
f1 (int x)
{
  if (x & 15)
g ();
}

We're trying to combine the insns:
(insn 6 3 7 2 (set (reg:SI 75)
(and:SI (reg/v:SI 74 [ x ])
(const_int 15 [0xf]))) cbz.c:7 460 {andsi3}
 (expr_list:REG_DEAD (reg/v:SI 74 [ x ])
(nil)))
(insn 7 6 8 2 (set (reg:CC 66 cc)
(compare:CC (reg:SI 75)
(const_int 0 [0]))) cbz.c:7 385 {*cmpsi}
 (expr_list:REG_DEAD (reg:SI 75)
(nil)))

followed by a conditional branch:
(jump_insn 8 7 9 2 (set (pc)
(if_then_else (eq (reg:CC 66 cc)
(const_int 0 [0]))
(label_ref:DI 14)
(pc))) cbz.c:7 7 {condjump}

combine attempts to match the pattern:
(set (reg:CC 66 cc)
(compare:CC (zero_extract:DI (reg:DI 0 x0 [ x ])
(const_int 4 [0x4])
(const_int 0 [0]))
(const_int 0 [0])))


and fails. This fails, first because our pattern matches the and-immediate 
form, not the zero_extract form.
The change_zero_ext step at the end of combine can't fix the damage because 
earlier in simplify_set when we
called SELECT_CC_MODE on the COMPARE and its use in the conditional branch the 
aarch64 implementation of SELECT_CC_MODE
doesn't handle the ZERO_EXTRACT form to return the CC_NZ form that the pattern 
needs, so the change_zero_ext code can
transform the zero_extract back into the AND-immediate but it doesn't have the 
context to fix the CC mode.

Is there a good way to fix this? It would seem rather weird to have extra MD 
patterns to match the zero_extract forms
explicitly. Maybe teaching the aarch64 implementation of SELECT_CC_MODE to 
handle ZERO_EXTRACTS the same as AND-immediates?
Or is there something that can be done in combine itself?

Thanks,
Kyrill





[PATCH 06/10] Fix g++.dg/template/pseudodtor3.C

2015-12-03 Thread David Malcolm
gcc/testsuite/ChangeLog:
* g++.dg/template/pseudodtor3.C: Update column numbers in dg-error
directives.
---
 gcc/testsuite/g++.dg/template/pseudodtor3.C | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/template/pseudodtor3.C 
b/gcc/testsuite/g++.dg/template/pseudodtor3.C
index 202182f..8700bb9 100644
--- a/gcc/testsuite/g++.dg/template/pseudodtor3.C
+++ b/gcc/testsuite/g++.dg/template/pseudodtor3.C
@@ -11,7 +11,7 @@ struct A
 template  struct B
 {
   T &foo ();
-  B () { foo.~T (); }  // { dg-error "10:invalid use of member" }
+  B () { foo.~T (); }  // { dg-error "15:invalid use of member" }
 };
 
 B b;
@@ -19,7 +19,7 @@ B b;
 template  struct C
 {
   T t;
-  C () { t.~S (); }// { dg-error "10:is not of type" }
+  C () { t.~S (); }// { dg-error "13:is not of type" }
 };
 
 C c;
-- 
1.8.5.3



[C] Issue an error on scalar va_list with reverse storage order

2015-12-03 Thread Eric Botcazou
Hi,

further testing revealed an issue with va_arg handling and reverse scalar 
storage order on some platforms: when va_list is scalar, passing a field of a 
structure with reverse SSO as first argument to va_start/va_arg/va_end doesn't 
work because the machinery takes its address and this is not allowed for such 
a field (it's really a corner case but gcc.c-torture/execute/stdarg-2.c does 
exercise it).  Hence the attached patch which issues an error in this case.

Tested on x86_64-suse-linux, OK for the mainline?


2015-12-03  Eric Botcazou  

* c-tree.h (c_build_va_arg): Adjust prototype.
* c-parser.c (c_parser_postfix_expression): Adjust call to above.
* c-typeck.c (c_build_va_arg): Rename LOC parameter to LOC2, add LOC1
parameter, adjust throughout and issue an error if EXPR is a component
with reverse storage order.


2015-12-03  Eric Botcazou  

* gcc.dg/sso-9.c: New test.

-- 
Eric BotcazouIndex: c-parser.c
===
--- c-parser.c	(revision 231206)
+++ c-parser.c	(working copy)
@@ -7485,7 +7485,7 @@ c_parser_postfix_expression (c_parser *p
 	else
 	  {
 		tree type_expr = NULL_TREE;
-		expr.value = c_build_va_arg (loc, e1.value,
+		expr.value = c_build_va_arg (start_loc, e1.value, loc,
 	 groktypename (t1, &type_expr, NULL));
 		if (type_expr)
 		  {
Index: c-tree.h
===
--- c-tree.h	(revision 231206)
+++ c-tree.h	(working copy)
@@ -661,7 +661,7 @@ extern tree c_finish_omp_task (location_
 extern void c_finish_omp_cancel (location_t, tree);
 extern void c_finish_omp_cancellation_point (location_t, tree);
 extern tree c_finish_omp_clauses (tree, bool, bool = false);
-extern tree c_build_va_arg (location_t, tree, tree);
+extern tree c_build_va_arg (location_t, tree, location_t, tree);
 extern tree c_finish_transaction (location_t, tree, int);
 extern bool c_tree_equal (tree, tree);
 extern tree c_build_function_call_vec (location_t, vec, tree,
Index: c-typeck.c
===
--- c-typeck.c	(revision 231206)
+++ c-typeck.c	(working copy)
@@ -13426,20 +13426,28 @@ c_build_qualified_type (tree type, int t
 /* Build a VA_ARG_EXPR for the C parser.  */
 
 tree
-c_build_va_arg (location_t loc, tree expr, tree type)
+c_build_va_arg (location_t loc1, tree expr, location_t loc2, tree type)
 {
   if (error_operand_p (type))
 return error_mark_node;
+  /* VA_ARG_EXPR cannot be used for a scalar va_list with reverse storage
+ order because it takes the address of the expression.  */
+  else if (handled_component_p (expr)
+	   && reverse_storage_order_for_component_p (expr))
+{
+  error_at (loc1, "cannot use % with reverse storage order");
+  return error_mark_node;
+}
   else if (!COMPLETE_TYPE_P (type))
 {
-  error_at (loc, "second argument to % is of incomplete "
+  error_at (loc2, "second argument to % is of incomplete "
 		"type %qT", type);
   return error_mark_node;
 }
   else if (warn_cxx_compat && TREE_CODE (type) == ENUMERAL_TYPE)
-warning_at (loc, OPT_Wc___compat,
+warning_at (loc2, OPT_Wc___compat,
 		"C++ requires promoted type, not enum type, in %");
-  return build_va_arg (loc, expr, type);
+  return build_va_arg (loc2, expr, type);
 }
 
 /* Return truthvalue of whether T1 is the same tree structure as T2.
/* Test support of scalar_storage_order attribute */

/* { dg-do compile } */

#include 

int x;

#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
struct __attribute__((scalar_storage_order("big-endian"))) Rec
{
  va_list v;
};
#else
struct __attribute__((scalar_storage_order("little-endian"))) Rec
{
  va_list v;
};
#endif

void foo (int i, ...)
{
  struct Rec a;
  va_start (a.v, i);
  a.v = a.v, x = va_arg (a.v, int); /* { dg-error "array type|reverse storage order" } */
  va_end (a.v);
}


Re: [PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-03 Thread Alan Lawrence

On 03/12/15 12:58, Richard Biener wrote:

On Thu, 3 Dec 2015, Alan Lawrence wrote:


On 02/12/15 14:13, Jeff Law wrote:

On 12/02/2015 01:33 AM, Richard Biener wrote:

Right.  So the question I have is how/why did DOM leave anything in the
map.
And if DOM is fixed to not leave stuff lying around, can we then assert
that
nothing is ever left in those maps between passes?  There's certainly no
good
reason I'm aware of why DOM would leave things in this state.


It happens not only with DOM but with all passes doing edge redirection.
This is because the map is populated by GIMPLE cfg hooks just in case
it might be used.  But there is no such thing as a "start CFG manip"
and "end CFG manip" to cleanup such dead state.

Sigh.



IMHO the redirect-edge-var-map stuff is just the very most possible
unclean implementation possible. :(  (see how remove_edge "clears"
stale info from the map to avoid even more "interesting" stale
data)

Ideally we could assert the map is empty whenever we leave a pass,
but as said it triggers all over the place.  Even cfg-cleanup causes
such stale data.

I agree that the patch is only a half-way "solution", but a full
solution would require sth more explicit, like we do with
initialize_original_copy_tables/free_original_copy_tables.  Thus
require passes to explicitely request the edge data to be preserved
with a initialize_edge_var_map/free_edge_var_map call pair.

Not appropriate at this stage IMHO (well, unless it turns out to be
a very localized patch).

So maybe as a follow-up to aid folks in the future, how about a debugging
verify_whatever function that we can call manually if debugging a problem in
this space.  With a comment indicating why we can't call it unconditionally
(yet).


jeff


I did a (fwiw disable bootstrap) build with the map-emptying code in passes.c
(not functions.c), printing out passes after which the map was non-empty
(before emptying it, to make sure passes weren't just carrying through stale
data from earlier). My (non-exhaustive!) list of passes after which the
edge_var_redirect_map can be non-empty stands at...

aprefetch ccp cddce ch ch_vect copyprop crited crited cselim cunroll cunrolli
dce dom ehcleanup einline esra fab fnsplit forwprop fre graphite ifcvt
isolate-paths ldist lim local-pure-const mergephi oaccdevlow ompexpssa
optimized parloops pcom phicprop phiopt phiprop pre profile profile_estimate
sccp sink slsr split-paths sra switchconv tailc tailr tracer unswitch
veclower2 vect vrm vrp whole-program


Yeah, exactly my findings...  note that most of the above are likely
due to cfgcleanup even though it already does sth like

   e = redirect_edge_and_branch (e, dest);
   redirect_edge_var_map_clear (e);

so eventually placing a redirect_edge_var_map_empty () at the end
of the cleanup_tree_cfg function should prune down the above list
considerably (well, then assert the map is empty on entry to that
function of course)


FWIW, the route by which dom added the edge to the redirect map was:
#0  redirect_edge_var_map_add (e=e@entry=0x7fb7a5f508, result=0x7fb725a000,
 def=0x7fb78eaea0, locus=2147483884) at ../../gcc/gcc/tree-ssa.c:54
#1  0x00cccf58 in ssa_redirect_edge (e=e@entry=0x7fb7a5f508,
 dest=dest@entry=0x7fb79cc680) at ../../gcc/gcc/tree-ssa.c:158
#2  0x00b00738 in gimple_redirect_edge_and_branch (e=0x7fb7a5f508,
 dest=0x7fb79cc680) at ../../gcc/gcc/tree-cfg.c:5662
#3  0x006ec678 in redirect_edge_and_branch (e=e@entry=0x7fb7a5f508,
 dest=) at ../../gcc/gcc/cfghooks.c:356
#4  0x00cb4530 in ssa_fix_duplicate_block_edges (rd=0x1a29f10,
 local_info=local_info@entry=0x7fed40)
 at ../../gcc/gcc/tree-ssa-threadupdate.c:1184
#5  0x00cb5520 in ssa_fixup_template_block (slot=,
 local_info=0x7fed40) at ../../gcc/gcc/tree-ssa-threadupdate.c:1369
#6  traverse_noresize (
 argument=0x7fed40, this=0x1a21a00) at ../../gcc/gcc/hash-table.h:911
#7  traverse (
 argument=0x7fed40, this=0x1a21a00) at ../../gcc/gcc/hash-table.h:933
#8  thread_block_1 (bb=bb@entry=0x7fb7485bc8,
 noloop_only=noloop_only@entry=true, joiners=joiners@entry=true)
 at ../../gcc/gcc/tree-ssa-threadupdate.c:1592
#9  0x00cb5a40 in thread_block (bb=0x7fb7485bc8,
 noloop_only=noloop_only@entry=true)
 at ../../gcc/gcc/tree-ssa-threadupdate.c:1629
---Type  to continue, or q  to quit---
#10 0x00cb6bf8 in thread_through_all_blocks (
 may_peel_loop_headers=true) at ../../gcc/gcc/tree-ssa-threadupdate.c:2736
#11 0x00becf6c in (anonymous namespace)::pass_dominator::execute (
 this=, fun=0x7fb77d1b28)
 at ../../gcc/gcc/tree-ssa-dom.c:622
#12 0x009feef4 in execute_one_pass (pass=pass@entry=0x16d1a80)
 at ../../gcc/gcc/passes.c:2311

The edge is then deleted much later:
#3  0x00f858e4 in free_edge (fn=, e=)
 at ../../gcc/gcc/cfg.c:91
#4  remove_edge_raw (e=) at ../../gcc/gcc/cfg.c:350
#5  0x006ec814 in remove

[PATCH 01/10] C++ FE: expression ranges v4

2015-12-03 Thread David Malcolm
Changes in this version:
- removal of gcc_assert (m_loc != UNKNOWN_LOCATION) from cp_expr ctor
- uses protected_set_expr_location or cp_expr::set_location/set_range,
  rather than attempting to add location_t arguments
- adds location support and test coverage based on issues seen in
  the analogous work onthe C FE (see r230497 and r230775).
  Specifically:
  - various Objective C++ constructs (creating obj-c++.dg/plugin in
order to unit-test these, analogous to changes for C FE)
  - braced initializers
  - statement expressions
  - address of label
  - transaction expressions
  - __FUNCTION__ et al
  - __builtin_va_arg and __builtin_offsetof
- handle locations of functional casts and _Cilk_spawn
- fixes locations of negative numeric literals
- various other bugfixes and additional test coverage

gcc/ChangeLog:
* convert.c (convert_to_real_1): When converting from a
REAL_TYPE, preserve the location of EXPR in the result.
* tree.c (get_pure_location): Make non-static.
(set_source_range): Return the resulting location_t.
(make_location): New function.
* tree.h (get_pure_location): New decl.
(get_finish): New inline function.
(set_source_range): Convert return type from void to location_t.
(make_location): New decl.

gcc/cp/ChangeLog:
* cp-tree.h (class cp_expr): New class.
(finish_parenthesized_expr): Convert return type and param to
cp_expr.
(perform_koenig_lookup): Convert return type and param from tree
to cp_expr.
(finish_increment_expr): Likewise.
(finish_unary_op_expr): Likewise.
(finish_id_expression): Likewise for return type.
(build_class_member_access_expr): Likewise for param.
(finish_class_member_access_expr): Likewise.
(build_x_unary_op): Likewise.
(build_c_cast): New decl.
(build_x_modify_expr): Convert return type from tree to cp_expr.
* name-lookup.c (lookup_arg_dependent_1): Likewise.
(lookup_arg_dependent): Likewise; also for local "ret".
* name-lookup.h (lookup_arg_dependent): Likewise for return type.
* parser.c (struct cp_parser_expression_stack_entry): Likewise
for field "lhs".
(cp_parser_identifier): Likewise for return type.  Use cp_expr
ctor to preserve the token's location.
(cp_parser_string_literal): Likewise, building up a meaningful
location for the case where a compound string literal is built by
concatentation.
(cp_parser_userdef_char_literal): Likewise for return type.
(cp_parser_userdef_numeric_literal): Likewise.
(cp_parser_statement_expr): Convert return type to cp_expr.
Generate a suitable location for the expr and return it via the
cp_expr ctor.
(cp_parser_fold_expression): Convert return type to cp_expr.
(cp_parser_primary_expression): Likewise, and for locals "expr",
"lam", "id_expression", "decl".
Use cp_expr ctor when parsing literals, to preserve the spelling
location of the token.  Preserve the locations of parentheses.
Preserve location when calling objc_lookup_ivar.
Preserve the location for "this" tokens.  Generate suitable
locations for "__builtin_va_arg" constructs and for
Objective C 2.0 dot-syntax.  Set the location for the result of
finish_id_expression.
(cp_parser_primary_expression): Convert return type from tree to
cp_expr.
(cp_parser_id_expression): Likewise.
(cp_parser_unqualified_id): Likewise.  Also for local "id".
(cp_parser_postfix_expression): Likewise, also for local
"postfix_expression".  Generate suitable locations for
C++-style casts, "_Cilk_spawn" constructs.  Convert local
"initializer" to cp_expr and use it to preserve the location of
compound literals.  Capture the location of the closing
parenthesis of a call site via
cp_parser_parenthesized_expression_list, and use it to build
a source range for a call.  Use cp_expr in ternary expression.
(cp_parser_postfix_dot_deref_expression): Convert param from tree to
cp_expr.  Generate and set a location.
(cp_parser_parenthesized_expression_list): Add "close_paren_loc"
out-param, and write back to it.
(cp_parser_unary_expression): Convert return type from tree to
cp_expr.  Also for locals "cast_expression" and "expression".
Generate and use suitable locations for addresses of
labels and for cast expressions.  Call cp_expr::set_location where
necessary.  Preserve the locations of negated numeric literals.
(cp_parser_new_expression): Generate meaningful locations/ranges.
(cp_parser_cast_expression): Convert return type from tree to
cp_expr; also for local "expr".  Use the paren location to generate a
meaningful range for the express

Re: [PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-03 Thread Richard Biener
On Thu, 3 Dec 2015, Alan Lawrence wrote:

> On 03/12/15 12:58, Richard Biener wrote:
> > On Thu, 3 Dec 2015, Alan Lawrence wrote:
> > 
> > > On 02/12/15 14:13, Jeff Law wrote:
> > > > On 12/02/2015 01:33 AM, Richard Biener wrote:
> > > > > > Right.  So the question I have is how/why did DOM leave anything in
> > > > > > the
> > > > > > map.
> > > > > > And if DOM is fixed to not leave stuff lying around, can we then
> > > > > > assert
> > > > > > that
> > > > > > nothing is ever left in those maps between passes?  There's
> > > > > > certainly no
> > > > > > good
> > > > > > reason I'm aware of why DOM would leave things in this state.
> > > > > 
> > > > > It happens not only with DOM but with all passes doing edge
> > > > > redirection.
> > > > > This is because the map is populated by GIMPLE cfg hooks just in case
> > > > > it might be used.  But there is no such thing as a "start CFG manip"
> > > > > and "end CFG manip" to cleanup such dead state.
> > > > Sigh.
> > > > 
> > > > > 
> > > > > IMHO the redirect-edge-var-map stuff is just the very most possible
> > > > > unclean implementation possible. :(  (see how remove_edge "clears"
> > > > > stale info from the map to avoid even more "interesting" stale
> > > > > data)
> > > > > 
> > > > > Ideally we could assert the map is empty whenever we leave a pass,
> > > > > but as said it triggers all over the place.  Even cfg-cleanup causes
> > > > > such stale data.
> > > > > 
> > > > > I agree that the patch is only a half-way "solution", but a full
> > > > > solution would require sth more explicit, like we do with
> > > > > initialize_original_copy_tables/free_original_copy_tables.  Thus
> > > > > require passes to explicitely request the edge data to be preserved
> > > > > with a initialize_edge_var_map/free_edge_var_map call pair.
> > > > > 
> > > > > Not appropriate at this stage IMHO (well, unless it turns out to be
> > > > > a very localized patch).
> > > > So maybe as a follow-up to aid folks in the future, how about a
> > > > debugging
> > > > verify_whatever function that we can call manually if debugging a
> > > > problem in
> > > > this space.  With a comment indicating why we can't call it
> > > > unconditionally
> > > > (yet).
> > > > 
> > > > 
> > > > jeff
> > > 
> > > I did a (fwiw disable bootstrap) build with the map-emptying code in
> > > passes.c
> > > (not functions.c), printing out passes after which the map was non-empty
> > > (before emptying it, to make sure passes weren't just carrying through
> > > stale
> > > data from earlier). My (non-exhaustive!) list of passes after which the
> > > edge_var_redirect_map can be non-empty stands at...
> > > 
> > > aprefetch ccp cddce ch ch_vect copyprop crited crited cselim cunroll
> > > cunrolli
> > > dce dom ehcleanup einline esra fab fnsplit forwprop fre graphite ifcvt
> > > isolate-paths ldist lim local-pure-const mergephi oaccdevlow ompexpssa
> > > optimized parloops pcom phicprop phiopt phiprop pre profile
> > > profile_estimate
> > > sccp sink slsr split-paths sra switchconv tailc tailr tracer unswitch
> > > veclower2 vect vrm vrp whole-program
> > 
> > Yeah, exactly my findings...  note that most of the above are likely
> > due to cfgcleanup even though it already does sth like
> > 
> >e = redirect_edge_and_branch (e, dest);
> >redirect_edge_var_map_clear (e);
> > 
> > so eventually placing a redirect_edge_var_map_empty () at the end
> > of the cleanup_tree_cfg function should prune down the above list
> > considerably (well, then assert the map is empty on entry to that
> > function of course)
> > 
> > > FWIW, the route by which dom added the edge to the redirect map was:
> > > #0  redirect_edge_var_map_add (e=e@entry=0x7fb7a5f508,
> > > result=0x7fb725a000,
> > >  def=0x7fb78eaea0, locus=2147483884) at ../../gcc/gcc/tree-ssa.c:54
> > > #1  0x00cccf58 in ssa_redirect_edge (e=e@entry=0x7fb7a5f508,
> > >  dest=dest@entry=0x7fb79cc680) at ../../gcc/gcc/tree-ssa.c:158
> > > #2  0x00b00738 in gimple_redirect_edge_and_branch (e=0x7fb7a5f508,
> > >  dest=0x7fb79cc680) at ../../gcc/gcc/tree-cfg.c:5662
> > > #3  0x006ec678 in redirect_edge_and_branch
> > > (e=e@entry=0x7fb7a5f508,
> > >  dest=) at ../../gcc/gcc/cfghooks.c:356
> > > #4  0x00cb4530 in ssa_fix_duplicate_block_edges (rd=0x1a29f10,
> > >  local_info=local_info@entry=0x7fed40)
> > >  at ../../gcc/gcc/tree-ssa-threadupdate.c:1184
> > > #5  0x00cb5520 in ssa_fixup_template_block (slot=,
> > >  local_info=0x7fed40) at
> > > ../../gcc/gcc/tree-ssa-threadupdate.c:1369
> > > #6  traverse_noresize (
> > >  argument=0x7fed40, this=0x1a21a00) at
> > > ../../gcc/gcc/hash-table.h:911
> > > #7  traverse (
> > >  argument=0x7fed40, this=0x1a21a00) at
> > > ../../gcc/gcc/hash-table.h:933
> > > #8  thread_block_1 (bb=bb@entry=0x7fb7485bc8,
> > >  noloop_only=noloop_only@entry=true, joiners=joiners@entry=true)
> > >  at 

Re: Documentation tweaks for internal-fn-related optabs

2015-12-03 Thread Richard Sandiford
Bernd Schmidt  writes:
> On 12/03/2015 02:06 PM, Richard Sandiford wrote:
>> As Bernd requested, this patch adds "This pattern cannot FAIL" to the
>> documentation of optabs that came to be mapped to interal functions.
>> For consistency I did the same for optabs that were already being
>> used for internal functions.
>>
>> Many of the optabs weren't documented in the first place, so I added
>> entries for the missing ones.  Also, there were some inaccuracies in
>> the documentation of the rounding optabs.  The bitcount optabs said
>> that operand 0 has mode @var{m} and that operand 1 is under target
>> control, whereas it should be the other way around.
>
> That actually goes beyond what I imagined. I was looking at the top part 
> of md.texi (line 87), where there is a brief discussion of what is 
> allowed to FAIL and what isn't. Also, there is "@item FAIL":
>
>"Failure is currently supported only for binary (addition,
> multiplication, shifting, etc.) and bit-field (@code{extv},
> @code{extzv}, and @code{insv}) operations."
>
> That's pretty outdated. I think unary operations are probably missing by 
> accident, and from what my grep showed there are also conditional moves, 
> atomic operations, certain vec_ patterns that can all fail. As a minimum 
> this paragraph should also mention internal functions.

I don't think that quote means that FAIL is supported for _all_ optabs
with two inputs and one output.  What "etc." includes is left vague.

A blanket statement about internal functions is likely to get out of
date, since there's no reason in principle why optabs used for future
internal functions couldn't have fallbacks.  Also, "internal-function
optabs" aren't self-describing: no-one's going to know what an internal
function optab is without looking at the source.

I'd rather keep it as the patch has it and say for each relevant optab
that the expander can't fail.

Richard



Re: [RTL] canonical form of AND-immediate within COMPARE?

2015-12-03 Thread Eric Botcazou
> Some ISAs have instructions to perform a bitwise AND operation with an
> immediate and compare the result with zero.

Many of them I'd say.

> Is there a good way to fix this? It would seem rather weird to have extra MD
> patterns to match the zero_extract forms explicitly. Maybe teaching the
> aarch64 implementation of SELECT_CC_MODE to handle ZERO_EXTRACTS the same
> as AND-immediates? Or is there something that can be done in combine
> itself?

ARM, i386, MIPS, PA, SPARC, etc have ZERO_EXTRACT-based patterns though.

-- 
Eric Botcazou


Re: [PATCH] Fix shrink-wrap bug with anticipating into loops (PR67778, PR68634)

2015-12-03 Thread Richard Sandiford
Segher Boessenkool  writes:
> On Wed, Dec 02, 2015 at 08:19:05PM +0100, Jakub Jelinek wrote:
>> On Wed, Dec 02, 2015 at 06:21:47PM +, Segher Boessenkool wrote:
>> > --- a/gcc/shrink-wrap.c
>> > +++ b/gcc/shrink-wrap.c
>> > @@ -752,7 +752,11 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head 
>> > *bb_with,
>> >  
>> >/* If we can move PRO back without having to duplicate more blocks, do 
>> > so.
>> >   We can move back to a block PRE if every path from PRE will 
>> > eventually
>> > - need a prologue, that is, PRO is a post-dominator of PRE.  */
>> > + need a prologue, that is, PRO is a post-dominator of PRE.  We might
>> > + need to duplicate PRE if there is any path from a successor of PRE 
>> > back
>> > + to PRE, so don't allow that either (but self-loops are fine, as are 
>> > any
>> > + other loops entirely dominated by PRE; this in general seems too
>> > + expensive to check for, for such an uncommon case).  */
>> 
>> So, what will happen if PRE self-loops?
>
> The prologue is put in a new block before the chosen one (as always).
>
>> It would be nice to have it covered by a testcase.
>
> If I knew how to prepare one, that stayed stable for more than about
> two weeks, yes :-/
>
>> > +bool ok = true;
>> > +
>> > +if (!can_get_prologue (pre, prologue_clobbered))
>> > +  ok = false;
>> > +
>> > +FOR_EACH_EDGE (e, ei, pre->succs)
>> > +  if (e->dest != pre
>> > +  && dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre))
>> > +ok = false;
>> 
>> I wonder if it wouldn't be better to:
>> 
>>  if (!can_get_prologue (pre, prologue_clobbered))
>>ok = false;
>>  else
>>FOR_EACH_EDGE (e, ei, pre->succs)
>>  if (e->dest != pre
>>  && dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre))
>>{
>>  ok = false;
>>  break;
>>}
>> 
>> so that it doesn't walk or continue walking the edges if not needed.
>
> If the compiler is any good, neither does my code, right?  :-)
>
> I think it is more important to have this code readable than a teeny
> tiny bit faster.  It is all linear (assuming dominator lookups are O(1)),
> which isn't too hard to ascertain (yeah, famous last words).

Maybe the clearest thing is to split it out into a function that returns
false as soon as it finds a reason why the transform is not OK.
The "decent compiler" ought to inline that function.

Thanks,
Richard



Re: [RFA] [PR tree-optimization/68599] Avoid over-zealous optimization with -funsafe-loop-optimizations

2015-12-03 Thread Jeff Law

On 12/03/2015 02:36 AM, Richard Biener wrote:

On Wed, Dec 2, 2015 at 5:27 PM, Jeff Law  wrote:



I strongly recommend reading the analysis in pr45122 since pr68599 uses the
same testcase and just triggers the same bug in the RTL optimizers instead
of the tree optimziers.

As noted in 45122, with -funsafe-loop-optimizations, we may exit the loop an
iteration too early.  The loop in question is finite and the counter does
not overflow.  Yet -funsafe-loop-optimizations munges it badly.

As is noted in c#6 and patched in c#8, when there's more than one exit from
the loop, simply discarding the assumptions for the trip count is "a bit too
unsafe".  Richi & Zdenek agreed that disabling the optimization when the
loop has > 1 exit was the preferred approach. Alex's patch did just that,
but only for the tree optimizers.

This patch does essentially the same thing for the RTL loop optimizer. If
the candidate loop has > 1 exit, then we don't allow
-funsafe-loop-optimizations to drop the assumptions/infinite notes for the
RTL loop.

This required ensuring that LOOPS_HAVE_RECORDED_EXITS when initializing the
loop optimizer.

Bootstrapped and regression tested on x86_64-linux-gnu and
powerpc64-linux-gnu.  For the latter, pr45122.c flips to a pass.  Given this
is covered by the pr45122 testcase, I didn't add a new one.

OK for the trunk?


Ok.

Note that I believe we should dump -funsafe-loop-optimizations in
favor of a per-loop
#pragma now that we can properly track such.  Globally it's known to miscompile
SPEC at least.
Yea, I saw that on IRC and almost went down that path.  Certainly 
wouldn't get any argument from me if we were to remove that option. 
Sounds like Bin might want to do that and he'll have my full support.


Jeff



Re: [PATCH][install.texi] Add note against GNAT 4.8 on ARM targets.

2015-12-03 Thread Alan Lawrence

On 16/11/15 15:08, Alan Lawrence wrote:

This follows from the discussion here: 
https://gcc.gnu.org/ml/gcc/2015-10/msg00082.html .

OK for trunk?

--Alan

gcc/ChangeLog:

doc/install.texi: Add note against GNAT 4.8 on ARM targets.
---
  gcc/doc/install.texi | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 1fd773e..1ce93d4 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3481,6 +3481,8 @@ require GNU binutils 2.13 or newer.  Such subtargets 
include:
  @code{arm-*-netbsdelf}, @code{arm-*-*linux-*}
  and @code{arm-*-rtemseabi}.

+Building the Ada frontend commonly fails (an infinite loop executing 
@code{xsinfo}) if the host compiler is GNAT 4.8.  Host compilers built from the 
GNAT 4.6, 4.9 or 5 release branches are known to succeed.
+
  @html
  
  @end html



Ping.



Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-12-03 Thread Jeff Law

On 12/03/2015 07:38 AM, Richard Biener wrote:


This pass is now enabled by default with -Os but has no limits on the amount of
stmts it copies.  It also will make all loops with this shape have at least two
exits (if the resulting loop will be disambiguated the inner loop will
have two exits).
Having more than one exit will disable almost all loop optimizations after it.

[ ... ]
split-paths in the queue -- behind addressing a couple of correctness 
issues that are on my plate (not split-paths related).  I'll respond 
fully.  FWIW, I wouldn't lose much sleep if this were disabled by 
default -- without the "sink-common-code-past-phi" stuff we've discussed 
in the past it's fairly hard to justify path-splitting this aggressively.


jeff



Re: [PATCH][install.texi] Add note against GNAT 4.8 on ARM targets.

2015-12-03 Thread Gerald Pfeifer
On Thu, 3 Dec 2015, Alan Lawrence wrote:
>>  doc/install.texi: Add note against GNAT 4.8 on ARM targets.

This looks fine (provided it builds and looks okay).

Just...

>> +Building the Ada frontend commonly fails (an infinite loop executing
>> @code{xsinfo}) if the host compiler is GNAT 4.8.  Host compilers built from
>> the GNAT 4.6, 4.9 or 5 release branches are known to succeed.

...if this is only just one long line, can you please wrap?

Gerald


RE: [PATCH] MIPS/GCC/doc: Reorder `-mcompact-branches='

2015-12-03 Thread Matthew Fortune
Maciej Rozycki  writes:
> Move the `-mcompact-branches=' option out of the middle of a block of
> floating-point options.  The option is not related to FP in any way.
> Place it immediately below other branch instruction selection options.
> 
>   gcc/
>   * doc/invoke.texi (Option Summary) : Reorder
>   `-mcompact-branches='.
>   (MIPS Options): Likewise.
> ---
> 
>  OK to apply?

OK, thanks.

Matthew


Re: [PATCH] Handle OBJ_TYPE_REF in FRE

2015-12-03 Thread Jan Hubicka
> 
> The following patch handles CSEing OBJ_TYPE_REF which was omitted
> because it is a GENERIC expression even on GIMPLE (for whatever

Why it is generic? It is part of gimple grammar :)

> reason...).  Rather than changing this now the following patch
> simply treats it properly as such.

Thanks for working on this! Will this do code motion, too?
I think you may want to compare the ODR type of obj_type_ref_class
otherwise two otherwise equivalent OBJ_TYPE_REFs may lead to different
optimizations later.  I suppose we can have code of form

if (test)
  OBJ_TYPE_REF1
  ...
else
  OBJ_TYPE_REF2
  ..
where each invoke method of different class type but would otherwise
match as equivalent for tree-ssa-sccvn becuase we ignore pointed-to types.

so doing

OBJ_TYPE_REF1
if (test)
  ...
else
  ...

may lead to wrong code.

Or do you just substitute the operands of OBJ_TYPE_REF? 
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> Note that this does not (yet) substitute OBJ_TYPE_REFs in calls
> with SSA names that have the same value - not sure if that would
> be desired generally (does the devirt machinery cope with that?).

This should work fine.
> 
> Thanks,
> Richard.
> 
> 2015-12-03  Richard Biener  
> 
>   PR tree-optimization/64812
>   * tree-ssa-sccvn.c (vn_get_stmt_kind): Handle OBJ_TYPE_REF.
>   (vn_nary_length_from_stmt): Likewise.
>   (init_vn_nary_op_from_stmt): Likewise.
>   * gimple-match-head.c (maybe_build_generic_op): Likewise.
>   * gimple-pretty-print.c (dump_unary_rhs): Likewise.
> 
>   * g++.dg/tree-ssa/ssa-fre-1.C: New testcase.
> 
> Index: gcc/tree-ssa-sccvn.c
> ===
> *** gcc/tree-ssa-sccvn.c  (revision 231221)
> --- gcc/tree-ssa-sccvn.c  (working copy)
> *** vn_get_stmt_kind (gimple *stmt)
> *** 460,465 
> --- 460,467 
> ? VN_CONSTANT : VN_REFERENCE);
>   else if (code == CONSTRUCTOR)
> return VN_NARY;
> + else if (code == OBJ_TYPE_REF)
> +   return VN_NARY;
>   return VN_NONE;
> }
> default:
> *** vn_nary_length_from_stmt (gimple *stmt)
> *** 2479,2484 
> --- 2481,2487 
> return 1;
>   
>   case BIT_FIELD_REF:
> + case OBJ_TYPE_REF:
> return 3;
>   
>   case CONSTRUCTOR:
> *** init_vn_nary_op_from_stmt (vn_nary_op_t
> *** 2508,2513 
> --- 2511,2517 
> break;
>   
>   case BIT_FIELD_REF:
> + case OBJ_TYPE_REF:
> vno->length = 3;
> vno->op[0] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
> vno->op[1] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 1);
> Index: gcc/gimple-match-head.c
> ===
> *** gcc/gimple-match-head.c   (revision 231221)
> --- gcc/gimple-match-head.c   (working copy)
> *** maybe_build_generic_op (enum tree_code c
> *** 243,248 
> --- 243,249 
> *op0 = build1 (code, type, *op0);
> break;
>   case BIT_FIELD_REF:
> + case OBJ_TYPE_REF:
> *op0 = build3 (code, type, *op0, op1, op2);
> break;
>   default:;
> Index: gcc/gimple-pretty-print.c
> ===
> *** gcc/gimple-pretty-print.c (revision 231221)
> --- gcc/gimple-pretty-print.c (working copy)
> *** dump_unary_rhs (pretty_printer *buffer,
> *** 302,308 
> || TREE_CODE_CLASS (rhs_code) == tcc_reference
> || rhs_code == SSA_NAME
> || rhs_code == ADDR_EXPR
> !   || rhs_code == CONSTRUCTOR)
>   {
> dump_generic_node (buffer, rhs, spc, flags, false);
> break;
> --- 302,309 
> || TREE_CODE_CLASS (rhs_code) == tcc_reference
> || rhs_code == SSA_NAME
> || rhs_code == ADDR_EXPR
> !   || rhs_code == CONSTRUCTOR
> !   || rhs_code == OBJ_TYPE_REF)
>   {
> dump_generic_node (buffer, rhs, spc, flags, false);
> break;
> Index: gcc/testsuite/g++.dg/tree-ssa/ssa-fre-1.C
> ===
> *** gcc/testsuite/g++.dg/tree-ssa/ssa-fre-1.C (revision 0)
> --- gcc/testsuite/g++.dg/tree-ssa/ssa-fre-1.C (working copy)
> ***
> *** 0 
> --- 1,44 
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -fdump-tree-fre2" } */
> + 
> + template  class A
> + {
> +   T *p;
> + 
> + public:
> +   A (T *p1) : p (p1) { p->acquire (); }
> + };
> + 
> + class B
> + {
> + public:
> + virtual void acquire ();
> + };
> + class D : public B
> + {
> + };
> + class F : B
> + {
> +   int mrContext;
> + };
> + class WindowListenerMultiplexer : F, public D
> + {
> +   void acquire () { acquire (); }
> + };
> + class C
> + {
> +   void createPeer () throw ();
> +   WindowListenerMultiplexer maWindowListeners;
> + };
> + class FmXGridPeer
> + {
> + public:
> +

[gomp4] backport fortran array reduction changes

2015-12-03 Thread Cesar Philippidis
This patch backports the recent array reduction changes in trunk to
gomp-4_0-branch. It's mostly straightforward, except I couldn't include
changes to reduction-2.f95 because the gimplifier is reordering the loop
clauses slightly different in trunk and gomp4. I'm not sure why. Thomas,
that's something to keep in mind next time you do a trunk merge.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-12-03  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (gfc_match_omp_clauses): Allow subarrays for acc reductions.
	(resolve_omp_clauses): Error on any acc reductions on arrays.

	gcc/testsuite/
	* gfortran.dg/goacc/array-reduction.f90: New test.
	* gfortran.dg/goacc/assumed.f95: Update expected diagnostics.
	* gfortran.dg/goacc/coarray.f95: Likewise.
	* gfortran.dg/goacc/coarray_2.f90: Likewise.
	* gfortran.dg/goacc/reduction.f95: Likewise.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 0e87f54..e7f61f2 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -997,7 +997,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 
 	  if (gfc_match_omp_variable_list (" :",
 	   &c->lists[OMP_LIST_REDUCTION],
-	   false, NULL, &head) == MATCH_YES)
+	   false, NULL, &head, openacc)
+	  == MATCH_YES)
 	{
 	  gfc_omp_namelist *n;
 	  if (rop == OMP_REDUCTION_NONE)
@@ -3429,6 +3430,11 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 		   n->sym->name, &n->where);
 	  else
 	n->sym->mark = 1;
+
+	  /* OpenACC does not support reductions on arrays.  */
+	  if (n->sym->as)
+	gfc_error ("Array %qs is not permitted in reduction at %L",
+		   n->sym->name, &n->where);
 	}
 }
   
diff --git a/gcc/testsuite/gfortran.dg/goacc/array-reduction.f90 b/gcc/testsuite/gfortran.dg/goacc/array-reduction.f90
new file mode 100644
index 000..d71c400
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/array-reduction.f90
@@ -0,0 +1,74 @@
+program test
+  implicit none
+  integer a(10), i
+
+  a(:) = 0
+  
+  ! Array reductions.
+  
+  !$acc parallel reduction (+:a) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a = a + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel
+  !$acc loop reduction (+:a) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a = a + 1
+  end do
+  !$acc end parallel
+
+  !$acc kernels
+  !$acc loop reduction (+:a) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a = a + 1
+  end do
+  !$acc end kernels
+
+  ! Subarray reductions.
+  
+  !$acc parallel reduction (+:a(1:5)) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a = a + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel
+  !$acc loop reduction (+:a(1:5)) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a = a + 1
+  end do
+  !$acc end parallel
+
+  !$acc kernels
+  !$acc loop reduction (+:a(1:5)) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a = a + 1
+  end do
+  !$acc end kernels
+
+  ! Reductions on array elements.
+  
+  !$acc parallel reduction (+:a(1)) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a(1) = a(1) + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel
+  !$acc loop reduction (+:a(1)) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a(1) = a(1) + 1
+  end do
+  !$acc end parallel
+
+  !$acc kernels
+  !$acc loop reduction (+:a(1)) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a(1) = a(1) + 1
+  end do
+  !$acc end kernels
+  
+  print *, a
+end program test
diff --git a/gcc/testsuite/gfortran.dg/goacc/assumed.f95 b/gcc/testsuite/gfortran.dg/goacc/assumed.f95
index 3287241..4efe5a2 100644
--- a/gcc/testsuite/gfortran.dg/goacc/assumed.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/assumed.f95
@@ -45,3 +45,6 @@ contains
 !$acc update self (a) ! { dg-error "Assumed rank" }
   end subroutine assumed_rank
 end module test
+
+! { dg-error "Array 'a' is not permitted in reduction" "" { target "*-*-*" } 18 }
+! { dg-error "Array 'a' is not permitted in reduction" "" { target "*-*-*" } 39 }
diff --git a/gcc/testsuite/gfortran.dg/goacc/coarray.f95 b/gcc/testsuite/gfortran.dg/goacc/coarray.f95
index d2f10d5..932e1f7 100644
--- a/gcc/testsuite/gfortran.dg/goacc/coarray.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/coarray.f95
@@ -2,8 +2,6 @@
 ! { dg-additional-options "-fcoarray=single" }
 !
 ! PR fortran/63861
-! { dg-xfail-if "" { *-*-* } }
-! { dg-excess-errors "TODO" }
 
 module test
 contains
@@ -20,7 +18,7 @@ contains
 !$acc end parallel
 !$acc host_data use_device (a)
 !$acc end host_data
-!$acc parallel loop reduction(+:a)
+!$acc parallel loop reduction(+:a) ! { dg-error "Array 'a' is not permitted in reduction" }
 do i = 1,5
 enddo
 !$acc end parallel loop
diff --git a/gcc/testsuite/gfortran.dg/goacc/

[RFA][PATCH] Run CFG cleanups after reassociation as needed

2015-12-03 Thread Jeff Law

This is something I noticed while working on fixing 67816.

Essentially I was seeing trivially true or trivially false conditionals 
left in the IL for DOM to clean up.


While DOM can and will clean that crud up, but a trivially true or 
trivially false conditional ought to be detected and cleaned up by 
cleanup_cfg.


It turns out the reassociation pass does not schedule a CFG cleanup even 
in cases where it optimizes a conditional to TRUE or FALSE.


Bubbling up an indicator that we optimized away a conditional and using 
that to trigger a CFG cleanup is trivial.


While I have a slight preference to see this fix in GCC 6, if folks 
object and want this to wait for GCC 7 stage1, I'd understand.


Bootstrapped and regression tested on x86_64-linux-gnu.

OK for the trunk?

Thanks,
Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 04dbcb0..61a5e54 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2015-12-03  Jeff Law  
+
+   * tree-ssa-reassoc.c (maybe_optimize_range_tests): Return boolean
+   indicating if a gimple conditional was optimized to true/false.
+   (reassociate_bb): Bubble up return value from
+   maybe_optimize_range_tests.
+   (do_reassoc): Similarly, but for reassociate_bb.
+   (execute_reassoc): Return TODO_cleanup_cfg as needed.
+
 2015-11-27  Jiri Engelthaler  
 
PR driver/68029
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 4e62a06..893aab1 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2015-12-02  Jeff Law  
+
+   * gcc.dg/tree-ssa/reassoc-43.c: New test.
+
 2015-12-02  Andreas Krebbel  
 
* gcc.dg/optimize-bswapdi-1.c: Force using -mzarch on s390 and
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-43.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-43.c
new file mode 100644
index 000..ea44f30
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-43.c
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc -w" } */
+
+typedef union tree_node *tree;
+enum cpp_ttype { CPP_COLON, CPP_SEMICOLON, CPP_CLOSE_BRACE, CPP_COMMA };
+enum rid { RID_STATIC = 0, RID_ATTRIBUTE, };
+typedef struct c_token
+{
+  enum cpp_ttype type:8;
+}
+c_token;
+typedef struct c_parser
+{
+  c_token tokens[2];
+  short tokens_avail;
+}
+c_parser;
+__inline__ c_token *
+c_parser_peek_token (c_parser * parser)
+{
+  if (parser->tokens_avail == 0)
+{
+  parser->tokens_avail = 1;
+}
+  return &parser->tokens[0];
+}
+
+__inline__ unsigned char
+c_parser_next_token_is (c_parser * parser, enum cpp_ttype type)
+{
+  return c_parser_peek_token (parser)->type == type;
+}
+
+void
+c_parser_translation_unit (c_parser * parser)
+{
+  tree prefix_attrs;
+  tree all_prefix_attrs;
+  while (1)
+{
+  if (c_parser_next_token_is (parser, CPP_COLON)
+ || c_parser_next_token_is (parser, CPP_COMMA)
+ || c_parser_next_token_is (parser, CPP_SEMICOLON)
+ || c_parser_next_token_is (parser, CPP_CLOSE_BRACE)
+ || c_parser_next_token_is_keyword (parser, RID_ATTRIBUTE))
+   {
+ if (c_parser_next_token_is_keyword (parser, RID_ATTRIBUTE))
+   all_prefix_attrs =
+ chainon (c_parser_attributes (parser), prefix_attrs);
+   }
+}
+}
+/* { dg-final { scan-tree-dump-not "0 != 0" "reassoc2"} } */
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index dfd0da1..315b0bf 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -2976,9 +2976,15 @@ struct inter_bb_range_test_entry
   unsigned int first_idx, last_idx;
 };
 
-/* Inter-bb range test optimization.  */
+/* Inter-bb range test optimization.
 
-static void
+   Returns TRUE if a gimple conditional is optimized to a true/false,
+   otherwise return FALSE.
+
+   This indicates to the caller that it should run a CFG cleanup pass
+   once reassociation is completed.  */
+
+static bool
 maybe_optimize_range_tests (gimple *stmt)
 {
   basic_block first_bb = gimple_bb (stmt);
@@ -2990,6 +2996,7 @@ maybe_optimize_range_tests (gimple *stmt)
   auto_vec ops;
   auto_vec bbinfo;
   bool any_changes = false;
+  bool cfg_cleanup_needed = false;
 
   /* Consider only basic blocks that end with GIMPLE_COND or
  a cast statement satisfying final_range_test_p.  All
@@ -2998,15 +3005,15 @@ maybe_optimize_range_tests (gimple *stmt)
   if (gimple_code (stmt) == GIMPLE_COND)
 {
   if (EDGE_COUNT (first_bb->succs) != 2)
-   return;
+   return cfg_cleanup_needed;
 }
   else if (final_range_test_p (stmt))
 other_bb = single_succ (first_bb);
   else
-return;
+return cfg_cleanup_needed;
 
   if (stmt_could_throw_p (stmt))
-return;
+return cfg_cleanup_needed;
 
   /* As relative ordering of post-dominator sons isn't fixed,
  maybe_optimize_range_tests can be called first on any
@@ -3030,14 +3037,14 @@ maybe_optimize_range_tests (gimple *stmt)
   /* As non-GIMPLE_COND last stmt always terminates the range,
   

C PATCH for c/68668 (grokdeclarator and wrong type of PARM_DECL)

2015-12-03 Thread Marek Polacek
This ought to fix the fallout from PR c/68162 fix.  Here the problem is that
grokdeclarator created a wrong type for PARM_DECL "p".  It created this decl
with type "const int[] *" while it should be "const int *".

I think the problem is that we weren't using TREE_TYPE on orig_qual_type and
thus c_build_qualified_type and subsequent c_build_pointer_type might create
a bogus type.  So when we're transfering const-ness of an array into that of
type pointed to, use TREE_TYPE not only of "type", but even of the orig qual
type.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-12-03  Marek Polacek  

PR c/68668
* c-decl.c (grokdeclarator): When creating a PARM_DECL of ARRAY_TYPE,
use TREE_TYPE of orig_qual_type.

* gcc.dg/pr68668.c: New test.

diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index 9ad8219..0edff2a 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -6417,6 +6417,8 @@ grokdeclarator (const struct c_declarator *declarator,
  {
/* Transfer const-ness of array into that of type pointed to.  */
type = TREE_TYPE (type);
+   if (orig_qual_type != NULL_TREE)
+ orig_qual_type = TREE_TYPE (orig_qual_type);
if (type_quals)
  type = c_build_qualified_type (type, type_quals, orig_qual_type,
 orig_qual_indirect);
diff --git gcc/testsuite/gcc.dg/pr68668.c gcc/testsuite/gcc.dg/pr68668.c
index e69de29..d144fb6 100644
--- gcc/testsuite/gcc.dg/pr68668.c
+++ gcc/testsuite/gcc.dg/pr68668.c
@@ -0,0 +1,10 @@
+/* PR c/68668 */
+/* { dg-do compile } */
+
+typedef const int T[];
+
+int
+fn1 (T p)
+{
+  return p[0];
+}

Marek


[PATCH] Add options -finstrument-functions-include-{file,function}-list

2015-12-03 Thread Andi Drebes
By default -finstrument-functions instruments all functions. To limit
instrumentation to certain functions or files it is necessary to
specify the complement using -finstrument-functions-exclude-file-list
or -finstrument-functions-exclude-function-list.

The new options -finstrument-functions-include-file-list and
-finstrument-functions-include-function-list make the specification of
the complement unnecessary by allowing the user to limit
instrumentation to a set of file names and functions.
---
 gcc/common.opt   | 16 ++-
 gcc/doc/invoke.texi  | 52 
 gcc/gimplify.c   | 51 ---
 gcc/opts.c   | 10 +++
 gcc/testsuite/ChangeLog  | 10 +++
 gcc/testsuite/gcc.dg/instrument-10.c |  7 +
 gcc/testsuite/gcc.dg/instrument-4.c  |  7 +
 gcc/testsuite/gcc.dg/instrument-5.c  |  7 +
 gcc/testsuite/gcc.dg/instrument-6.c  |  7 +
 gcc/testsuite/gcc.dg/instrument-7.c  |  7 +
 gcc/testsuite/gcc.dg/instrument-8.c  |  7 +
 gcc/testsuite/gcc.dg/instrument-9.c  |  7 +
 12 files changed, 172 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/instrument-10.c
 create mode 100644 gcc/testsuite/gcc.dg/instrument-4.c
 create mode 100644 gcc/testsuite/gcc.dg/instrument-5.c
 create mode 100644 gcc/testsuite/gcc.dg/instrument-6.c
 create mode 100644 gcc/testsuite/gcc.dg/instrument-7.c
 create mode 100644 gcc/testsuite/gcc.dg/instrument-8.c
 create mode 100644 gcc/testsuite/gcc.dg/instrument-9.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 3eb520e..ac797b3 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -97,7 +97,7 @@ int flag_gen_aux_info = 0
 Variable
 int flag_shlib
 
-; These two are really VEC(char_p,heap) *.
+; These are really VEC(char_p,heap) *.
 
 Variable
 void *flag_instrument_functions_exclude_functions
@@ -105,6 +105,12 @@ void *flag_instrument_functions_exclude_functions
 Variable
 void *flag_instrument_functions_exclude_files
 
+Variable
+void *flag_instrument_functions_include_functions
+
+Variable
+void *flag_instrument_functions_include_files
+
 ; Generic structs (e.g. templates not explicitly specialized)
 ; may not have a compilation unit associated with them, and so
 ; may need to be treated differently from ordinary structs.
@@ -1477,6 +1483,14 @@ finstrument-functions-exclude-file-list=
 Common RejectNegative Joined
 -finstrument-functions-exclude-file-list=filename,...  Do not instrument 
functions listed in files.
 
+finstrument-functions-include-function-list=
+Common RejectNegative Joined
+-finstrument-functions-include-function-list=name,...  Only instrument listed 
functions.
+
+finstrument-functions-include-file-list=
+Common RejectNegative Joined
+-finstrument-functions-include-file-list=filename,...  Only instrument 
functions listed in files.
+
 fipa-cp
 Common Report Var(flag_ipa_cp) Optimization
 Perform interprocedural constant propagation.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 53f1fe2..ba9a3bd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1150,6 +1150,8 @@ See S/390 and zSeries Options.
 -finhibit-size-directive  -finstrument-functions @gol
 -finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{} @gol
 -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{} @gol
+-finstrument-functions-include-function-list=@var{sym},@var{sym},@dots{} @gol
+-finstrument-functions-include-file-list=@var{file},@var{file},@dots{} @gol
 -fno-common  -fno-ident @gol
 -fpcc-struct-return  -fpic  -fPIC -fpie -fPIE -fno-plt @gol
 -fno-jump-tables @gol
@@ -24529,6 +24531,56 @@ of the function name, it is considered to be a match.  
For C99 and C++
 extended identifiers, the function name must be given in UTF-8, not
 using universal character names.
 
+@item -finstrument-functions-include-file-list=@var{file},@var{file},@dots{}
+@opindex finstrument-functions-include-file-list
+
+Limit function instrumentation to functions from files specified in
+the list. The matching of file names is identical to the matching of
+@option{-finstrument-functions-exclude-file-list}. For example
+
+@smallexample
+-finstrument-functions-include-file-list=/foo/bar,baz
+@end smallexample
+
+@noindent
+includes only functions defined in files whose pathnames contain
+@file{/foo/bar} or @file{baz}. Additional functions can be added by
+using the option
+@option{-finstrument-functions-include-function-list}. For example
+
+@smallexample
+-finstrument-functions-include-file-list=/foo/bar,baz
+-finstrument-functions-include-function-list=fn1,fn2
+@end smallexample
+includes functions defined in files whose pathnames contain
+@file{/foo/bar} or @file{baz} as well as functions whose user-readable
+names contain @code{fn1} or @code{fn2}.
+
+The option can also be combined with exclusions, where exclusions take
+precedence. For example
+
+@smallexample
+-fins

Re: C PATCH for c/68668 (grokdeclarator and wrong type of PARM_DECL)

2015-12-03 Thread Joseph Myers
On Thu, 3 Dec 2015, Marek Polacek wrote:

> This ought to fix the fallout from PR c/68162 fix.  Here the problem is that
> grokdeclarator created a wrong type for PARM_DECL "p".  It created this decl
> with type "const int[] *" while it should be "const int *".
> 
> I think the problem is that we weren't using TREE_TYPE on orig_qual_type and
> thus c_build_qualified_type and subsequent c_build_pointer_type might create
> a bogus type.  So when we're transfering const-ness of an array into that of
> type pointed to, use TREE_TYPE not only of "type", but even of the orig qual
> type.

I think you also need to decrement orig_qual_indirect, which counts the 
number of levels of array type derivation from orig_qual_type.

-- 
Joseph S. Myers
jos...@codesourcery.com


[hsa] Useful checking assert in scan_omp_1_op

2015-12-03 Thread Martin Jambor
Hi,

I have found that adding the following checking assert very useful
when debugging omp lowering issues, so I have added it to the hsa
branch.  I hope that nobody will mind, but it of course is not an
essential thing to have if someone does.

Thanks,

Martin

2015-12-03  Martin Jambor  

* omp-low.c (scan_omp_1_op): Add checking assert that we are not
re-mapping to ERROR_MARK.
---
 gcc/omp-low.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 8854df7..05d8901 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3731,7 +3731,11 @@ scan_omp_1_op (tree *tp, int *walk_subtrees, void *data)
 case LABEL_DECL:
 case RESULT_DECL:
   if (ctx)
-   *tp = remap_decl (t, &ctx->cb);
+   {
+ tree repl = remap_decl (t, &ctx->cb);
+ gcc_checking_assert (TREE_CODE (repl) != ERROR_MARK);
+ *tp = repl;
+   }
   break;
 
 default:
-- 
2.6.3



Re: [PATCH] Add options -finstrument-functions-include-{file,function}-list

2015-12-03 Thread Bert Wesarg
Hi,

better write your own instrumentation plug-in and do the filtering on
your own. The plug-in interface exists since 4.5 so you have a much
greater version base that can support your feature already, than some
future version of GCC which may have this patch. While we didn't
announced it here on GCC, we maintain such plug-in already in Score-P
[1], and the overhead is also much lower (we also have a runtime
filter), we do not instrument inlined functions and functions from
system headers by default, and we do not need debug symbols to get
function names.

Best,
Bert

[1] www.score-p.org

On Thu, Dec 3, 2015 at 7:06 PM, Andi Drebes  wrote:
> By default -finstrument-functions instruments all functions. To limit
> instrumentation to certain functions or files it is necessary to
> specify the complement using -finstrument-functions-exclude-file-list
> or -finstrument-functions-exclude-function-list.
>
> The new options -finstrument-functions-include-file-list and
> -finstrument-functions-include-function-list make the specification of
> the complement unnecessary by allowing the user to limit
> instrumentation to a set of file names and functions.
> ---
>  gcc/common.opt   | 16 ++-
>  gcc/doc/invoke.texi  | 52 
> 
>  gcc/gimplify.c   | 51 ---
>  gcc/opts.c   | 10 +++
>  gcc/testsuite/ChangeLog  | 10 +++
>  gcc/testsuite/gcc.dg/instrument-10.c |  7 +
>  gcc/testsuite/gcc.dg/instrument-4.c  |  7 +
>  gcc/testsuite/gcc.dg/instrument-5.c  |  7 +
>  gcc/testsuite/gcc.dg/instrument-6.c  |  7 +
>  gcc/testsuite/gcc.dg/instrument-7.c  |  7 +
>  gcc/testsuite/gcc.dg/instrument-8.c  |  7 +
>  gcc/testsuite/gcc.dg/instrument-9.c  |  7 +
>  12 files changed, 172 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/instrument-10.c
>  create mode 100644 gcc/testsuite/gcc.dg/instrument-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/instrument-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/instrument-6.c
>  create mode 100644 gcc/testsuite/gcc.dg/instrument-7.c
>  create mode 100644 gcc/testsuite/gcc.dg/instrument-8.c
>  create mode 100644 gcc/testsuite/gcc.dg/instrument-9.c
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 3eb520e..ac797b3 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -97,7 +97,7 @@ int flag_gen_aux_info = 0
>  Variable
>  int flag_shlib
>
> -; These two are really VEC(char_p,heap) *.
> +; These are really VEC(char_p,heap) *.
>
>  Variable
>  void *flag_instrument_functions_exclude_functions
> @@ -105,6 +105,12 @@ void *flag_instrument_functions_exclude_functions
>  Variable
>  void *flag_instrument_functions_exclude_files
>
> +Variable
> +void *flag_instrument_functions_include_functions
> +
> +Variable
> +void *flag_instrument_functions_include_files
> +
>  ; Generic structs (e.g. templates not explicitly specialized)
>  ; may not have a compilation unit associated with them, and so
>  ; may need to be treated differently from ordinary structs.
> @@ -1477,6 +1483,14 @@ finstrument-functions-exclude-file-list=
>  Common RejectNegative Joined
>  -finstrument-functions-exclude-file-list=filename,...  Do not instrument 
> functions listed in files.
>
> +finstrument-functions-include-function-list=
> +Common RejectNegative Joined
> +-finstrument-functions-include-function-list=name,...  Only instrument 
> listed functions.
> +
> +finstrument-functions-include-file-list=
> +Common RejectNegative Joined
> +-finstrument-functions-include-file-list=filename,...  Only instrument 
> functions listed in files.
> +
>  fipa-cp
>  Common Report Var(flag_ipa_cp) Optimization
>  Perform interprocedural constant propagation.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 53f1fe2..ba9a3bd 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1150,6 +1150,8 @@ See S/390 and zSeries Options.
>  -finhibit-size-directive  -finstrument-functions @gol
>  -finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{} @gol
>  -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{} @gol
> +-finstrument-functions-include-function-list=@var{sym},@var{sym},@dots{} @gol
> +-finstrument-functions-include-file-list=@var{file},@var{file},@dots{} @gol
>  -fno-common  -fno-ident @gol
>  -fpcc-struct-return  -fpic  -fPIC -fpie -fPIE -fno-plt @gol
>  -fno-jump-tables @gol
> @@ -24529,6 +24531,56 @@ of the function name, it is considered to be a 
> match.  For C99 and C++
>  extended identifiers, the function name must be given in UTF-8, not
>  using universal character names.
>
> +@item -finstrument-functions-include-file-list=@var{file},@var{file},@dots{}
> +@opindex finstrument-functions-include-file-list
> +
> +Limit function instrumentation to functions from files specified in
> +the list. The matching of file names is identical to the matchi

[hsa] Make copy_gimple_seq_and_replace_locals copy seqs in omp clauses

2015-12-03 Thread Martin Jambor
Hi,

this is a fix to the last "last" ICE of the hsa branch.  THe problem
turned out not to be in the gridification itself but, depending your
point of view, in the gimple and tree walking infrastructure or in
function copy_gimple_seq_and_replace_locals from tree-inline.c on
which hsa gridification relies.

The issue is that in between gimplification and omplow pass, there can
be gimple sequences attached to OMP_CLAUSE trees that are attached to
omp statements and that are neither copied by gimple_seq_copy nor
walked by walk_gimple_seq.

While the correct solution would probably be to extend tree and gimple
walkers to handle them, that would be a big change.  I have talked
with Jakub about this yesterday on the IRC and he suggested that I
enhance the internal walkers of copy_gimple_seq_and_replace_locals
deal with this situation.  Even though that leaves gimple_seq_copy,
walk_gimple_seq and other to be technically incorrect, that is what I
have done in the patch below, which fixes my last ICEs and which I
have already committed to the branch.

Any feedback is of course very much appreciated,

Martin


2015-12-03  Martin Jambor  

* tree-inline.c (duplicate_remap_omp_clause_seq): New function.
(replace_locals_op): Duplicate gimple sequences in OMP clauses.

---
 gcc/tree-inline.c | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index ebab189..15141dc 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -5116,6 +5116,8 @@ mark_local_labels_stmt (gimple_stmt_iterator *gsip,
   return NULL_TREE;
 }
 
+static gimple_seq duplicate_remap_omp_clause_seq (gimple_seq seq,
+ struct walk_stmt_info *wi);
 
 /* Called via walk_gimple_seq by copy_gimple_seq_and_replace_local.
Using the splay_tree pointed to by ST (which is really a `splay_tree'),
@@ -5160,6 +5162,35 @@ replace_locals_op (tree *tp, int *walk_subtrees, void 
*data)
  TREE_OPERAND (expr, 3) = NULL_TREE;
}
 }
+  else if (TREE_CODE (expr) == OMP_CLAUSE)
+{
+  /* Before the omplower pass completes, some OMP clauses can contain
+sequences that are neither copied by gimple_seq_copy nor walked by
+walk_gimple_seq.  To make copy_gimple_seq_and_replace_locals work even
+in those situations, we have to copy and process them explicitely.  */
+
+  if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_LASTPRIVATE)
+   {
+ gimple_seq seq = OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (expr) = seq;
+   }
+  else if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_LINEAR)
+   {
+ gimple_seq seq = OMP_CLAUSE_LINEAR_GIMPLE_SEQ (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_LINEAR_GIMPLE_SEQ (expr) = seq;
+   }
+  else if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_REDUCTION)
+   {
+ gimple_seq seq = OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr) = seq;
+ seq = OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr) = seq;
+   }
+}
 
   /* Keep iterating.  */
   return NULL_TREE;
@@ -5200,6 +5231,18 @@ replace_locals_stmt (gimple_stmt_iterator *gsip,
   return NULL_TREE;
 }
 
+/* Create a copy of SEQ and remap all decls in it.  */
+
+static gimple_seq
+duplicate_remap_omp_clause_seq (gimple_seq seq, struct walk_stmt_info *wi)
+{
+  /* If there are any labels in OMP sequences, they can be only referred to in
+ the sequence itself and therefore we can do both here.  */
+  walk_gimple_seq (seq, mark_local_labels_stmt, NULL, wi);
+  gimple_seq copy = gimple_seq_copy (seq);
+  walk_gimple_seq (copy, replace_locals_stmt, replace_locals_op, wi);
+  return copy;
+}
 
 /* Copies everything in SEQ and replaces variables and labels local to
current_function_decl.  */
-- 
2.6.3



Re: Add fuzzing coverage support

2015-12-03 Thread Dmitry Vyukov
I've attached updated patch (also reuploaded
https://codereview.appspot.com/280140043).
Fixed ChangeLog.
Added invoke.texi.
Fixed style issues.

The function is defined only in kernel at the moment. Here is my patch:
https://github.com/dvyukov/linux/commit/f86eda0c895c47ea02ee37e981aeade7b03014d7
It is not mailed yet, for kernel asan people requested submit to gcc
first, then to kernel.

It will also be supported by libsanitizer later (Kostya?). But it is
not yet there.

Regarding plugins, we did tsan first as gcc plugin. It was difficult
to support, difficult to use, difficult to distribute. I maintain this
patch for a month, two people complained that it does not build
(because they synched to slightly different revisions).
Index: ChangeLog
===
--- ChangeLog	(revision 231234)
+++ ChangeLog	(working copy)
@@ -1,3 +1,15 @@
+2015-12-03  Dmitry Vyukov  
+
+	* sancov.c: New file.
+	* Makefile.in (OBJS): Add sancov.o.
+	* invoke.texi (-fsanitize-coverage=trace-pc): Describe.
+	* passes.def (sancov_pass): Add.
+	* tree-pass.h  (sancov_pass): Add.
+	* common.opt (-fsanitize-coverage=trace-pc): Add.
+	* sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_PC): Add.
+	* builtins.def (DEF_SANITIZER_BUILTIN): Enable for
+	flag_sanitize_coverage.
+
 2015-12-03  Evandro Menezes  
 
 	* config/aarch64/aarch64-cores.def: Use the Exynos M1 cost model.
@@ -360,7 +372,6 @@
 	* tree-ssa-structalias.c (find_func_aliases_for_builtin_call)
 	(find_func_clobbers, ipa_pta_execute): Handle BUILT_IN_GOACC_PARALLEL.
 
->>> .r231221
 2015-12-02  Segher Boessenkool  
 
 	* config/rs6000/rs6000.md (cstore_si_as_di): New expander.
Index: Makefile.in
===
--- Makefile.in	(revision 231234)
+++ Makefile.in	(working copy)
@@ -1427,6 +1427,7 @@
 	tsan.o \
 	ubsan.o \
 	sanopt.o \
+	sancov.o \
 	tree-call-cdce.o \
 	tree-cfg.o \
 	tree-cfgcleanup.o \
@@ -2400,6 +2401,7 @@
   $(srcdir)/ubsan.c \
   $(srcdir)/tsan.c \
   $(srcdir)/sanopt.c \
+  $(srcdir)/sancov.c \
   $(srcdir)/ipa-devirt.c \
   $(srcdir)/internal-fn.h \
   @all_gtfiles@
Index: builtins.def
===
--- builtins.def	(revision 231234)
+++ builtins.def	(working copy)
@@ -210,7 +210,8 @@
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
 	   true, true, true, ATTRS, true, \
 	  (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD \
-| SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT)))
+| SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT) \
+	   || flag_sanitize_coverage))
 
 #undef DEF_CILKPLUS_BUILTIN
 #define DEF_CILKPLUS_BUILTIN(ENUM, NAME, TYPE, ATTRS)  \
Index: common.opt
===
--- common.opt	(revision 231234)
+++ common.opt	(working copy)
@@ -225,6 +225,11 @@
 Variable
 unsigned int flag_sanitize_recover = SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT | SANITIZE_KERNEL_ADDRESS
 
+fsanitize-coverage=trace-pc
+Common Report Var(flag_sanitize_coverage)
+Enable coverage-guided fuzzing code instrumentation.
+Inserts call to __sanitizer_cov_trace_pc into every basic block.
+
 ; Flag whether a prefix has been added to dump_base_name
 Variable
 bool dump_base_name_prefixed = false
Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 231234)
+++ doc/invoke.texi	(working copy)
@@ -6135,6 +6135,11 @@
 @code{libubsan} library is not needed and is not linked in, so this
 is usable even in freestanding environments.
 
+@item -fsanitize-coverage=trace-pc
+@opindex fsanitize-coverage=trace-pc
+Enable coverage-guided fuzzing code instrumentation.
+Inserts call to __sanitizer_cov_trace_pc into every basic block.
+
 @item -fcheck-pointer-bounds
 @opindex fcheck-pointer-bounds
 @opindex fno-check-pointer-bounds
Index: passes.def
===
--- passes.def	(revision 231234)
+++ passes.def	(working copy)
@@ -237,6 +237,7 @@
   NEXT_PASS (pass_split_crit_edges);
   NEXT_PASS (pass_pre);
   NEXT_PASS (pass_sink_code);
+  NEXT_PASS (pass_sancov);
   NEXT_PASS (pass_asan);
   NEXT_PASS (pass_tsan);
   /* Pass group that runs when 1) enabled, 2) there are loops
@@ -346,6 +347,7 @@
  to forward object-size and builtin folding results properly.  */
   NEXT_PASS (pass_copy_prop);
   NEXT_PASS (pass_dce);
+  NEXT_PASS (pass_sancov);
   NEXT_PASS (pass_asan);
   NEXT_PASS (pass_tsan);
   /* ???  We do want some kind of loop invariant motion, but we possibly
@@ -369,6 +371,7 @@
   NEXT_PASS (pass_lower_vaarg);
   NEXT_PASS (pass_lower_vector);
   NEXT_PASS (pass_lower_complex_O0);
+  NEXT_PASS (pass_sancov_O0);
   NEXT_PASS (pass_asan_O0);
   NEXT_PASS (pass_tsan_O0);
   NEXT_PASS (pass_sanopt);
Index: sancov.c
=

Re: [hsa] Make copy_gimple_seq_and_replace_locals copy seqs in omp clauses

2015-12-03 Thread Jakub Jelinek
On Thu, Dec 03, 2015 at 07:26:20PM +0100, Martin Jambor wrote:
> this is a fix to the last "last" ICE of the hsa branch.  THe problem
> turned out not to be in the gridification itself but, depending your
> point of view, in the gimple and tree walking infrastructure or in
> function copy_gimple_seq_and_replace_locals from tree-inline.c on
> which hsa gridification relies.
> 
> The issue is that in between gimplification and omplow pass, there can
> be gimple sequences attached to OMP_CLAUSE trees that are attached to
> omp statements and that are neither copied by gimple_seq_copy nor
> walked by walk_gimple_seq.
> 
> While the correct solution would probably be to extend tree and gimple
> walkers to handle them, that would be a big change.  I have talked
> with Jakub about this yesterday on the IRC and he suggested that I
> enhance the internal walkers of copy_gimple_seq_and_replace_locals
> deal with this situation.  Even though that leaves gimple_seq_copy,
> walk_gimple_seq and other to be technically incorrect, that is what I
> have done in the patch below, which fixes my last ICEs and which I
> have already committed to the branch.

The point is that those gimple_seqs are there only from gimplification
till omplower, and I believe nothing else for now cares about those.
> @@ -5200,6 +5231,18 @@ replace_locals_stmt (gimple_stmt_iterator *gsip,
>return NULL_TREE;
>  }
>  
> +/* Create a copy of SEQ and remap all decls in it.  */
> +
> +static gimple_seq
> +duplicate_remap_omp_clause_seq (gimple_seq seq, struct walk_stmt_info *wi)
> +{

I would have expected an early if (seq == NULL) return NULL; either here,
or in the callers (not doing anything in the common case when it is NULL).

Jakub


Re: Add fuzzing coverage support

2015-12-03 Thread Dmitry Vyukov
On Thu, Dec 3, 2015 at 7:34 PM, Dmitry Vyukov  wrote:
> I've attached updated patch (also reuploaded
> https://codereview.appspot.com/280140043).
> Fixed ChangeLog.
> Added invoke.texi.
> Fixed style issues.
>
> The function is defined only in kernel at the moment. Here is my patch:
> https://github.com/dvyukov/linux/commit/f86eda0c895c47ea02ee37e981aeade7b03014d7
> It is not mailed yet, for kernel asan people requested submit to gcc
> first, then to kernel.
>
> It will also be supported by libsanitizer later (Kostya?). But it is
> not yet there.
>
> Regarding plugins, we did tsan first as gcc plugin. It was difficult
> to support, difficult to use, difficult to distribute. I maintain this
> patch for a month, two people complained that it does not build
> (because they synched to slightly different revisions).


Added missing:
  stmt = gsi_stmt (gsi);
Now actually run tests and compiled kernel with it.
Index: ChangeLog
===
--- ChangeLog	(revision 231234)
+++ ChangeLog	(working copy)
@@ -1,3 +1,15 @@
+2015-12-03  Dmitry Vyukov  
+
+	* sancov.c: New file.
+	* Makefile.in (OBJS): Add sancov.o.
+	* invoke.texi (-fsanitize-coverage=trace-pc): Describe.
+	* passes.def (sancov_pass): Add.
+	* tree-pass.h  (sancov_pass): Add.
+	* common.opt (-fsanitize-coverage=trace-pc): Add.
+	* sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_PC): Add.
+	* builtins.def (DEF_SANITIZER_BUILTIN): Enable for
+	flag_sanitize_coverage.
+
 2015-12-03  Evandro Menezes  
 
 	* config/aarch64/aarch64-cores.def: Use the Exynos M1 cost model.
@@ -360,7 +372,6 @@
 	* tree-ssa-structalias.c (find_func_aliases_for_builtin_call)
 	(find_func_clobbers, ipa_pta_execute): Handle BUILT_IN_GOACC_PARALLEL.
 
->>> .r231221
 2015-12-02  Segher Boessenkool  
 
 	* config/rs6000/rs6000.md (cstore_si_as_di): New expander.
Index: Makefile.in
===
--- Makefile.in	(revision 231234)
+++ Makefile.in	(working copy)
@@ -1427,6 +1427,7 @@
 	tsan.o \
 	ubsan.o \
 	sanopt.o \
+	sancov.o \
 	tree-call-cdce.o \
 	tree-cfg.o \
 	tree-cfgcleanup.o \
@@ -2400,6 +2401,7 @@
   $(srcdir)/ubsan.c \
   $(srcdir)/tsan.c \
   $(srcdir)/sanopt.c \
+  $(srcdir)/sancov.c \
   $(srcdir)/ipa-devirt.c \
   $(srcdir)/internal-fn.h \
   @all_gtfiles@
Index: builtins.def
===
--- builtins.def	(revision 231234)
+++ builtins.def	(working copy)
@@ -210,7 +210,8 @@
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
 	   true, true, true, ATTRS, true, \
 	  (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD \
-| SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT)))
+| SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT) \
+	   || flag_sanitize_coverage))
 
 #undef DEF_CILKPLUS_BUILTIN
 #define DEF_CILKPLUS_BUILTIN(ENUM, NAME, TYPE, ATTRS)  \
Index: common.opt
===
--- common.opt	(revision 231234)
+++ common.opt	(working copy)
@@ -225,6 +225,11 @@
 Variable
 unsigned int flag_sanitize_recover = SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT | SANITIZE_KERNEL_ADDRESS
 
+fsanitize-coverage=trace-pc
+Common Report Var(flag_sanitize_coverage)
+Enable coverage-guided fuzzing code instrumentation.
+Inserts call to __sanitizer_cov_trace_pc into every basic block.
+
 ; Flag whether a prefix has been added to dump_base_name
 Variable
 bool dump_base_name_prefixed = false
Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 231234)
+++ doc/invoke.texi	(working copy)
@@ -6135,6 +6135,11 @@
 @code{libubsan} library is not needed and is not linked in, so this
 is usable even in freestanding environments.
 
+@item -fsanitize-coverage=trace-pc
+@opindex fsanitize-coverage=trace-pc
+Enable coverage-guided fuzzing code instrumentation.
+Inserts call to __sanitizer_cov_trace_pc into every basic block.
+
 @item -fcheck-pointer-bounds
 @opindex fcheck-pointer-bounds
 @opindex fno-check-pointer-bounds
Index: passes.def
===
--- passes.def	(revision 231234)
+++ passes.def	(working copy)
@@ -237,6 +237,7 @@
   NEXT_PASS (pass_split_crit_edges);
   NEXT_PASS (pass_pre);
   NEXT_PASS (pass_sink_code);
+  NEXT_PASS (pass_sancov);
   NEXT_PASS (pass_asan);
   NEXT_PASS (pass_tsan);
   /* Pass group that runs when 1) enabled, 2) there are loops
@@ -346,6 +347,7 @@
  to forward object-size and builtin folding results properly.  */
   NEXT_PASS (pass_copy_prop);
   NEXT_PASS (pass_dce);
+  NEXT_PASS (pass_sancov);
   NEXT_PASS (pass_asan);
   NEXT_PASS (pass_tsan);
   /* ???  We do want some kind of loop invariant motion, but we possibly
@@ -369,6 +371,7 @@
   NEXT_PASS (pass_lower_vaarg);
   NEXT_PASS (pass_lower_vector);
   NEXT_PASS (pass_lower_complex_O0);
+  NE

Re: [PATCH] Fix shrink-wrap bug with anticipating into loops (PR67778, PR68634)

2015-12-03 Thread Segher Boessenkool
On Thu, Dec 03, 2015 at 12:35:51PM +0100, Bernd Schmidt wrote:
> On 12/02/2015 07:21 PM, Segher Boessenkool wrote:
> >After shrink-wrapping has found the "tightest fit" for where to place
> >the prologue, it tries move it earlier (so that frame saves are run
> >earlier) -- but without copying any more basic blocks.
> 
> Another question would be - is there really a good reason to do this at all?

I haven't actually benchmarked it to see if it in fact matters for
performance.  The original code did something similar, but perhaps not
for the same reasons.  The goal is to put the prologue as early as
possible while only putting it on paths that need it (the code before
here puts it as *late* as possible instead).

Moving the prologue earlier gives more free registers (the ones it saved)
in the blocks "skipped", so that late passes have more to work with.
More importantly, moving the prologue and the epilogue further apart
avoids some execution hazards.


Segher


Re: [Fortran, Patch] (RFC, Coarray) Implement TS18508's EVENTS

2015-12-03 Thread Alessandro Fanfarillo
Yes please.

Thanks.

2015-12-02 23:00 GMT+01:00 Steve Kargl :
> Committed as revision 231208.
>
> Alessandro, Tobias, is this a candidate for a commit to
> the 5-branch when it is re-opened?
>
> --
> steve
>
> On Wed, Dec 02, 2015 at 03:16:05PM +0100, Alessandro Fanfarillo wrote:
>> *PING*
>>
>> 2015-11-26 17:51 GMT+01:00 Steve Kargl :
>> > On Wed, Nov 25, 2015 at 06:24:49PM +0100, Alessandro Fanfarillo wrote:
>> >> Dear all,
>> >>
>> >> in attachment the previous patch compatible with the current trunk.
>> >> The patch also includes the changes introduced in the latest TS 18508.
>> >>
>> >> Built and regtested on x86_64-pc-linux-gnu.
>> >>
>> >> PS: I will add the test cases in a different patch.
>> >>
>> >
>> > I have now built and regression tested the patch on
>> > x86_64-*-freebsd and i386-*-freebsd.  There were no
>> > regressions.  In reading through the patch, nothing
>> > jumped out at me as suspicious/wrong.  Tobias, this
>> > is OK to commit.  If you don't committed by Sunday,
>> > I'll do it for you.
>> >
>> > --
>> > steve
>
> --
> Steve


[PATCH] Fix missing range information for "%q+D" format code

2015-12-03 Thread David Malcolm
There are about 220 or so diagnostics in trunk that use "%q+D" in
their format string, which, as well as printing a quoted decl,
overwrites any location_t supplied to the diagnostic, instead using
the location of the associated decl.

During development of the location range patch kit I adjusted
things to use both location&range of the decl for this case, but it
looks I broke it at some point; in the version in trunk the code is
currently discarding range information, so that just the caret is
printed.

For example:

diagnostic-ranges-1.c:6:7: warning: unused variable 'redundant' 
[-Wunused-variable]
   int redundant;
   ^
The attached patch updates the handling of %q+D, simplifying
the implementation, and ensuring that it retains the range
information of the decl, giving:

diagnostic-ranges-1.c:6:7: warning: unused variable ‘redundant’ 
[-Wunused-variable]
   int redundant;
   ^

As well as the above fix, the patch adds test coverage, both
- for the specific case above, and
- as a unit test for %q+D via one of the existing test plugins

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu; adds
5 PASS results to gcc.sum.

OK for trunk?

gcc/c-family/ChangeLog:
* c-common.c (c_cpp_error): Update for change to
rich_location::set_range.

gcc/fortran/ChangeLog:
* error.c (gfc_format_decoder): Update for change of
text_info::set_range to text_info::set_location.

gcc/ChangeLog:
* pretty-print.c (text_info::set_range): Rename to...
(text_info::set_location): ...this, converting 2nd param
from source_range to a location_t.
* pretty-print.h (text_info::set_location): Convert
from inline function to external definition.
(text_info::set_range): Delete.

gcc/testsuite/ChangeLog:
* gcc.dg/diagnostic-ranges-1.c: New test file.
* gcc.dg/plugin/diagnostic-test-show-locus-bw.c
(test_percent_q_plus_d): New test function.
* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
(test_show_locus): Rewrite test code using
rich_location::set_range.  Add code to unit-test the "%q+D"
format code.

libcpp/ChangeLog:
* include/line-map.h (rich_location::set_range): Add line_maps *
param; convert param from source_range to source_location.  Drop
"overwrite_loc_p" param.
* line-map.c (rich_location::set_range): Likewise, acting as if
"overwrite_loc_p" were true, and getting range from the location.
---
 gcc/c-family/c-common.c|  4 +---
 gcc/fortran/error.c| 11 -
 gcc/pretty-print.c |  6 ++---
 gcc/pretty-print.h |  9 +---
 gcc/testsuite/gcc.dg/diagnostic-ranges-1.c | 11 +
 .../gcc.dg/plugin/diagnostic-test-show-locus-bw.c  | 12 ++
 .../plugin/diagnostic_plugin_test_show_locus.c | 27 +-
 libcpp/include/line-map.h  |  4 ++--
 libcpp/line-map.c  | 14 ++-
 9 files changed, 64 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/diagnostic-ranges-1.c

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index a8122b3..59cfc19 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -10129,9 +10129,7 @@ c_cpp_error (cpp_reader *pfile ATTRIBUTE_UNUSED, int 
level, int reason,
   gcc_unreachable ();
 }
   if (done_lexing)
-richloc->set_range (0,
-   source_range::from_location (input_location),
-   true, true);
+richloc->set_range (line_table, 0, input_location, true);
   diagnostic_set_info_translated (&diagnostic, msg, ap,
  richloc, dlevel);
   diagnostic_override_option_index (&diagnostic,
diff --git a/gcc/fortran/error.c b/gcc/fortran/error.c
index b4f7020..8f57aff 100644
--- a/gcc/fortran/error.c
+++ b/gcc/fortran/error.c
@@ -939,12 +939,11 @@ gfc_format_decoder (pretty_printer *pp,
/* If location[0] != UNKNOWN_LOCATION means that we already
   processed one of %C/%L.  */
int loc_num = text->get_location (0) == UNKNOWN_LOCATION ? 0 : 1;
-   source_range range
- = source_range::from_location (
- linemap_position_for_loc_and_offset (line_table,
-  loc->lb->location,
-  offset));
-   text->set_range (loc_num, range, true);
+   location_t src_loc
+ = linemap_position_for_loc_and_offset (line_table,
+loc->lb->location,
+offset);
+   text->set_location (loc_num, src_loc, true);
pp_string (pp, result[loc_num]);
return true;
   }
diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index 4a28d3c..3365074 1006

Re: [PATCH] Fix shrink-wrap bug with anticipating into loops (PR67778, PR68634)

2015-12-03 Thread Segher Boessenkool
On Thu, Dec 03, 2015 at 12:31:53PM +0100, Bernd Schmidt wrote:
> On 12/02/2015 07:21 PM, Segher Boessenkool wrote:
> >After shrink-wrapping has found the "tightest fit" for where to place
> >the prologue, it tries move it earlier (so that frame saves are run
> >earlier) -- but without copying any more basic blocks.
> >
> >Unfortunately a candidate block we select can be inside a loop, and we
> >will still allow it (because the loop always exits via our previously
> >chosen block).
> 
> >So we need to detect this situation.  We can place the prologue at a
> >previous block PRE only if PRE dominates every block reachable from
> >it.  This is a bit hard / expensive to compute, so instead this patch
> >allows a block PRE only if PRE does not post-dominate any of its
> >successors (other than itself).
> 
> Are the two conditions equivalent though?

They are not, one is a subset of the other.  By construction, the block
PRE (the new candidate for getting the prologue) dominates PRO (the
original block to get the prologue), and PRO post-dominates PRE.  Now,
PRE is only suitable if it dominates every block reachable from it,
since otherwise putting the prologue on PRE instead of on PRO requires
duplicating more blocks.

Hrm.  A successor block of PRE could loop back to PRE conditionally,
and go to PRO otherwise.  Rats, what was I thinking.  Thanks for catching
it; I'll have to think of something better.  A bit more factoring will
probably help, we'll see.

> I think I agree with Jakub that we don't want to do unnecessary work in 
> this piece of code.

I agree as well.

> >/* If we can move PRO back without having to duplicate more blocks, do 
> >so.
> >   We can move back to a block PRE if every path from PRE will 
> >   eventually
> >- need a prologue, that is, PRO is a post-dominator of PRE.  */
> >+ need a prologue, that is, PRO is a post-dominator of PRE.  We might
> >+ need to duplicate PRE if there is any path from a successor of PRE 
> >back
> >+ to PRE, so don't allow that either (but self-loops are fine, as are 
> >any
> >+ other loops entirely dominated by PRE; this in general seems too
> >+ expensive to check for, for such an uncommon case).  */
> 
> The last comment is unclear and I don't know what it wants to tell me.

Yeah, sorry.  Writing text is hard :-)


Segher


Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path

2015-12-03 Thread Jeff Law

On 12/02/2015 08:35 AM, Richard Biener wrote:



The most interesting side effect, and one I haven't fully analyzed yet is an
unexpected jump thread -- which I've traced back to differences in what the
alias oracle is able to find when we walk unaliased vuses. Which makes
totally no sense that it's unable to find the unaliased vuse in the
simplified CFG, but finds it when we don't remove the unexecutable edge.  As
I said, it makes no sense to me yet and I'm still digging.


The walking of PHI nodes is quite simplistic to avoid doing too much work so
an extra (not executable) edge may confuse it enough.  So this might be
"expected".  Adding a flag on whether EDGE_EXECUTABLE is to be
trusted would be an option (also helping SCCVN).
Found it.  In the CFG with the unexectuable edges _not_ removed there is 
a PHI associated with that edge which provides a dominating unaliased 
vuse.  Once that edge is removed, the PHI arg gets removed and thus we 
can't easily see the unaliased vuse.


So all is working as expected.  It wasn't ever a big issue, I just 
wanted to make sure I thoroughly understood the somewhat 
counter-intuitive result.


Jeff


Re: [PATCH 02/10] Fix g++.dg/cpp0x/nsdmi-template14.C

2015-12-03 Thread Jason Merrill

On 12/03/2015 09:55 AM, David Malcolm wrote:

This patch adds bulletproofing to detect purged tokens, and avoid using
them.

Alternatively, is it OK to access purged tokens for this kind of thing?
If so, would it make more sense to instead leave their locations untouched
when purging them?


I think cp_lexer_previous_token should skip past purged tokens.

Jason



Re: [PATCH 07/10] Fix g++.dg/template/ref3.C

2015-12-03 Thread Jason Merrill

On 12/03/2015 09:55 AM, David Malcolm wrote:

Testcase g++.dg/template/ref3.C:

  1 // PR c++/28341
  2
  3 template struct A {};
  4
  5 template struct B
  6 {
  7   A<(T)0> b; // { dg-error "constant|not a valid" }
  8   A a; // { dg-error "constant|not a valid" }
  9 };
 10
 11 B b;

The output of this test for both c++11 and c++14 is unaffected
by the patch kit:
  g++.dg/template/ref3.C: In instantiation of 'struct B':
  g++.dg/template/ref3.C:11:15:   required from here
  g++.dg/template/ref3.C:7:11: error: '0' is not a valid template argument for type 
'const int&' because it is not an lvalue
  g++.dg/template/ref3.C:8:11: error: '0' is not a valid template argument for type 
'const int&' because it is not an lvalue

However, the c++98 output is changed:

Status quo for c++98:
g++.dg/template/ref3.C: In instantiation of 'struct B':
g++.dg/template/ref3.C:11:15:   required from here
g++.dg/template/ref3.C:7:11: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression
g++.dg/template/ref3.C:8:11: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression

(line 7 and 8 are at the closing semicolon for fields b and a)

With the patchkit for c++98:
g++.dg/template/ref3.C: In instantiation of 'struct B':
g++.dg/template/ref3.C:11:15:   required from here
g++.dg/template/ref3.C:7:5: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression
g++.dg/template/ref3.C:7:5: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression

So the 2nd:
   "error: a cast to a type other than an integral or enumeration type cannot appear 
in a constant-expression"
moves from line 8 to line 7 (and moves them to earlier, having ranges)

What's happening is that cp_parser_enclosed_template_argument_list
builds a CAST_EXPR, the first time from cp_parser_cast_expression,
the second time from cp_parser_functional_cast; these have locations
representing the correct respective caret&ranges, i.e.:

A<(T)0> b;
  ^~~~

and:

A a;
  ^~~~

Eventually finish_template_type is called for each, to build a RECORD_TYPE,
and we get a cache hit the 2nd time through here in pt.c:
8281  hash = spec_hasher::hash (&elt);
8282  entry = type_specializations->find_with_hash (&elt, hash);
8283
8284  if (entry)
8285return entry->spec;

due to:
   template_args_equal (ot=, nt=) at ../../src/gcc/cp/pt.c:7778
which calls:
   cp_tree_equal (t1=, t2=) 
at ../../src/gcc/cp/tree.c:2833
and returns equality.

Hence we get a single RECORD_TYPE for the type A<(T)(0)>, and hence
when issuing the errors it uses the TREE_VEC for the first one,
using the location of the first line.


Why does the type sharing affect where the parser gives the error?


I'm not sure what the ideal fix for this is; for now I've worked
around it by updating the dg directives to reflect the new output.

gcc/testsuite/ChangeLog:
* g++.dg/template/ref3.C: Update locations of dg directives.
---
  gcc/testsuite/g++.dg/template/ref3.C | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/template/ref3.C 
b/gcc/testsuite/g++.dg/template/ref3.C
index 976c093..6e568c3 100644
--- a/gcc/testsuite/g++.dg/template/ref3.C
+++ b/gcc/testsuite/g++.dg/template/ref3.C
@@ -4,8 +4,10 @@ template struct A {};

  template struct B
  {
-  A<(T)0> b; // { dg-error "constant|not a valid" }
-  A a; // { dg-error "constant|not a valid" }
+  A<(T)0> b; // { dg-error "constant" "" { target c++98_only } }
+  // { dg-error "not a valid" "" { target c++11 } 7 }
+
+  A a; // { dg-error "not a valid" "" { target c++11 } }
  };

  B b;





Re: [PATCH] Handle OBJ_TYPE_REF in FRE

2015-12-03 Thread Richard Biener
On December 3, 2015 6:40:07 PM GMT+01:00, Jan Hubicka  wrote:
>> 
>> The following patch handles CSEing OBJ_TYPE_REF which was omitted
>> because it is a GENERIC expression even on GIMPLE (for whatever
>
>Why it is generic? It is part of gimple grammar :)
>
>> reason...).  Rather than changing this now the following patch
>> simply treats it properly as such.
>
>Thanks for working on this! Will this do code motion, too?

It will do PRE, so "yes".

>I think you may want to compare the ODR type of obj_type_ref_class
>otherwise two otherwise equivalent OBJ_TYPE_REFs may lead to different
>optimizations later.  I suppose we can have code of form
>
>if (test)
>  OBJ_TYPE_REF1
>  ...
>else
>  OBJ_TYPE_REF2
>  ..
>where each invoke method of different class type but would otherwise
>match as equivalent for tree-ssa-sccvn becuase we ignore pointed-to
>types.
>so doing
>
>OBJ_TYPE_REF1
>if (test)
>  ...
>else
>  ...
>
>may lead to wrong code.

Can you try generating a testcase?  Because with equal vptr and voffset I can't 
see how that can happen unless some pass extracts information from the pointer 
types without sanity checking with the pointers and offsets.

>Or do you just substitute the operands of OBJ_TYPE_REF? 

No, I value number them.  But yes, the type issue also crossed my mind.  
Meanwhile testing revealed that I need to adjust gimple_expr_type to preserve 
the type of the obj-type-ref, otherwise the devirt machinery ICEs (receiving 
void *). That's also a reason we can't make obj-type-ref a ternary RHS.

>> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>> 
>> Note that this does not (yet) substitute OBJ_TYPE_REFs in calls
>> with SSA names that have the same value - not sure if that would
>> be desired generally (does the devirt machinery cope with that?).
>
>This should work fine.

OK. So with that substituting the direct call later should work as well.

Richard.

>> 
>> Thanks,
>> Richard.
>> 
>> 2015-12-03  Richard Biener  
>> 
>>  PR tree-optimization/64812
>>  * tree-ssa-sccvn.c (vn_get_stmt_kind): Handle OBJ_TYPE_REF.
>>  (vn_nary_length_from_stmt): Likewise.
>>  (init_vn_nary_op_from_stmt): Likewise.
>>  * gimple-match-head.c (maybe_build_generic_op): Likewise.
>>  * gimple-pretty-print.c (dump_unary_rhs): Likewise.
>> 
>>  * g++.dg/tree-ssa/ssa-fre-1.C: New testcase.
>> 
>> Index: gcc/tree-ssa-sccvn.c
>> ===
>> *** gcc/tree-ssa-sccvn.c (revision 231221)
>> --- gcc/tree-ssa-sccvn.c (working copy)
>> *** vn_get_stmt_kind (gimple *stmt)
>> *** 460,465 
>> --- 460,467 
>>? VN_CONSTANT : VN_REFERENCE);
>>  else if (code == CONSTRUCTOR)
>>return VN_NARY;
>> +else if (code == OBJ_TYPE_REF)
>> +  return VN_NARY;
>>  return VN_NONE;
>>}
>>default:
>> *** vn_nary_length_from_stmt (gimple *stmt)
>> *** 2479,2484 
>> --- 2481,2487 
>> return 1;
>>   
>>   case BIT_FIELD_REF:
>> + case OBJ_TYPE_REF:
>> return 3;
>>   
>>   case CONSTRUCTOR:
>> *** init_vn_nary_op_from_stmt (vn_nary_op_t
>> *** 2508,2513 
>> --- 2511,2517 
>> break;
>>   
>>   case BIT_FIELD_REF:
>> + case OBJ_TYPE_REF:
>> vno->length = 3;
>> vno->op[0] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
>> vno->op[1] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 1);
>> Index: gcc/gimple-match-head.c
>> ===
>> *** gcc/gimple-match-head.c  (revision 231221)
>> --- gcc/gimple-match-head.c  (working copy)
>> *** maybe_build_generic_op (enum tree_code c
>> *** 243,248 
>> --- 243,249 
>> *op0 = build1 (code, type, *op0);
>> break;
>>   case BIT_FIELD_REF:
>> + case OBJ_TYPE_REF:
>> *op0 = build3 (code, type, *op0, op1, op2);
>> break;
>>   default:;
>> Index: gcc/gimple-pretty-print.c
>> ===
>> *** gcc/gimple-pretty-print.c(revision 231221)
>> --- gcc/gimple-pretty-print.c(working copy)
>> *** dump_unary_rhs (pretty_printer *buffer,
>> *** 302,308 
>>|| TREE_CODE_CLASS (rhs_code) == tcc_reference
>>|| rhs_code == SSA_NAME
>>|| rhs_code == ADDR_EXPR
>> !  || rhs_code == CONSTRUCTOR)
>>  {
>>dump_generic_node (buffer, rhs, spc, flags, false);
>>break;
>> --- 302,309 
>>|| TREE_CODE_CLASS (rhs_code) == tcc_reference
>>|| rhs_code == SSA_NAME
>>|| rhs_code == ADDR_EXPR
>> !  || rhs_code == CONSTRUCTOR
>> !  || rhs_code == OBJ_TYPE_REF)
>>  {
>>dump_generic_node (buffer, rhs, spc, flags, false);
>>break;
>> Index: gcc/testsuite/g++.dg/tree-ssa/ssa-fre-1.C
>> =

Re: [PATCH 4/4][AArch64] Add cost model for Exynos M1

2015-12-03 Thread Evandro Menezes

On 11/05/2015 06:09 PM, Evandro Menezes wrote:

2015-10-25  Evandro Menezes 

   gcc/

   * config/aarch64/aarch64-cores.def: Use the Exynos M1 cost model.
   * config/aarch64/aarch64.c (exynosm1_addrcost_table): New 
variable.

   (exynosm1_regmove_cost): Likewise.
   (exynosm1_vector_cost): Likewise.
   (exynosm1_tunings): Likewise.
   * config/arm/aarch-cost-tables.h (exynosm1_extra_costs): Likewise.
   * config/arm/arm.c (arm_exynos_m1_tune): Likewise.

This patch adds the cost model for Exynos M1.  This patch depends on a 
couple of previous patches though, 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00505.html and 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00538.html


Checked in as r231233.

Thank you,

--
Evandro Menezes



[PATCH] Improve constant vec_perm expansion on i?86 (PR target/68655)

2015-12-03 Thread Jakub Jelinek
Hi!

As discussed in the PR, for some permutation we can get better code
if we try to expand it as if it was a permutation in a mode with the
same vector size, but wider vector element.  The first attempt to do this
always had mixed results, lots of improvements, lots of pessimizations,
this one at least on gcc.dg/vshuf*
{-msse2,-msse4,-mavx,-mavx2,-mavx512f,-mavx512bw} shows only
improvements - it tries the original permutation for single insn,
if that doesn't work tries the wider one single insn, and then
as complete fallback, if we don't have any expansion whatsoever, tries
the wider one too.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2015-12-03  Jakub Jelinek  

PR target/68655
* config/i386/i386.c (canonicalize_vector_int_perm): New function.
(expand_vec_perm_1): Use it and recurse if everything else
failed.  Use nd.perm instead of perm2.
(expand_vec_perm_even_odd_1): If testing_p, use gen_raw_REG
instead of gen_lowpart for the target.
(ix86_expand_vec_perm_const_1): Use canonicalize_vector_int_perm
and recurse if everything else failed.

* gcc.dg/torture/vshuf-4.inc (TESTS): Add one extra test.
* gcc.dg/torture/vshuf-4.inc (TESTS): Add two extra tests.

--- gcc/config/i386/i386.c.jj   2015-12-02 20:27:00.0 +0100
+++ gcc/config/i386/i386.c  2015-12-03 15:03:13.415764986 +0100
@@ -49365,6 +49365,57 @@ expand_vec_perm_pshufb (struct expand_ve
   return true;
 }
 
+/* For V*[QHS]Imode permutations, check if the same permutation
+   can't be performed in a 2x, 4x or 8x wider inner mode.  */
+
+static bool
+canonicalize_vector_int_perm (const struct expand_vec_perm_d *d,
+ struct expand_vec_perm_d *nd)
+{
+  int i;
+  enum machine_mode mode = VOIDmode;
+
+  switch (d->vmode)
+{
+case V16QImode: mode = V8HImode; break;
+case V32QImode: mode = V16HImode; break;
+case V64QImode: mode = V32HImode; break;
+case V8HImode: mode = V4SImode; break;
+case V16HImode: mode = V8SImode; break;
+case V32HImode: mode = V16SImode; break;
+case V4SImode: mode = V2DImode; break;
+case V8SImode: mode = V4DImode; break;
+case V16SImode: mode = V8DImode; break;
+default: return false;
+}
+  for (i = 0; i < d->nelt; i += 2)
+if ((d->perm[i] & 1) || d->perm[i + 1] != d->perm[i] + 1)
+  return false;
+  nd->vmode = mode;
+  nd->nelt = d->nelt / 2;
+  for (i = 0; i < nd->nelt; i++)
+nd->perm[i] = d->perm[2 * i] / 2;
+  if (GET_MODE_INNER (mode) != DImode)
+canonicalize_vector_int_perm (nd, nd);
+  if (nd != d)
+{
+  nd->one_operand_p = d->one_operand_p;
+  nd->testing_p = d->testing_p;
+  if (d->op0 == d->op1)
+   nd->op0 = nd->op1 = gen_lowpart (nd->vmode, d->op0);
+  else
+   {
+ nd->op0 = gen_lowpart (nd->vmode, d->op0);
+ nd->op1 = gen_lowpart (nd->vmode, d->op1);
+   }
+  if (d->testing_p)
+   nd->target = gen_raw_REG (nd->vmode, LAST_VIRTUAL_REGISTER + 1);
+  else
+   nd->target = gen_reg_rtx (nd->vmode);
+}
+  return true;
+}
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
in a single instruction.  */
 
@@ -49372,7 +49423,7 @@ static bool
 expand_vec_perm_1 (struct expand_vec_perm_d *d)
 {
   unsigned i, nelt = d->nelt;
-  unsigned char perm2[MAX_VECT_LEN];
+  struct expand_vec_perm_d nd;
 
   /* Check plain VEC_SELECT first, because AVX has instructions that could
  match both SEL and SEL+CONCAT, but the plain SEL will allow a memory
@@ -49385,10 +49436,10 @@ expand_vec_perm_1 (struct expand_vec_per
 
   for (i = 0; i < nelt; i++)
{
- perm2[i] = d->perm[i] & mask;
- if (perm2[i] != i)
+ nd.perm[i] = d->perm[i] & mask;
+ if (nd.perm[i] != i)
identity_perm = false;
- if (perm2[i])
+ if (nd.perm[i])
broadcast_perm = false;
}
 
@@ -49457,7 +49508,7 @@ expand_vec_perm_1 (struct expand_vec_per
}
}
 
-  if (expand_vselect (d->target, d->op0, perm2, nelt, d->testing_p))
+  if (expand_vselect (d->target, d->op0, nd.perm, nelt, d->testing_p))
return true;
 
   /* There are plenty of patterns in sse.md that are written for
@@ -49468,10 +49519,10 @@ expand_vec_perm_1 (struct expand_vec_per
 every other permutation operand.  */
   for (i = 0; i < nelt; i += 2)
{
- perm2[i] = d->perm[i] & mask;
- perm2[i + 1] = (d->perm[i + 1] & mask) + nelt;
+ nd.perm[i] = d->perm[i] & mask;
+ nd.perm[i + 1] = (d->perm[i + 1] & mask) + nelt;
}
-  if (expand_vselect_vconcat (d->target, d->op0, d->op0, perm2, nelt,
+  if (expand_vselect_vconcat (d->target, d->op0, d->op0, nd.perm, nelt,
  d->testing_p))
return true;
 
@@ -49480,13 +49531,13 @@ expand_vec_perm_1 (struct expand_vec_per
{
  for (i

Re: [PATCH 3b/4][AArch64] Add scheduling model for Exynos M1

2015-12-03 Thread Evandro Menezes

On 11/20/2015 11:17 AM, James Greenhalgh wrote:

On Tue, Nov 10, 2015 at 11:54:00AM -0600, Evandro Menezes wrote:

2015-11-10  Evandro Menezes 

gcc/

* config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model.
* config/aarch64/aarch64.md: Include "exynos-m1.md".
* config/arm/arm-cores.def: Use the Exynos M1 sched model.
* config/arm/arm.md: Include "exynos-m1.md".
* config/arm/arm-tune.md: Regenerated.
* config/arm/exynos-m1.md: New file.

This patch adds the scheduling model for Exynos M1.  It depends on
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01257.html

Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu.

Please, commit if it's alright.



 From 0b7b6d597e5877c78c4d88e0d4491858555a5364 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 9 Nov 2015 17:18:52 -0600
Subject: [PATCH 2/2] [AArch64] Add scheduling model for Exynos M1

gcc/
* config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model.
* config/aarch64/aarch64.md: Include "exynos-m1.md".

These changes are fine.


* config/arm/arm-cores.def: Use the Exynos M1 sched model.
* config/arm/arm.md: Include "exynos-m1.md".
* config/arm/arm-tune.md: Regenerated.

These changes need an ack from an ARM reviewer.


* config/arm/exynos-m1.md: New file.

I have a few comments on this model.


+;; The Exynos M1 core is modeled as a triple issue pipeline that has
+;; the following functional units.
+
+(define_automaton "exynos_m1_gp")
+(define_automaton "exynos_m1_ls")
+(define_automaton "exynos_m1_fp")
+
+;; 1.  Two pipelines for simple integer operations: A, B
+;; 2.  One pipeline for simple or complex integer operations: C
+
+(define_cpu_unit "em1_xa, em1_xb, em1_xc" "exynos_m1_gp")
+
+(define_reservation "em1_alu" "(em1_xa | em1_xb | em1_xc)")
+(define_reservation "em1_c" "em1_xc")

Is this extra reservation useful, can we not just use em1_xc directly?


+;; 3.  Two asymmetric pipelines for Neon and FP operations: F0, F1
+
+(define_cpu_unit "em1_f0, em1_f1" "exynos_m1_fp")
+
+(define_reservation "em1_fmac" "em1_f0")
+(define_reservation "em1_fcvt" "em1_f0")
+(define_reservation "em1_nalu" "(em1_f0 | em1_f1)")
+(define_reservation "em1_nalu0" "em1_f0")
+(define_reservation "em1_nalu1" "em1_f1")
+(define_reservation "em1_nmisc" "em1_f0")
+(define_reservation "em1_ncrypt" "em1_f0")
+(define_reservation "em1_fadd" "em1_f1")
+(define_reservation "em1_fvar" "em1_f1")
+(define_reservation "em1_fst" "em1_f1")

Same comment here, does this not just obfuscate the interaction between
instruction classes in the description. I'm not against doing it this way
if you prefer, but it would seem to reduce readability to me. I think there
is also an argument that this increases readability, so it is your choice.


+
+;; 4.  One pipeline for branch operations: BX
+
+(define_cpu_unit "em1_bx" "exynos_m1_gp")
+
+(define_reservation "em1_br" "em1_bx")
+

And again?


+;; 5.  One AGU for loads: L
+;; One AGU for stores and one pipeline for stores: S, SD
+
+(define_cpu_unit "em1_lx" "exynos_m1_ls")
+(define_cpu_unit "em1_sx, em1_sd" "exynos_m1_ls")
+
+(define_reservation "em1_ld" "em1_lx")
+(define_reservation "em1_st" "(em1_sx + em1_sd)")
+
+;; Common occurrences
+(define_reservation "em1_sfst" "(em1_fst + em1_st)")
+(define_reservation "em1_lfst" "(em1_fst + em1_ld)")
+
+;; Branches
+;;
+;; No latency as there is no result
+;; TODO: Unconditional branches use no units;
+;; conditional branches add the BX unit;
+;; indirect branches add the C unit.
+(define_insn_reservation "exynos_m1_branch" 0
+  (and (eq_attr "tune" "exynosm1")
+   (eq_attr "type" "branch"))
+  "em1_br")
+
+(define_insn_reservation "exynos_m1_call" 1
+  (and (eq_attr "tune" "exynosm1")
+   (eq_attr "type" "call"))
+  "em1_alu")
+
+;; Basic ALU
+;;
+;; Simple ALU without shift, non-predicated
+(define_insn_reservation "exynos_m1_alu" 1
+  (and (eq_attr "tune" "exynosm1")
+   (and (not (eq_attr "predicated" "yes"))

(and (eq_attr "predicated" "no")) ?

Likewise throughout the file? Again this is your choice.

This is OK from the AArch64 side, let me know if you plan to change any
of the above, otherwise I'll commit it (or someone else can commit it)
after I see an OK from an ARM reviewer.


ARM ping.

--
Evandro Menezes



[PATCH] Fix reassoc range test vs. value ranges (PR tree-optimization/68671)

2015-12-03 Thread Jakub Jelinek
Hi!

As mentioned in the PR, maybe_optimize_range_tests considers basic blocks
with not just the final GIMPLE_COND (or for last_bb store feeding into PHI),
but also assign stmts that don't trap, don't have side-effects and where
the SSA_NAMEs they set are used only in their own bb.
Now, if we decide to optimize some range test, we can change some conditions
on previous bbs and that means we could execute some basic blocks that
wouldn't be executed in the original program.  As the stmts don't set
anything used in other bbs, they are most likely dead after the
optimization, but the problem on the testcase is that because of the
condition changes in previous bb we end up with incorrect value range
for some SSA_NAME(s).  That can result in the miscompilation of the testcase
on certain targets.

Fixed by resetting the value range info of such SSA_NAMEs.  I believe it
shouldn't be a big deal, they will be mostly dead anyway.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2015-12-03  Jakub Jelinek  

PR tree-optimization/68671
* tree-ssa-reassoc.c (maybe_optimize_range_tests): For basic
blocks starting with the successor of first bb we've modified
and ending with last_bb, reset value ranges of all integral
SSA_NAMEs set in those basic blocks.

* gcc.dg/pr68671.c: New test.

--- gcc/tree-ssa-reassoc.c.jj   2015-11-18 11:22:51.0 +0100
+++ gcc/tree-ssa-reassoc.c  2015-12-03 18:12:08.915210122 +0100
@@ -3204,7 +3204,7 @@ maybe_optimize_range_tests (gimple *stmt
 any_changes = optimize_range_tests (ERROR_MARK, &ops);
   if (any_changes)
 {
-  unsigned int idx;
+  unsigned int idx, max_idx = 0;
   /* update_ops relies on has_single_use predicates returning the
 same values as it did during get_ops earlier.  Additionally it
 never removes statements, only adds new ones and it should walk
@@ -3220,6 +3220,7 @@ maybe_optimize_range_tests (gimple *stmt
{
  tree new_op;
 
+ max_idx = idx;
  stmt = last_stmt (bb);
  new_op = update_ops (bbinfo[idx].op,
   (enum tree_code)
@@ -3289,6 +3290,10 @@ maybe_optimize_range_tests (gimple *stmt
  && ops[bbinfo[idx].first_idx]->op != NULL_TREE)
{
  gcond *cond_stmt = as_a  (last_stmt (bb));
+
+ if (idx > max_idx)
+   max_idx = idx;
+
  if (integer_zerop (ops[bbinfo[idx].first_idx]->op))
gimple_cond_make_false (cond_stmt);
  else if (integer_onep (ops[bbinfo[idx].first_idx]->op))
@@ -3305,6 +3310,30 @@ maybe_optimize_range_tests (gimple *stmt
  if (bb == first_bb)
break;
}
+
+  /* The above changes could result in basic blocks after the first
+modified one, up to and including last_bb, to be executed even if
+they would not be in the original program.  If the value ranges of
+assignment lhs' in those bbs were dependent on the conditions
+guarding those basic blocks which now can change, the VRs might
+be incorrect.  As no_side_effect_bb should ensure those SSA_NAMEs
+are only used within the same bb, it should be not a big deal if
+we just reset all the VRs in those bbs.  See PR68671.  */
+  for (bb = last_bb, idx = 0; idx < max_idx; bb = single_pred (bb), idx++)
+   {
+ gimple_stmt_iterator gsi;
+ for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
+   {
+ gimple *g = gsi_stmt (gsi);
+ if (!is_gimple_assign (g))
+   continue;
+ tree lhs = gimple_assign_lhs (g);
+ if (TREE_CODE (lhs) != SSA_NAME)
+   continue;
+ if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
+   SSA_NAME_RANGE_INFO (lhs) = NULL;
+   }
+   }
 }
 }
 
--- gcc/testsuite/gcc.dg/pr68671.c.jj   2015-12-03 18:19:24.769104484 +0100
+++ gcc/testsuite/gcc.dg/pr68671.c  2015-12-03 18:19:07.0 +0100
@@ -0,0 +1,23 @@
+/* PR tree-optimization/68671 */
+/* { dg-do run } */
+/* { dg-options " -O2 -fno-tree-dce" } */
+
+volatile int a = -1;
+volatile int b;
+
+static inline int
+fn1 (signed char p1, int p2)
+{
+  return (p1 < 0) || (p1 > (1 >> p2)) ? 0 : (p1 << 1);
+}
+
+int
+main ()
+{
+  signed char c = a;
+  b = fn1 (c, 1);
+  c = ((128 | c) < 0 ? 1 : 0);
+  if (c != 1)
+__builtin_abort ();
+  return 0;
+}

Jakub


Re: [PATCH][AArch64] Replace insn to zero up DF register

2015-12-03 Thread Evandro Menezes

On 11/09/2015 04:59 PM, Evandro Menezes wrote:

Hi, Marcus.

Have you an update from the architecture folks about this?

Thank you,


Marcus?

--
Evandro Menezes



  1   2   >