Re: [RFC][IPA-VRP] Disable setting param of __builtin_constant_p to null

2016-07-25 Thread kugan

Hi,


diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index ecfab1f..23c12b5 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -3759,8 +3759,10 @@ extract_range_basic (value_range *vr, gimple *stmt)
  && SSA_NAME_IS_DEFAULT_DEF (arg)
  && TREE_CODE (SSA_NAME_VAR (arg)) == PARM_DECL)
{
+#if 0
  set_value_range_to_null (vr, type);
  return;
+#endif


It is not cleanest either, but better to test cfun->after_inlining


Thanks. Here is the patch which does this. Bootstrapped and regression 
tested with the rest of the patches in the series. Is this OK for trunk?


Thanks,
Kugan

gcc/ChangeLog:

2016-07-25  Kugan Vivekanandarajah  

* tree-vrp.c (extract_range_basic): Check cfun->after_inlining before
folding call to __builtin_constant_p with parameters to false.
>From 4805ea975de0fd3b183b27324df1caa7ff29f887 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Sat, 25 Jun 2016 11:52:57 +1000
Subject: [PATCH 2/7] Prevent setting __builtin_constant_p of param to null
 before inlining in Early VRP

---
 gcc/tree-vrp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index bdfc1b6..edaacf2 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -3763,7 +3763,8 @@ extract_range_basic (value_range *vr, gimple *stmt)
 	  arg = gimple_call_arg (stmt, 0);
 	  if (TREE_CODE (arg) == SSA_NAME
 	  && SSA_NAME_IS_DEFAULT_DEF (arg)
-	  && TREE_CODE (SSA_NAME_VAR (arg)) == PARM_DECL)
+	  && TREE_CODE (SSA_NAME_VAR (arg)) == PARM_DECL
+	  && cfun->after_inlining)
 	{
 	  set_value_range_to_null (vr, type);
 	  return;
-- 
1.9.1



Re: Gimple loop splitting

2016-07-25 Thread Andrew Pinski
On Thu, Nov 12, 2015 at 8:52 AM, Michael Matz  wrote:
> Hello,
>
> this new pass implements loop iteration space splitting for loops that
> contain a conditional that's always true for one part of the iteration
> space and false for the other, i.e. such situations:
>
>   for (i = beg; i < end; i++)
> if (i < p)
>   dothis();
> else
>   dothat();
>
> this is transformed into roughly:
>
>   for (i = beg; i < p; i++)
> dothis();
>   for (; i < end; i++)
> dothat();
>
> Of course, not quite the above as there needs to be provisions for the
> border conditions, if e.g. 'p' is outside the original iteration space, or
> the conditional doesn't directly use the control IV, but some other, or
> the IV runs backwards.  The testcase checks many of these border
> conditions.
>
> This transformation is in itself a good one but can also be an enabler for
> the vectorizer.  It does increase code size, when the loop body contains
> also unconditional code (that one is duplicated), so we only transform hot
> loops.  I'm a bit unsure of the placement of the new pass, or if it should
> be an own pass at all.  Right now I've placed it after unswitching and
> scev_cprop, before loop distribution.  Ideally I think all three, together
> with loop fusion and an gimple unroller should be integrated into one loop
> nest optimizer, alas, we aren't there yet.
>
> I'm planning to work on loop fusion in the future as well, but that's not
> for GCC 6.
>
> I've regstrapped this pass enabled with -O2 on x86-64-linux, without
> regressions.  I've also checked cpu2006 (the non-fortran part) for
> correctness, not yet for performance.  In the end it should probably only
> be enabled for -O3+ (although if the whole loop body is conditional it
> makes sense to also have it with -O2 because code growth is very small
> then).
>
> So, okay for trunk?

What ever happened to this patch?  I was looking into doing this
myself today but I found this patch.
It is stage 1 of GCC 7, it might be a good idea to get this patch into GCC.

Thanks,
Andrew

>
>
> Ciao,
> Michael.
> * passes.def (pass_loop_split): Add.
> * timevar.def (TV_LOOP_SPLIT): Add.
> * tree-pass.h (make_pass_loop_split): Declare.
> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
> cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h,
> gimple-pretty-print.h, gimple-fold.h, gimplify-me.h.
> (split_at_bb_p, patch_loop_exit, find_or_create_guard_phi,
> split_loop, tree_ssa_split_loops,
> make_pass_loop_split): New functions.
> (pass_data_loop_split): New.
> (pass_loop_split): New.
>
> testsuite/
> * gcc.dg/loop-split.c: New test.
>
> Index: passes.def
> ===
> --- passes.def  (revision 229763)
> +++ passes.def  (working copy)
> @@ -233,6 +233,7 @@ along with GCC; see the file COPYING3.
>   NEXT_PASS (pass_dce);
>   NEXT_PASS (pass_tree_unswitch);
>   NEXT_PASS (pass_scev_cprop);
> + NEXT_PASS (pass_loop_split);
>   NEXT_PASS (pass_record_bounds);
>   NEXT_PASS (pass_loop_distribution);
>   NEXT_PASS (pass_copy_prop);
> Index: timevar.def
> ===
> --- timevar.def (revision 229763)
> +++ timevar.def (working copy)
> @@ -179,6 +179,7 @@ DEFTIMEVAR (TV_LIM   , "
>  DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
>  DEFTIMEVAR (TV_SCEV_CONST, "scev constant prop")
>  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH, "tree loop unswitching")
> +DEFTIMEVAR (TV_LOOP_SPLIT, "loop splitting")
>  DEFTIMEVAR (TV_COMPLETE_UNROLL   , "complete unrolling")
>  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
>  DEFTIMEVAR (TV_TREE_VECTORIZATION, "tree vectorization")
> Index: tree-pass.h
> ===
> --- tree-pass.h (revision 229763)
> +++ tree-pass.h (working copy)
> @@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_tree_n
>  extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
> Index: tree-ssa-loop-manip.h
> ===
> --- tree-ssa-loop-manip.h   (revision 229763)
> +++ tree-ssa-loop-manip.h   (working copy)
> @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
>
>  extern void create_iv (tr

Re: [PATCH] Add a STATIC_ASSERT on sizeof (struct cp_token)

2016-07-25 Thread Richard Biener
On Fri, Jul 22, 2016 at 4:11 PM, Jakub Jelinek  wrote:
> On Fri, Jul 22, 2016 at 10:33:50AM -0400, David Malcolm wrote:
>> gcc/cp/ChangeLog:
>>   * parser.h (struct cp_token): Add a STATIC_ASSERT on the
>>   size of the struct.
>> ---
>>  gcc/cp/parser.h | 9 +
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
>> index ccbace9..8c1de57 100644
>> --- a/gcc/cp/parser.h
>> +++ b/gcc/cp/parser.h
>> @@ -71,6 +71,15 @@ struct GTY (()) cp_token {
>>  "|| (%1.type == CPP_DECLTYPE)"))) u;
>>  };
>>
>> +/* The C++ frontend lexes everything first, and keeps the tokens
>> +   in memory, so there are possibly millions of tokens in memory.
>> +   Ensure that we don't accidentally grow the structure.  */
>> +STATIC_ASSERT (sizeof (cp_token) ==
>> +(2 // "type" and "keyword"
>> + + 1 // "flags"
>> + + 1 // bitfields
>> + + 4 // location_t
>> + + sizeof (void *))); // union
>
> Doesn't that assume way too much on the host data layout?
> This can be compiled with weirdo system compilers or on weirdo hosts,
> I bet we don't really support non-8-bit char hosts, but still this might
> break somewhere...
>
> The formatting is wrong BTW, == shouldn't appear at the end of line.

Maybe restrict it to __GNUC__ and __x86_64__ or so.

Richard.

> Jakub


Re: [PATCH] Minor changes in tree-vrp.c

2016-07-25 Thread Richard Biener
On Sat, Jul 23, 2016 at 2:40 PM, Patrick Palka  wrote:
> 1. When dumping assert details, print loc->expr instead of the bare SSA
> name.  loc->expr is not always equal to the SSA name.  For example we
> sometimes insert an ASSERT_EXPR like
>
>   x_7 = ASSERT_EXPR  8>;
>
> The diff of the new dump output looks like:
>
> Assertions to be inserted for x_4(D)
> if (_4 <= 8)
>
> BB #3
> EDGE 2->3 2 [39.0%]  (FALSE_VALUE,EXECUTABLE)
>  -  PREDICATE: x_4(D) gt_expr 8
>  +  PREDICATE: (unsigned int) x_4(D) + 4294967295 gt_expr 8
>
> 2. In extract_code_and_val_from_cond_with_ops verify that name is equal to
> either cond_op0 or cond_op1.  If name is not equal to one of these
> operands then its caller register_edge_assert_for will malfunction and
> the wrong assertion will get inserted.
>
> Is this OK to commit after bootstrap + regtesting?

Ok.

Richard.

> gcc/ChangeLog:
>
> * tree-vrp.c (dump_asserts_for): Print loc->expr instead of
> name.
> (extract_code_and_val_from_cond_with_ops): Verify that name is
> either cond_op0 or cond_op1.
> ---
>  gcc/tree-vrp.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index a3068ec..5072370 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -4827,7 +4827,7 @@ dump_asserts_for (FILE *file, tree name)
>   dump_edge_info (file, loc->e, dump_flags, 0);
> }
>fprintf (file, "\n\tPREDICATE: ");
> -  print_generic_expr (file, name, 0);
> +  print_generic_expr (file, loc->expr, 0);
>fprintf (file, " %s ", get_tree_code_name (loc->comp_code));
>print_generic_expr (file, loc->val, 0);
>fprintf (file, "\n\n");
> @@ -5009,13 +5009,15 @@ extract_code_and_val_from_cond_with_ops (tree name, 
> enum tree_code cond_code,
>comp_code = swap_tree_comparison (cond_code);
>val = cond_op0;
>  }
> -  else
> +  else if (name == cond_op0)
>  {
>/* The comparison is of the form NAME COMP VAL, so the
>  comparison code remains unchanged.  */
>comp_code = cond_code;
>val = cond_op1;
>  }
> +  else
> +gcc_unreachable ();
>
>/* Invert the comparison code as necessary.  */
>if (invert)
> --
> 2.9.2.413.g76d2a70
>


Re: [PATCH 5/6] add a constructor to elim_graph

2016-07-25 Thread Richard Biener
On Sun, Jul 24, 2016 at 1:44 PM,   wrote:
> From: Trevor Saunders 
>
> gcc/ChangeLog:

Ok.

Richard.

> 2016-07-24  Trevor Saunders  
>
> * tree-outof-ssa.c (struct elim_graph): Change type of members
> to auto_vec and auto_sbitmap.
> (elim_graph::elim_graph): New constructor.
> (delete_elim_graph): Remove.
> (expand_phi_nodes): Adjust.
> ---
>  gcc/tree-outof-ssa.c | 64 
> +---
>  1 file changed, 16 insertions(+), 48 deletions(-)
>
> diff --git a/gcc/tree-outof-ssa.c b/gcc/tree-outof-ssa.c
> index 5047788..be57ce4 100644
> --- a/gcc/tree-outof-ssa.c
> +++ b/gcc/tree-outof-ssa.c
> @@ -128,23 +128,25 @@ ssa_is_replaceable_p (gimple *stmt)
>
>  struct elim_graph
>  {
> +  elim_graph (var_map map);
> +
>/* Size of the elimination vectors.  */
>int size;
>
>/* List of nodes in the elimination graph.  */
> -  vec nodes;
> +  auto_vec nodes;
>
>/*  The predecessor and successor edge list.  */
> -  vec edge_list;
> +  auto_vec edge_list;
>
>/* Source locus on each edge */
> -  vec edge_locus;
> +  auto_vec edge_locus;
>
>/* Visited vector.  */
> -  sbitmap visited;
> +  auto_sbitmap visited;
>
>/* Stack for visited nodes.  */
> -  vec stack;
> +  auto_vec stack;
>
>/* The variable partition map.  */
>var_map map;
> @@ -153,11 +155,11 @@ struct elim_graph
>edge e;
>
>/* List of constant copies to emit.  These are pushed on in pairs.  */
> -  vec const_dests;
> -  vec const_copies;
> +  auto_vec const_dests;
> +  auto_vec const_copies;
>
>/* Source locations for any constant copies.  */
> -  vec copy_locus;
> +  auto_vec copy_locus;
>  };
>
>
> @@ -392,25 +394,12 @@ insert_part_to_rtx_on_edge (edge e, rtx dest, int src, 
> source_location locus)
>  }
>
>
> -/* Create an elimination graph with SIZE nodes and associated data
> -   structures.  */
> +/* Create an elimination graph for map.  */
>
> -static elim_graph *
> -new_elim_graph (int size)
> +elim_graph::elim_graph (var_map map) :
> +  nodes (30), edge_list (20), edge_locus (10), visited (map->num_partitions),
> +  stack (30), map (map), const_dests (20), const_copies (20), copy_locus (10)
>  {
> -  elim_graph *g = (elim_graph *) xmalloc (sizeof (struct elim_graph));
> -
> -  g->nodes.create (30);
> -  g->const_dests.create (20);
> -  g->const_copies.create (20);
> -  g->copy_locus.create (10);
> -  g->edge_list.create (20);
> -  g->edge_locus.create (10);
> -  g->stack.create (30);
> -
> -  g->visited = sbitmap_alloc (size);
> -
> -  return g;
>  }
>
>
> @@ -425,24 +414,6 @@ clear_elim_graph (elim_graph *g)
>  }
>
>
> -/* Delete elimination graph G.  */
> -
> -static inline void
> -delete_elim_graph (elim_graph *g)
> -{
> -  sbitmap_free (g->visited);
> -  g->stack.release ();
> -  g->edge_list.release ();
> -  g->const_copies.release ();
> -  g->const_dests.release ();
> -  g->nodes.release ();
> -  g->copy_locus.release ();
> -  g->edge_locus.release ();
> -
> -  free (g);
> -}
> -
> -
>  /* Return the number of nodes in graph G.  */
>
>  static inline int
> @@ -925,8 +896,7 @@ void
>  expand_phi_nodes (struct ssaexpand *sa)
>  {
>basic_block bb;
> -  elim_graph *g = new_elim_graph (sa->map->num_partitions);
> -  g->map = sa->map;
> +  elim_graph g (sa->map);
>
>FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb,
>   EXIT_BLOCK_PTR_FOR_FN (cfun), next_bb)
> @@ -935,7 +905,7 @@ expand_phi_nodes (struct ssaexpand *sa)
> edge e;
> edge_iterator ei;
> FOR_EACH_EDGE (e, ei, bb->preds)
> - eliminate_phi (e, g);
> + eliminate_phi (e, &g);
> set_phi_nodes (bb, NULL);
> /* We can't redirect EH edges in RTL land, so we need to do this
>here.  Redirection happens only when splitting is necessary,
> @@ -961,8 +931,6 @@ expand_phi_nodes (struct ssaexpand *sa)
>   ei_next (&ei);
>   }
>}
> -
> -  delete_elim_graph (g);
>  }
>
>
> --
> 2.9.0
>


Re: [PATCH 4/6] remove elim_graph typedef

2016-07-25 Thread Richard Biener
On Sun, Jul 24, 2016 at 1:44 PM,   wrote:
> From: Trevor Saunders 
>
> gcc/ChangeLog:

Ok.

Richard.

> 2016-07-24  Trevor Saunders  
>
> * tree-outof-ssa.c (struct elim_graph): Remove typedef.
> (new_elim_graph): Adjust.
> (clear_elim_graph): Likewise.
> (delete_elim_graph): Likewise.
> (elim_graph_size): Likewise.
> (elim_graph_add_node): Likewise.
> (elim_graph_add_edge): Likewise.
> (elim_graph_remove_succ_edge): Likewise.
> (eliminate_name): Likewise.
> (eliminate_build): Likewise.
> (elim_forward): Likewise.
> (elim_unvisited_predecessor): Likewise.
> (elim_backward): Likewise.
> (elim_create): Likewise.
> (eliminate_phi): Likewise.
> (expand_phi_nodes): Likewise.
> ---
>  gcc/tree-outof-ssa.c | 37 +++--
>  1 file changed, 19 insertions(+), 18 deletions(-)
>
> diff --git a/gcc/tree-outof-ssa.c b/gcc/tree-outof-ssa.c
> index 4125529..5047788 100644
> --- a/gcc/tree-outof-ssa.c
> +++ b/gcc/tree-outof-ssa.c
> @@ -126,7 +126,8 @@ ssa_is_replaceable_p (gimple *stmt)
> rarely more than 6, and in the bootstrap of gcc, the maximum number
> of nodes encountered was 12.  */
>
> -typedef struct _elim_graph {
> +struct elim_graph
> +{
>/* Size of the elimination vectors.  */
>int size;
>
> @@ -157,7 +158,7 @@ typedef struct _elim_graph {
>
>/* Source locations for any constant copies.  */
>vec copy_locus;
> -} *elim_graph;
> +};
>
>
>  /* For an edge E find out a good source location to associate with
> @@ -394,10 +395,10 @@ insert_part_to_rtx_on_edge (edge e, rtx dest, int src, 
> source_location locus)
>  /* Create an elimination graph with SIZE nodes and associated data
> structures.  */
>
> -static elim_graph
> +static elim_graph *
>  new_elim_graph (int size)
>  {
> -  elim_graph g = (elim_graph) xmalloc (sizeof (struct _elim_graph));
> +  elim_graph *g = (elim_graph *) xmalloc (sizeof (struct elim_graph));
>
>g->nodes.create (30);
>g->const_dests.create (20);
> @@ -416,7 +417,7 @@ new_elim_graph (int size)
>  /* Empty elimination graph G.  */
>
>  static inline void
> -clear_elim_graph (elim_graph g)
> +clear_elim_graph (elim_graph *g)
>  {
>g->nodes.truncate (0);
>g->edge_list.truncate (0);
> @@ -427,7 +428,7 @@ clear_elim_graph (elim_graph g)
>  /* Delete elimination graph G.  */
>
>  static inline void
> -delete_elim_graph (elim_graph g)
> +delete_elim_graph (elim_graph *g)
>  {
>sbitmap_free (g->visited);
>g->stack.release ();
> @@ -445,7 +446,7 @@ delete_elim_graph (elim_graph g)
>  /* Return the number of nodes in graph G.  */
>
>  static inline int
> -elim_graph_size (elim_graph g)
> +elim_graph_size (elim_graph *g)
>  {
>return g->nodes.length ();
>  }
> @@ -454,7 +455,7 @@ elim_graph_size (elim_graph g)
>  /* Add NODE to graph G, if it doesn't exist already.  */
>
>  static inline void
> -elim_graph_add_node (elim_graph g, int node)
> +elim_graph_add_node (elim_graph *g, int node)
>  {
>int x;
>int t;
> @@ -469,7 +470,7 @@ elim_graph_add_node (elim_graph g, int node)
>  /* Add the edge PRED->SUCC to graph G.  */
>
>  static inline void
> -elim_graph_add_edge (elim_graph g, int pred, int succ, source_location locus)
> +elim_graph_add_edge (elim_graph *g, int pred, int succ, source_location 
> locus)
>  {
>g->edge_list.safe_push (pred);
>g->edge_list.safe_push (succ);
> @@ -481,7 +482,7 @@ elim_graph_add_edge (elim_graph g, int pred, int succ, 
> source_location locus)
> return the successor node.  -1 is returned if there is no such edge.  */
>
>  static inline int
> -elim_graph_remove_succ_edge (elim_graph g, int node, source_location *locus)
> +elim_graph_remove_succ_edge (elim_graph *g, int node, source_location *locus)
>  {
>int y;
>unsigned x;
> @@ -543,7 +544,7 @@ do {  
>   \
>  /* Add T to elimination graph G.  */
>
>  static inline void
> -eliminate_name (elim_graph g, int T)
> +eliminate_name (elim_graph *g, int T)
>  {
>elim_graph_add_node (g, T);
>  }
> @@ -570,7 +571,7 @@ queue_phi_copy_p (var_map map, tree t)
> G->e.  */
>
>  static void
> -eliminate_build (elim_graph g)
> +eliminate_build (elim_graph *g)
>  {
>tree Ti;
>int p0, pi;
> @@ -619,7 +620,7 @@ eliminate_build (elim_graph g)
>  /* Push successors of T onto the elimination stack for G.  */
>
>  static void
> -elim_forward (elim_graph g, int T)
> +elim_forward (elim_graph *g, int T)
>  {
>int S;
>source_location locus;
> @@ -637,7 +638,7 @@ elim_forward (elim_graph g, int T)
>  /* Return 1 if there unvisited predecessors of T in graph G.  */
>
>  static int
> -elim_unvisited_predecessor (elim_graph g, int T)
> +elim_unvisited_predecessor (elim_graph *g, int T)
>  {
>int P;
>source_location locus;
> @@ -653,7 +654,7 @@ elim_unvisited_predecessor (elim_graph g, int T)
>  /* Process predecesso

Re: [PATCH] Adapt the numbering scheme (PR gcov-profile/64874)

2016-07-25 Thread Martin Liška
On 07/22/2016 03:20 PM, Jakub Jelinek wrote:
> On Fri, Jul 22, 2016 at 01:46:44PM +0200, Martin Liška wrote:
>> As described in the PR, current numbering scheme in gcov-io.h would overflow 
>> in couple of years.
>> Thus, I'm suggesting to switch from:
>>
>> [major][minor/10][minor%10][release_status]
>>
>> to:
>> [major/10][major%10][minor][release_status]
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Won't this clash with the already released versions?
> Unless you start e.g. using different release_status letters from the past
> (but that is complicated with vendor supplied DEV-PHASE), won't
> say gcc 3.4.0 and gcc 30.4.1 have the same string?
> 
> Wouldn't it be better to just use letters for the major/10?  As GCC major
> didn't reach 10 yet, we only had [0-9][0-9][0-9]. in the past, so I think:
> 
> -  v[0] = (major < 10 ? '0' : 'A' - 10) + major;
> -  v[1] = (minor / 10) + '0';
> -  v[2] = (minor % 10) + '0';
> +  v[0] = (major / 10) + 'A';
> +  v[1] = (major % 10) + '0';
> +  v[2] = minor + '0';
> 
> would be better, then there will be no clash, and the versioning scheme will
> allow until 259.9.
> 
>   Jakub
> 

Hello.

I like the change suggested by Jakub, I've updated the numbering scheme, as well
as comments in gcov-io.h.

Final version installed as r238702.

Martin
>From 7cdcfec705d90f4d0a04d82617373077280ec495 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 22 Jul 2016 11:54:20 +0200
Subject: [PATCH] Adapt the numbering scheme (PR gcov-profile/64874)

gcc/ChangeLog:

2016-07-22  Martin Liska  

	PR gcov-profile/64874
	* gcov-io.h: Update command about file format.
	* gcov-iov.c (main): Adapt the numbering scheme.
---
 gcc/gcov-io.h  | 18 +-
 gcc/gcov-iov.c |  6 +++---
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index 3446407..bbf013a 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -63,19 +63,19 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
Although the ident and version are formally 32 bit numbers, they
are derived from 4 character ASCII strings.  The version number
-   consists of the single character major version number, a two
-   character minor version number (leading zero for versions less than
-   10), and a single character indicating the status of the release.
+   consists of a two character major version number
+   (first digit starts from 'A' letter to not to clash with the older
+   numbering scheme), the single character minor version number,
+   and a single character indicating the status of the release.
That will be 'e' experimental, 'p' prerelease and 'r' for release.
Because, by good fortune, these are in alphabetical order, string
collating can be used to compare version strings.  Be aware that
the 'e' designation will (naturally) be unstable and might be
-   incompatible with itself.  For gcc 3.4 experimental, it would be
-   '304e' (0x33303465).  When the major version reaches 10, the
-   letters A-Z will be used.  Assuming minor increments releases every
-   6 months, we have to make a major increment every 50 years.
-   Assuming major increments releases every 5 years, we're ok for the
-   next 155 years -- good enough for me.
+   incompatible with itself.  For gcc 17.0 experimental, it would be
+   'B70e' (0x42373065).  As we currently do not release more than 5 minor
+   releases, the single character should be always fine.  Major number
+   is currently changed roughly every year, which gives us space
+   for next 250 years (maximum allowed number would be 259.9).
 
A record has a tag, length and variable amount of data.
 
diff --git a/gcc/gcov-iov.c b/gcc/gcov-iov.c
index 202f32a..f87445a 100644
--- a/gcc/gcov-iov.c
+++ b/gcc/gcov-iov.c
@@ -58,9 +58,9 @@ main (int argc, char **argv)
   || strcmp (argv[2], "prerelease") == 0)
 phase = '*';
 
-  v[0] = (major < 10 ? '0' : 'A' - 10) + major;
-  v[1] = (minor / 10) + '0';
-  v[2] = (minor % 10) + '0';
+  v[0] = (major / 10) + 'A';
+  v[1] = (major % 10) + '0';
+  v[2] = minor + '0';
   v[3] = phase;
 
   for (ix = 0; ix != 4; ix++)
-- 
2.9.0



Re: [PATCH 3/6] add ctor to topo_info

2016-07-25 Thread Richard Biener
On Sun, Jul 24, 2016 at 1:44 PM,   wrote:
> From: Trevor Saunders 
>
> gcc/ChangeLog:
>
> 2016-07-24  Trevor Saunders  
>
> * tree-ssa-structalias.c (struct topo_info): Add constructor,
> and change types of members to auto_vec and auto_sbitmap.
> (init_topo_info): Remove.
> (topo_info::topo_info): New constructor.
> (solve_graph): Adjust.
> ---
>  gcc/tree-ssa-structalias.c | 31 ---
>  1 file changed, 8 insertions(+), 23 deletions(-)
>
> diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
> index 5e3c7d0..94d81ed1 100644
> --- a/gcc/tree-ssa-structalias.c
> +++ b/gcc/tree-ssa-structalias.c
> @@ -1536,36 +1536,21 @@ unify_nodes (constraint_graph_t graph, unsigned int 
> to, unsigned int from,
>
>  struct topo_info
>  {
> +  topo_info ();
> +
>/* sbitmap of visited nodes.  */
> -  sbitmap visited;
> +  auto_sbitmap visited;
>/* Array that stores the topological order of the graph, *in
>   reverse*.  */
> -  vec topo_order;
> +  auto_vec topo_order;
>  };
>
>
>  /* Initialize and return a topological info structure.  */
>
> -static struct topo_info *
> -init_topo_info (void)
> -{
> -  size_t size = graph->size;
> -  struct topo_info *ti = XNEW (struct topo_info);
> -  ti->visited = sbitmap_alloc (size);
> -  bitmap_clear (ti->visited);
> -  ti->topo_order.create (1);
> -  return ti;
> -}
> -
> -
> -/* Free the topological sort info pointed to by TI.  */
> -
> -static void
> -free_topo_info (struct topo_info *ti)
> +topo_info::topo_info () : visited (graph->size), topo_order (1)
>  {
> -  sbitmap_free (ti->visited);
> -  ti->topo_order.release ();
> -  free (ti);
> +  bitmap_clear (visited);
>  }
>
>  /* Visit the graph in topological order, and store the order in the
> @@ -2679,7 +2664,7 @@ solve_graph (constraint_graph_t graph)
>while (!bitmap_empty_p (changed))
>  {
>unsigned int i;
> -  struct topo_info *ti = init_topo_info ();
> +  topo_info *ti = new topo_info ();

I think it would be better to save the repeated allocations and instead just
clear the bitmap and the vector here.  Your patch makes this more
difficult.

Richard.

>stats.iterations++;
>
>bitmap_obstack_initialize (&iteration_obstack);
> @@ -2797,7 +2782,7 @@ solve_graph (constraint_graph_t graph)
> }
> }
> }
> -  free_topo_info (ti);
> +  delete ti;
>bitmap_obstack_release (&iteration_obstack);
>  }
>
> --
> 2.9.0
>


[PATCH] Handle loops with loop->latch == NULL (PR gcov-profile/71868)

2016-07-25 Thread Martin Liška
Hi.

As discussed with Honza, we should sum all edge frequencies when a loop
has multiple latches.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 07dc9092511aedfe2786630d72419b16ef660d0c Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 22 Jul 2016 11:36:52 +0200
Subject: [PATCH] Handle loops with loop->latch == NULL (PR gcov-profile/71868)

gcc/ChangeLog:

2016-07-22  Martin Liska  

	PR gcov-profile/71868
	* cfgloopanal.c (expected_loop_iterations_unbounded): When we
	have a function with multiple latches, count them all.
---
 gcc/cfgloopanal.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c
index 2739f44..ae08108 100644
--- a/gcc/cfgloopanal.c
+++ b/gcc/cfgloopanal.c
@@ -244,7 +244,7 @@ expected_loop_iterations_unbounded (const struct loop *loop,
   /* If we have no profile at all, use AVG_LOOP_NITER.  */
   if (profile_status_for_fn (cfun) == PROFILE_ABSENT)
 expected = PARAM_VALUE (PARAM_AVG_LOOP_NITER);
-  else if (loop->latch->count || loop->header->count)
+  else if (loop->latch && (loop->latch->count || loop->header->count))
 {
   gcov_type count_in, count_latch;
 
@@ -274,8 +274,9 @@ expected_loop_iterations_unbounded (const struct loop *loop,
   freq_latch = 0;
 
   FOR_EACH_EDGE (e, ei, loop->header->preds)
-	if (e->src == loop->latch)
-	  freq_latch = EDGE_FREQUENCY (e);
+	if (e->src == loop->latch
+	|| flow_bb_inside_loop_p (loop, e->src))
+	  freq_latch += EDGE_FREQUENCY (e);
 	else
 	  freq_in += EDGE_FREQUENCY (e);
 
-- 
2.9.0



[PATCH] Call get_ops just for SSA_NAMEs (PR tree-optimization/71987)

2016-07-25 Thread Martin Liška
Hi.

As other calls of get_ops is guarded with TREE_CODE (x) == SSA_NAME, I guess the
same should be done for the call that causes the ICE.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 9dd5cc00eb3c271fad91fede6a9b06df356e001f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 25 Jul 2016 09:19:54 +0200
Subject: [PATCH] Call get_ops just for SSA_NAMEs (PR tree-optimization/71987)

gcc/ChangeLog:

2016-07-25  Martin Liska  

	PR tree-optimization/71987
	* tree-ssa-reassoc.c (maybe_optimize_range_tests): Call get_ops
	just for SSA_NAMEs.

gcc/testsuite/ChangeLog:

2016-07-25  Martin Liska  

	* gcc.dg/torture/pr71987.c: New test.
---
 gcc/testsuite/gcc.dg/torture/pr71987.c | 21 +
 gcc/tree-ssa-reassoc.c | 10 ++
 2 files changed, 27 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr71987.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr71987.c b/gcc/testsuite/gcc.dg/torture/pr71987.c
new file mode 100644
index 000..87d5938
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr71987.c
@@ -0,0 +1,21 @@
+/* PR tree-optimization/71987 */
+
+int a, b, *c, *d;
+
+short fn1 (int p1)
+{
+  return a ? p1 : a;
+}
+
+void fn2 ()
+{
+  int e, *f = &e;
+  b = fn1 (d != &e);
+  c = f;
+}
+
+int main ()
+{
+  fn2 ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index ece2d08..dfaf7d8 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -3519,7 +3519,8 @@ maybe_optimize_range_tests (gimple *stmt)
 	 (or &, corresponding to 1/0 in the phi arguments,
 	 push into ops the individual range test arguments
 	 of the bitwise or resp. and, recursively.  */
-	  if (!get_ops (rhs, code, &ops,
+	  if (TREE_CODE (rhs) == SSA_NAME
+	  && !get_ops (rhs, code, &ops,
 			loop_containing_stmt (stmt))
 	  && (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
 		  != tcc_comparison)
@@ -3537,11 +3538,12 @@ maybe_optimize_range_tests (gimple *stmt)
 	  bb_ent.last_idx++;
 	  bb_ent.op = rhs;
 	}
-	  else if (is_gimple_assign (stmt)
+	  else if (TREE_CODE (lhs) == SSA_NAME
+		   && is_gimple_assign (stmt)
 		   && (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
 		   == tcc_comparison)
-		   &&!get_ops (lhs, code, &ops,
-			   loop_containing_stmt (stmt))
+		   && !get_ops (lhs, code, &ops,
+loop_containing_stmt (stmt))
 		   && has_single_use (lhs))
 	{
 	  operand_entry *oe = operand_entry_pool.allocate ();
-- 
2.9.0



[PATCH] Fix memory leak introduced in r238336

2016-07-25 Thread Martin Liška
Hi.

This is quite obvious change.

I've been waiting for bootstrap and regression tests on ppc64le-redhat-linux.
Ready after it finishes?

Martin
>From 2f416d7feca35d9075124f4dc74f3560a18beefb Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 22 Jul 2016 12:46:08 +0200
Subject: [PATCH] Fix memory leak introduced in r238336

gcc/ChangeLog:

2016-07-22  Martin Liska  

	* tree-ssa-loop-niter.c (loop_only_exit_p): Release body array.
---
 gcc/tree-ssa-loop-niter.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index b7d7c32..95bb5cf 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2119,7 +2119,10 @@ loop_only_exit_p (const struct loop *loop, const_edge exit)
 {
   for (bsi = gsi_start_bb (body[i]); !gsi_end_p (bsi); gsi_next (&bsi))
 	if (stmt_can_terminate_bb_p (gsi_stmt (bsi)))
-	  return true;
+	  {
+	free (body);
+	return true;
+	  }
 }
 
   free (body);
-- 
2.9.0



Re: [PR70920] transform (intptr_t) x eq/ne CST to x eq/ne (typeof x) cst

2016-07-25 Thread Richard Biener
On Mon, 25 Jul 2016, Prathamesh Kulkarni wrote:

> Hi Richard,
> The attached patch tries to fix PR70920.
> It adds your pattern from comment 1 in the PR
> (with additional gating on INTEGRAL_TYPE_P to avoid regressing 
> finalize_18.f90)
> and second pattern, which is reverse of the first transform.
> I needed to update ssa-dom-branch-1.c because with patch applied,
> jump threading removed the second if (i != 0B) block.
> The dumps with and without patch for ssa-dom-branch-1.c start
> to differ with forwprop1:
> 
> before:
>  :
>   _1 = temp_16(D)->code;
>   _2 = _1 == 42;
>   _3 = (int) _2;
>   _4 = (long int) _3;
>   temp_17 = (struct rtx_def *) _4;
>   if (temp_17 != 0B)
> goto ;
>   else
> goto ;
> 
> after:
> :
>   _1 = temp_16(D)->code;
>   _2 = _1 == 42;
>   _3 = (int) _2;
>   _4 = (long int) _2;
>   temp_17 = (struct rtx_def *) _4;
>   if (_1 == 42)
> goto ;
>   else
> goto ;
> 
> I suppose the transform is correct for above test-case ?
> 
> Then vrp dump shows:
>  Threaded jump 5 --> 9 to 13
>   Threaded jump 8 --> 9 to 13
>   Threaded jump 3 --> 9 to 13
>   Threaded jump 12 --> 9 to 14
> Removing basic block 9
> basic block 9, loop depth 0
>  pred:
> if (i1_10(D) != 0B)
>   goto ;
> else
>   goto ;
>  succ:   10
>  11
> 
> So there remained two instances of if (i1_10 (D) != 0B) in dom2 dump file,
> and hence needed to update the test-case.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> OK to commit ?

--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3408,3 +3408,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 { CONSTRUCTOR_ELT (ctor, idx / k)->value; })
(BIT_FIELD_REF { CONSTRUCTOR_ELT (ctor, idx / k)->value; }
   @1 { bitsize_int ((idx % k) * width); })
+
+/* PR70920: Transform (intptr_t)x eq/ne CST to x eq/ne (typeof x) CST.  
*/
+
+(for cmp (ne eq)
+ (simplify
+  (cmp (convert@2 @0) INTEGER_CST@1)
+  (if (POINTER_TYPE_P (TREE_TYPE (@0))
+   && INTEGRAL_TYPE_P (TREE_TYPE (@2)))

you can use @1 here and omit @2.

+   (cmp @0 (convert @1)
+
+/* Reverse of the above case:
+   x has integral_type, CST is a pointer constant.
+   Transform (typeof CST)x eq/ne CST to x eq/ne (typeof x) CST.  */
+
+(for cmp (ne eq)
+ (simplify
+  (cmp (convert @0) @1)
+  (if (POINTER_TYPE_P (TREE_TYPE (@1))
+   && INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(cmp @0 (convert @1)

The second pattern lacks the INTEGER_CST on @1 so it doesn't match
its comment.  Please do not add vertical space between pattern
comment and pattern.

Please place patterns not at the end of match.pd but where similar
transforms are done.  Like after

/* Simplify pointer equality compares using PTA.  */
(for neeq (ne eq)
 (simplify
  (neeq @0 @1)
  (if (POINTER_TYPE_P (TREE_TYPE (@0))
   && ptrs_compare_unequal (@0, @1))
   { neeq == EQ_EXPR ? boolean_false_node : boolean_true_node; })))

please also share the (for ...) for both patterns or merge them
by changing the condition to

  (if ((POINTER_TYPE_P (TREE_TYPE (@0))
&& INTEGRAL_TYPE_P (TREE_TYPE (@1)))
   || (INTEGRAL_TYPE_P (TREE_TYPE (@0))
   && POINTER_TYPE_P (TREE_TYPE (@1

Richard.

> PS: Writing changelog entries for match.pd is a bit tedious.
> Should we add optional names for pattern so we can refer to them by names
> in the ChangeLog for the more complicated ones ?
> Or maybe just use comments:
> (simplify /* name */ ... ) -;)

That will add the fun of inventing names ;)

> Thanks,
> Prathamesh
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [AArch64][3/3] Migrate aarch64_expand_prologue/epilogue to aarch64_add_constant

2016-07-25 Thread Jiong Wang

On 21/07/16 11:08, Richard Earnshaw (lists) wrote:

On 20/07/16 16:02, Jiong Wang wrote:

Richard,
   Thanks for the review, yes, I believe using aarch64_add_constant is
unconditionally
safe here.  Because we have generated a stack tie to clobber the whole
memory thus
prevent any instruction which access stack be scheduled after that.

   The access to deallocated stack issue was there and fixed by

   https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html.

  aarch64_add_constant itself is generating the same instruction
sequences as the
original code, except for a few cases, it will prefer

   move scratch_reg, #imm
   add sp, sp, scratch_reg

than:
   add sp, sp, #imm_part1
   add sp, sp, #imm_part2





OK, I've had another look at this and I'm happy that we don't
(currently) run into the problem I'm concerned about.

However, this new usage does impose a constraint on aarch64_add_constant
that will need to be respected in future, so please can you add the
following to the comment that precedes that function:

/* ...

This function is sometimes used to adjust the stack pointer, so we
must ensure that it can never cause transient stack deallocation
by writing an invalid value into REGNUM.  */



+  bool frame_related_p = (regnum == SP_REGNUM);

I think it would be better to make the frame-related decision be an
explicit parameter passed to the routine (don't forget SP is not always
the frame pointer).  Then the new uses would pass 'true' and the
existing uses 'false'.

R.


Thanks, attachment is the updated patch which:

  * Added above new comments for aarch64_add_constant.
  * One new parameter "frame_related_p" for aarch64_add_constant.
I thought adding new gcc assertion for sanity check of
frame_related_p and REGNUM, haven't done that as I found dwarf2cfi.c
is doing that.

OK for trunk?

gcc/
2016-07-25  Jiong Wang  

 * config/aarch64/aarch64.c (aarch64_add_constant): New
 parameter "frame_related_p".  Generate CFA annotation when
 it's necessary.
 (aarch64_expand_prologue): Use aarch64_add_constant.
 (aarch64_expand_epilogue): Likewise.
 (aarch64_output_mi_thunk): Pass "false" when calling
 aarch64_add_constant.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 41844a1..ca93f6e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1866,14 +1866,19 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 }
 
 /* Add DELTA onto REGNUM in MODE, using SCRATCHREG to held intermediate value if
-   it is necessary.  */
+   it is necessary.
+
+   This function is sometimes used to adjust the stack pointer, so we must
+   ensure that it can never cause transient stack deallocation by writing an
+   invalid value into REGNUM.  */
 
 static void
 aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
-		  HOST_WIDE_INT delta)
+		  HOST_WIDE_INT delta, bool frame_related_p)
 {
   HOST_WIDE_INT mdelta = abs_hwi (delta);
   rtx this_rtx = gen_rtx_REG (mode, regnum);
+  rtx_insn *insn;
 
   /* Do nothing if mdelta is zero.  */
   if (!mdelta)
@@ -1882,7 +1887,8 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
   /* We only need single instruction if the offset fit into add/sub.  */
   if (aarch64_uimm12_shift (mdelta))
 {
-  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
   return;
 }
 
@@ -1895,15 +1901,23 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
   HOST_WIDE_INT low_off = mdelta & 0xfff;
 
   low_off = delta < 0 ? -low_off : low_off;
-  emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
-  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
   return;
 }
 
   /* Otherwise use generic function to handle all other situations.  */
   rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
   aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
-  emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
+  insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
+  if (frame_related_p)
+{
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
+  rtx adj = plus_constant (mode, this_rtx, delta);
+  add_reg_note (insn , REG_CFA_ADJUST_CFA, gen_rtx_SET (this_rtx, adj));
+}
 }
 
 static bool
@@ -3038,36 +3052,7 @@ aarch64_expand_prologue (void)
   frame_size -= (offset + crtl->outgoing_args_size);
   fp_offset = 0;
 
-  if (frame_size >= 0x100)
-	{
-	  rtx op0 = gen_rtx_REG (Pmode, IP0_REGNUM);
-	  emit_mov

Re: [PATCH] Call get_ops just for SSA_NAMEs (PR tree-optimization/71987)

2016-07-25 Thread kugan

Hi Martin,


On 25/07/16 18:56, Martin Liška wrote:

Hi.

As other calls of get_ops is guarded with TREE_CODE (x) == SSA_NAME, I guess the
same should be done for the call that causes the ICE.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin


0001-Call-get_ops-just-for-SSA_NAMEs-PR-tree-optimization.patch


From 9dd5cc00eb3c271fad91fede6a9b06df356e001f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 25 Jul 2016 09:19:54 +0200
Subject: [PATCH] Call get_ops just for SSA_NAMEs (PR tree-optimization/71987)

gcc/ChangeLog:

2016-07-25  Martin Liska  

PR tree-optimization/71987
* tree-ssa-reassoc.c (maybe_optimize_range_tests): Call get_ops
just for SSA_NAMEs.

gcc/testsuite/ChangeLog:

2016-07-25  Martin Liska  

* gcc.dg/torture/pr71987.c: New test.
---
 gcc/testsuite/gcc.dg/torture/pr71987.c | 21 +
 gcc/tree-ssa-reassoc.c | 10 ++
 2 files changed, 27 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr71987.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr71987.c 
b/gcc/testsuite/gcc.dg/torture/pr71987.c
new file mode 100644
index 000..87d5938
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr71987.c
@@ -0,0 +1,21 @@
+/* PR tree-optimization/71987 */
+
+int a, b, *c, *d;
+
+short fn1 (int p1)
+{
+  return a ? p1 : a;
+}
+
+void fn2 ()
+{
+  int e, *f = &e;
+  b = fn1 (d != &e);
+  c = f;
+}
+
+int main ()
+{
+  fn2 ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index ece2d08..dfaf7d8 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -3519,7 +3519,8 @@ maybe_optimize_range_tests (gimple *stmt)
 (or &, corresponding to 1/0 in the phi arguments,
 push into ops the individual range test arguments
 of the bitwise or resp. and, recursively.  */
- if (!get_ops (rhs, code, &ops,
+ if (TREE_CODE (rhs) == SSA_NAME
+ && !get_ops (rhs, code, &ops,
loop_containing_stmt (stmt))
  && (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
  != tcc_comparison)
@@ -3537,11 +3538,12 @@ maybe_optimize_range_tests (gimple *stmt)
  bb_ent.last_idx++;
  bb_ent.op = rhs;
}
- else if (is_gimple_assign (stmt)
+ else if (TREE_CODE (lhs) == SSA_NAME
+  && is_gimple_assign (stmt)
   && (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
   == tcc_comparison)
-  &&!get_ops (lhs, code, &ops,
-  loop_containing_stmt (stmt))
+  && !get_ops (lhs, code, &ops,
+   loop_containing_stmt (stmt))
   && has_single_use (lhs))
{
  operand_entry *oe = operand_entry_pool.allocate ();


Sorry about the breakage. Since final_range_test_p allows either lhs or 
rhs to be SSA_NAME (for the different cases it accepts), we should 
indeed check for TREE_CODE being SSA_NAME. Unfortunately it didn't 
trigger in my testing. Lets wait for the maintainers conformation.


Thanks for working on this,
Kugan


Re: [PATCH] Call get_ops just for SSA_NAMEs (PR tree-optimization/71987)

2016-07-25 Thread Martin Liška
On 07/25/2016 11:51 AM, kugan wrote:
> Sorry about the breakage. Since final_range_test_p allows either lhs or rhs 
> to be SSA_NAME (for the different cases it accepts), we should indeed check 
> for TREE_CODE being SSA_NAME. Unfortunately it didn't trigger in my testing. 
> Lets wait for the maintainers conformation.
> 
> Thanks for working on this,
> Kugan

You're welcome, I noticed that I see breakage for e.g. Firefox or Inkscape 
(-O2),
while linux-kernel or GIMP work fine.

Martin


Re: [PATCH] Teach VRP to register assertions along default switch labels (PR 18046)

2016-07-25 Thread Richard Biener
On Mon, Jul 25, 2016 at 5:38 AM, Patrick Palka  wrote:
> On Fri, 22 Jul 2016, Patrick Palka wrote:
>
>> On Fri, 22 Jul 2016, Patrick Palka wrote:
>>
>> > On Fri, 22 Jul 2016, Patrick Palka wrote:
>> >
>> > > This patch teaches VRP to register along a default switch label
>> > > assertions that corresponds to the anti range of each case label.
>> > >
>> > > Does this look OK to commit after bootstrap + regtest on
>> > > x86_64-pc-linux-gnu?
>> >
>> > Forgot the changelog:
>> >
>> > gcc/ChangeLog:
>> >
>> > PR tree-optimization/18046
>> > * tree-vrp.c (find_switch_asserts): Register edge assertions
>> > for the default label which correspond to the anti-ranges
>> > of each non-case label.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> > PR tree-optimization/18046
>> > * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Bump FSM count to 5.
>> > * gcc.dg/tree-ssa/vrp103.c: New test.
>> > * gcc.dg/tree-ssa/vrp104.c: New test.
>> >
>>
>> The patch failed to bootstrap due to a number -Warray-bounds warnings
>> getting emitted for the autogenerated header file insn-modes.h:
>>
>> In file included from /home/patrick/code/gcc/gcc/machmode.h:24:0,
>>  from /home/patrick/code/gcc/gcc/coretypes.h:391,
>>  from /home/patrick/code/gcc/gcc/lto-streamer-out.c:25:
>> ./insn-modes.h: In function ‘void produce_asm_for_decls()’:
>> ./insn-modes.h:638:36: warning: array subscript is outside array bounds 
>> [-Warray-bounds]
>>  default: return mode_inner[mode];
>>  ~~~^
>>
>> These new warnings seem legitimate.  From what I can tell, this array
>> access along the default switch label will always be out of bounds.  The
>> code it's warning about basically looks like this:
>>
>>   enum A { a, b, c };
>>   int foo (A i)
>>   {
>> int array[3];
>> switch (i)
>> {
>>   case a: return x;
>>   case b: return y;
>>   case c: return z;
>>   default: return array[i];
>> }
>>   }
>>
>> and while the switch does exhaust every possible enumeration value of A,
>> there's nothing stopping the user from passing an arbitrary int to
>> foo() thus triggering the default case.  So this patch suppresses these
>> warnings by making genemit emit an assertion that verifies that mode is
>> within the bounds of the array.  This assertion won't affect performance
>> because the mode_*_inline functions are always called with constant
>> arguments.
>>
>> This version of the patch has the following changes:
>>
>> 1. Fixes the bootstrap failure as mentioned above.
>> 2. Checks if the switch operand is live along the default edge before
>> registering assertions.
>> 3. Combines contiguous case ranges to reduce the number of assert
>> expressions to insert.
>>
>> Bootstrap + regtesting in progress on x86_64-pc-linux-gnu.
>>
>> gcc/ChangeLog:
>>
>>   PR tree-optimization/18046
>>   * genmodes.c (emit_mode_size_inline): Emit an assert that
>>   verifies that mode is a valid array index.
>>   (emit_mode_nuinits_inline): Likewise.
>>   (emit_mode_inner_inline): Likewise.
>>   (emit_mode_unit_size_inline): Likewise.
>>   (emit_mode_unit_precision_inline): Likewise.
>>   * tree-vrp.c (compare_case_label_values): New qsort callback.
>>   (find_switch_asserts): Register edge assertions for
>>   the default label, which correspond to the anti-range of each
>>   non-case label.
>>
>> gcc/testsuite/ChangeLog:
>>
>>   PR tree-optimization/18046
>>   * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Bump FSM count to 5.
>>   * gcc.dg/tree-ssa/vrp103.c: New test.
>>   * gcc.dg/tree-ssa/vrp104.c: New test.
>
> Here's another version of the patch, which has the following changes:
>
> 1. Use wide-int arithmetic directly instead of tree arithmetic.
> 2. Don't bother re-sorting and re-using the ci vector.  Instead just
> inspect gimple_switch_label() directly since cases are already sorted
> there by CASE_LOW.
> 3. Add an insertion limit of 10 and a tunable param.
>
> Bootstrapped + regtested on x86_64-pc-linux-gnu.

Ok.

The only thing I wonder is whether VRP does a good job combining
non-adjacent anti-ranges or if the result is sth non-sensible.  ISTR
VRP simply chooses A when combining A and B which would mean
inserting asserts will only provide better ranges for more than one
asserts via equivalences (which might be good enough, of course).
I think the anti-range merge behavior is good but I for example wonder
about the behavior for ~[0, n] which will end up as [n, +INF] for
unsigned.  Also the combining limitation will make ranges derived
from the switch parameter less useful, like

  switch (x)
  {

  default:
 x++;


where x++ will only be derived from the merged anti-range.

All these are existing VRP range representation limitations of course.

Thanks,
Richard.

> gcc/ChangeLog:
>
> PR tree-optimization/18046
> * genmodes.c (emit_mode_size_inline): Emit an assert that
> ver

Re: [RFC][IPA-VRP] Disable setting param of __builtin_constant_p to null

2016-07-25 Thread Richard Biener
On Mon, Jul 25, 2016 at 8:59 AM, kugan
 wrote:
> Hi,
>
>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>> index ecfab1f..23c12b5 100644
>>> --- a/gcc/tree-vrp.c
>>> +++ b/gcc/tree-vrp.c
>>> @@ -3759,8 +3759,10 @@ extract_range_basic (value_range *vr, gimple
>>> *stmt)
>>>   && SSA_NAME_IS_DEFAULT_DEF (arg)
>>>   && TREE_CODE (SSA_NAME_VAR (arg)) == PARM_DECL)
>>> {
>>> +#if 0
>>>   set_value_range_to_null (vr, type);
>>>   return;
>>> +#endif
>>
>>
>> It is not cleanest either, but better to test cfun->after_inlining
>
>
> Thanks. Here is the patch which does this. Bootstrapped and regression
> tested with the rest of the patches in the series. Is this OK for trunk?

Ok.

Richard.

> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2016-07-25  Kugan Vivekanandarajah  
>
> * tree-vrp.c (extract_range_basic): Check cfun->after_inlining
> before
> folding call to __builtin_constant_p with parameters to false.


Re: [PATCH] Handle loops with loop->latch == NULL (PR gcov-profile/71868)

2016-07-25 Thread Richard Biener
On Mon, Jul 25, 2016 at 10:55 AM, Martin Liška  wrote:
> Hi.
>
> As discussed with Honza, we should sum all edge frequencies when a loop
> has multiple latches.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

-   if (e->src == loop->latch)
- freq_latch = EDGE_FREQUENCY (e);
+   if (e->src == loop->latch
+   || flow_bb_inside_loop_p (loop, e->src))
+ freq_latch += EDGE_FREQUENCY (e);

the e->src == loop->latch condition is redundant now.

Richard.

> Martin


Re: [PATCH] Fix memory leak introduced in r238336

2016-07-25 Thread Richard Biener
On Mon, Jul 25, 2016 at 10:58 AM, Martin Liška  wrote:
> Hi.
>
> This is quite obvious change.
>
> I've been waiting for bootstrap and regression tests on ppc64le-redhat-linux.
> Ready after it finishes?

Ok.

Richard.

> Martin


Re: [PATCH] Call get_ops just for SSA_NAMEs (PR tree-optimization/71987)

2016-07-25 Thread Richard Biener
On Mon, Jul 25, 2016 at 10:56 AM, Martin Liška  wrote:
> Hi.
>
> As other calls of get_ops is guarded with TREE_CODE (x) == SSA_NAME, I guess 
> the
> same should be done for the call that causes the ICE.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

- else if (is_gimple_assign (stmt)
+ else if (TREE_CODE (lhs) == SSA_NAME
+  && is_gimple_assign (stmt)
   && (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
   == tcc_comparison)
-  &&!get_ops (lhs, code, &ops,
-  loop_containing_stmt (stmt))
+  && !get_ops (lhs, code, &ops,
+   loop_containing_stmt (stmt))

the check is redundant here (the is_gimple_assign check as well).  If the
RHS code is a tcc_comparison the lhs has to be an SSA name.  Ok with
the is_gimple_assign check removed instead.

Thanks,
Richard.

> Martin


[PATCH] Use std::mt19937, std::thread and std::atomic to simplify tests

2016-07-25 Thread Jonathan Wakely

These tests that hammer shared_ptr and weak_ptr to test the
thread-safety of the reference counting were originally written for
the TR1 smart pointers, and so pre-date std::thread etc.

This refactors them to use C++11 std::thread and std::atomic (with
relaxed memory ordering, since we don't need to know the counter value
until after we've joined all the worker threads).

I also used the std::mt19937 typedef that uses the same parameters as
the test lists explicitly (mt19937 produces uint_fast32_t not unsigned
long, but that doesn't change anything here) and replaced the
non-uniform modulus operation with std::uniform_int_distribution.

To verify that after the changes we still test what we're trying to
test, I changed the shared_ptr code to remove the synchronization in
_M_add_ref_lock and saw the tests fail.

* testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc: Use
std::mt19937, std::thread and std::atomic to simplify test.
* testsuite/20_util/shared_ptr/thread/mutex_weaktoshared.cc: Likewise.

Tested x86_64-linux (8 HT cores) and powerpc64-linux (160 cores).

Committed to trunk.

commit b2f86d9d04f9561b5189eccaa85e5d2b21884de4
Author: Jonathan Wakely 
Date:   Mon Jul 25 10:32:11 2016 +0100

Use std::mt19937, std::thread and std::atomic to simplify tests

* testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc: Use
std::mt19937, std::thread and std::atomic to simplify test.
* testsuite/20_util/shared_ptr/thread/mutex_weaktoshared.cc: Likewise.

diff --git 
a/libstdc++-v3/testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc 
b/libstdc++-v3/testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc
index d5b9d4f..a853307 100644
--- a/libstdc++-v3/testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc
+++ b/libstdc++-v3/testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc
@@ -25,11 +25,11 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-
-#include 
+#include 
+#include 
+#include 
 
 #ifdef _GLIBCXX_HAVE_UNISTD_H
 #include // To test for _POSIX_THREAD_PRIORITY_SCHEDULING
@@ -50,19 +50,13 @@ const unsigned long HAMMER_REPEAT = 10;
 const unsigned long KILL_ONE_IN = 1000;
 
 struct A
-  {
-static _Atomic_word counter;
-A()
-  {
-   __gnu_cxx::__atomic_add(&counter, 1);
-  }
-~A()
-  {
-   __gnu_cxx::__atomic_add(&counter, -1);
-  }
-  };
+{
+  static std::atomic counter;
+  A() { counter.fetch_add(1, std::memory_order_relaxed); }
+ ~A() { counter.fetch_sub(1, std::memory_order_relaxed); }
+};
 
-_Atomic_word A::counter = 0;
+std::atomic A::counter{ 0 };
 
 typedef std::shared_ptr sp_A_t;
 typedef std::weak_ptr wp_A_t;
@@ -80,16 +74,10 @@ struct shared_and_weak_pools
 { }
 };
 
-void* thread_hammer_and_kill(void* opaque_pools)
+void thread_hammer_and_kill(shared_and_weak_pools& pools)
 {
-  shared_and_weak_pools& pools = 
*static_cast(opaque_pools);
-  // Using the same parameters as in the RNG test cases.
-  std::mersenne_twister_engine<
-unsigned long, 32, 624, 397, 31,
-0x9908b0dful, 11,
-0xul, 7,
-0x9d2c5680ul, 15,
-0xefc6ul, 18, 1812433253ul> rng;
+  std::mt19937 urbg;
+  std::uniform_int_distribution<> dist(0, KILL_ONE_IN - 1);
   
   sp_vector_t::iterator cur_shared = pools.shared_pool.begin();
   wp_vector_t::iterator cur_weak = pools.weak_pool.begin();
@@ -107,26 +95,16 @@ void* thread_hammer_and_kill(void* opaque_pools)
   break;
   }
   
-  if (rng() % KILL_ONE_IN == 0)
+  if (dist(urbg) == 0)
   {
 cur_shared->reset();
 ++cur_shared;
   }
 }
-  return 0;
 }
 
-void* thread_hammer(void* opaque_weak)
+void thread_hammer(wp_vector_t& weak_pool)
 {
-  wp_vector_t& weak_pool = *static_cast(opaque_weak);
-  // Using the same parameters as in the RNG test cases.
-  std::mersenne_twister_engine<
-unsigned long, 32, 624, 397, 31,
-0x9908b0dful, 11,
-0xul, 7,
-0x9d2c5680ul, 15,
-0xefc6ul, 18, 1812433253ul> rng;
-
   wp_vector_t::iterator cur_weak = weak_pool.begin();
 
   for (unsigned int i = 0; i < HAMMER_REPEAT; ++i)
@@ -142,51 +120,38 @@ void* thread_hammer(void* opaque_weak)
   break;
   }
 }
-  return 0;
 }
 
-int
+void
 test01()
 {
   bool test __attribute__((unused)) = true;
   sp_vector_t obj_pool(POOL_SIZE);
   
-  for(sp_vector_t::iterator cur = obj_pool.begin(); cur != obj_pool.end(); 
++cur)
-  {
-cur->reset(new A);
-  }
+  for(auto& obj : obj_pool)
+obj.reset(new A);
+
   // Obtain weak references.
   std::vector weak_pool(HAMMER_MAX_THREADS, 
wp_vector_t(obj_pool.begin(), obj_pool.end()));
   
   // Launch threads with pointer to weak reference.
-  pthread_t threads[HAMMER_MAX_THREADS];
+  std::thread threads[HAMMER_MAX_THREADS];
 #if defined(__sun) && defined(__svr4__) && _XOPEN_VERSION >= 500
   pthread_setconcurrency (HAMMER_MAX_THREADS);
 #endif
   
-  pthread_attr_t tattr;
-  pthread_attr_ini

Re: [PATCH/AARCH64] Add scheduler for vulcan.

2016-07-25 Thread James Greenhalgh
On Wed, Jul 20, 2016 at 03:07:45PM +0530, Virendra Pathak wrote:
> Hi gcc-patches group,
> 
> Please find the patch for adding the basic scheduler for vulcan
> in the aarch64 port.
> 
> Tested the patch with compiling cross aarch64-linux-gcc,
> bootstrapped native aarch64-unknown-linux-gnu and
> run gcc regression.
> 
> Kindly review and merge the patch to trunk, if the patch is okay.
> 
> There are few TODO in this patch which we have planned to
> submit in the next submission e.g. crc and crypto
> instructions, further improving integer & fp load/store
> based on addressing mode of the address.

Hi Virendra,

Thanks for the patch, I have some concerns about the size of the
automata that this description generates. As you can see
(use (automata_option "stats") in the description to enable statistics)
this scheduler description generates a 10x larger automata for Vulcan than
the second largest description we have for AArch64 (cortex_a53_advsimd):

  Automaton `cortex_a53_advsimd'
 9072 NDFA states,  49572 NDFA arcs
 9072 DFA states,   49572 DFA arcs
 4050 minimal DFA states,   23679 minimal DFA arcs
  368 all insns 11 insn equivalence classes
0 locked states
  28759 transition comb vector els, 44550 trans table els: use simple vect
  44550 min delay table els, compression factor 2

  Automaton `vulcan'
103223 NDFA states,  651918 NDFA arcs
103223 DFA states,   651918 DFA arcs
45857 minimal DFA states,   352255 minimal DFA arcs
  368 all insns 28 insn equivalence classes
0 locked states
  429671 transition comb vector els, 1283996 trans table els: use comb vect
  1283996 min delay table els, compression factor 2

Such a large automaton increases compiler build time and memory consumption,
often for little scheduling benefit.

Normally such a large automaton comes from using a large repeat
expression (*). For example in your modeling of divisions:

> +(define_insn_reservation "vulcan_div" 13
> +  (and (eq_attr "tune" "vulcan")
> +   (eq_attr "type" "sdiv,udiv"))
> +  "vulcan_i1*13")
> +
> +(define_insn_reservation "vulcan_fp_divsqrt_s" 16
> +  (and (eq_attr "tune" "vulcan")
> +   (eq_attr "type" "fdivs,fsqrts"))
> +  "vulcan_f0*8|vulcan_f1*8")
> +
> +(define_insn_reservation "vulcan_fp_divsqrt_d" 23
> +  (and (eq_attr "tune" "vulcan")
> +   (eq_attr "type" "fdivd,fsqrtd"))
> +  "vulcan_f0*12|vulcan_f1*12")

In other pipeline models, we try to keep these repeat numbers low to avoid
the large state-space growth they cause. For example, the Cortex-A57
pipeline model describes them as:

  (define_insn_reservation "cortex_a57_fp_divd" 16
(and (eq_attr "tune" "cortexa57")
 (eq_attr "type" "fdivd, fsqrtd, neon_fp_div_d, neon_fp_sqrt_d"))
"ca57_cx2_block*3")

The lower accuracy is acceptable because of the nature of the scheduling
model. For a machine with an issue rate of "4" like Vulcan, each cycle the
compiler models it tries to find four instructions to schedule, before it
progresses the state of the automaton. If an instruction is modelled as
blocking the "vulcan_i1" unit for 13 cycles, that means up to 52
instructions that the scheduler would have to find before issuing the next
instruction which would use vulcan_i1. Because scheduling works within
basic-blocks, the chance of finding so many independent instructions is
extremely low, and so you'd never see the benefit of the 13-cycle block.

I tried lowering the repeat expressions as so:

> +(define_insn_reservation "vulcan_div" 13
> +  (and (eq_attr "tune" "vulcan")
> +   (eq_attr "type" "sdiv,udiv"))
> +  "vulcan_i1*3")
> +
> +(define_insn_reservation "vulcan_fp_divsqrt_s" 16
> +  (and (eq_attr "tune" "vulcan")
> +   (eq_attr "type" "fdivs,fsqrts"))
> +  "vulcan_f0*3|vulcan_f1*3")
> +
> +(define_insn_reservation "vulcan_fp_divsqrt_d" 23
> +  (and (eq_attr "tune" "vulcan")
> +   (eq_attr "type" "fdivd,fsqrtd"))
> +  "vulcan_f0*5|vulcan_f1*5")

Which more than halves the size of the generated automaton:

  Automaton `vulcan'
45370 NDFA states,  319261 NDFA arcs
45370 DFA states,   319261 DFA arcs
20150 minimal DFA states,   170824 minimal DFA arcs
  368 all insns 28 insn equivalence classes
0 locked states
  215565 transition comb vector els, 564200 trans table els: use comb vect
  564200 min delay table els, compression factor 2

The other technique we use to reduce the size of the generated automaton is
to split off the AdvSIMD/FP model from the main pipeline description
(the thunderx _main, thunderx_mult, thunderx_divide, and thunderx_simd
 models take this approach even further and acheieve very small automaton
 as a result)

A change like wiring the vulcan_f0 and vulcan_f1 reservations to be
cpu_units of a new define_automaton "vulcan_advsimd" would cut the size
of the automaton by half again:

  Automaton `vulcan'
 8520 NDFA states,  52754 NDFA arcs
 8520 DFA states,   

[PATCHv2, PING][ARM] -mpure-code option for ARM

2016-07-25 Thread Andre Vieira (lists)
On 11/07/16 17:56, Andre Vieira (lists) wrote:
> On 07/07/16 13:30, mickael guene wrote:
>> Hi Andre,
>>
>>  Another feedback on your purecode patch.
>>  You have to disable casesi pattern since then it will
>> generate wrong code with -mpure-code option.
>>  Indeed it will generate an 'adr rx, .Lx' (aka
>> 'subs rx, PC, #offset') which will not work in our
>> case since 'Lx' label is put in an .rodata section.
>> So offset value is unknown and can be impossible
>> to encode correctly.
>>
>> Regards
>> Mickael
>>
>> On 06/30/2016 04:32 PM, Andre Vieira (lists) wrote:
>>> Hello,
>>>
>>> This patch adds the -mpure-code option for ARM. This option ensures
>>> functions are put into sections that contain only code and no data. To
>>> ensure this throughout compilation we give these sections the ARM
>>> processor-specific ELF section attribute "SHF_ARM_PURECODE". This option
>>> is only supported for non-pic code for armv7-m targets.
>>>
>>> This patch introduces a new target hook 'TARGET_ASM_ELF_FLAGS_NUMERIC'.
>>> This target hook enables a target to use the numeric value for elf
>>> section attributes rather than their alphabetical representation. If
>>> TARGET_ASM_ELF_FLAGS_NUMERIC returns TRUE, the existing
>>> 'default_elf_asm_named_section', will print the numeric value of the
>>> section attributes for the current section. This target hook has two
>>> parameters:
>>> unsigned int FLAGS, the input parameter that tells the function the
>>> current section's attributes;
>>> unsigned int *NUM, used to pass down the numerical representation of the
>>> section's attributes.
>>>
>>> The default implementation for TARGET_ASM_ELF_FLAGS_NUMERIC will return
>>> false, so existing behavior is not changed.
>>>
>>> Bootstrapped and tested for arm-none-linux-gnueabihf. Further tested for
>>> arm-none-eabi with a Cortex-M3 target.
>>>
>>>
>>> gcc/ChangeLog:
>>> 2016-06-30  Andre Vieira  
>>> Terry Guo  
>>>
>>> * target.def (elf_flags_numeric): New target hook.
>>> * targhooks.h (default_asm_elf_flags_numeric): New.
>>> * varasm.c (default_asm_elf_flags_numeric): New.
>>>   (default_elf_asm_named_section): Use new target hook.
>>> * config/arm/arm.opt (mpure-code): New.
>>> * config/arm/arm.h (SECTION_ARM_PURECODE): New.
>>> * config/arm/arm.c (arm_asm_init_sections): Add section
>>>   attribute to default text section if -mpure-code.
>>>   (arm_option_check_internal): Diagnose use of option with
>>>   non supported targets and/or options.
>>>   (arm_asm_elf_flags_numeric): New.
>>>   (arm_function_section): New.
>>>   (arm_elf_section_type_flags): New.
>>> * config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Disable
>>>   for -mpure-code.
>>> * gcc/doc/texi (TARGET_ASM_ELF_FLAGS_NUMERIC): New.
>>> * gcc/doc/texi.in (TARGET_ASM_ELF_FLAGS_NUMERIC): Likewise.
>>>
>>>
>>>
>>> gcc/testsuite/ChangeLog:
>>> 2016-06-30  Andre Vieira  
>>> Terry Guo  
>>>
>>> * gcc.target/arm/pure-code/ffunction-sections.c: New.
>>> * gcc.target/arm/pure-code/no-literal-pool.c: New.
>>> * gcc.target/arm/pure-code/pure-code.exp: New.
>>>
>>
> Hi Sandra, Mickael,
> 
> Thank you for your comments. I changed the description of -mpure-code in
> invoke.texi to better reflect the error message you get wrt supported
> targets.
> 
> As for the target hook description, I hope the text is clearer now. Let
> me know if you think it needs further explanation.
> 
> I also fixed the double '%' in the text string for unnamed text sections
> and disabled the casesi pattern.
> 
> I duplicated the original casesi test
> 'gcc/testsuite/gcc.c-torture/compile/pr46934.c' for pure-code to make
> sure the casesi was disabled and other patterns were selected instead.
> 
> Reran regressions for pure-code.exp for Cortex-M3.
> 
> Cheers,
> Andre
> 
> 
> gcc/ChangeLog:
> 2016-07-11  Andre Vieira  
> Terry Guo  
> 
> * target.def (elf_flags_numeric): New target hook.
> * hooks.c (hook_uint_uintp_false): New generic hook.
> * varasm.c (default_elf_asm_named_section): Use new target hook.
> * config/arm/arm.opt (mpure-code): New.
> * config/arm/arm.h (SECTION_ARM_PURECODE): New.
> * config/arm/arm.c (arm_asm_init_sections): Add section
> attribute to default text section if -mpure-code.
> (arm_option_check_internal): Diagnose use of option with
> non supported targets and/or options.
> (arm_asm_elf_flags_numeric): New.
> (arm_function_section): New.
> (arm_elf_section_type_flags): New.
> * config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Disable
> for -mpure-code.
> * config/arm/arm.md (casesi): Disable pattern for
> -mpure-code.
> * gcc/doc/texi (TARGET_ASM_ELF_FLAGS_NUMERIC): New.
> * gcc/doc/texi.in (TARGET_ASM_ELF_FLAGS_NUMERIC): Likewise.
> 

[Fortran, Patch, pr70524, v1] [5/6/7 Regression] ICE when using -frepack-arrays -Warray-temporaries

2016-07-25 Thread Andre Vehreschild
Hi all,

the attached patch fixes the ICE when the options in the title are
used. Two issues caused this ICE:

1. the error-printing routines relied on the locus.nextc which was not
set by the gfc_set_backend_locus() and is now set by the patch
(locally, not in gfc_set_backend_locus()).

2. the locus of the function whose result triggered the warning (and
with it the ICE) was set too late.

Both issues are addressed by the patch. Albeit I am not quite sure,
whether my solution to 1. (the first chunk in the patch) is ok this
way.

Bootstraps and regtests ok on x86_64-linux-gnu/F23.

Ok for trunk? And with one week delay for gcc-5- and -6-branch?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


pr70524_1.clog
Description: Binary data
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index e95c8dd..d52cd11 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -6103,7 +6103,12 @@ gfc_trans_dummy_array_bias (gfc_symbol * sym, tree tmpdesc,
   return;
 }
 
+  loc.nextc = NULL;
   gfc_save_backend_locus (&loc);
+  /* loc.nextc is not set by save_backend_locus but the location routines
+ depend on it.  */
+  if (loc.nextc == NULL)
+loc.nextc = loc.lb->line;
   gfc_set_backend_locus (&sym->declared_at);
 
   /* Descriptor type.  */
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 05dfcb4..2a34a4c 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -4087,6 +4087,8 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block)
   else if (proc_sym->as)
 	{
 	  tree result = TREE_VALUE (current_fake_result_decl);
+	  gfc_save_backend_locus (&loc);
+	  gfc_set_backend_locus (&proc_sym->declared_at);
 	  gfc_trans_dummy_array_bias (proc_sym, result, block);
 
 	  /* An automatic character length, pointer array result.  */
@@ -4096,8 +4098,6 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block)
 	  tmp = NULL;
 	  if (proc_sym->ts.deferred)
 		{
-		  gfc_save_backend_locus (&loc);
-		  gfc_set_backend_locus (&proc_sym->declared_at);
 		  gfc_start_block (&init);
 		  tmp = gfc_null_and_pass_deferred_len (proc_sym, &init, &loc);
 		  gfc_add_init_cleanup (block, gfc_finish_block (&init), tmp);
diff --git a/gcc/testsuite/gfortran.dg/dependency_48.f90 b/gcc/testsuite/gfortran.dg/dependency_48.f90
new file mode 100644
index 000..6470019
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dependency_48.f90
@@ -0,0 +1,26 @@
+! { dg-do compile }
+! { dg-options "-frepack-arrays -Warray-temporaries -O" }
+
+! Same as dependency_35 but with repack-arrays
+
+module foo
+  implicit none
+contains
+  pure function bar(i,j) ! { dg-warning "Creating array temporary at \\(1\\)" }
+integer, intent(in) :: i,j
+integer, dimension(2,2) :: bar
+bar = 33
+  end function bar
+end module foo
+
+program main
+  use foo
+  implicit none
+  integer a(2,2), b(2,2),c(2,2), d(2,2), e(2)
+
+  read (*,*) b, c, d
+  a = matmul(b,c) + d
+  a = b + bar(3,4)
+  a = bar(3,4)*5 + b
+  e = sum(b,1) + 3
+end program main


Re: [AArch64][2/14] ARMv8.2-A FP16 one operand vector intrinsics

2016-07-25 Thread James Greenhalgh
On Wed, Jul 20, 2016 at 06:00:34PM +0100, Jiong Wang wrote:
> On 07/07/16 17:14, Jiong Wang wrote:
> >This patch add ARMv8.2-A FP16 one operand vector intrinsics.
> >
> >We introduced new mode iterators to cover HF modes, qualified patterns
> >which was using old mode iterators are switched to new ones.
> >
> >We can't simply extend old iterator like VDQF to conver HF modes,
> >because not all patterns using VDQF are with new FP16 support, thus we
> >introduced new, temperary iterators, and only apply new iterators on
> >those patterns which do have FP16 supports.
> 
> I noticed the patchset at
> 
>   https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00308.html
> 
> has some modifications on the standard name "div" and "sqrt", thus there
> are minor conflicts as this patch touch "sqrt" as well.
> 
> This patch resolve the conflict and the change is to let
> aarch64_emit_approx_sqrt simply return false for V4HFmode and V8HFmode.

This is OK for trunk, with one modification...

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 58a9d695c0ef9e6e1d67030580428699aba05be4..5ed633542efe58763d68fd9bfbb478ae6ef569c3
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -7598,6 +7598,10 @@ bool
>  aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
>  {
>machine_mode mode = GET_MODE (dst);
> +
> +  if (mode == V4HFmode || mode == V8HFmode)
> +return false;
> +

Given that you don't plan to handle any HFmode modes here, I'd prefer:

  if (GET_MODE_INNER (mode) == HFmode)
return false;

That'll save you from updating this again in patch 7/14.

Otherwise, this is OK.

Thanks,
James


> gcc/
> 2016-07-20  Jiong Wang
> 
> * config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New.
> * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
> * config/aarch64/aarch64-simd.md (aarch64_rsqrte): Extend to HF 
> modes.
> (neg2): Likewise.
> (abs2): Likewise.
> (2): Likewise.
> (l2): Likewise.
> (2): Likewise.
> (2): Likewise.
> (ftrunc2): Likewise.
> (2): Likewise.
> (sqrt2): Likewise.
> (*sqrt2): Likewise.
> (aarch64_frecpe): Likewise.
> (aarch64_cm): Likewise.
> * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return
> false for V4HF and V8HF.
> * config/aarch64/iterators.md (VHSDF, VHSDF_DF, VHSDF_SDF): New.
> (VDQF_COND, fcvt_target, FCVT_TARGET, hcon): Extend mode attribute to 
> HF modes.
> (stype): New.
> * config/aarch64/arm_neon.h (vdup_n_f16): New.
> (vdupq_n_f16): Likewise.
> (vld1_dup_f16): Use vdup_n_f16.
> (vld1q_dup_f16): Use vdupq_n_f16.
> (vabs_f16): New.
> (vabsq_f16): Likewise.
> (vceqz_f16): Likewise.
> (vceqzq_f16): Likewise.
> (vcgez_f16): Likewise.
> (vcgezq_f16): Likewise.
> (vcgtz_f16): Likewise.
> (vcgtzq_f16): Likewise.
> (vclez_f16): Likewise.
> (vclezq_f16): Likewise.
> (vcltz_f16): Likewise.
> (vcltzq_f16): Likewise.
> (vcvt_f16_s16): Likewise.
> (vcvtq_f16_s16): Likewise.
> (vcvt_f16_u16): Likewise.
> (vcvtq_f16_u16): Likewise.
> (vcvt_s16_f16): Likewise.
> (vcvtq_s16_f16): Likewise.
> (vcvt_u16_f16): Likewise.
> (vcvtq_u16_f16): Likewise.
> (vcvta_s16_f16): Likewise.
> (vcvtaq_s16_f16): Likewise.
> (vcvta_u16_f16): Likewise.
> (vcvtaq_u16_f16): Likewise.
> (vcvtm_s16_f16): Likewise.
> (vcvtmq_s16_f16): Likewise.
> (vcvtm_u16_f16): Likewise.
> (vcvtmq_u16_f16): Likewise.
> (vcvtn_s16_f16): Likewise.
> (vcvtnq_s16_f16): Likewise.
> (vcvtn_u16_f16): Likewise.
> (vcvtnq_u16_f16): Likewise.
> (vcvtp_s16_f16): Likewise.
> (vcvtpq_s16_f16): Likewise.
> (vcvtp_u16_f16): Likewise.
> (vcvtpq_u16_f16): Likewise.
> (vneg_f16): Likewise.
> (vnegq_f16): Likewise.
> (vrecpe_f16): Likewise.
> (vrecpeq_f16): Likewise.
> (vrnd_f16): Likewise.
> (vrndq_f16): Likewise.
> (vrnda_f16): Likewise.
> (vrndaq_f16): Likewise.
> (vrndi_f16): Likewise.
> (vrndiq_f16): Likewise.
> (vrndm_f16): Likewise.
> (vrndmq_f16): Likewise.
> (vrndn_f16): Likewise.
> (vrndnq_f16): Likewise.
> (vrndp_f16): Likewise.
> (vrndpq_f16): Likewise.
> (vrndx_f16): Likewise.
> (vrndxq_f16): Likewise.
> (vrsqrte_f16): Likewise.
> (vrsqrteq_f16): Likewise.
> (vsqrt_f16): Likewise.
> (vsqrtq_f16): Likewise.
> 



Re: [AArch64][3/14] ARMv8.2-A FP16 two operands vector intrinsics

2016-07-25 Thread James Greenhalgh
On Wed, Jul 20, 2016 at 06:00:46PM +0100, Jiong Wang wrote:
> On 07/07/16 17:15, Jiong Wang wrote:
> >This patch add ARMv8.2-A FP16 two operands vector intrinsics.
> 
> The updated patch resolve the conflict with
> 
>https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html
> 
> The change is to let aarch64_emit_approx_div return false for
> V4HFmode and V8HFmode.

As with patch 2/14, please rewrite this hunk:

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 5ed633542efe58763d68fd9bfbb478ae6ef569c3..a7437c04eb936a5e3ebd0bc77eb4afd8c052df28
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -7717,6 +7717,10 @@ bool
>  aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
>  {
>machine_mode mode = GET_MODE (quo);
> +
> +  if (mode == V4HFmode || mode == V8HFmode)
> +return false;
> +

 To:

  if (GET_MODE_INNER (mode) == HFmode)
return false;

Otherwise, this patch is OK for trunk.

Thanks,
James

> gcc/
> 2016-07-20  Jiong Wang
> 
> * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
> * config/aarch64/aarch64-simd.md
> (aarch64_rsqrts): Extend to HF modes.
> (fabd3): Likewise.
> (3): Likewise.
> (3): Likewise.
> (aarch64_p): Likewise.
> (3): Likewise.
> (3): Likewise.
> (3): Likewise.
> (aarch64_faddp): Likewise.
> (aarch64_fmulx): Likewise.
> (aarch64_frecps): Likewise.
> (*aarch64_fac): Rename to aarch64_fac.
> (add3): Extend to HF modes.
> (sub3): Likewise.
> (mul3): Likewise.
> (div3): Likewise.
> (*div3): Likewise.
> * config/aarch64/aarch64.c (aarch64_emit_approx_div): Return
> false for V4HF and V8HF.
> * config/aarch64/iterators.md (VDQ_HSDI, VSDQ_HSDI): New mode
> iterator.
> * config/aarch64/arm_neon.h (vadd_f16): Likewise.
> (vaddq_f16): Likewise.
> (vabd_f16): Likewise.
> (vabdq_f16): Likewise.
> (vcage_f16): Likewise.
> (vcageq_f16): Likewise.
> (vcagt_f16): Likewise.
> (vcagtq_f16): Likewise.
> (vcale_f16): Likewise.
> (vcaleq_f16): Likewise.
> (vcalt_f16): Likewise.
> (vcaltq_f16): Likewise.
> (vceq_f16): Likewise.
> (vceqq_f16): Likewise.
> (vcge_f16): Likewise.
> (vcgeq_f16): Likewise.
> (vcgt_f16): Likewise.
> (vcgtq_f16): Likewise.
> (vcle_f16): Likewise.
> (vcleq_f16): Likewise.
> (vclt_f16): Likewise.
> (vcltq_f16): Likewise.
> (vcvt_n_f16_s16): Likewise.
> (vcvtq_n_f16_s16): Likewise.
> (vcvt_n_f16_u16): Likewise.
> (vcvtq_n_f16_u16): Likewise.
> (vcvt_n_s16_f16): Likewise.
> (vcvtq_n_s16_f16): Likewise.
> (vcvt_n_u16_f16): Likewise.
> (vcvtq_n_u16_f16): Likewise.
> (vdiv_f16): Likewise.
> (vdivq_f16): Likewise.
> (vdup_lane_f16): Likewise.
> (vdup_laneq_f16): Likewise.
> (vdupq_lane_f16): Likewise.
> (vdupq_laneq_f16): Likewise.
> (vdups_lane_f16): Likewise.
> (vdups_laneq_f16): Likewise.
> (vmax_f16): Likewise.
> (vmaxq_f16): Likewise.
> (vmaxnm_f16): Likewise.
> (vmaxnmq_f16): Likewise.
> (vmin_f16): Likewise.
> (vminq_f16): Likewise.
> (vminnm_f16): Likewise.
> (vminnmq_f16): Likewise.
> (vmul_f16): Likewise.
> (vmulq_f16): Likewise.
> (vmulx_f16): Likewise.
> (vmulxq_f16): Likewise.
> (vpadd_f16): Likewise.
> (vpaddq_f16): Likewise.
> (vpmax_f16): Likewise.
> (vpmaxq_f16): Likewise.
> (vpmaxnm_f16): Likewise.
> (vpmaxnmq_f16): Likewise.
> (vpmin_f16): Likewise.
> (vpminq_f16): Likewise.
> (vpminnm_f16): Likewise.
> (vpminnmq_f16): Likewise.
> (vrecps_f16): Likewise.
> (vrecpsq_f16): Likewise.
> (vrsqrts_f16): Likewise.
> (vrsqrtsq_f16): Likewise.
> (vsub_f16): Likewise.
> (vsubq_f16): Likewise.



Re: [PATCH] correct atomic_compare_exchange_n return type (c++/71675)

2016-07-25 Thread Renlin Li

Hi Martin,

I observed the following error:

ERROR: gcc.dg/atomic/pr71675.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects : syntax error in target selector "target c11" for 
" dg-do 3 compile { target c11 } "


It seems we don't have a c11 effective target check available
in dejagnu target-supports.exp.

Thanks,
Renlin


diff --git a/gcc/testsuite/gcc.dg/atomic/pr71675.c 
b/gcc/testsuite/gcc.dg/atomic/pr71675.c
new file mode 100644
index 000..0e344ac
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/atomic/pr71675.c
@@ -0,0 +1,32 @@
+/* PR c++/71675 - __atomic_compare_exchange_n returns wrong type for typed enum
+ */
+/* { dg-do compile { target c11 } } */


Re: [AArch64][4/14] ARMv8.2-A FP16 three operands vector intrinsics

2016-07-25 Thread James Greenhalgh
On Thu, Jul 07, 2016 at 05:16:01PM +0100, Jiong Wang wrote:
> This patch add ARMv8.2-A FP16 three operands vector intrinsics.
> 
> Three operands intrinsics only contain fma and fms.

OK.

Thanks,
James

> 
> 2016-07-07  Jiong Wang 
> 
> gcc/
> * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
> * config/aarch64/aarch64-simd.md (fma4): Extend to HF modes.
> (fnma4): Likewise.
> * config/aarch64/arm_neon.h (vfma_f16): New.
> (vfmaq_f16): Likewise.
> (vfms_f16): Likewise.
> (vfmsq_f16): Likewise.
> 



[PATCH] Don't call get_working_sets w/ LTO and -fauto-profile (PR, gcov-profile/70993)

2016-07-25 Thread Martin Liška
Hi.

Currently, call to get_working_sets is only called from tree_profiling
(called from 'pass_ipa_tree_profile' and is guarded in gate with 
!flag_auto_profile).
I would like to apply the same logic in lto-cgraph.c.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From dcf13fb3ac8a4c9e85842f147665b527ce71234b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 22 Jul 2016 13:07:40 +0200
Subject: [PATCH] Don't call get_working_sets w/ LTO and -fauto-profile (PR
 gcov-profile/70993)

gcc/ChangeLog:

2016-07-22  Martin Liska  

	* lto-cgraph.c (input_symtab): Don't call get_working_sets
	if flag_auto_profile is set to true.
---
 gcc/lto-cgraph.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 5cef2ba..2642041 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1867,7 +1867,9 @@ input_symtab (void)
 }
 
   merge_profile_summaries (file_data_vec);
-  get_working_sets ();
+
+  if (!flag_auto_profile)
+get_working_sets ();
 
 
   /* Clear out the aux field that was used to store enough state to
-- 
2.9.0



Re: [AArch64][5/14] ARMv8.2-A FP16 lane vector intrinsics

2016-07-25 Thread James Greenhalgh
On Thu, Jul 07, 2016 at 05:16:28PM +0100, Jiong Wang wrote:
> This patch add ARMv8.2-A FP16 lane vector intrinsics.
> 
> Lane intrinsics are generally derivatives of multiply intrinsics,
> including multiply accumulate.  All necessary backend support for them
> are there already except fmulx, the implementions are largely a
> combination of existed multiply intrinsics with vdup intrinsics

OK.

Thanks,
James

> 2016-07-07  Jiong Wang 
> 
> gcc/
> * config/aarch64/aarch64-simd.md
> (*aarch64_mulx_elt_to_64v2df): Rename to
> "*aarch64_mulx_elt_from_dup".
> (*aarch64_mul3_elt): Update schedule type.
> (*aarch64_mul3_elt_from_dup): Likewise.
> (*aarch64_fma4_elt_from_dup): Likewise.
> (*aarch64_fnma4_elt_from_dup): Likewise.
> * config/aarch64/iterators.md (VMUL): Supprt half precision
> float modes.
> (f, fp): Support HF modes.
> * config/aarch64/arm_neon.h (vfma_lane_f16): New.
> (vfmaq_lane_f16): Likewise.
> (vfma_laneq_f16): Likewise.
> (vfmaq_laneq_f16): Likewise.
> (vfma_n_f16): Likewise.
> (vfmaq_n_f16): Likewise.
> (vfms_lane_f16): Likewise.
> (vfmsq_lane_f16): Likewise.
> (vfms_laneq_f16): Likewise.
> (vfmsq_laneq_f16): Likewise.
> (vfms_n_f16): Likewise.
> (vfmsq_n_f16): Likewise.
> (vmul_lane_f16): Likewise.
> (vmulq_lane_f16): Likewise.
> (vmul_laneq_f16): Likewise.
> (vmulq_laneq_f16): Likewise.
> (vmul_n_f16): Likewise.
> (vmulq_n_f16): Likewise.
> (vmulx_lane_f16): Likewise.
> (vmulxq_lane_f16): Likewise.
> (vmulx_laneq_f16): Likewise.
> (vmulxq_laneq_f16): Likewise.
> 



Re: [AArch64][6/14] ARMv8.2-A FP16 reduction vector intrinsics

2016-07-25 Thread James Greenhalgh
On Thu, Jul 07, 2016 at 05:16:58PM +0100, Jiong Wang wrote:
> This patch add ARMv8.2-A FP16 reduction vector intrinsics.

OK.

Thanks,
James

> gcc/
> 2016-07-07  Jiong Wang 
> 
> * config/aarch64/arm_neon.h (vmaxv_f16): New.
> (vmaxvq_f16): Likewise.
> (vminv_f16): Likewise.
> (vminvq_f16): Likewise.
> (vmaxnmv_f16): Likewise.
> (vmaxnmvq_f16): Likewise.
> (vminnmv_f16): Likewise.
> (vminnmvq_f16): Likewise.
> * config/aarch64/iterators.md (vp): Support HF modes.



Re: [PATCH 2/3] Run profile feedback tests with autofdo

2016-07-25 Thread Martin Liška
On 07/15/2016 10:37 AM, Bin.Cheng wrote:
> On Thu, Jul 14, 2016 at 11:33 PM, Andi Kleen  wrote:
> I haven't seen that. Unstable in what way?
 For GCC doesn't support FDO, it run below tests as you said:

 PASS: gcc.dg/tree-prof/20041218-1.c compilation,  -g
 UNSUPPORTED: gcc.dg/tree-prof/20041218-1.c: Cannot run create_gcov
 --binary /data/work/trunk/build/gcc/testsuite/gcc/20041218-1.gcda

 Normally, it doesn't create gcov data file, thus the next test is
 unsupported.  I guess in parallel test, the second test picks up gcov
 data files from other process, which results in random pass.
 Is it possible to not have these when fdo is supported?
>>> Hmm, typo:  s/supported/not supported/.
>>
>> I don't see how this can happen: as you can see the .gcda file name
>> is unique for each test case.
>>
>> I think there may be problems with the perf.data, could add a postfix.
>> But you should only see that with real autofdo.
> Don't know.  I only configured/built GCC on x86_64 with below command lines:
> 
> $ ../gcc/configure --prefix=...
> --enable-languages=c,c++,fortran,go,java,lto,objc,obj-c++
> --with-build-config=bootstrap-O3 --disable-werror
> $ make && make install
> 
> Not sure what the status for autofdo is in this case.  "make check -k"
> is stable for me, but "make check -k -j#" gives unstable result in
> tree-prof.exp tests.  Anything I did wrong?
> 
> Thanks,
> bin
> 

I can confirm that I see the same problem on a Xeon machine (../configure && 
make && make check -k -jX).
I see many:

 g++.dg/tree-prof/pr35545.C: Cannot run create_gcov --binary 
/home/marxin/Programming/gcc/objdir/gcc/testsuite/g++/pr35545.x01 
--profile=perf.data -gcov_version=1 
--gcov=/home/marxin/Programming/gcc/objdir/gcc/testsuite/g++/pr35545.gcda

Thanks,
Martin



Re: [AArch64][7/14] ARMv8.2-A FP16 one operand scalar intrinsics

2016-07-25 Thread James Greenhalgh
On Wed, Jul 20, 2016 at 06:00:53PM +0100, Jiong Wang wrote:
> On 07/07/16 17:17, Jiong Wang wrote:
> >This patch add ARMv8.2-A FP16 one operand scalar intrinsics
> >
> >Scalar intrinsics are kept in arm_fp16.h instead of arm_neon.h.
> 
> The updated patch resolve the conflict with
> 
>https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00308.html
> 
> The change is to let aarch64_emit_approx_sqrt return false for HFmode.

OK, but...

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> a7437c04eb936a5e3ebd0bc77eb4afd8c052df28..27866ccd605abec6ea7c9110022f329c9b172ee0
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -7599,7 +7599,7 @@ aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
>  {
>machine_mode mode = GET_MODE (dst);
>  
> -  if (mode == V4HFmode || mode == V8HFmode)
> +  if (mode == HFmode || mode == V4HFmode || mode == V8HFmode)
>  return false;

...if you take my advice on patch 2/14, you won't need this change.

Otherwise, OK.

Thanks,
James

> gcc/
> 2016-07-20  Jiong Wang
> 
> * config.gcc (aarch64*-*-*): Install arm_fp16.h.
> * config/aarch64/aarch64-builtins.c (hi_UP): New.
> * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
> * config/aarch64/aarch64-simd.md (aarch64_frsqrte): Extend to 
> HF mode.
> (aarch64_frecp): Likewise.
> (aarch64_cm): Likewise.
> * config/aarch64/aarch64.md (2): Likewise.
> (l2): Likewise.
> (fix_trunc2): Likewise.
> (sqrt2): Likewise.
> (*sqrt2): Likewise.
> (abs2): Likewise.
> (hf2): New pattern for HF mode.
> (hihf2): Likewise.
> * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return
> for HF mode.
> * config/aarch64/arm_neon.h: Include arm_fp16.h.
> * config/aarch64/iterators.md (GPF_F16): New.
> (GPI_F16): Likewise.
> (VHSDF_HSDF): Likewise.
> (w1): Support HF mode.
> (w2): Likewise.
> (v): Likewise.
> (s): Likewise.
> (q): Likewise.
> (Vmtype): Likewise.
> (V_cmp_result): Likewise.
> (fcvt_iesize): Likewise.
> (FCVT_IESIZE): Likewise.
> * config/aarch64/arm_fp16.h: New file.
> (vabsh_f16): New.
> (vceqzh_f16): Likewise.
> (vcgezh_f16): Likewise.
> (vcgtzh_f16): Likewise.
> (vclezh_f16): Likewise.
> (vcltzh_f16): Likewise.
> (vcvth_f16_s16): Likewise.
> (vcvth_f16_s32): Likewise.
> (vcvth_f16_s64): Likewise.
> (vcvth_f16_u16): Likewise.
> (vcvth_f16_u32): Likewise.
> (vcvth_f16_u64): Likewise.
> (vcvth_s16_f16): Likewise.
> (vcvth_s32_f16): Likewise.
> (vcvth_s64_f16): Likewise.
> (vcvth_u16_f16): Likewise.
> (vcvth_u32_f16): Likewise.
> (vcvth_u64_f16): Likewise.
> (vcvtah_s16_f16): Likewise.
> (vcvtah_s32_f16): Likewise.
> (vcvtah_s64_f16): Likewise.
> (vcvtah_u16_f16): Likewise.
> (vcvtah_u32_f16): Likewise.
> (vcvtah_u64_f16): Likewise.
> (vcvtmh_s16_f16): Likewise.
> (vcvtmh_s32_f16): Likewise.
> (vcvtmh_s64_f16): Likewise.
> (vcvtmh_u16_f16): Likewise.
> (vcvtmh_u32_f16): Likewise.
> (vcvtmh_u64_f16): Likewise.
> (vcvtnh_s16_f16): Likewise.
> (vcvtnh_s32_f16): Likewise.
> (vcvtnh_s64_f16): Likewise.
> (vcvtnh_u16_f16): Likewise.
> (vcvtnh_u32_f16): Likewise.
> (vcvtnh_u64_f16): Likewise.
> (vcvtph_s16_f16): Likewise.
> (vcvtph_s32_f16): Likewise.
> (vcvtph_s64_f16): Likewise.
> (vcvtph_u16_f16): Likewise.
> (vcvtph_u32_f16): Likewise.
> (vcvtph_u64_f16): Likewise.
> (vnegh_f16): Likewise.
> (vrecpeh_f16): Likewise.
> (vrecpxh_f16): Likewise.
> (vrndh_f16): Likewise.
> (vrndah_f16): Likewise.
> (vrndih_f16): Likewise.
> (vrndmh_f16): Likewise.
> (vrndnh_f16): Likewise.
> (vrndph_f16): Likewise.
> (vrndxh_f16): Likewise.
> (vrsqrteh_f16): Likewise.
> (vsqrth_f16): Likewise.



Re: [AArch64][8/14] ARMv8.2-A FP16 two operands scalar intrinsics

2016-07-25 Thread James Greenhalgh
On Wed, Jul 20, 2016 at 06:00:58PM +0100, Jiong Wang wrote:
> On 07/07/16 17:17, Jiong Wang wrote:
> >This patch add ARMv8.2-A FP16 two operands scalar intrinsics.
> 
> The updated patch resolve the conflict with
> 
>https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html
> 
> The change is to let aarch64_emit_approx_div return false for HFmode.

If you take the change I proposed on patch 3/14 you won't need this.

OK otherwise.

Thanks,
James

> gcc/
> 2016-07-20  Jiong Wang
> 
> * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
> * config/aarch64/aarch64.md 
> (hf3): New.
> (hf3): Likewise.
> (add3): Likewise.
> (sub3): Likewise.
> (mul3): Likewise.
> (div3): Likewise.
> (*div3): Likewise.
> (3): Extend to HF.
> * config/aarch64/aarch64.c (aarch64_emit_approx_div): Return
> false for HFmode.
> * config/aarch64/aarch64-simd.md (aarch64_rsqrts): Likewise.
> (fabd3): Likewise.
> (3): Likewise.
> (3): Likewise.
> (aarch64_fmulx): Likewise.
> (aarch64_fac): Likewise.
> (aarch64_frecps): Likewise.
> (hfhi3): New.
> (hihf3): Likewise.
> * config/aarch64/iterators.md (VHSDF_SDF): Delete.
> (VSDQ_HSDI): Support HI.
> (fcvt_target, FCVT_TARGET): Likewise.
> * config/aarch64/arm_fp16.h: (vaddh_f16): New.
> (vsubh_f16): Likewise.
> (vabdh_f16): Likewise.
> (vcageh_f16): Likewise.
> (vcagth_f16): Likewise.
> (vcaleh_f16): Likewise.
> (vcalth_f16): Likewise.(vcleh_f16): Likewise.
> (vclth_f16): Likewise.
> (vcvth_n_f16_s16): Likewise.
> (vcvth_n_f16_s32): Likewise.
> (vcvth_n_f16_s64): Likewise.
> (vcvth_n_f16_u16): Likewise.
> (vcvth_n_f16_u32): Likewise.
> (vcvth_n_f16_u64): Likewise.
> (vcvth_n_s16_f16): Likewise.
> (vcvth_n_s32_f16): Likewise.
> (vcvth_n_s64_f16): Likewise.
> (vcvth_n_u16_f16): Likewise.
> (vcvth_n_u32_f16): Likewise.
> (vcvth_n_u64_f16): Likewise.
> (vdivh_f16): Likewise.
> (vmaxh_f16): Likewise.
> (vmaxnmh_f16): Likewise.
> (vminh_f16): Likewise.
> (vminnmh_f16): Likewise.
> (vmulh_f16): Likewise.
> (vmulxh_f16): Likewise.
> (vrecpsh_f16): Likewise.
> (vrsqrtsh_f16): Likewise.
> 



Re: C++ PATCH for c++/71913 (copy elision choices)

2016-07-25 Thread Renlin Li

Hi Jason,

On 22/07/16 04:01, Jason Merrill wrote:

71913 is a case where unsafe_copy_elision_p was being too
conservative. We can allow copy elision in a new expression; the only
way we could end up initializing a base subobject without knowing it
would be through a placement new, in which case we would already be
using the wrong (complete object) constructor, so copy elision doesn't
make it any worse.




diff --git a/gcc/testsuite/g++.dg/init/elide5.C 
b/gcc/testsuite/g++.dg/init/elide5.C
new file mode 100644
index 000..0a9978c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/elide5.C
@@ -0,0 +1,27 @@
+// PR c++/71913
+// { dg-do link { target c++11 } }
+
+void* operator new(unsigned long, void* p) { return p; }


g++.dg/init/elide5.C fails on target whose SIZE_TYPE is not "long 
unsigned int".


testsuite/g++.dg/init/elide5.C:4:42: error: 'operator new' takes type 
'size_t' ('unsigned int') as first parameter [-fpermissive]


I have checked, for most 32 bit architectures or ABI, the SIZE_TYPE is 
"unsigned int". arm is one of them.


To make this test case portable, will __SIZE_TYPE__ be better in this 
case, instead of "unsigned long" as first argument of new operator?


(sorry for the duplicate reply in the bugzilla, I just found the email here)

Regards,
Renlin


Re: [AArch64][9/14] ARMv8.2-A FP16 three operands scalar intrinsics

2016-07-25 Thread James Greenhalgh
On Thu, Jul 07, 2016 at 05:18:15PM +0100, Jiong Wang wrote:
> This patch add ARMv8.2-A FP16 three operands scalar intrinsics.

OK.

Thanks,
James

> gcc/
> 2016-07-07  Jiong Wang 
> 
> * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
> * config/aarch64/aarch64.md (fma): New for HF.
> (fnma): Likewise.
> * config/aarch64/arm_fp16.h (vfmah_f16): New.
> (vfmsh_f16): Likewise.



Re: [AArch64][10/14] ARMv8.2-A FP16 lane scalar intrinsics

2016-07-25 Thread James Greenhalgh
On Thu, Jul 07, 2016 at 05:18:29PM +0100, Jiong Wang wrote:
> This patch adds ARMv8.2-A FP16 lane scalar intrinsics.

OK.

Thanks,
James

> 
> gcc/
> 2016-07-07  Jiong Wang 
> 
> * config/aarch64/arm_neon.h (vfmah_lane_f16): New.
> (vfmah_laneq_f16): Likewise.
> (vfmsh_lane_f16): Likewise.
> (vfmsh_laneq_f16): Likewise.
> (vmulh_lane_f16): Likewise.
> (vmulh_laneq_f16): Likewise.
> (vmulxh_lane_f16): Likewise.
> (vmulxh_laneq_f16): Likewise.
> 



Re: [RFC][IPA-VRP] Early VRP Implementation

2016-07-25 Thread Richard Biener
On Fri, Jul 22, 2016 at 2:10 PM, kugan
 wrote:
> Hi Richard,
>
> Thanks for the review.
>
> On 18/07/16 21:51, Richard Biener wrote:
>>
>> On Fri, Jul 15, 2016 at 9:33 AM, kugan
>>  wrote:
>>>
>>> Hi Andrew,
>>>
>>> On 15/07/16 17:28, Andrew Pinski wrote:


 On Fri, Jul 15, 2016 at 12:08 AM, kugan
  wrote:
>
>
> Hi Andrew,
>
>> Why separate out early VRP from tree-vrp?  Just a little curious.
>
>
>
>
> It is based on the discussion in
> https://gcc.gnu.org/ml/gcc/2016-01/msg00069.html.
> In summary, conclusion (based on my understanding) was to implement a
> simplified VRP algorithm that doesn't require ASSERT_EXPR insertion.



 But I don't see why you are moving it from tree-vrp.c .  That was my
 question, you pointing to that discussion does not say to split it
 into a new file and expose these interfaces.

>>>
>>> Are you saying that I should keep this part of tree-vrp.c. I am happy to
>>> do
>>> that if this is considered the best approach.
>>
>>
>> Yes, I think that's the best approach.
>>
> Thanks. Moved it into tree-vrp.c now.
>
>> Can you, as a refactoring before your patch, please change VRP to use
>> an alloc-pool
>> for allocating value_range?  The DOM-based VRP will add a lot of
>> malloc/free churn
>> otherwise.
>>
>> Generally watch coding-style such as  missed function comments.
>>
>> As you do a non-iterating VRP and thus do not visit back-edges you don't
>> need
>> to initialize loops or SCEV nor do you need loop-closed SSA.
>>
>> As you do a DOM-based VRP using SSA propagator stuff like ssa_prop_result
>> doesn't make any sense.
>
>
> I have removed it.
>>
>>
>> +edge evrp_dom_walker::before_dom_children (basic_block bb)
>> +{
>> +  /* If we are going out of scope, restore the old VR.  */
>> +  while (!cond_stack.is_empty ()
>> +&& !dominated_by_p (CDI_DOMINATORS, bb, cond_stack.last
>> ().first))
>> +{
>> +  tree var = cond_stack.last ().second.first;
>> +  value_range *vr = cond_stack.last ().second.second;
>> +  value_range *vr_to_del = get_value_range (var);
>> +  XDELETE (vr_to_del);
>> +  change_value_range (var, vr);
>> +  cond_stack.pop ();
>> +}
>>
>> that should be in after_dom_children I think and use a marker instead
>> of a DOM query.
>> See other examples like DOM itself or SCCVN.
>>
>
> Done.

+/* Restore/Pop all the old VRs maintained in the cond_stack.  */
+
+void evrp_dom_walker::finalize_dom_walker ()
+{
+  while (!cond_stack.is_empty ())
+{
+  tree var = cond_stack.last ().second;
+  pop_value_range (var);
+  cond_stack.pop ();
+}

why is this necessary at all?  Looks like a waste of time (and the
stack should be empty here
anyway).  I think you leak cond_stack as well, I suppose using
auto_vec might work here,
you have to check.

>>
>> + /* Discover VR when condition is true.  */
>> + if (te == e
>> + && !TREE_OVERFLOW_P (op0)
>> + && !TREE_OVERFLOW_P (op1))
>
>
> This is mainly to handle gcc.dg/pr38934.c. This would result in ICE from
> set_value_range otherwise:
>
> gcc_assert ((!TREE_OVERFLOW_P (min) || is_overflow_infinity (min))
>   && (!TREE_OVERFLOW_P (max) || is_overflow_infinity
> (max)));

Ok, instead make sure to remove the overflow flag via

  if (TREE_OVERFLOW_P (op0))
op0 = drop_tree_overflow (op0);
...

it looks like you want to merge the if ( & EDGE_TRUE_VALUE) and FALSE_VALUE
cases and only invert the tree comparison for the false edge.

+ tree cond = build2 (code, boolean_type_node, op0, op1);
+ extract_range_for_var_from_comparison_expr (&vr, op0, cond);

no wasteful tree building here please.  Instead of cond pass in code,
op0 and op1
as separate arguments.

+ /* If we found any usable VR, set the VR to ssa_name and create a
+PUSH old value in the cond_stack with the old VR.  */
+ if (vr.type == VR_RANGE || vr.type == VR_ANTI_RANGE)
+   {
+ value_range *new_vr = vrp_value_range_pool.allocate ();
+ memset (new_vr, 0, sizeof (*new_vr));
+ *new_vr = vr;
+ cond_stack.safe_push (std::make_pair (bb, op0));

the memset looks redundant given you copy-assing from vr anyway.

You seem to miss the chance to register ranges for x_2 in i_1 < x_2 where
both i_1 and x_2 might have ranges that are useful.

push and pop_value_range should be methods in the dom walker class
and vrp_stack and cond_stack should be a single stack.  You can omit
basic_block if you use a "magic" NULL_TREE var as marker (simply
push a NULL_TREE, NULL pair at the end of before_dom_children).

>>
>> you can use e->flags & EDGE_TRUE_VALUE/EDGE_FALSE_VALUE
>>
>> why do you need those TREE_OVERFLOW checks?
>>
>> + tree cond = build2 (code, boolean_type_node, op0, op1);
>> + tree a = build2 (ASSERT_EXPR, TREE_TYPE (op0), op0, cond);
>> + ex

Re: [PATCH] Don't call get_working_sets w/ LTO and -fauto-profile (PR, gcov-profile/70993)

2016-07-25 Thread Richard Biener
On Mon, Jul 25, 2016 at 1:05 PM, Martin Liška  wrote:
> Hi.
>
> Currently, call to get_working_sets is only called from tree_profiling
> (called from 'pass_ipa_tree_profile' and is guarded in gate with 
> !flag_auto_profile).
> I would like to apply the same logic in lto-cgraph.c.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

Ok.

Richard.

> Martin


Re: [AArch64][0/14] ARMv8.2-A FP16 extension support

2016-07-25 Thread James Greenhalgh
On Thu, Jul 07, 2016 at 05:12:48PM +0100, Jiong Wang wrote:
> Hello,
> 
> As a follow up of
> 
> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01240.html,
> 
> This patch set adds ARMv8.2-A FP16 scalar and vector intrinsics support,
> gcc middle-end will also be aware of some standard operations that some
> instructions can be auto-generated.
> 
> According to ACLE, ARMv8.2-A FP16 intrinsics for AArch64 is superset of
> intrinsics for AArch32, so all those intrinsic related testcases,
> particularly those under the directory advsimd-intrinsics, are also
> appliable to AArch64.  This patch set has only included those testcases
> that are exclusive for AArch64.
> 
> Jiong Wang (14)
>   ARMv8.2-A FP16 data processing intrinsics
>   ARMv8.2-A FP16 one operand vector intrinsics
>   ARMv8.2-A FP16 two operands vector intrinsics
>   ARMv8.2-A FP16 three operands vector intrinsics
>   ARMv8.2-A FP16 lane vector intrinsics
>   ARMv8.2-A FP16 reduction vector intrinsics
>   ARMv8.2-A FP16 one operand scalar intrinsics
>   ARMv8.2-A FP16 two operands scalar intrinsics
>   ARMv8.2-A FP16 three operands scalar intrinsics
>   ARMv8.2-A FP16 lane scalar intrinsics

At this point, I've OKed the first 10 patches in the series, these represent
the functional changes to the compiler. I'm leaving the testsuite patches
for now, as they depend on testsuite changes that have yet to be approved
for the ARM port.

To save you from having to continue to rebase the functional parts of this
patch while you wait for review of the ARM changes, I would be OK with you
committing them now, on the understanding that you'll continue to check
the testsuite in the time between now and the testsuite changes are approved,
and that you'll fix any issues that you find.

>   ARMv8.2-A FP16 testsuite selector
>   ARMv8.2-A testsuite for new data movement intrinsics
>   ARMv8.2-A testsuite for new vector intrinsics
>   ARMv8.2-A testsuite for new scalar intrinsics
 
I've taken a brief look through these testsuite changes and they look OK
to me. I'll revisit them properly once I've seen the ARM patches go in.

Thanks,
James



[PATCH] Remove special streaming of builtins

2016-07-25 Thread Richard Biener

So I needed to fix that builtins appearing in BLOCK_VARs and the solution
I came up with accidentially disabled streaming via the special path.
Thus the following patch removes the special-casing completely and makes
the BLOCK_VARs handling work the same way as for regular externs (by
streaming a local copy).  We stream each builtin decl once and then
refer to it via the decl index (which is cheaper than the special
casing).

I'm not 100% this solves for example the -fno-math-errno inlining
across TUs (it certainly doesn't if you use attribute optimize with
-fno-math-errno), but it eventually should by means of having two
different BUILT_IN_XXX if they have different behavior.  At least
if all relevant bits are set on the function _type_ rather than
the decl which I think we still lto-symtab replace with one
entity during WPA(?)

Well.

LTO bootstrapped and tested on x86_64-unknown-linux-gnu (c,c++,fortran),
bootstrapped on x86_64-unknown-linux-gnu (all), testing in progress.

I might have not catched all fndecl compares.

Will apply to trunk if testing completes.  As said, maybe followup
cleanups possible, at least to lto-opts.c / lto-wrapper.

Richard.

2016-07-25  Richard Biener  

* cgraph.c (cgraph_node::verify_node): Compare against builtin
by using DECL_BUILT_IN_CLASS and DECL_FUNCTION_CODE.
* tree-chkp.c (chkp_gimple_call_builtin_p): Likewise.
* tree-streamer.h (streamer_handle_as_builtin_p): Remove.
(streamer_get_builtin_tree): Likewise.
(streamer_write_builtin): Likewise.
* lto-streamer.h (LTO_builtin_decl): Remove.
* lto-streamer-in.c (lto_read_tree_1): Remove assert.
(lto_input_scc): Remove LTO_builtin_decl handling.
(lto_input_tree_1): Liekwise.
* lto-streamer-out.c (lto_output_tree_1): Remove special
handling of builtins.
(DFS::DFS): Likewise.
* tree-streamer-in.c (streamer_get_builtin_tree): Remove.
* tree-streamer-out.c (pack_ts_function_decl_value_fields): Remove
assert.
(streamer_write_builtin): Remove.

lto/
* lto.c (compare_tree_sccs_1): Remove streamer_handle_as_builtin_p uses.
(unify_scc): Likewise.
(lto_read_decls): Likewise.

Index: gcc/cgraph.c
===
--- gcc/cgraph.c(revision 238590)
+++ gcc/cgraph.c(working copy)
@@ -3136,8 +3136,9 @@ cgraph_node::verify_node (void)
  && !e->speculative
  /* Optimized out calls are redirected to __builtin_unreachable.  */
  && (e->frequency
- || e->callee->decl
-!= builtin_decl_implicit (BUILT_IN_UNREACHABLE))
+ || ! e->callee->decl
+ || DECL_BUILT_IN_CLASS (e->callee->decl) != BUILT_IN_NORMAL
+ || DECL_FUNCTION_CODE (e->callee->decl) != BUILT_IN_UNREACHABLE)
  && (e->frequency
  != compute_call_stmt_bb_frequency (e->caller->decl,
 gimple_bb (e->call_stmt
Index: gcc/lto/lto.c
===
--- gcc/lto/lto.c   (revision 238590)
+++ gcc/lto/lto.c   (working copy)
@@ -1061,12 +1061,6 @@ compare_tree_sccs_1 (tree t1, tree t2, t
TREE_FIXED_CST_PTR (t1), TREE_FIXED_CST_PTR (t2)))
   return false;
 
-
-  /* We want to compare locations up to the point where it makes
- a difference for streaming - thus whether the decl is builtin or not.  */
-  if (CODE_CONTAINS_STRUCT (code, TS_DECL_MINIMAL))
-compare_values (streamer_handle_as_builtin_p);
-
   if (CODE_CONTAINS_STRUCT (code, TS_DECL_COMMON))
 {
   compare_values (DECL_MODE);
@@ -1602,8 +1596,7 @@ unify_scc (struct data_in *data_in, unsi
   streamer.  The others should be singletons, too, and we
   should not merge them in any way.  */
gcc_assert (code != TRANSLATION_UNIT_DECL
-   && code != IDENTIFIER_NODE
-   && !streamer_handle_as_builtin_p (t));
+   && code != IDENTIFIER_NODE);
  }
 
  /* Fixup the streamer cache with the prevailing nodes according
@@ -1710,8 +1703,7 @@ lto_read_decls (struct lto_file_decl_dat
  if (len == 1
  && (TREE_CODE (first) == IDENTIFIER_NODE
  || TREE_CODE (first) == INTEGER_CST
- || TREE_CODE (first) == TRANSLATION_UNIT_DECL
- || streamer_handle_as_builtin_p (first)))
+ || TREE_CODE (first) == TRANSLATION_UNIT_DECL))
continue;
 
  /* Try to unify the SCC with already existing ones.  */
Index: gcc/lto-streamer-in.c
===
--- gcc/lto-streamer-in.c   (revision 238590)
+++ gcc/lto-streamer-in.c   (working copy)
@@ -1302,10 +1302,6 @@ lto_read_tree_1 (struct lto_input

Re: [PATCH] nvptx: do not implicitly enable -ftoplevel-reorder

2016-07-25 Thread Nathan Sidwell

On 07/22/16 11:19, Alexander Monakov wrote:


I hope I've satisfactorily explained the failures you've pointed out (thanks for
the data).  I think I should leave the choice of what to do next (revert the
patch or leave it in and install fixups where appropriate) up to you?


Please revert the nvptx patch.


[Patch, testuite, committed] Fix some more tests that fail for non 32-bit int targets

2016-07-25 Thread Senthil Kumar Selvaraj
Hi,

  The below patch fixes tests that fail for the avr target, because they
  assume ints are atleast 32 bits wide and pointers and longs have the
  same size.

  I've required int32plus support for one test, and for the other two,
  I've introduced a cast to intptr_t to avoid the pointer <-> int size
  difference warning.

  Reg tested on avr and x86_64 with no regressions. Committed as
  obvious.

Regards
Senthil

gcc/testsuite/ChangeLog

2016-07-25  Senthil Kumar Selvaraj  

* gcc.dg/torture/pr69352.c (foo): Cast to intptr_t instead of long.
* gcc.dg/torture/pr69771.c: Require int32plus.
* gcc.dg/torture/pr71866.c (inb): Add cast to intptr_t.



Index: gcc.dg/torture/pr69352.c
===
--- gcc.dg/torture/pr69352.c(revision 238707)
+++ gcc.dg/torture/pr69352.c(working copy)
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 
+#include 
+
 int a[10][14], b, c, d, e, f, g, h, i;
 void bar (void);
 int
@@ -13,7 +15,7 @@
   else
 m = 13;
   if (a[x][m])
-l = (long) foo;
+l = (intptr_t) foo;
   a[x][i] = l;
   while (c)
 {
Index: gcc.dg/torture/pr69771.c
===
--- gcc.dg/torture/pr69771.c(revision 238707)
+++ gcc.dg/torture/pr69771.c(working copy)
@@ -1,5 +1,6 @@
 /* PR rtl-optimization/69771 */
 /* { dg-do compile } */
+/* { dg-require-effective-target int32plus } */
 
 unsigned char a = 5, c;
 unsigned short b = 0;
Index: gcc.dg/torture/pr71866.c
===
--- gcc.dg/torture/pr71866.c(revision 238707)
+++ gcc.dg/torture/pr71866.c(working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-additional-options "-ftree-pre -fcode-hoisting" } */
 
+#include 
 typedef unsigned char u8;
 extern unsigned long pci_io_base;
 u8 in_8 (const volatile void *);
@@ -25,7 +26,7 @@
 static inline
 u8 inb (unsigned long port)
 {
-  return readb((volatile void *)pci_io_base + port);
+  return readb((volatile void *)(intptr_t)pci_io_base + port);
 }
 static inline
 void outb (u8 val, unsigned long port)



Re: [PATCH] Adapt the numbering scheme (PR gcov-profile/64874)

2016-07-25 Thread Nathan Sidwell

On 07/25/16 04:42, Martin Liška wrote:


I like the change suggested by Jakub, I've updated the numbering scheme, as well
as comments in gcov-io.h.


ok.  I'm not too fussed about a problem that is 25 years away and would result 
in contusion of code (then) instrumented 30 years ago.



nathan


[patch] libstdc++: fix ext/rope::dump() bug

2016-07-25 Thread Georeth Chow
Hello, maintainers.

Recently, I tried to figure out how the rope container works by
dumping the content. I found that the implementation of rope::dump()
has a misspell bug which use a static member function as a enum value.

It seems that the original SGI STL implementation doesn't have this bug.

bug trigger code below.

#include 
int main() {
__gnu_cxx::crope r(100, 'x');
r.dump();
return 0;
}

patch is attached.

Thanks,
Georeth
diff --git a/libstdc++-v3/include/ext/ropeimpl.h 
b/libstdc++-v3/include/ext/ropeimpl.h
index 4316af7..93bd152 100644
--- a/libstdc++-v3/include/ext/ropeimpl.h
+++ b/libstdc++-v3/include/ext/ropeimpl.h
@@ -1117,7 +1117,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  printf("NULL\n");
  return;
}
-  if (_S_concat == __r->_M_tag)
+  if (__detail::_S_concat == __r->_M_tag)
{
  _RopeConcatenation* __c = (_RopeConcatenation*)__r;
  _RopeRep* __left = __c->_M_left;


Re: [PATCH] nvptx: do not implicitly enable -ftoplevel-reorder

2016-07-25 Thread Alexander Monakov
On Mon, 25 Jul 2016, Nathan Sidwell wrote:
> On 07/22/16 11:19, Alexander Monakov wrote:
> 
> > I hope I've satisfactorily explained the failures you've pointed out (thanks
> > for the data).  I think I should leave the choice of what to do next (revert
> > the patch or leave it in and install fixups where appropriate) up to you?
> 
> Please revert the nvptx patch.

Done - svn rev. 238710.  I'd still like to get feedback on the other points I've
raised in that email, though.

Alexander


Re: [patch] Get rid of stack trampolines for nested functions

2016-07-25 Thread Eric Botcazou
> Should UNITS_PER_WORD here be POINTER_SIZE/BITS_PER_UNIT right?

Hmm, yes, presumably, it's the size of the static chain and so a pointer.

> Another UNITS_PER_WORD that I think ought to be
> POINTER_SIZE/BITS_PER_UNIT.  Probably worth a pass over the patch to
> look for this throughout.

Yes, it was very likely enabled only on platforms with word-sized pointers.

> > +ftrampolines
> > +Common Report Var(flag_trampolines) Init(0)
> > +Always generate trampolines for pointers to nested functions
> 
> Even for targets that use procedure descriptors natively?  I'm not sure
> what's the best choice for them -- I could argue either way on this issue.

For targets that use procedure descriptors natively, it's a no-op because the 
entire patch is supposed to be a no-op for them.  I'm going to rephrase.

> I'm probably not well versed enough in this aspect of the various
> targets's ABIs to review the target stuff.  We'll want/need the target
> maintainers to chime in.  That might be easier if the target bits were
> split out into their own patches.

Good idea, I'll do that.

> "number of bytes", don't you mean "number of bits" in the last paragraph?

Yes, that's confused, I'll rephrase too.

> > Index: tree-nested.c
> > ===
> > --- tree-nested.c   (revision 237789)
> > +++ tree-nested.c   (working copy)
> > @@ -21,6 +21,7 @@
> > 
> >  #include "system.h"
> >  #include "coretypes.h"
> >  #include "backend.h"
> > 
> > +#include "target.h"
> 
> Not real happy to see target stuff getting into tree-*.  Is this really
> needed?

It's the target hook mentioned earlier, e.g. for targets that use procedure 
descriptors natively we don't want to create generic descriptors but use the 
regular "degenerated" trampolines.

> I think the biggest issues are the ABI concerns and getting the target
> maintainers to take a looksie.  Everything else looks pretty good.

Thanks for the review.

-- 
Eric Botcazou


Re: [patch] libstdc++: fix ext/rope::dump() bug

2016-07-25 Thread Jonathan Wakely

On 25/07/16 20:26 +0800, Georeth Chow wrote:

Hello, maintainers.

Recently, I tried to figure out how the rope container works by
dumping the content. I found that the implementation of rope::dump()
has a misspell bug which use a static member function as a enum value.

It seems that the original SGI STL implementation doesn't have this bug.

bug trigger code below.

#include 
int main() {
   __gnu_cxx::crope r(100, 'x');
   r.dump();
   return 0;
}

patch is attached.


Thanks for the patch, I've committed it to trunk.

   Fix missing qualification in 
   
   2016-07-25  Georeth Chow  
   
   * include/ext/ropeimpl.h (rope<>::_S_dump(_RopeRep*, int)): Qualify

   _S_concat enumerator.
   * testsuite/ext/rope/6.cc: New test.





Re: [patch] Get rid of stack trampolines for nested functions

2016-07-25 Thread Richard Earnshaw (lists)
On 29/06/16 23:08, Eric Botcazou wrote:
> Index: config/aarch64/aarch64.h
> ===
> --- config/aarch64/aarch64.h  (revision 237789)
> +++ config/aarch64/aarch64.h  (working copy)
> @@ -779,6 +779,9 @@ typedef struct
> correctly.  */
>  #define TRAMPOLINE_SECTION text_section
>  
> +/* Use custom descriptors instead of trampolines when possible.  */
> +#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
> +


Eric,

If I understand how this is supposed to work then this is not
future-proof against changes to the architecture.  The bottom two bits
in both AArch32 (arm) and AArch64 are reserved for future use by the
architecture; they must not be used by software for tricks like this.
As has already been seen in AArch32 state, bit-0 is used to indicate the
ARM/Thumb ISA selection.

The patch to arm.h is similarly problematic in this regard.

R.

> Hi,
> 
> this patch implements generic support for the elimination of stack 
> trampolines 
> and, consequently, of the need to make the stack executable when pointers to 
> nested functions are used.  That's done on a per-language and per-target 
> basis 
> (i.e. there is 1 language hook and 1 target hook to parameterize it) and 
> there 
> are no changes whatsoever in code generation if both are not turned on (and 
> the patch implements a -ftrampolines option to let the user override them).
> 
> The idea is based on the fact that, for targets using function descriptors as 
> per their ABI like IA-64, AIX or VMS platforms, stack trampolines 
> "degenerate" 
> into descriptors built at run time on the stack and thus made up of data 
> only, 
> which in turn means that the stack doesn't need to be made executable.
> 
> This descriptor-based scheme is implemented generically for nested functions, 
> i.e. the nested function lowering pass builds generic descriptors instead of 
> trampolines on the stack when encountering pointers to nested functions, 
> which 
> means that there are 2 kinds of pointers to functions and therefore a 
> run-time 
> identification mechanism is needed for indirect calls to distinguish them.
> 
> Because of that, enabling the support breaks binary compatibility (for code 
> manipulating pointers to nested functions).  That's OK for Ada and nested 
> functions are first-class citizens in the language anyway so we really need 
> this, but not for C so for example Ada doesn't use it at the interface with C 
> (when objects have "convention C" in Ada parlance).
> 
> This was bootstrapped/regtested on x86_64-suse-linux but AdaCore has been 
> using it on native platforms (Linux, Windows, Solaris, etc) for years.
> 
> OK for the mainline?
> 
> 
> 2016-06-29  Eric Botcazou  
> 
>   PR ada/37139
>   PR ada/67205
>   * common.opt (-ftrampolines): New option.
>   * doc/invoke.texi (Code Gen Options): Document it.
>   * doc/tm.texi.in (Trampolines): Add TARGET_CUSTOM_FUNCTION_DESCRIPTORS
>   * doc/tm.texi: Regenerate.
>   * builtins.def: Add init_descriptor and adjust_descriptor.
>   * builtins.c (expand_builtin_init_trampoline): Do not issue a warning
>   on platforms with descriptors.
>   (expand_builtin_init_descriptor): New function.
>   (expand_builtin_adjust_descriptor): Likewise.
>   (expand_builtin) : New case.
>   : Likewise.
>   * calls.c (prepare_call_address): Remove SIBCALLP parameter and add
>   FLAGS parameter.  Deal with indirect calls by descriptor and adjust.
>   Set STATIC_CHAIN_REG_P on the static chain register, if any.
>   (call_expr_flags): Set ECF_BY_DESCRIPTOR for calls by descriptor.
>   (expand_call): Likewise.  Move around call to prepare_call_address
>   and pass all flags to it.
>   * cfgexpand.c (expand_call_stmt): Reinstate CALL_EXPR_BY_DESCRIPTOR.
>   * gimple.h (enum gf_mask): New GF_CALL_BY_DESCRIPTOR value.
>   (gimple_call_set_by_descriptor): New setter.
>   (gimple_call_by_descriptor_p): New getter.
>   * gimple.c (gimple_build_call_from_tree): Set CALL_EXPR_BY_DESCRIPTOR.
>   (gimple_call_flags): Deal with GF_CALL_BY_DESCRIPTOR.
>   * langhooks.h (struct lang_hooks): Add custom_function_descriptors.
>   * langhooks-def.h (LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS): Define.
>   (LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS.
>   * rtl.h (STATIC_CHAIN_REG_P): New macro.
>   * rtlanal.c (find_first_parameter_load): Skip static chain registers.
>   * target.def (custom_function_descriptors): New POD hook.
>   * tree.h (FUNC_ADDR_BY_DESCRIPTOR): New flag on ADDR_EXPR.
>   (CALL_EXPR_BY_DESCRIPTOR): New flag on CALL_EXPR.
>   * tree-core.h (ECF_BY_DESCRIPTOR): New mask.
>   Document FUNC_ADDR_BY_DESCRIPTOR and CALL_EXPR_BY_DESCRIPTOR.
>   * tree.c (make_node_stat) : Set function alignment to
>   DEFAULT_FUNCTION_ALIGNMENT instead of FUNCTION_BOUNDARY.
>   (build_common_builtin_nodes): Initiali

Re: [AArch64][3/3] Migrate aarch64_expand_prologue/epilogue to aarch64_add_constant

2016-07-25 Thread Richard Earnshaw (lists)
On 25/07/16 10:34, Jiong Wang wrote:
> On 21/07/16 11:08, Richard Earnshaw (lists) wrote:
>> On 20/07/16 16:02, Jiong Wang wrote:
>>> Richard,
>>>Thanks for the review, yes, I believe using aarch64_add_constant is
>>> unconditionally
>>> safe here.  Because we have generated a stack tie to clobber the whole
>>> memory thus
>>> prevent any instruction which access stack be scheduled after that.
>>>
>>>The access to deallocated stack issue was there and fixed by
>>>
>>>https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html.
>>>
>>>   aarch64_add_constant itself is generating the same instruction
>>> sequences as the
>>> original code, except for a few cases, it will prefer
>>>
>>>move scratch_reg, #imm
>>>add sp, sp, scratch_reg
>>>
>>> than:
>>>add sp, sp, #imm_part1
>>>add sp, sp, #imm_part2
>>>
>>>
>>>
>>>
>> OK, I've had another look at this and I'm happy that we don't
>> (currently) run into the problem I'm concerned about.
>>
>> However, this new usage does impose a constraint on aarch64_add_constant
>> that will need to be respected in future, so please can you add the
>> following to the comment that precedes that function:
>>
>> /* ...
>>
>> This function is sometimes used to adjust the stack pointer, so we
>> must ensure that it can never cause transient stack deallocation
>> by writing an invalid value into REGNUM.  */
>>
>>
>>> +  bool frame_related_p = (regnum == SP_REGNUM);
>> I think it would be better to make the frame-related decision be an
>> explicit parameter passed to the routine (don't forget SP is not always
>> the frame pointer).  Then the new uses would pass 'true' and the
>> existing uses 'false'.
>>
>> R.
> 
> Thanks, attachment is the updated patch which:
> 
>   * Added above new comments for aarch64_add_constant.
>   * One new parameter "frame_related_p" for aarch64_add_constant.
> I thought adding new gcc assertion for sanity check of
> frame_related_p and REGNUM, haven't done that as I found dwarf2cfi.c
> is doing that.
> 
> OK for trunk?

This is fine, thanks.

R.

> 
> gcc/
> 2016-07-25  Jiong Wang  
> 
>  * config/aarch64/aarch64.c (aarch64_add_constant): New
>  parameter "frame_related_p".  Generate CFA annotation when
>  it's necessary.
>  (aarch64_expand_prologue): Use aarch64_add_constant.
>  (aarch64_expand_epilogue): Likewise.
>  (aarch64_output_mi_thunk): Pass "false" when calling
>  aarch64_add_constant.
> 
> 
> update.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 41844a1..ca93f6e 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1866,14 +1866,19 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
>  }
>  
>  /* Add DELTA onto REGNUM in MODE, using SCRATCHREG to held intermediate 
> value if
> -   it is necessary.  */
> +   it is necessary.
> +
> +   This function is sometimes used to adjust the stack pointer, so we must
> +   ensure that it can never cause transient stack deallocation by writing an
> +   invalid value into REGNUM.  */
>  
>  static void
>  aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
> -   HOST_WIDE_INT delta)
> +   HOST_WIDE_INT delta, bool frame_related_p)
>  {
>HOST_WIDE_INT mdelta = abs_hwi (delta);
>rtx this_rtx = gen_rtx_REG (mode, regnum);
> +  rtx_insn *insn;
>  
>/* Do nothing if mdelta is zero.  */
>if (!mdelta)
> @@ -1882,7 +1887,8 @@ aarch64_add_constant (machine_mode mode, int regnum, 
> int scratchreg,
>/* We only need single instruction if the offset fit into add/sub.  */
>if (aarch64_uimm12_shift (mdelta))
>  {
> -  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
> +  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
> +  RTX_FRAME_RELATED_P (insn) = frame_related_p;
>return;
>  }
>  
> @@ -1895,15 +1901,23 @@ aarch64_add_constant (machine_mode mode, int regnum, 
> int scratchreg,
>HOST_WIDE_INT low_off = mdelta & 0xfff;
>  
>low_off = delta < 0 ? -low_off : low_off;
> -  emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
> -  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
> +  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
> +  RTX_FRAME_RELATED_P (insn) = frame_related_p;
> +  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
> +  RTX_FRAME_RELATED_P (insn) = frame_related_p;
>return;
>  }
>  
>/* Otherwise use generic function to handle all other situations.  */
>rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
>aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
> -  emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
> +  insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
> +  if (frame_related_p)
> +{
> +  RTX_FRAME_RELAT

[PATCH 0/7, GCC, V8M] ARMv8-M Security Extensions

2016-07-25 Thread Andre Vieira (lists)
[PATCH 0/7, GCC, V8M] ARMv8-M Security Extensions

Hello,

This is a respin of a previous patch series for ARMv8-M Security
Extensions. In this version I have removed one patch, rebased the rest
and changed some of them.

This patch series aims at implementing support for ARMv8-M's Security
Extensions. You can find the specification of ARMV8-M Security
Extensions in: ARM®v8-M Security Extensions: Requirements on Development
Tools
(http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html).

We currently:
- do not support passing arguments or returning on the stack for
cmse_nonsecure_{call,entry} functions,
- only test Security Extensions for -mfpu=fpv5-d16 and fpv5-sp-d16 and
only support single and double precision FPU's with d16.

Bootstrapped and tested on arm-none-linux-gnueabihf and tested on
arm-none-eabi with ARMv8-M Baseline and Mainline targets.

Andre Vieira (7):
 Add support for ARMv8-M's Security Extensions flag and intrinsics
 Handling ARMv8-M Security Extension's cmse_nonsecure_entry attribute
 ARMv8-M Security Extension's cmse_nonsecure_entry: __acle_se label and
bxns return
 ARMv8-M Security Extension's cmse_nonsecure_entry: clear registers
 Handling ARMv8-M Security Extension's cmse_nonsecure_call attribute
 ARMv8-M Security Extension's cmse_nonsecure_call: use
__gnu_cmse_nonsecure_call
 Added support for ARMV8-M Security Extension cmse_nonsecure_caller
intrinsic


[PATCH 1/7, GCC, ARM, V8M] Add support for ARMv8-M's Secure Extensions flag and intrinsics

2016-07-25 Thread Andre Vieira (lists)
This patch adds the support of the '-mcmse' option to enable ARMv8-M's
Security Extensions and supports the following intrinsics:
cmse_TT
cmse_TT_fptr
cmse_TTT
cmse_TTT_fptr
cmse_TTA
cmse_TTA_fptr
cmse_TTAT
cmse_TTAT_fptr
cmse_check_address_range
cmse_check_pointed_object
cmse_is_nsfptr
cmse_nsfptr_create

It also defines the mandatory cmse_address_info struct and the
__ARM_FEATURE_CMSE macro.
See Chapter 4, Sections 5.2, 5.3 and 5.6 of ARM®v8-M Security
Extensions: Requirements on Development Tools
(http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html).

*** gcc/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* config.gcc (extra_headers): Added arm_cmse.h.
* config/arm/arm-arches.def (ARM_ARCH):
(armv8-m): Add FL2_CMSE.
(armv8-m.main): Likewise.
(armv8-m.main+dsp): Likewise.
* config/arm/arm-c.c
(arm_cpu_builtins): Added __ARM_FEATURE_CMSE macro.
* config/arm/arm-protos.h
(arm_is_constant_pool_ref): Define FL2_CMSE.
* config/arm.c (arm_arch_cmse): New.
(arm_option_override): New error for unsupported cmse target.
* config/arm/arm.h (arm_arch_cmse): New.
* config/arm/arm.opt (mcmse): New.
* doc/invoke.texi (ARM Options): Add -mcmse.
* config/arm/arm_cmse.h: New file.

*** libgcc/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* config/arm/cmse.c: Likewise.
* config/arm/t-arm (HAVE_CMSE): New.

*** gcc/testsuite/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* gcc.target/arm/cmse/cmse.exp: New.
* gcc.target/arm/cmse/cmse-1.c: New.
* gcc.target/arm/cmse/cmse-12.c: New.
* lib/target-supports.exp
(check_effective_target_arm_cmse_ok): New.
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 
1f75f17877334c2bb61cd16b69539ec7514db8ae..8555bbf19d81b517493c86b38aff31a633ac50eb
 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -320,7 +320,7 @@ arc*-*-*)
 arm*-*-*)
cpu_type=arm
extra_objs="arm-builtins.o aarch-common.o"
-   extra_headers="mmintrin.h arm_neon.h arm_acle.h"
+   extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_cmse.h"
target_type_format_char='%'
c_target_objs="arm-c.o"
cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 
be46521c9eaea54f9ad78a92874567589289dbdf..0e523959551cc3b1da31411ccdd1105b830db845
 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -63,11 +63,11 @@ ARM_ARCH("armv8.1-a+crc",cortexa53, 8A,
  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
 FL2_FOR_ARCH8_1A))
 ARM_ARCH("armv8-m.base", cortexm0, 8M_BASE,
-ARM_FSET_MAKE_CPU1 ( FL_FOR_ARCH8M_BASE))
+ARM_FSET_MAKE (  FL_FOR_ARCH8M_BASE, FL2_CMSE))
 ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN,
-ARM_FSET_MAKE_CPU1(FL_CO_PROC |  FL_FOR_ARCH8M_MAIN))
+ARM_FSET_MAKE (FL_CO_PROC |  FL_FOR_ARCH8M_MAIN, FL2_CMSE))
 ARM_ARCH("armv8-m.main+dsp", cortexm7, 8M_MAIN,
-ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN))
+ARM_FSET_MAKE (FL_CO_PROC | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN, FL2_CMSE))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,   ARM_FSET_MAKE_CPU1 (FL_LDSCHED | 
FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE,   ARM_FSET_MAKE_CPU1 (FL_LDSCHED | 
FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
 
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 
b98470fff45b20a5398c2534bc3bb3edfb7bfd01..ad2fb09d1f9ca14300c6283f3831a527db656267
 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -76,6 +76,14 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 
   def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
 
+  if (arm_arch8 && !arm_arch_notm)
+{
+  if (arm_arch_cmse && use_cmse)
+   builtin_define_with_int_value ("__ARM_FEATURE_CMSE", 3);
+  else
+   builtin_define ("__ARM_FEATURE_CMSE");
+}
+
   if (TARGET_ARM_FEATURE_LDREX)
 builtin_define_with_int_value ("__ARM_FEATURE_LDREX",
   TARGET_ARM_FEATURE_LDREX);
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
49c3a92dba80db32b698a0b44ad72d56111c1358..0754fb7252a8b86ffacc0cee4598686752af6e56
 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -391,6 +391,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_ARCH6KZ(1 << 31)   /* ARMv6KZ architecture.  */
 
 #define FL2_ARCH8_1   (1 << 0)   /* Architecture 8.1.  */
+#define FL2_CMSE  (1 << 1)   /* ARMv8-M Security Extensions.  */
 
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE(FL

[PATCH 2/7, GCC, ARM, V8M] Handling ARMv8-M Security Extension's cmse_nonsecure_entry attribute

2016-07-25 Thread Andre Vieira (lists)
This patch adds support for the ARMv8-M Security Extensions
'cmse_nonsecure_entry' attribute. In this patch we implement the
attribute handling and diagnosis around the attribute. See Section 5.4
of ARM®v8-M Security Extensions
(http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html).

*** gcc/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* config/arm/arm.c (arm_handle_cmse_nonsecure_entry): New.
(arm_attribute_table): Added cmse_nonsecure_entry
(arm_compute_func_type): Handle cmse_nonsecure_entry.
(cmse_func_args_or_return_in_stack): New.
(arm_handle_cmse_nonsecure_entry): New.
* config/arm/arm.h (ARM_FT_CMSE_ENTRY): New macro define.
(IS_CMSE_ENTRY): Likewise.

*** gcc/testsuite/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* gcc.target/arm/cmse/cmse-3.c: New.
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
e3697bbcb425999db31ac2b4f47e14bb3f2ffa89..5307ec8f904230db5ea44150ef471d928926ab6d
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1373,6 +1373,7 @@ enum reg_class
 #define ARM_FT_VOLATILE(1 << 4) /* Does not return.  */
 #define ARM_FT_NESTED  (1 << 5) /* Embedded inside another func.  */
 #define ARM_FT_STACKALIGN  (1 << 6) /* Called with misaligned stack.  */
+#define ARM_FT_CMSE_ENTRY  (1 << 7) /* ARMv8-M non-secure entry function.  
*/
 
 /* Some macros to test these flags.  */
 #define ARM_FUNC_TYPE(t)   (t & ARM_FT_TYPE_MASK)
@@ -1381,6 +1382,7 @@ enum reg_class
 #define IS_NAKED(t)(t & ARM_FT_NAKED)
 #define IS_NESTED(t)   (t & ARM_FT_NESTED)
 #define IS_STACKALIGN(t)   (t & ARM_FT_STACKALIGN)
+#define IS_CMSE_ENTRY(t)   (t & ARM_FT_CMSE_ENTRY)
 
 
 /* Structure used to hold the function stack frame layout.  Offsets are
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
9903d9cd8c5ff68a2318a643bdf31cf48016eba4..11417ab3c2f7101866ee5d6b100913480e5c336e
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -134,6 +134,7 @@ static tree arm_handle_isr_attribute (tree *, tree, tree, 
int, bool *);
 #if TARGET_DLLIMPORT_DECL_ATTRIBUTES
 static tree arm_handle_notshared_attribute (tree *, tree, tree, int, bool *);
 #endif
+static tree arm_handle_cmse_nonsecure_entry (tree *, tree, tree, int, bool *);
 static void arm_output_function_epilogue (FILE *, HOST_WIDE_INT);
 static void arm_output_function_prologue (FILE *, HOST_WIDE_INT);
 static int arm_comp_type_attributes (const_tree, const_tree);
@@ -343,6 +344,9 @@ static const struct attribute_spec arm_attribute_table[] =
   { "notshared",0, 0, false, true, false, arm_handle_notshared_attribute,
 false },
 #endif
+  /* ARMv8-M Security Extensions support.  */
+  { "cmse_nonsecure_entry", 0, 0, true, false, false,
+arm_handle_cmse_nonsecure_entry, false },
   { NULL,   0, 0, false, false, false, NULL, false }
 };
 
@@ -3633,6 +3637,9 @@ arm_compute_func_type (void)
   else
 type |= arm_isr_value (TREE_VALUE (a));
 
+  if (lookup_attribute ("cmse_nonsecure_entry", attr))
+type |= ARM_FT_CMSE_ENTRY;
+
   return type;
 }
 
@@ -6634,6 +6641,110 @@ arm_handle_notshared_attribute (tree *node,
 }
 #endif
 
+/* This function returns true if a function with declaration FNDECL, name
+   NAME and type FNTYPE uses the stack to pass arguments or return variables
+   and false otherwise.  This is used for functions with the attributes
+   'cmse_nonsecure_call' or 'cmse_nonsecure_entry' and this function will issue
+   diagnostic messages if the stack is used.  */
+
+static bool
+cmse_func_args_or_return_in_stack (tree fndecl, tree name, tree fntype)
+{
+  function_args_iterator args_iter;
+  CUMULATIVE_ARGS args_so_far_v;
+  cumulative_args_t args_so_far;
+  bool first_param = true;
+  tree arg_type, prev_arg_type = NULL_TREE, ret_type;
+
+  /* Error out if any argument is passed on the stack.  */
+  arm_init_cumulative_args (&args_so_far_v, fntype, NULL_RTX, fndecl);
+  args_so_far = pack_cumulative_args (&args_so_far_v);
+  FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter)
+{
+  rtx arg_rtx;
+  machine_mode arg_mode = TYPE_MODE (arg_type);
+
+  prev_arg_type = arg_type;
+  if (VOID_TYPE_P (arg_type))
+   continue;
+
+  if (!first_param)
+   arm_function_arg_advance (args_so_far, arg_mode, arg_type, true);
+  arg_rtx = arm_function_arg (args_so_far, arg_mode, arg_type, true);
+  if (!arg_rtx
+ || arm_arg_partial_bytes (args_so_far, arg_mode, arg_type, true))
+   {
+ error ("%qE attribute not available to functions with arguments "
+"passed on the stack", name);
+ return true;
+   }
+  first_param = false;
+}
+
+  /* Error out for variadic functions since we cannot control how many
+ arguments will be passed and thus stack could be used.  stdarg_p () is not
+

[PATCH] Fix PR71984

2016-07-25 Thread Richard Biener

The following should fix PR71984 which stems from a bogus (but harmless
for CONCAT) use of GET_MODE_UNIT_SIZE vs. GET_MODE_SIZE (where the
former is really really badly named...).  In the process of fixing it
I also hardened it against VOIDmode which might have occured for
CONCAT as well given we have CSImode.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

I don't have AVX512 HW to check but by inspection the testcase is fixed.
(and the issue is obvious once you analyzed it)

It was Honza who sneaked in this GET_MODE_UNIT_SIZE use, the bad
name exists since the initial repository version.

Ok for trunk?

Thanks,
Richard.

2016-07-25  Richard Biener  

PR rtl-optimization/71984
* simplify-rtx.c (simplify_subreg): Use GET_MODE_SIZE and prepare
for VOIDmode.

* gcc.dg/torture/pr71984.c: New testcase.

Index: gcc/simplify-rtx.c
===
*** gcc/simplify-rtx.c  (revision 238710)
--- gcc/simplify-rtx.c  (working copy)
*** simplify_subreg (machine_mode outermode,
*** 6116,6122 
unsigned int part_size, final_offset;
rtx part, res;
  
!   part_size = GET_MODE_UNIT_SIZE (GET_MODE (XEXP (op, 0)));
if (byte < part_size)
{
  part = XEXP (op, 0);
--- 6116,6125 
unsigned int part_size, final_offset;
rtx part, res;
  
!   enum machine_mode part_mode = GET_MODE (XEXP (op, 0));
!   if (part_mode == VOIDmode)
!   part_mode = GET_MODE_INNER (GET_MODE (op));
!   part_size = GET_MODE_SIZE (part_mode);
if (byte < part_size)
{
  part = XEXP (op, 0);
*** simplify_subreg (machine_mode outermode,
*** 6131,6137 
if (final_offset + GET_MODE_SIZE (outermode) > part_size)
return NULL_RTX;
  
!   enum machine_mode part_mode = GET_MODE (part);
if (part_mode == VOIDmode)
part_mode = GET_MODE_INNER (GET_MODE (op));
res = simplify_subreg (outermode, part, part_mode, final_offset);
--- 6134,6140 
if (final_offset + GET_MODE_SIZE (outermode) > part_size)
return NULL_RTX;
  
!   part_mode = GET_MODE (part);
if (part_mode == VOIDmode)
part_mode = GET_MODE_INNER (GET_MODE (op));
res = simplify_subreg (outermode, part, part_mode, final_offset);
Index: gcc/testsuite/gcc.dg/torture/pr71984.c
===
*** gcc/testsuite/gcc.dg/torture/pr71984.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr71984.c  (working copy)
***
*** 0 
--- 1,21 
+ /* { dg-do run { target lp64 } } */
+ /* { dg-additional-options "-w -Wno-psabi" } */
+ 
+ typedef unsigned char v64u8 __attribute__((vector_size(64)));
+ typedef unsigned long v64u64 __attribute__((vector_size(64)));
+ typedef unsigned char u8;
+ 
+ static u8 __attribute__ ((noinline, noclone))
+ foo (v64u64 v64u64_0)
+ {
+   return ((v64u8)(v64u64){0, v64u64_0[0]})[13];
+ }
+ 
+ int
+ main ()
+ {
+   u8 x = foo((v64u64){0x0706050403020100UL});
+   if (x != 5)
+ __builtin_abort ();
+   return 0;
+ }


[PATCH 3/7, GCC, ARM, V8M] ARMv8-M Security Extension's cmse_nonsecure_entry: __acle_se label and bxns return

2016-07-25 Thread Andre Vieira (lists)
This patch extends support for the ARMv8-M Security Extensions
'cmse_nonsecure_entry' attribute in two ways:

1) Generate two labels for the function, the regular function name and
one with the function's name appended to '__acle_se_', this will trigger
the linker to create a secure gateway veneer for this entry function.
2) Return from cmse_nonsecure_entry marked functions using bxns.

See Section 5.4 of ARM®v8-M Security Extensions
(http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html).


*** gcc/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* config/arm/arm.c (use_return_insn): Change to return with  bxns
when cmse_nonsecure_entry.
(output_return_instruction): Likewise.
(arm_output_function_prologue): Likewise.
(thumb_pop): Likewise.
(thumb_exit): Likewise.
(arm_function_ok_for_sibcall): Disable sibcall for entry functions.
(arm_asm_declare_function_name): New.
* config/arm/arm-protos.h (arm_asm_declare_function_name): New.
* config/arm/elf.h (ASM_DECLARE_FUNCTION_NAME): Redefine to
use arm_asm_declare_function_name.

*** gcc/testsuite/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* gcc.target/arm/cmse/cmse-2.c: New.
* gcc.target/arm/cmse/cmse-4.c: New.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
0754fb7252a8b86ffacc0cee4598686752af6e56..1d2e35b52f631570450b6c8eaf077e18e9b99203
 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -31,6 +31,7 @@ extern int arm_volatile_func (void);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
 extern void arm_declare_function_name (FILE *, const char *, tree);
+extern void arm_asm_declare_function_name (FILE *, const char *, tree);
 extern void thumb2_expand_return (bool);
 extern const char *arm_strip_name_encoding (const char *);
 extern void arm_asm_output_labelref (FILE *, const char *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
11417ab3c2f7101866ee5d6b100913480e5c336e..9fba371768b1eba3a11dc8aa5d6acf8cc30f464d
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3866,6 +3866,11 @@ use_return_insn (int iscond, rtx sibling)
return 0;
 }
 
+  /* ARMv8-M nonsecure entry function need to use bxns to return and thus need
+ several instructions if anything needs to be popped.  */
+  if (saved_int_regs && IS_CMSE_ENTRY (func_type))
+return 0;
+
   /* If there are saved registers but the LR isn't saved, then we need
  two instructions for the return.  */
   if (saved_int_regs && !(saved_int_regs & (1 << LR_REGNUM)))
@@ -6903,6 +6908,11 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
   if (IS_INTERRUPT (func_type))
 return false;
 
+  /* ARMv8-M non-secure entry functions need to return with bxns which is only
+ generated for entry functions themselves.  */
+  if (IS_CMSE_ENTRY (arm_current_func_type ()))
+return false;
+
   if (!VOID_TYPE_P (TREE_TYPE (DECL_RESULT (cfun->decl
 {
   /* Check that the return value locations are the same.  For
@@ -19739,6 +19749,7 @@ output_return_instruction (rtx operand, bool 
really_return, bool reverse,
 (e.g. interworking) then we can load the return address
 directly into the PC.  Otherwise we must load it into LR.  */
   if (really_return
+ && !IS_CMSE_ENTRY (func_type)
  && (IS_INTERRUPT (func_type) || !TARGET_INTERWORK))
return_reg = reg_names[PC_REGNUM];
   else
@@ -19879,8 +19890,10 @@ output_return_instruction (rtx operand, bool 
really_return, bool reverse,
  break;
 
default:
+ if (IS_CMSE_ENTRY (func_type))
+   snprintf (instr, sizeof (instr), "bxns%s\t%%|lr", conditional);
  /* Use bx if it's available.  */
- if (arm_arch5 || arm_arch4t)
+ else if (arm_arch5 || arm_arch4t)
sprintf (instr, "bx%s\t%%|lr", conditional);
  else
sprintf (instr, "mov%s\t%%|pc, %%|lr", conditional);
@@ -19893,6 +19906,42 @@ output_return_instruction (rtx operand, bool 
really_return, bool reverse,
   return "";
 }
 
+/* Output in FILE asm statements needed to declare the NAME of the function
+   defined by its DECL node.  */
+
+void
+arm_asm_declare_function_name (FILE *file, const char *name, tree decl)
+{
+  size_t cmse_name_len;
+  char *cmse_name = 0;
+  char cmse_prefix[] = "__acle_se_";
+
+  if (use_cmse && lookup_attribute ("cmse_nonsecure_entry",
+   DECL_ATTRIBUTES (decl)))
+{
+  cmse_name_len = sizeof (cmse_prefix) + strlen (name);
+  cmse_name = XALLOCAVEC (char, cmse_name_len);
+  snprintf (cmse_name, cmse_name_len, "%s%s", cmse_prefix, name);
+  targetm.asm_out.globalize_label (file, cmse_name);
+}
+
+  if (cmse_name)
+{
+  ARM_DECLARE_FUNCTION_NAME (file, cmse_name, decl);

[PATCH 4/7, GCC, ARM, V8M] ARMv8-M Security Extension's cmse_nonsecure_entry: clear registers

2016-07-25 Thread Andre Vieira (lists)
This patch extends support for the ARMv8-M Security Extensions
'cmse_nonsecure_entry' attribute to safeguard against leak of
information through unbanked registers.

When returning from a nonsecure entry function we clear all caller-saved
registers that are not used to pass return values, by writing either the
LR, in case of general purpose registers, or the value 0, in case of FP
registers. We use the LR to write to APSR and FPSCR too. We currently do
not support entry functions that pass arguments or return variables on
the stack and we diagnose this. This patch relies on the existing code
to make sure callee-saved registers used in cmse_nonsecure_entry
functions are saved and restored thus retaining their nonsecure mode
value, this should be happening already as it is required by AAPCS.

This patch also clears padding bits for cmse_nonsecure_entry functions
with struct and union return types. For unions a bit is only considered
a padding bit if it is an unused bit in every field of that union. The
function that calculates these is used in a later patch to do the same
for arguments of cmse_nonsecure_call's.

*** gcc/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* config/arm/arm.c (output_return_instruction): Clear
registers.
(thumb2_expand_return): Likewise.
(thumb1_expand_epilogue): Likewise.
(thumb_exit): Likewise.
(arm_expand_epilogue): Likewise.
(cmse_nonsecure_entry_clear_before_return): New.
(comp_not_to_clear_mask_str_un): New.
(compute_not_to_clear_mask): New.
* config/arm/thumb1.md (*epilogue_insns): Change length attribute.
* config/arm/thumb2.md (*thumb2_return): Likewise.

*** gcc/testsuite/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* gcc.target/arm/cmse/cmse.exp: Test different multilibs separate.
* gcc.target/arm/cmse/struct-1.c: New.
* gcc.target/arm/cmse/bitfield-1.c: New.
* gcc.target/arm/cmse/bitfield-2.c: New.
* gcc.target/arm/cmse/bitfield-3.c: New.
* gcc.target/arm/cmse/baseline/cmse-2.c: Test that registers are
cleared.
* gcc.target/arm/cmse/mainline/soft/cmse-5.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-5.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-5.c: New.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-5.c: New.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
9fba371768b1eba3a11dc8aa5d6acf8cc30f464d..81a9d9a6fb29d0956a661734d60dd2e44cb554b8
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -17473,6 +17473,279 @@ note_invalid_constants (rtx_insn *insn, HOST_WIDE_INT 
address, int do_pushes)
   return;
 }
 
+/* This function computes the clear mask and PADDING_BITS_TO_CLEAR for structs
+   and unions in the context of ARMv8-M Security Extensions.  It is used as a
+   helper function for both 'cmse_nonsecure_call' and 'cmse_nonsecure_entry'
+   functions.  The PADDING_BITS_TO_CLEAR pointer can be the base to either one
+   or four masks, depending on whether it is being computed for a
+   'cmse_nonsecure_entry' return value or a 'cmse_nonsecure_call' argument
+   respectively.  The tree for the type of the argument or a field within an
+   argument is passed in ARG_TYPE, the current register this argument or field
+   starts in is kept in the pointer REGNO and updated accordingly, the bit this
+   argument or field starts at is passed in STARTING_BIT and the last used bit
+   is kept in LAST_USED_BIT which is also updated accordingly.  */
+
+static unsigned HOST_WIDE_INT
+comp_not_to_clear_mask_str_un (tree arg_type, int * regno,
+  uint32_t * padding_bits_to_clear,
+  unsigned starting_bit, int * last_used_bit)
+
+{
+  unsigned HOST_WIDE_INT not_to_clear_reg_mask = 0;
+
+  if (TREE_CODE (arg_type) == RECORD_TYPE)
+{
+  unsigned current_bit = starting_bit;
+  tree field;
+  long int offset, size;
+
+
+  field = TYPE_FIELDS (arg_type);
+  while (field)
+   {
+ /* The offset within a structure is always an offset from
+the start of that structure.  Make sure we take that into the
+calculation of the register based offset that we use here.  */
+ offset = starting_bit;
+ offset += TREE_INT_CST_ELT (DECL_FIELD_BIT_OFFSET (field), 0);
+ offset %= 32;
+
+ /* This is the actual size of the field, for bitfields this is the
+bitfield width and not the container size.  */
+ size = TREE_INT_CST_ELT (DECL_SIZE (field), 0);
+
+ if (*last_used_bit != offset)
+   {
+ if (offset < *last_used_bit)
+   {
+ /* This field's offset is before the 'last_used_bit', that
+means this field goes on the next register.  So we need to

[PATCH 5/7, GCC, ARM, V8M] Handling ARMv8-M Security Extension's cmse_nonsecure_call attribute

2016-07-25 Thread Andre Vieira (lists)
This patch adds support for the ARMv8-M Security Extensions
'cmse_nonsecure_call' attribute. This attribute may only be used for
function types and when used in combination with the '-mcmse'
compilation flag. See Section 5.5 of ARM®v8-M Security Extensions
(http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html).

We currently do not support cmse_nonsecure_call functions that pass
arguments or return variables on the stack and we diagnose this.

*** gcc/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* config/arm/arm.c (gimplify.h): New include.
(arm_handle_cmse_nonsecure_call): New.
(arm_attribute_table): Added cmse_nonsecure_call.

*** gcc/testsuite/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* gcc.target/arm/cmse/cmse-3.c: Add tests.
* gcc.target/arm/cmse/cmse-4.c: Add tests.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
81a9d9a6fb29d0956a661734d60dd2e44cb554b8..128baae92e4b65507eb18b679874b1ad24ca7c4a
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -61,6 +61,7 @@
 #include "builtins.h"
 #include "tm-constrs.h"
 #include "rtl-iter.h"
+#include "gimplify.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -135,6 +136,7 @@ static tree arm_handle_isr_attribute (tree *, tree, tree, 
int, bool *);
 static tree arm_handle_notshared_attribute (tree *, tree, tree, int, bool *);
 #endif
 static tree arm_handle_cmse_nonsecure_entry (tree *, tree, tree, int, bool *);
+static tree arm_handle_cmse_nonsecure_call (tree *, tree, tree, int, bool *);
 static void arm_output_function_epilogue (FILE *, HOST_WIDE_INT);
 static void arm_output_function_prologue (FILE *, HOST_WIDE_INT);
 static int arm_comp_type_attributes (const_tree, const_tree);
@@ -347,6 +349,8 @@ static const struct attribute_spec arm_attribute_table[] =
   /* ARMv8-M Security Extensions support.  */
   { "cmse_nonsecure_entry", 0, 0, true, false, false,
 arm_handle_cmse_nonsecure_entry, false },
+  { "cmse_nonsecure_call", 0, 0, true, false, false,
+arm_handle_cmse_nonsecure_call, false },
   { NULL,   0, 0, false, false, false, NULL, false }
 };
 
@@ -6750,6 +6754,78 @@ arm_handle_cmse_nonsecure_entry (tree *node, tree name,
   return NULL_TREE;
 }
 
+
+/* Called upon detection of the use of the cmse_nonsecure_call attribute, this
+   function will check whether the attribute is allowed here and will add the
+   attribute to the function type tree or otherwise issue a diagnostic.  The
+   reason we check this at declaration time is to only allow the use of the
+   attribute with declarations of function pointers and not function
+   declarations.  This function checks NODE is of the expected type and issues
+   diagnostics otherwise using NAME.  If it is not of the expected type
+   *NO_ADD_ATTRS will be set to true.  */
+
+static tree
+arm_handle_cmse_nonsecure_call (tree *node, tree name,
+tree /* args */,
+int /* flags */,
+bool *no_add_attrs)
+{
+  tree decl = NULL_TREE;
+  tree type, fntype, main_variant;
+
+  if (!use_cmse)
+{
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
+  if (TREE_CODE (*node) == VAR_DECL || TREE_CODE (*node) == TYPE_DECL)
+{
+  decl = *node;
+  type = TREE_TYPE (decl);
+}
+
+  if (!decl
+  || (!(TREE_CODE (type) == POINTER_TYPE
+   && TREE_CODE (TREE_TYPE (type)) == FUNCTION_TYPE)
+ && TREE_CODE (type) != FUNCTION_TYPE))
+{
+   warning (OPT_Wattributes, "%qE attribute only applies to base type of a 
"
+"function pointer", name);
+   *no_add_attrs = true;
+   return NULL_TREE;
+}
+
+  /* type is either a function pointer, when the attribute is used on a 
function
+   * pointer, or a function type when used in a typedef.  */
+  if (TREE_CODE (type) == FUNCTION_TYPE)
+fntype = type;
+  else
+fntype = TREE_TYPE (type);
+
+  *no_add_attrs |= cmse_func_args_or_return_in_stack (NULL, name, fntype);
+
+  if (*no_add_attrs)
+return NULL_TREE;
+
+  /* Prevent trees being shared among function types with and without
+ cmse_nonsecure_call attribute.  Do however make sure they keep the same
+ main_variant, this is required for correct DIE output.  */
+  main_variant = TYPE_MAIN_VARIANT (fntype);
+  fntype = build_distinct_type_copy (fntype);
+  TYPE_MAIN_VARIANT (fntype) = main_variant;
+  if (TREE_CODE (type) == FUNCTION_TYPE)
+TREE_TYPE (decl) = fntype;
+  else
+TREE_TYPE (type) = fntype;
+
+  /* Construct a type attribute and add it to the function type.  */
+  tree attrs = tree_cons (get_identifier ("cmse_nonsecure_call"), NULL_TREE,
+ TYPE_ATTRIBUTES (fntype));
+  TYPE_ATTRIBUTES (fntype) = attrs;
+  return NULL_TREE;
+}
+
 /* Return 0 if the attributes for two types are incompatible, 1 if they
   

[PATCH 6/7, GCC, ARM, V8M] ARMv8-M Security Extension's cmse_nonsecure_call: use __gnu_cmse_nonsecure_call

2016-07-25 Thread Andre Vieira (lists)
This patch extends support for the ARMv8-M Security Extensions
'cmse_nonsecure_call' to use a new library function
'__gnu_cmse_nonsecure_call'. This library function is responsible for
(without using r0-r3 or d0-d7):
1) saving and clearing all callee-saved registers using the secure stack
2) clearing the LSB of the address passed in r4 and using blxns to
'jump' to it
3) clearing ASPR, including the 'ge bits' if DSP is enabled
4) clearing FPSCR if using non-soft float-abi
5) restoring callee-saved registers.

The decisions whether to include DSP 'ge bits' clearing and floating
point registers (single/double precision) all depends on the multilib used.

See Section 5.5 of ARM®v8-M Security Extensions
(http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html).

*** gcc/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* config/arm/arm.c (detect_cmse_nonsecure_call): New.
(cmse_nonsecure_call_clear_caller_saved): New.
(arm_reorg): Use cmse_nonsecure_call_clear_caller_saved.
* config/arm/arm-protos.h (detect_cmse_nonsecure_call): New.
* config/arm/arm.md (call): Handle cmse_nonsecure_entry.
(call_value): Likewise.
(nonsecure_call_internal): New.
(nonsecure_call_value_internal): New.
* config/arm/thumb1.md (*nonsecure_call_reg_thumb1_v5): New.
(*nonsecure_call_value_reg_thumb1_v5): New.
* config/arm/thumb2.md (*nonsecure_call_reg_thumb2): New.
(*nonsecure_call_value_reg_thumb2): New.
* config/arm/unspecs.md (UNSPEC_NONSECURE_MEM): New.

*** libgcc/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* config/arm/cmse_nonsecure_call.S: New.
* config/arm/t-arm: Compile cmse_nonsecure_call.S


*** gcc/testsuite/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* gcc.target/arm/cmse/cmse.exp: Run tests in mainline dir.
* gcc.target/arm/cmse/cmse-9.c: Added some extra tests.
* gcc.target/arm/cmse/baseline/bitfield-4.c: New.
* gcc.target/arm/cmse/baseline/bitfield-5.c: New.
* gcc.target/arm/cmse/baseline/bitfield-6.c: New.
* gcc.target/arm/cmse/baseline/bitfield-7.c: New.
* gcc.target/arm/cmse/baseline/bitfield-8.c: New.
* gcc.target/arm/cmse/baseline/bitfield-9.c: New.
* gcc.target/arm/cmse/baseline/bitfield-and-union-1.c: New.
* gcc.target/arm/cmse/baseline/cmse-11.c: New.
* gcc.target/arm/cmse/baseline/cmse-13.c: New.
* gcc.target/arm/cmse/baseline/cmse-6.c: New.
* gcc/testsuite/gcc.target/arm/cmse/baseline/union-1.c: New.
* gcc/testsuite/gcc.target/arm/cmse/baseline/union-2.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-8.c: New.

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
1d2e35b52f631570450b6c8eaf077e18e9b99203..e7e223f7a932163cba8beeb76c10b4c90eae9234
 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -133,6 +133,7 @@ extern int arm_const_double_inline_cost (rtx);
 extern bool arm_const_double_by_parts (rtx);
 extern bool arm_const_double_by_immediates (rtx);
 extern void arm_emit_call_insn (rtx, rtx, bool);
+bool detect_cmse_nonsecure_call (tree);
 extern const char *output_call (rtx *);
 void arm_emit_movpair (rtx, rtx);
 extern const char *output_mov_long_double_arm_from_arm (rtx *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
128baae92e4b65507eb18b679874b1ad24ca7c4a..fd0bc10083026ce544bf56244ce0eb740295e5d3
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -17822,6 +17822,197 @@ compute_not_to_clear_mask (tree arg_type, rtx 
arg_rtx, int regno,
   return not_to_clear_mask;
 }
 
+/* Saves callee saved registers, clears callee saved registers and caller saved
+   registers not used to pass arguments before a cmse_nonsecure_call.  And
+   restores the callee saved registers after.  */
+
+static void
+cmse_nonsecure_call_clear_caller_saved (void)
+{
+  basic_block bb;
+
+  FOR_EACH_BB_FN (bb, cfun)
+{
+  rtx_insn *insn;
+
+  FOR_BB_INSNS (bb, insn)
+   {
+ uint64_t to_cle

Re: [patch] Get rid of stack trampolines for nested functions

2016-07-25 Thread Eric Botcazou
> If I understand how this is supposed to work then this is not
> future-proof against changes to the architecture.  The bottom two bits
> in both AArch32 (arm) and AArch64 are reserved for future use by the
> architecture; they must not be used by software for tricks like this.

I see, thanks for the info, the value will need to be bumped to 4 then.

>From the definition of TARGET_PTRMEMFUNC_VBIT_LOCATION, I gather that the MIPS 
setting is probably incorrect too and ought to be bumped to 2.

-- 
Eric Botcazou


[PATCH 7/7, GCC, ARM, V8M] Added support for ARMV8-M Security Extension cmse_nonsecure_caller intrinsic

2016-07-25 Thread Andre Vieira (lists)
This patch adds support ARMv8-M's Security Extension's
cmse_nonsecure_caller intrinsic. This intrinsic is used to check whether
an entry function was called from a non-secure state.
See Section 5.4.3 of ARM®v8-M Security Extensions: Requirements on
Development Tools
(http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html)
for further details.

The FIXME in config/arm/arm_cmse.h is for a diagnostic message that is
suggested in the ARMv8-M Security Extensions document mentioned above,
to diagnose the use of the cmse_nonsecure_caller intrinsic outside of
functions with the 'cmse_nonsecure_entry' attribute.  Checking whether
the intrinsic is called from within such functions can easily be done
inside 'arm_expand_builtin'. However, making the warning point to the
right location is more complicated.  The ARMv8-M Security Extensions
specification does mention that such a diagnostic might become
mandatory, so I might have to pick this up later, otherwise it is left
as a potential extra feature.


*** gcc/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* config/arm/arm-builtins.c (arm_builtins): Define
ARM_BUILTIN_CMSE_NONSECURE_CALLER.
(bdesc_2arg): Add line for cmse_nonsecure_caller.
(arm_expand_builtin): Handle cmse_nonsecure_caller.
* config/arm/arm_cmse.h (cmse_nonsecure_caller): New.

*** gcc/testsuite/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* gcc.target/arm/cmse/cmse-1.c: Add test for
cmse_nonsecure_caller.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
68b2839879f78e8d819444fbc11d2a91f8d6279a..2589ec2d1233f3daff94a1d35ebf63c8a9b93ecf
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -515,6 +515,8 @@ enum arm_builtins
   ARM_BUILTIN_GET_FPSCR,
   ARM_BUILTIN_SET_FPSCR,
 
+  ARM_BUILTIN_CMSE_NONSECURE_CALLER,
+
 #undef CRYPTO1
 #undef CRYPTO2
 #undef CRYPTO3
@@ -1789,6 +1791,17 @@ arm_init_builtins (void)
= add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr,
ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, 
NULL_TREE);
 }
+
+  if (arm_arch_cmse)
+{
+  tree ftype_cmse_nonsecure_caller
+   = build_function_type_list (unsigned_type_node, NULL);
+  arm_builtin_decls[ARM_BUILTIN_CMSE_NONSECURE_CALLER]
+   = add_builtin_function ("__builtin_arm_cmse_nonsecure_caller",
+   ftype_cmse_nonsecure_caller,
+   ARM_BUILTIN_CMSE_NONSECURE_CALLER, BUILT_IN_MD,
+   NULL, NULL_TREE);
+}
 }
 
 /* Return the ARM builtin for CODE.  */
@@ -2368,6 +2381,12 @@ arm_expand_builtin (tree exp,
   emit_insn (pat);
   return target;
 
+case ARM_BUILTIN_CMSE_NONSECURE_CALLER:
+  target = gen_reg_rtx (SImode);
+  op0 = arm_return_addr (0, NULL_RTX);
+  emit_insn (gen_addsi3 (target, op0, const1_rtx));
+  return target;
+
 case ARM_BUILTIN_TEXTRMSB:
 case ARM_BUILTIN_TEXTRMUB:
 case ARM_BUILTIN_TEXTRMSH:
diff --git a/gcc/config/arm/arm_cmse.h b/gcc/config/arm/arm_cmse.h
index 
c9afdcbf48160a963fc254795543a1d9e981a215..989d09f9d08268224e745da5a2e0aa85916cd3a9
 100644
--- a/gcc/config/arm/arm_cmse.h
+++ b/gcc/config/arm/arm_cmse.h
@@ -163,6 +163,13 @@ __attribute__ ((__always_inline__))
 cmse_TTAT (void *__p)
 __CMSE_TT_ASM (at)
 
+/* FIXME: diagnose use outside cmse_nonsecure_entry functions.  */
+__extension__ static __inline int __attribute__ ((__always_inline__))
+cmse_nonsecure_caller (void)
+{
+  return __builtin_arm_cmse_nonsecure_caller ();
+}
+
 #define CMSE_AU_NONSECURE  2
 #define CMSE_MPU_NONSECURE 16
 #define CMSE_NONSECURE 18
@@ -177,7 +184,7 @@ __extension__ void *
 cmse_check_address_range (void *, size_t, int);
 
 #define cmse_check_pointed_object(p, f) \
-  ((typeof ((p))) cmse_check_address_range ((p), sizeof (*(p)), (f)))
+  ((typeof ((p))) cmse_check_address_range ((p), sizeof (*(p)), (f)))
 
 #define cmse_nsfptr_create(p) ((typeof ((p))) ((intptr_t) (p) & ~1))
 
diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c
index 
d5b9a2d9d59569de170da814ae660e9fb2b943e7..ddcf12a30a6c1806969d239c448da81ccf49532e
 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c
@@ -65,3 +65,32 @@ int foo (char * p)
 /* { dg-final { scan-assembler-times "ttat " 2 } } */
 /* { dg-final { scan-assembler-times "bl.cmse_check_address_range" 7 } } */
 /* { dg-final { scan-assembler-not "cmse_check_pointed_object" } } */
+
+typedef int (*int_ret_funcptr_t) (void);
+typedef int __attribute__ ((cmse_nonsecure_call)) (*int_ret_nsfuncptr_t) 
(void);
+
+int __attribute__ ((cmse_nonsecure_entry))
+baz (void)
+{
+  return cmse_nonsecure_caller ();
+}
+
+int __attribute__ ((cmse_nonsecure_entry))
+qux (int_ret_funcptr_t int_ret_funcptr)

Re: [patch] Get rid of stack trampolines for nested functions

2016-07-25 Thread Richard Earnshaw (lists)
On 25/07/16 14:25, Eric Botcazou wrote:
>> If I understand how this is supposed to work then this is not
>> future-proof against changes to the architecture.  The bottom two bits
>> in both AArch32 (arm) and AArch64 are reserved for future use by the
>> architecture; they must not be used by software for tricks like this.
> 
> I see, thanks for the info, the value will need to be bumped to 4 then.
> 
> From the definition of TARGET_PTRMEMFUNC_VBIT_LOCATION, I gather that the 
> MIPS 
> setting is probably incorrect too and ought to be bumped to 2.
> 

So does this require that every function have an 8-byte alignment
boundary?  or just those that appear in nested functions?  If the
former, how do you get routines written in assembler, or produced by
third-party compilers, to comply?

R.


[PATCH] do not take mutex in _Unwind_Find_registered_FDE if there is no registered objects

2016-07-25 Thread Gleb Natapov
_Unwind_Find_FDE calls _Unwind_Find_registered_FDE and it takes lock even
when there is no registered objects. As far as I see only statically
linked applications call __register_frame_info* functions, so for
dynamically linked executables taking the lock to check unseen_objects
and seen_objects is a pessimization. Since the function is called on
each thrown exception this is a lot of unneeded locking.  This patch
checks unseen_objects and seen_objects outside of the lock and returns
earlier if both are NULL.

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index 5b16a1f..41de746 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -1001,6 +1001,13 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
   struct object *ob;
   const fde *f = NULL;
 
+  /* __atomic_write is not used to modify unseen_objects and seen_objects
+ since they are modified in locked sections only and unlock provides
+ release semantics. */
+  if (!__atomic_load_n(&unseen_objects, __ATOMIC_ACQUIRE)
+  && !__atomic_load_n(&seen_objects, __ATOMIC_ACQUIRE))
+  return NULL;
+
   init_object_mutex_once ();
   __gthread_mutex_lock (&object_mutex);
 
@@ -1020,8 +1027,8 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
   while ((ob = unseen_objects))
 {
   struct object **p;
+  struct object *next = ob->next;
 
-  unseen_objects = ob->next;
   f = search_object (ob, pc);
 
   /* Insert the object into the classified list.  */
@@ -1031,6 +1038,8 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
   ob->next = *p;
   *p = ob;
 
+  unseen_objects = next;
+
   if (f)
goto fini;
 }
--
Gleb.


Re: [patch] Get rid of stack trampolines for nested functions

2016-07-25 Thread Eric Botcazou
> So does this require that every function have an 8-byte alignment
> boundary?  or just those that appear in nested functions?  If the
> former, how do you get routines written in assembler, or produced by
> third-party compilers, to comply?

The former.  You need to be able to control what happens at the interface with 
other languages and accept what they send (and do not send them descriptors).
The ISO standard contains an entire annex specifying this interfacing in Ada.

-- 
Eric Botcazou


[PR71078] x / abs(x) -> copysign (1.0, x)

2016-07-25 Thread Prathamesh Kulkarni
Hi,
The attached patch tries to fix PR71078.
I am not sure if I have got the converts right.
I put (convert? @0) and (convert1? (abs @1))
to match for cases when operands's types may
be different from outermost type like in pr71078-3.c
test-case (included in patch).
Bootstrap+test in progress on x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
diff --git a/gcc/match.pd b/gcc/match.pd
index 21bf617..6c3d6ec 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -391,6 +391,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (mult (abs@1 @0) @1)
  (mult @0 @0))
 
+/* PR71078: x / abs(x) -> copysign (1.0, x) */
+(simplify
+ (rdiv:C (convert? @0) (convert1? (abs @0)))
+  (if (FLOAT_TYPE_P (type)
+   && ! HONOR_NANS (type)
+   && ! HONOR_INFINITIES (type))
+   (switch
+(if (type == float_type_node)
+ (BUILT_IN_COPYSIGNF { build_one_cst (type); } (convert @0)))
+(if (type == double_type_node)
+ (BUILT_IN_COPYSIGN { build_one_cst (type); } (convert @0)))
+(if (type == long_double_type_node)
+ (BUILT_IN_COPYSIGNL { build_one_cst (type); } (convert @0))
+
 /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
 (for coss (COS COSH)
  copysigns (COPYSIGN)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c
new file mode 100644
index 000..6204c14
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */
+
+#include 
+
+float f1(float x)
+{
+  float t1 = fabsf (x);
+  float t2 = x / t1;
+  return t2;
+}
+ 
+double f2(double x)
+{
+  double t1 = fabs (x);
+  double t2 = x / t1;
+  return t2;
+}
+
+long double f3 (long double x)
+{
+  long double t1 = fabsl (x);
+  long double t2 = x / t1;
+  return t2;
+}
+
+/* { dg-final { scan-tree-dump "__builtin_copysignf" "forwprop1" } } */
+/* { dg-final { scan-tree-dump "__builtin_copysign" "forwprop1" } } */
+/* { dg-final { scan-tree-dump "__builtin_copysignl" "forwprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c
new file mode 100644
index 000..96485af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */
+
+#include 
+
+float f1(float x)
+{
+  float t1 = fabsf (x);
+  float t2 = t1 / x; 
+  return t2;
+}
+ 
+double f2(double x)
+{
+  double t1 = fabs (x);
+  double t2 = t1 / x; 
+  return t2;
+}
+
+long double f3 (long double x)
+{
+  long double t1 = fabsl (x);
+  long double t2 = t1 / x; 
+  return t2;
+}
+
+/* { dg-final { scan-tree-dump "__builtin_copysignf" "forwprop1" } } */
+/* { dg-final { scan-tree-dump "__builtin_copysign" "forwprop1" } } */
+/* { dg-final { scan-tree-dump "__builtin_copysignl" "forwprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-3.c
new file mode 100644
index 000..8780b6a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */
+
+#include 
+double f(float f)
+{
+  double t1 = fabs(f); 
+  double t2 = f / t1;
+  return t2;
+}
+
+/* { dg-final { scan-tree-dump "__builtin_copysign" "forwprop1" } } */


ChangeLog
Description: Binary data


Re: Gimple loop splitting

2016-07-25 Thread Michael Matz
Hi,

On Sun, 24 Jul 2016, Andrew Pinski wrote:

> What ever happened to this patch?

It got accepted but I deferred inclusion in GCC 6 because it 
was late in the cycle then and performance results didn't show super 
improvements (only looked at cpu2006).  No regressions, but no nice 
speedups either.

> I was looking into doing this myself today but I found this patch. It is 
> stage 1 of GCC 7, it might be a good idea to get this patch into GCC.

Indeed.  If you want to performance test it on something you know where it 
should help, I'm all ears.


Ciao,
Michael.


Re: libgo patch committed: Update to 1.7rc3

2016-07-25 Thread Rainer Orth
Hi Ian,

> I have committed a patch to update libgo to the 1.7rc3 release
> candidate.  This is very close to the upcoming 1.7 release.  As usual
> with libgo updates, the patch is too large to include in this e-mail
> message.  I've appended the changes to the gccgo-specific directories.

this broke Solaris bootstrap:

/vol/gcc/src/hg/trunk/local/libgo/go/os/user/decls_solaris.go:24:46: error: 
reference to undefined identifier 'syscall.group'
 func libc_getgrgid_r(gid syscall.Gid_t, grp *syscall.group, buf *byte, buflen 
syscall.Size_t, result **syscall.group) int
  ^
/vol/gcc/src/hg/trunk/local/libgo/go/os/user/decls_solaris.go:24:104: error: 
reference to undefined identifier 'syscall.group'
 func libc_getgrgid_r(gid syscall.Gid_t, grp *syscall.group, buf *byte, buflen 
syscall.Size_t, result **syscall.group) int

^
make[4]: *** [os/user.lo] Error 1

Easily fixed by the following patch:

diff --git a/libgo/go/os/user/decls_solaris.go b/libgo/go/os/user/decls_solaris.go
--- a/libgo/go/os/user/decls_solaris.go
+++ b/libgo/go/os/user/decls_solaris.go
@@ -21,4 +21,4 @@ func libc_getpwuid_r(uid syscall.Uid_t, 
 func libc_getgrnam_r(name *byte, grp *syscall.Group, buf *byte, buflen syscall.Size_t, result **syscall.Group) int
 
 //extern __posix_getgrgid_r
-func libc_getgrgid_r(gid syscall.Gid_t, grp *syscall.group, buf *byte, buflen syscall.Size_t, result **syscall.group) int
+func libc_getgrgid_r(gid syscall.Gid_t, grp *syscall.Group, buf *byte, buflen syscall.Size_t, result **syscall.Group) int

There are also a couple of testsuite regressions I need to investigate.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH 1/6] add auto_sbitmap class

2016-07-25 Thread Jeff Law

On 07/24/2016 10:28 AM, Richard Biener wrote:

On July 24, 2016 1:44:44 PM GMT+02:00, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-07-24  Trevor Saunders  

* sbitmap.h (auto_sbitmap): New class.


OK.

Do we need a private move constructor and assignment operator here too?


jeff



Re: [patch] Get rid of stack trampolines for nested functions

2016-07-25 Thread Jeff Law

On 07/25/2016 06:56 AM, Eric Botcazou wrote:



Another UNITS_PER_WORD that I think ought to be
POINTER_SIZE/BITS_PER_UNIT.  Probably worth a pass over the patch to
look for this throughout.


Yes, it was very likely enabled only on platforms with word-sized pointers.
That's what I suspected.  Thankfully it's not a huge patch and I didn't 
see anything else that would likely trip those oddballs that use 
something other than word size pointers.






+ftrampolines
+Common Report Var(flag_trampolines) Init(0)
+Always generate trampolines for pointers to nested functions


Even for targets that use procedure descriptors natively?  I'm not sure
what's the best choice for them -- I could argue either way on this issue.


For targets that use procedure descriptors natively, it's a no-op because the
entire patch is supposed to be a no-op for them.  I'm going to rephrase.

Ah, OK, it's just misleading docs.  I'll trust you to clean that up.


Not real happy to see target stuff getting into tree-*.  Is this really

needed?


It's the target hook mentioned earlier, e.g. for targets that use procedure
descriptors natively we don't want to create generic descriptors but use the
regular "degenerated" trampolines.
Makes sense.  I see this happening more and more and it makes me wonder 
if we want to make a hard line in the sand by splitting out the key bits 
necessary to allow these kinds hook calls, but not direct access to the 
rest of the target bits.  But that's outside the scope of this patch.





I think the biggest issues are the ABI concerns and getting the target
maintainers to take a looksie.  Everything else looks pretty good.


Thanks for the review.
So i think with the various nits I pointed out the target independent 
stuff is good to go on the trunk.  Then you can just iterate with the 
target guys to get those wrapped up.


jeff


Re: [PATCH] Teach VRP to register assertions along default switch labels (PR 18046)

2016-07-25 Thread Jeff Law

On 07/24/2016 11:56 AM, Andrew Pinski wrote:

On Sun, Jul 24, 2016 at 9:49 AM, Patrick Palka  wrote:

On Sat, Jul 23, 2016 at 9:13 PM, kugan
 wrote:



On 23/07/16 05:26, Patrick Palka wrote:


This patch teaches VRP to register along a default switch label
assertions that corresponds to the anti range of each case label.



Hi Patrick,

In case of a larger switch statement with case values that cannot be
combined, you could end up with very large number of ASSERT_EXPRs in the
default basic block. I am not sure if this would have any impact on compile
time/memory usage? If that is the case you might want to punt at some
length?


Hi Kugan,

That sounds like a good idea to me.  So I gathered some info about the
number of assert_exprs inserted in the default bb during GCC bootstrap
with this patch:

Max # of asserts inserted in a default BB: 59
Median #: 2
Average #: 3.33817
Std. dev: 4.76926
95% percentile: 10

Based on the last point I guess 10 would be a good limit?



Make sure this is a parameter so someone can increase it if they want.
But 10 seems like a good default.

Agreed on both points.

jeff



Re: Init df for split pass since some target use REG_NOTE in split pattern

2016-07-25 Thread Jeff Law

On 07/25/2016 12:36 AM, Kito Cheng wrote:

Hi all:

Some target(for example i386, sh and h8300) use find_regno_note in
split pattern but df infrastructure seem not initialize at split pass,
so it may got wrong note since it's out-of-date.

ChangeLog
2016-07-25  Kito Cheng 

* gcc/recog.c (split_all_insns): Initialize df for split pass.
(split_all_insns_noflow): Likewise.

Patch was not included/attached to your message.  Please resend with the 
patch and hopefully a testcase which exhibits the problem.


jeff


Re: libgo patch committed: Update to 1.7rc3

2016-07-25 Thread Lynn A. Boger

libgo version bump to indicate the change in Go version?


On 07/22/2016 01:15 PM, Ian Lance Taylor wrote:

I have committed a patch to update libgo to the 1.7rc3 release
candidate.  This is very close to the upcoming 1.7 release.  As usual
with libgo updates, the patch is too large to include in this e-mail
message.  I've appended the changes to the gccgo-specific directories.

Ian




[committed] testsuite: add two more label_values annotations

2016-07-25 Thread Alexander Monakov
Hi,

I've applied the following patch as obvious to mark two more tests as taking
addresses of labels (this cannot work on nvptx).  The issue on pr16973.c was
uncovered in the fallout of my (now reverted) toplevel-reorder patch; the other
test, pr71494.c, fails regardless of that.

2016-07-25  Alexander Monakov  

* gcc.c-torture/execute/pr71494.c: Require label_values.
* gcc.dg/pr16973.c: Ditto.

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr71494.c 
b/gcc/testsuite/gcc.c-torture/execute/pr71494.c
index f962f2c..cee407d 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr71494.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr71494.c
@@ -1,4 +1,5 @@
 /* PR middle-end/71494 */
+/* { dg-require-effective-target label_values } */

 int
 main ()
diff --git a/gcc/testsuite/gcc.dg/pr16973.c b/gcc/testsuite/gcc.dg/pr16973.c
index e24c9f8..83274a1 100644
--- a/gcc/testsuite/gcc.dg/pr16973.c
+++ b/gcc/testsuite/gcc.dg/pr16973.c
@@ -3,6 +3,7 @@
to add back the label.   */

 /* { dg-options "" } */
+/* { dg-require-effective-target label_values } */

 void
 f (void)


Re: GCC testsuite maintenance (was: [PATCH] Fix OpenACC vector_length parsing in fortran)

2016-07-25 Thread Joseph Myers
On Fri, 15 Jul 2016, Thomas Schwinge wrote:

> > No, we want to have as little churn as possible in existing tests, the
> > general policy is to add new tests (not just for OpenACC/OpenMP, but for
> > all functionality).
> 
> Hmm, that's something I had not been aware of, and I can't find this
> covered in the documentation.  So, you're basically saying that files in
> the testsuite are write-once, and should not be maintained aside from
> fixing errors, adjusting due to optimization changes, or due to changed
> diagnostics, and the like?  (Of course, I do agree that we shouldn't

Yes, that's my view.  It makes it easier to distinguish regressions from 
new tests that fail (on some platforms) if what a test tests doesn't 
change unnecessarily.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PR70920] transform (intptr_t) x eq/ne CST to x eq/ne (typeof x) cst

2016-07-25 Thread Prathamesh Kulkarni
On 25 July 2016 at 14:32, Richard Biener  wrote:
> On Mon, 25 Jul 2016, Prathamesh Kulkarni wrote:
>
>> Hi Richard,
>> The attached patch tries to fix PR70920.
>> It adds your pattern from comment 1 in the PR
>> (with additional gating on INTEGRAL_TYPE_P to avoid regressing 
>> finalize_18.f90)
>> and second pattern, which is reverse of the first transform.
>> I needed to update ssa-dom-branch-1.c because with patch applied,
>> jump threading removed the second if (i != 0B) block.
>> The dumps with and without patch for ssa-dom-branch-1.c start
>> to differ with forwprop1:
>>
>> before:
>>  :
>>   _1 = temp_16(D)->code;
>>   _2 = _1 == 42;
>>   _3 = (int) _2;
>>   _4 = (long int) _3;
>>   temp_17 = (struct rtx_def *) _4;
>>   if (temp_17 != 0B)
>> goto ;
>>   else
>> goto ;
>>
>> after:
>> :
>>   _1 = temp_16(D)->code;
>>   _2 = _1 == 42;
>>   _3 = (int) _2;
>>   _4 = (long int) _2;
>>   temp_17 = (struct rtx_def *) _4;
>>   if (_1 == 42)
>> goto ;
>>   else
>> goto ;
>>
>> I suppose the transform is correct for above test-case ?
>>
>> Then vrp dump shows:
>>  Threaded jump 5 --> 9 to 13
>>   Threaded jump 8 --> 9 to 13
>>   Threaded jump 3 --> 9 to 13
>>   Threaded jump 12 --> 9 to 14
>> Removing basic block 9
>> basic block 9, loop depth 0
>>  pred:
>> if (i1_10(D) != 0B)
>>   goto ;
>> else
>>   goto ;
>>  succ:   10
>>  11
>>
>> So there remained two instances of if (i1_10 (D) != 0B) in dom2 dump file,
>> and hence needed to update the test-case.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> OK to commit ?
>
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3408,3 +3408,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  { CONSTRUCTOR_ELT (ctor, idx / k)->value; })
> (BIT_FIELD_REF { CONSTRUCTOR_ELT (ctor, idx / k)->value; }
>@1 { bitsize_int ((idx % k) * width); })
> +
> +/* PR70920: Transform (intptr_t)x eq/ne CST to x eq/ne (typeof x) CST.
> */
> +
> +(for cmp (ne eq)
> + (simplify
> +  (cmp (convert@2 @0) INTEGER_CST@1)
> +  (if (POINTER_TYPE_P (TREE_TYPE (@0))
> +   && INTEGRAL_TYPE_P (TREE_TYPE (@2)))
>
> you can use @1 here and omit @2.
>
> +   (cmp @0 (convert @1)
> +
> +/* Reverse of the above case:
> +   x has integral_type, CST is a pointer constant.
> +   Transform (typeof CST)x eq/ne CST to x eq/ne (typeof x) CST.  */
> +
> +(for cmp (ne eq)
> + (simplify
> +  (cmp (convert @0) @1)
> +  (if (POINTER_TYPE_P (TREE_TYPE (@1))
> +   && INTEGRAL_TYPE_P (TREE_TYPE (@0)))
> +(cmp @0 (convert @1)
>
> The second pattern lacks the INTEGER_CST on @1 so it doesn't match
> its comment.  Please do not add vertical space between pattern
> comment and pattern.
>
> Please place patterns not at the end of match.pd but where similar
> transforms are done.  Like after
>
> /* Simplify pointer equality compares using PTA.  */
> (for neeq (ne eq)
>  (simplify
>   (neeq @0 @1)
>   (if (POINTER_TYPE_P (TREE_TYPE (@0))
>&& ptrs_compare_unequal (@0, @1))
>{ neeq == EQ_EXPR ? boolean_false_node : boolean_true_node; })))
>
> please also share the (for ...) for both patterns or merge them
> by changing the condition to
>
>   (if ((POINTER_TYPE_P (TREE_TYPE (@0))
> && INTEGRAL_TYPE_P (TREE_TYPE (@1)))
>|| (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>&& POINTER_TYPE_P (TREE_TYPE (@1
>
Hi,
Done suggested changes in this version.
pr70920-4.c (test-case in patch) is now folded during  ccp instead of
forwprop after merging the
two patterns.
Passes bootstrap+test on x86_64-unknown-linux-gnu.
OK for trunk ?

Thanks,
Prathamesh
> Richard.
>
>> PS: Writing changelog entries for match.pd is a bit tedious.
>> Should we add optional names for pattern so we can refer to them by names
>> in the ChangeLog for the more complicated ones ?
>> Or maybe just use comments:
>> (simplify /* name */ ... ) -;)
>
> That will add the fun of inventing names ;)
>
>> Thanks,
>> Prathamesh
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)
diff --git a/gcc/match.pd b/gcc/match.pd
index 21bf617..6c2ec82 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2513,6 +2513,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& ptrs_compare_unequal (@0, @1))
{ neeq == EQ_EXPR ? boolean_false_node : boolean_true_node; })))
 
+/* PR70920: Transform (intptr_t)x eq/ne CST to x eq/ne (typeof x) CST.
+   and (typeof ptr_cst) x eq/ne ptr_cst to x eq/ne (typeof x) CST */
+(for cmp (ne eq)
+ (simplify
+  (cmp (convert @0) INTEGER_CST@1)
+  (if ((POINTER_TYPE_P (TREE_TYPE (@0)) && INTEGRAL_TYPE_P (TREE_TYPE (@1)))
+|| (INTEGRAL_TYPE_P (TREE_TYPE (@0)) && POINTER_TYPE_P (TREE_TYPE 
(@1
+   (cmp @0 (convert @1)
+
 /* Non-equality compare simplifications from fold_binary  */
 (for cmp (lt gt le ge)
  /* Comparisons with the highest or lowest possible integer of
diff --git a/gcc/testsuite/gcc.dg/pr70920-1.c b/gcc/testsuite/gcc.dg/pr70920-1.

Re: [Patch, testuite, committed] Fix some more tests that fail for non 32-bit int targets

2016-07-25 Thread Mike Stump
On Jul 25, 2016, at 5:00 AM, Senthil Kumar Selvaraj 
 wrote:
> 
>  The below patch fixes tests that fail for the avr target, because they
>  assume ints are atleast 32 bits wide and pointers and longs have the
>  same size.
> 
>  I've required int32plus support for one test, and for the other two,
>  I've introduced a cast to intptr_t to avoid the pointer <-> int size
>  difference warning.
> 
>  Reg tested on avr and x86_64 with no regressions. Committed as
>  obvious.

Can you use __INTPTR_TYPE__ instead?  This way, you don't need to add any 
include, and the test case will work in cross environments without any libc, 
which is at times handy.  Thanks.

See grep INTPTR gcc/gcc/testsuite/gcc.dg/*.c for how people use it, and for the 
unsigned version.

> 2016-07-25  Senthil Kumar Selvaraj  
> 
>   * gcc.dg/torture/pr69352.c (foo): Cast to intptr_t instead of long.
>   * gcc.dg/torture/pr69771.c: Require int32plus.
>   * gcc.dg/torture/pr71866.c (inb): Add cast to intptr_t.


Re: [PATCH 2/6] use auto_sbitmap in various places

2016-07-25 Thread Jeff Law

On 07/24/2016 05:44 AM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-07-24  Trevor Saunders  

* bt-load.c (compute_out): Use auto_sbitmap class.
(link_btr_uses): Likewise.
* cfganal.c (mark_dfs_back_edges): Likewise.
(post_order_compute): Likewise.
(inverted_post_order_compute): Likewise.
(pre_and_rev_post_order_compute_fn): Likewise.
(single_pred_before_succ_order): Likewise.
* cfgexpand.c (pass_expand::execute): Likewise.
* cfgloop.c (verify_loop_structure): Likewise.
* cfgloopmanip.c (fix_bb_placements): Likewise.
(remove_path): Likewise.
(update_dominators_in_loop): Likewise.
* cfgrtl.c (break_superblocks): Likewise.
* ddg.c (check_sccs): Likewise.
(create_ddg_all_sccs): Likewise.
* df-core.c (df_worklist_dataflow): Likewise.
* dse.c (dse_step3): Likewise.
* except.c (eh_region_outermost): Likewise.
* function.c (thread_prologue_and_epilogue_insns): Likewise.
* gcse.c (prune_expressions): Likewise.
(prune_insertions_deletions): Likewise.
* gimple-ssa-backprop.c (backprop::~backprop): Likewise.
* graph.c (draw_cfg_nodes_no_loops): Likewise.
* ira-lives.c (remove_some_program_points_and_update_live_ranges): 
Likewise.
* lcm.c (compute_earliest): Likewise.
(compute_farthest): Likewise.
* loop-unroll.c (unroll_loop_constant_iterations): Likewise.
(unroll_loop_runtime_iterations): Likewise.
(unroll_loop_stupid): Likewise.
* lower-subreg.c (decompose_multiword_subregs): Likewise.
* lra-lives.c: Likewise.
* lra.c (lra): Likewise.
* modulo-sched.c (schedule_reg_moves): Likewise.
(optimize_sc): Likewise.
(get_sched_window): Likewise.
(sms_schedule_by_order): Likewise.
(check_nodes_order): Likewise.
(order_nodes_of_sccs): Likewise.
(order_nodes_in_scc): Likewise.
* recog.c (split_all_insns): Likewise.
* regcprop.c (pass_cprop_hardreg::execute): Likewise.
* reload1.c (reload): Likewise.
* sched-rgn.c (haifa_find_rgns): Likewise.
(split_edges): Likewise.
(compute_trg_info): Likewise.
* sel-sched.c (init_seqno): Likewise.
* store-motion.c (remove_reachable_equiv_notes): Likewise.
* tree-into-ssa.c (update_ssa): Likewise.
* tree-ssa-live.c (live_worklist): Likewise.
* tree-ssa-loop-im.c (fill_always_executed_in): Likewise.
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely):
* Likewise.
(try_peel_loop): Likewise.
* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop):
* Likewise.
* tree-ssa-pre.c (compute_antic): Likewise.
* tree-ssa-reassoc.c (undistribute_ops_list): Likewise.
* tree-stdarg.c (reachable_at_most_once): Likewise.
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Likewise.
* var-tracking.c (vt_find_locations): Likewise.
While looking at this, I noticed some more.  Check the local sbitmaps in 
ddg.c::find_nodes_on_paths.


I wonder if building a plugin to help find this kind of thing would 
help.  Essentially what we want is TYPE which we're considering 
converting to an auto_TYPE.  We want to see the declaration, possibly an 
allocation and an explicit release.  All paths have to pass through an 
explicit release.  The object must not escape.  We can have a whitelist 
of routines where the object can be passed in as a parameter.


Given that kind of infrastructure we ought to be able to look at a type 
and say, yes, this seems to make sense to turn into an auto and here's 
all the places that need twiddling for that change.


That infrastructure could also do things like say "X escaped via call Y" 
or a path doesn't release which can further guide the process, or show 
leaks.



diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c
index 5dde66c..6e87a6f 100644
--- a/gcc/modulo-sched.c
+++ b/gcc/modulo-sched.c
So I got a bit lost tracking some of these through the call chains.  I 
could try again, taking notes along the way to make sure I properly 
verified the children don't make a copy MUST_FOLLOW.  But I'm going to 
assume you did this correctly.



diff --git a/gcc/store-motion.c b/gcc/store-motion.c
index 6d7d37f..1d504e7 100644
--- a/gcc/store-motion.c
+++ b/gcc/store-motion.c
Ick.  Who wrote remove_unreachable_equiv_notes?!?  OK, you don't need to 
answer that, I just needed to vent a little.

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index a5f3f71..6247a4c 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -2451,7 +2451,7 @@ compute_antic (void)

has_abnormal_preds seems ripe for a similar change in this function.


diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
index 5d09879..9088978 100644
--- a/gcc/var-tracking.c
+++ b/gcc/var-tracking.c
@@ -6996,7 +6

[PATCH] Replacing gcc's dependence on libiberty's fnmatch to gnulib's fnmatch

2016-07-25 Thread ayush goel
On top of the previously filed patch for importing gnulib (the link
isn’t available on the archive yet, however this contains some of the
information: 
http://gcc.1065356.n5.nabble.com/Importing-gnulib-into-the-gcc-tree-td1275807.html#a1279573)
now I have replaced another function from libiberty with the
corresponding version from gnulib.
Even though in both OSX and GNU/Linux, fnmatch is provided by the GNU
libc already, so the copy in libiberty is not used in your systems.
However since the objective is to replace whatever functions can be
leveraged by gnulib, these changes have been made.

Tested and bootstrapped on x86_64(appledarwin,linux).

Changelog:
2016-07-25 Ayush Goel 

*src/gcc/genattrtab.c: included 
*src/gcc/genautomata.c: included 
*src/gcc/Makefile.in: removed dependency from libiberty’s fnmatch
*src/gnulib/update-gnulib.sh: Added fnmatch to list of modules
imported from gnulib

Wiki link for more details: https://gcc.gnu.org/wiki/replacelibibertywithgnulib

-Ayush


importgnulib_fnmatch.patch
Description: Binary data


Re: [PATCH] Invalidate argument slots for const/pure calls

2016-07-25 Thread Jeff Law

On 07/23/2016 12:32 PM, John David Anglin wrote:

The attached patch fixes PR middle-end/71712.  We need to invalidate the 
argument slots in
const/pure calls as they are owned by the callee.  The caller can't assume they 
are preserved
across the call.  Fix is similar to that for PR71532.

Tested on hppa-unknown-linux-gnu.

Okay for trunk?

Dave
--
John David Anglin   dave.ang...@bell.net


cselib.c.d.txt


2016-08-23  John David Anglin  

PR middle-end/71732
* cselib.c (cselib_process_insn): Invalidate argument slots for
const/pure calls.

OK.
jeff



[PATCH] Replacing gcc's dependence on libiberty's md5 to gnulib's md5

2016-07-25 Thread ayush goel
On top of the previously filed patch for importing gnulib (the link
isn’t available on the archive yet, however this contains some of the
information: 
http://gcc.1065356.n5.nabble.com/Importing-gnulib-into-the-gcc-tree-td1275807.html#a1279573)
now I have replaced another function from libiberty with the
corresponding version from gnulib.

Tested and bootstrapped on x86_64(appledarwin,linux).

Changelog:
2016-07-25 Ayush Goel 

*src/gcc/Makefile.in: replaced header path for md5 from libiberty to
the one in gnulib
*src/gnulib/update-gnulib.sh: imported md5 from gnulib

Wiki link for more details: https://gcc.gnu.org/wiki/replacelibibertywithgnulib

-Ayush


importgnulib_md5.patch
Description: Binary data


Re: [C PATCH] Don't mark C99 inline functions as always_inline (PR c/71969)

2016-07-25 Thread Jeff Law

On 07/23/2016 10:38 AM, Richard Biener wrote:

On July 22, 2016 5:09:07 PM GMT+02:00, Jakub Jelinek  wrote:

Hi!

As Richard reported on IRC, we are marking C99 inline (without extern)
functions as effectively always_inline, even when the intent has been
to do
this only for the GNU extern inline functions (i.e. -fgnu89-inline
extern
inline or extern inline __attribute__((gnu_inline))).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
for
trunk?  The alias-11.c change is needed because the inliner decides not
to
inline new_cfi, and as it uses since -std=c11 default C99 inline
semantics
without extern, the body isn't emitted.


LGTM,
Just a note, Jakub is on PTO this week and next week -- he'll commit 
when he returns.  Or if this is time critical, anyone with commit 
privileges can commit on Jakub's behalf.


Jeff



Re: [PATCH] Replacing gcc's dependence on libiberty's fnmatch to gnulib's fnmatch

2016-07-25 Thread ayush goel
The link for that patch importing gnulib inside gcc’s tree:
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01302.html

Apologies for the confusion.

On 25 July 2016 at 10:48:20 PM, ayush goel (ayushgoel1...@gmail.com) wrote:
>
> On top of the previously filed patch for importing gnulib (the link isn’t 
> available on
> the archive yet, however this contains some of the information: 
> http://gcc.1065356.n5.nabble.com/Importing-gnulib-into-the-gcc-tree-td1275807.html#a1279573)
> now I have replaced another function from libiberty with the corresponding 
> version
> from gnulib.
> Even though in both OSX and GNU/Linux, fnmatch is provided by the GNU libc 
> already, so
> the copy in libiberty is not used in your systems. However since the 
> objective is to replace
> whatever functions can be leveraged by gnulib, these changes have been made.
>
> Tested and bootstrapped on x86_64(appledarwin,linux).
>
> Changelog:
> 2016-07-25 Ayush Goel
>
> *src/gcc/genattrtab.c: included
> *src/gcc/genautomata.c: included
> *src/gcc/Makefile.in: removed dependency from libiberty’s fnmatch
> *src/gnulib/update-gnulib.sh: Added fnmatch to list of modules imported from 
> gnulib
>
> Wiki link for more details: 
> https://gcc.gnu.org/wiki/replacelibibertywithgnulib
>
> -Ayush
>
>
>


Re: [PATCH] Replacing gcc's dependence on libiberty's md5 to gnulib's md5

2016-07-25 Thread ayush goel
Link for the patch importing gnulib inside gcc’s tree:
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01302.html

Apologies for the confusion

-Ayush

On 25 July 2016 at 10:52:00 PM, ayush goel (ayushgoel1...@gmail.com) wrote:
> On top of the previously filed patch for importing gnulib (the link isn’t 
> available on
> the archive yet, however this contains some of the information: 
> http://gcc.1065356.n5.nabble.com/Importing-gnulib-into-the-gcc-tree-td1275807.html#a1279573)
> now I have replaced another function from libiberty with the corresponding 
> version
> from gnulib.
>
> Tested and bootstrapped on x86_64(appledarwin,linux).
>
> Changelog:
> 2016-07-25 Ayush Goel
>
> *src/gcc/Makefile.in: replaced header path for md5 from libiberty to the one 
> in gnulib
> *src/gnulib/update-gnulib.sh: imported md5 from gnulib
>
> Wiki link for more details: 
> https://gcc.gnu.org/wiki/replacelibibertywithgnulib
>
> -Ayush
>
>
>


Re: [PATCH] predict.c: merge multi-edges

2016-07-25 Thread Jeff Law

On 07/22/2016 05:38 AM, Martin Liška wrote:

On 07/22/2016 12:53 PM, Segher Boessenkool wrote:

Hi Martin,

On Fri, Jul 22, 2016 at 10:17:51AM +0200, Martin Liška wrote:

 /* We can not predict the probabilities of outgoing edges of bb.  Set them
-   evenly and hope for the best.  */
+   evenly and hope for the best.  If UNLIKELY_EDGES is not null, distribute
+   evel probability for all edges not mentioned in the set.  These edges
+   are given PROB_VERY_UNLIKELY probability.  */


Typo ("evel");


Hello Segher.

The typo is fixed.




+  unsigned unlikely_count = unlikely_edges ? unlikely_edges->elements () : 0;
+
   FOR_EACH_EDGE (e, ei, bb->succs)
 if (!(e->flags & (EDGE_EH | EDGE_FAKE)))
   nedges ++;
+
+  unsigned c = nedges - unlikely_count;


What prevents c from becoming 0?  The sum of outgoing edge probabilities
will be very small then (unless there are very many such noreturn edges,
then the sum is too big, instead).


You are right, I'm sending second version where I set even probabilities
in case of all edges are PROB_VERY_UNLIKELY. However, I was unable to come
up with a test-case for that:

int main(int argc, char **argv)
{
  switch (argc)
{
case 1:
  __builtin_unreachable();
case 4:
  __builtin_unreachable();
default:
  __builtin_unreachable();
}

  return 10;
}

No predictors are set as 'has_return_edges' == false in 
tree_bb_level_predictions.

Can you turn this into a test as well?

With that change this patch is OK for the trunk.

jeff



Re: [PATCH] correct atomic_compare_exchange_n return type (c++/71675)

2016-07-25 Thread Jeff Law

On 07/25/2016 05:03 AM, Renlin Li wrote:

Hi Martin,

I observed the following error:

ERROR: gcc.dg/atomic/pr71675.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects : syntax error in target selector "target c11" for
" dg-do 3 compile { target c11 } "

It seems we don't have a c11 effective target check available
in dejagnu target-supports.exp.
ISTM like we should either add a c11 effective target check, or force 
the compiler into c11 mode via -std=c11.


Jeff



Re: [PATCH, vec-tails 07/10] Support loop epilogue combining

2016-07-25 Thread Jeff Law

On 07/22/2016 05:36 AM, Richard Biener wrote:

The thing that needs work I think is re-running of if-conversion.
I wonder if we could revamp if-conversion to work on a subset of the 
CFG?   I can see that potentially being useful in other contexts.  Would 
that work for you Richi?


We've already got Bin doing that for DOM...



Also I don't like at
all that we have many variants of vectorizing but somehow the decision which one
to choose is rather unclear.  The way the epilogue vectorization code
is hooked in
is rather awkward and bound to be a maintainance burden (well, maybe a
small one).
I think it's going to be a small one.  I suspect that we really need 
another architecture with masking capabilities to really be able to see 
how the costing models ought to work and bring sanity to that decision.




And last, I double there is a case for a masked vectorized loop - I can bet that
doing a non-masked vectorized loop plus a masked epilogue (with no iteration
then!) will be always faster unless you hit the window of very few iterations
(or optimizing for size - in which case vectorizing is questionable on
its own and
disabled IIRC).
Ilya, does this case make a noticeable difference with the ICC 
implementation?


Jeff



Re: [PATCH] correct atomic_compare_exchange_n return type (c++/71675)

2016-07-25 Thread Jason Merrill
On Mon, Jul 25, 2016 at 1:56 PM, Jeff Law  wrote:
> On 07/25/2016 05:03 AM, Renlin Li wrote:
>>
>> Hi Martin,
>>
>> I observed the following error:
>>
>> ERROR: gcc.dg/atomic/pr71675.c   -O2 -flto -fuse-linker-plugin
>> -fno-fat-lto-objects : syntax error in target selector "target c11" for
>> " dg-do 3 compile { target c11 } "
>>
>> It seems we don't have a c11 effective target check available
>> in dejagnu target-supports.exp.
>
> ISTM like we should either add a c11 effective target check, or force the
> compiler into c11 mode via -std=c11.

I think the C testsuite also doesn't run in multiple modes like C++,
so adding -std=c11 is probably the better choice.

Jason


Re: C++ PATCH for c++/71913 (copy elision choices)

2016-07-25 Thread Jason Merrill
On Mon, Jul 25, 2016 at 7:15 AM, Renlin Li  wrote:
> Hi Jason,
>
> On 22/07/16 04:01, Jason Merrill wrote:
>>
>> 71913 is a case where unsafe_copy_elision_p was being too
>> conservative. We can allow copy elision in a new expression; the only
>> way we could end up initializing a base subobject without knowing it
>> would be through a placement new, in which case we would already be
>> using the wrong (complete object) constructor, so copy elision doesn't
>> make it any worse.
>>
>
>> diff --git a/gcc/testsuite/g++.dg/init/elide5.C
>> b/gcc/testsuite/g++.dg/init/elide5.C
>> new file mode 100644
>> index 000..0a9978c
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/init/elide5.C
>> @@ -0,0 +1,27 @@
>> +// PR c++/71913
>> +// { dg-do link { target c++11 } }
>> +
>> +void* operator new(unsigned long, void* p) { return p; }
>
>
> g++.dg/init/elide5.C fails on target whose SIZE_TYPE is not "long unsigned
> int".
>
> testsuite/g++.dg/init/elide5.C:4:42: error: 'operator new' takes type
> 'size_t' ('unsigned int') as first parameter [-fpermissive]
>
> I have checked, for most 32 bit architectures or ABI, the SIZE_TYPE is
> "unsigned int". arm is one of them.
>
> To make this test case portable, will __SIZE_TYPE__ be better in this case,
> instead of "unsigned long" as first argument of new operator?

Thanks, I'll fix that shortly.

Jason


Re: GCC testsuite maintenance (was: [PATCH] Fix OpenACC vector_length parsing in fortran)

2016-07-25 Thread Mike Stump
On Jul 25, 2016, at 9:37 AM, Joseph Myers  wrote:
> 
> On Fri, 15 Jul 2016, Thomas Schwinge wrote:
> 
>>> No, we want to have as little churn as possible in existing tests, the
>>> general policy is to add new tests (not just for OpenACC/OpenMP, but for
>>> all functionality).
>> 
>> Hmm, that's something I had not been aware of, and I can't find this
>> covered in the documentation.  So, you're basically saying that files in
>> the testsuite are write-once, and should not be maintained aside from
>> fixing errors, adjusting due to optimization changes, or due to changed
>> diagnostics, and the like?  (Of course, I do agree that we shouldn't
> 
> Yes, that's my view.  It makes it easier to distinguish regressions from 
> new tests that fail (on some platforms) if what a test tests doesn't 
> change unnecessarily.

Right.  Roughly the test suite is a monotonic, slow moving ever increasing 
mass.  Generally we favor new files for new tests and little to no recoding or 
rearrangement of existing stuff.  We do some limited forms for maintenance, for 
example, intptr_t to make a less portable test case, to be more portable, just 
to pick a random recent edit; but these tend to be very minor and very narrowly 
focused.

C++ PATCH for c++/71972 (ICE with constexpr array element self-modification)

2016-07-25 Thread Jason Merrill
Here, when we start to modify an array element, we add a
constructor_elt for it to the CONSTRUCTOR.  When calculating the value
to store there we found this empty element and tried to use it,
leading to an ICE because it was NULL.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit b472cebfc4ce1bc44d2d328f5f693322e04c905b
Author: Jason Merrill 
Date:   Mon Jul 25 09:54:04 2016 -0400

PR c++/71972 - constexpr array self-modification

* constexpr.c (cxx_eval_array_reference): Handle looking for the
value of an element we're currently modifying.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index f139260..47fb39b 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -2129,8 +2129,32 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
   else
 found = (i < len);
 
-  if (!found)
+  if (found)
 {
+  tree r;
+  if (TREE_CODE (ary) == CONSTRUCTOR)
+   r = (*CONSTRUCTOR_ELTS (ary))[i].value;
+  else if (TREE_CODE (ary) == VECTOR_CST)
+   r = VECTOR_CST_ELT (ary, i);
+  else if (elem_nchars == 1)
+   r = build_int_cst (cv_unqualified (TREE_TYPE (TREE_TYPE (ary))),
+  TREE_STRING_POINTER (ary)[i]);
+  else
+   {
+ tree type = cv_unqualified (TREE_TYPE (TREE_TYPE (ary)));
+ r = native_interpret_expr (type, (const unsigned char *)
+TREE_STRING_POINTER (ary)
++ i * elem_nchars, elem_nchars);
+   }
+  if (r)
+   /* Don't VERIFY_CONSTANT here.  */
+   return r;
+
+  /* Otherwise the element doesn't have a value yet.  */
+}
+
+  /* Not found.  */
+
   if (TREE_CODE (ary) == CONSTRUCTOR
   && CONSTRUCTOR_NO_IMPLICIT_ZERO (ary))
 {
@@ -2150,23 +2174,6 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
   overflow_p);
 }
 
-  if (TREE_CODE (ary) == CONSTRUCTOR)
-return (*CONSTRUCTOR_ELTS (ary))[i].value;
-  else if (TREE_CODE (ary) == VECTOR_CST)
-return VECTOR_CST_ELT (ary, i);
-  else if (elem_nchars == 1)
-return build_int_cst (cv_unqualified (TREE_TYPE (TREE_TYPE (ary))),
- TREE_STRING_POINTER (ary)[i]);
-  else
-{
-  tree type = cv_unqualified (TREE_TYPE (TREE_TYPE (ary)));
-  return native_interpret_expr (type, (const unsigned char *)
- TREE_STRING_POINTER (ary)
- + i * elem_nchars, elem_nchars);
-}
-  /* Don't VERIFY_CONSTANT here.  */
-}
-
 /* Subroutine of cxx_eval_constant_expression.
Attempt to reduce a field access of a value of class type.  */
 
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-array5.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-array5.C
new file mode 100644
index 000..3abdd84
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-array5.C
@@ -0,0 +1,13 @@
+// PR c++/71972
+// { dg-do compile { target c++14 } }
+
+typedef int size_t;
+template  struct S {
+  template  constexpr S(const char (&)[M]) : data{} {
+data[0] = data[0];
+  }
+  char data[N];
+};
+int main() {
+  constexpr S<1> s1("");
+}


C++ PATCH for c++/65970 (constexpr infinite loop)

2016-07-25 Thread Jason Merrill
An infinite loop in a constexpr function led to a compiler hang.
Fixed by putting an upper bound on loop iterations in constexpr
evaluation.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 6c93c22a799e5ea5e1cdfe661476cf123ed1a4e8
Author: Jason Merrill 
Date:   Mon Jul 25 10:31:10 2016 -0400

PR c++/65970 - constexpr infinite loop

gcc/c-family/
* c.opt (fconstexpr-loop-limit): New.
gcc/cp/
* constexpr.c (cxx_eval_loop_expr): Count iterations.
* cp-gimplify.c (genericize_cp_loop): Use start_locus even for
infinite loops.

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 8c70152..a5358ed 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1174,6 +1174,10 @@ fconstexpr-depth=
 C++ ObjC++ Joined RejectNegative UInteger Var(max_constexpr_depth) Init(512)
 -fconstexpr-depth= Specify maximum constexpr recursion depth.
 
+fconstexpr-loop-limit=
+C++ ObjC++ Joined RejectNegative UInteger Var(constexpr_loop_limit) 
Init(262144)
+-fconstexpr-loop-limit=Specify maximum constexpr loop 
iteration count.
+
 fdebug-cpp
 C ObjC C++ ObjC++
 Emit debug annotations during preprocessing.
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 47fb39b..6bcb41a 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3468,6 +3468,7 @@ cxx_eval_loop_expr (const constexpr_ctx *ctx, tree t,
   constexpr_ctx new_ctx = *ctx;
 
   tree body = TREE_OPERAND (t, 0);
+  int count = 0;
   do
 {
   hash_set save_exprs;
@@ -3480,6 +3481,16 @@ cxx_eval_loop_expr (const constexpr_ctx *ctx, tree t,
   for (hash_set::iterator iter = save_exprs.begin();
   iter != save_exprs.end(); ++iter)
new_ctx.values->remove (*iter);
+  if (++count >= constexpr_loop_limit)
+   {
+ if (!ctx->quiet)
+   error_at (EXPR_LOC_OR_LOC (t, input_location),
+ "constexpr loop iteration count exceeds limit of %d "
+ "(use -fconstexpr-loop-limit= to increase the limit)",
+ constexpr_loop_limit);
+ *non_constant_p = true;
+ break;
+   }
 }
   while (!returns (jump_target) && !breaks (jump_target) && !*non_constant_p);
 
diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 59953a6..d9f7cea 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -264,14 +264,7 @@ genericize_cp_loop (tree *stmt_p, location_t start_locus, 
tree cond, tree body,
loop = stmt_list;
 }
   else
-{
-  location_t loc = start_locus;
-  if (!cond || integer_nonzerop (cond))
-   loc = EXPR_LOCATION (expr_first (body));
-  if (loc == UNKNOWN_LOCATION)
-   loc = start_locus;
-  loop = build1_loc (loc, LOOP_EXPR, void_type_node, stmt_list);
-}
+loop = build1_loc (start_locus, LOOP_EXPR, void_type_node, stmt_list);
 
   stmt_list = NULL;
   append_to_statement_list (loop, &stmt_list);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9e0f07e..79c842d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -191,7 +191,8 @@ in the following sections.
 @xref{C++ Dialect Options,,Options Controlling C++ Dialect}.
 @gccoptlist{-fabi-version=@var{n}  -fno-access-control @gol
 -fargs-in-order=@var{n} -fcheck-new @gol
--fconstexpr-depth=@var{n}  -ffriend-injection @gol
+-fconstexpr-depth=@var{n} -fconstexpr-loop-limit=@var{n} @gol
+-ffriend-injection @gol
 -fno-elide-constructors @gol
 -fno-enforce-eh-specs @gol
 -ffor-scope  -fno-for-scope  -fno-gnu-keywords @gol
@@ -2265,6 +2266,12 @@ to @var{n}.  A limit is needed to detect endless 
recursion during
 constant expression evaluation.  The minimum specified by the standard
 is 512.
 
+@item -fconstexpr-loop-limit=@var{n}
+@opindex fconstexpr-loop-limit
+Set the maximum number of iterations for a loop in C++14 constexpr functions
+to @var{n}.  A limit is needed to detect infinite loops during
+constant expression evaluation.  The default is 262144 (1<<18).
+
 @item -fdeduce-init-list
 @opindex fdeduce-init-list
 Enable deduction of a template type parameter as
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-loop6.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-loop6.C
new file mode 100644
index 000..e49e531
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-loop6.C
@@ -0,0 +1,11 @@
+// PR c++/65970
+// { dg-do compile { target c++14 } }
+// { dg-options -fconstexpr-loop-limit=5 }
+
+constexpr int foo() {
+  while (true) // { dg-error "-fconstexpr-loop-limit" }
+;
+  return 0;
+}
+
+constexpr int i = foo();   // { dg-message "" }


Re: [PATCH, vec-tails 07/10] Support loop epilogue combining

2016-07-25 Thread Richard Biener
On July 25, 2016 8:01:17 PM GMT+02:00, Jeff Law  wrote:
>On 07/22/2016 05:36 AM, Richard Biener wrote:
>> The thing that needs work I think is re-running of if-conversion.
>I wonder if we could revamp if-conversion to work on a subset of the 
>CFG?   I can see that potentially being useful in other contexts. 
>Would 
>that work for you Richi?

Well, you need to make it not need post-dominators or preserve them (or compute 
"post-dominators" on SESE regions).

What doesn't work with the idea to clone the epilogue using 
__built-in_vectorized()
For the if- vs. Not if-converted loop?

Richard.

>We've already got Bin doing that for DOM...
>
>
>> Also I don't like at
>> all that we have many variants of vectorizing but somehow the
>decision which one
>> to choose is rather unclear.  The way the epilogue vectorization code
>> is hooked in
>> is rather awkward and bound to be a maintainance burden (well, maybe
>a
>> small one).
>I think it's going to be a small one.  I suspect that we really need 
>another architecture with masking capabilities to really be able to see
>
>how the costing models ought to work and bring sanity to that decision.
>
>>
>> And last, I double there is a case for a masked vectorized loop - I
>can bet that
>> doing a non-masked vectorized loop plus a masked epilogue (with no
>iteration
>> then!) will be always faster unless you hit the window of very few
>iterations
>> (or optimizing for size - in which case vectorizing is questionable
>on
>> its own and
>> disabled IIRC).
>Ilya, does this case make a noticeable difference with the ICC 
>implementation?
>
>Jeff




Re: Init df for split pass since some target use REG_NOTE in split pattern

2016-07-25 Thread Richard Biener
On July 25, 2016 5:56:29 PM GMT+02:00, Jeff Law  wrote:
>On 07/25/2016 12:36 AM, Kito Cheng wrote:
>> Hi all:
>>
>> Some target(for example i386, sh and h8300) use find_regno_note in
>> split pattern but df infrastructure seem not initialize at split
>pass,
>> so it may got wrong note since it's out-of-date.
>>
>> ChangeLog
>> 2016-07-25  Kito Cheng 
>>
>> * gcc/recog.c (split_all_insns): Initialize df for split
>pass.
>> (split_all_insns_noflow): Likewise.
>>
>Patch was not included/attached to your message.  Please resend with
>the 
>patch and hopefully a testcase which exhibits the problem.

And maybe back ends shouldn't look at notes in their splitters?

Richard.

>jeff




C++ PATCH for c++/71833 (member template with two template parameter packs)

2016-07-25 Thread Jason Merrill
My patch for 54440 coerce_template_parameter_pack mysteriously assumed
that the index of the first argument we want to pack into the trailing
pack somehow matched the index of the parameter pack in the parameter
list.  I don't know what I was thinking.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit f4823136e2e9d96fcf25dbb94fcf18c511533dc0
Author: Jason Merrill 
Date:   Mon Jul 25 13:06:33 2016 -0400

PR c++/71833 - member template with two parameter packs

* pt.c (coerce_template_parameter_pack): Fix logic for
pack index.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index a61f1c8..7dd6b25 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -7403,11 +7403,12 @@ coerce_template_parameter_pack (tree parms,
 
   /* Convert the remaining arguments, which will be a part of the
  parameter pack "parm".  */
+  int first_pack_arg = arg_idx;
   for (; arg_idx < nargs; ++arg_idx)
 {
   tree arg = TREE_VEC_ELT (inner_args, arg_idx);
   tree actual_parm = TREE_VALUE (parm);
-  int pack_idx = arg_idx - parm_idx;
+  int pack_idx = arg_idx - first_pack_arg;
 
   if (packed_parms)
 {
@@ -7436,12 +7437,12 @@ coerce_template_parameter_pack (tree parms,
   TREE_VEC_ELT (packed_args, pack_idx) = arg;
 }
 
-  if (arg_idx - parm_idx < TREE_VEC_LENGTH (packed_args)
+  if (arg_idx - first_pack_arg < TREE_VEC_LENGTH (packed_args)
   && TREE_VEC_LENGTH (packed_args) > 0)
 {
   if (complain & tf_error)
error ("wrong number of template arguments (%d, should be %d)",
-  arg_idx - parm_idx, TREE_VEC_LENGTH (packed_args));
+  arg_idx - first_pack_arg, TREE_VEC_LENGTH (packed_args));
   return error_mark_node;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-nested1.C 
b/gcc/testsuite/g++.dg/cpp0x/variadic-nested1.C
new file mode 100644
index 000..abfb49a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic-nested1.C
@@ -0,0 +1,9 @@
+// PR c++/71833
+// { dg-do compile { target c++11 } }
+
+template < typename ... Ts > struct A 
+{
+  template < Ts ..., typename ... Us > struct B {};
+};
+
+A <>::B < int > e;


C++ PATCH for c++/71837 (pack expansion in init-capture)

2016-07-25 Thread Jason Merrill
This useless testcase uses a pack expansion as the parenthesized
initializer for an init-capture, which is only valid when the
parameter pack has a single argument.  This patch avoids an ICE by
leaving the TREE_LIST around the pack expansion and dealing with it at
substitution time.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 6929f98b018d844d310ec65d07f5f8ce73f98292
Author: Jason Merrill 
Date:   Mon Jul 25 12:13:27 2016 -0400

PR c++/71837 - pack expansion in init-capture

* lambda.c (add_capture): Leave a pack expansion in a TREE_LIST.
(build_lambda_object): Call build_x_compound_expr_from_list.
* pt.c (tsubst) [DECLTYPE_TYPE]: Likewise.

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 4d6d80fe..9bf8a56 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -79,6 +79,10 @@ build_lambda_object (tree lambda_expr)
  goto out;
}
 
+  if (TREE_CODE (val) == TREE_LIST)
+   val = build_x_compound_expr_from_list (val, ELK_INIT,
+  tf_warning_or_error);
+
   if (DECL_P (val))
mark_used (val);
 
@@ -449,7 +453,9 @@ add_capture (tree lambda, tree id, tree orig_init, bool 
by_reference_p,
   variadic = true;
 }
 
-  if (TREE_CODE (initializer) == TREE_LIST)
+  if (TREE_CODE (initializer) == TREE_LIST
+  /* A pack expansion might end up with multiple elements.  */
+  && !PACK_EXPANSION_P (TREE_VALUE (initializer)))
 initializer = build_x_compound_expr_from_list (initializer, ELK_INIT,
   tf_warning_or_error);
   type = TREE_TYPE (initializer);
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index a61f1c8..30c74ae 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13619,6 +13619,18 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
  /*function_p*/false,
  /*integral_constant_expression*/false);
 
+   if (DECLTYPE_FOR_INIT_CAPTURE (t))
+ {
+   if (type == NULL_TREE)
+ {
+   if (complain & tf_error)
+ error ("empty initializer in lambda init-capture");
+   type = error_mark_node;
+ }
+   else if (TREE_CODE (type) == TREE_LIST)
+ type = build_x_compound_expr_from_list (type, ELK_INIT, complain);
+ }
+
--cp_unevaluated_operand;
--c_inhibit_evaluation_warnings;
 
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-init15.C 
b/gcc/testsuite/g++.dg/cpp1y/lambda-init15.C
new file mode 100644
index 000..8b3e520
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-init15.C
@@ -0,0 +1,13 @@
+// PR c++/71837
+// { dg-do compile { target c++14 } }
+
+template < typename ... Ts > void f (Ts ... args)
+{
+  [ts (args ...)] { return ts; } ();
+}
+
+int main ()
+{ 
+  f (0);
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-init15a.C 
b/gcc/testsuite/g++.dg/cpp1y/lambda-init15a.C
new file mode 100644
index 000..166d650
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-init15a.C
@@ -0,0 +1,14 @@
+// PR c++/71837
+// { dg-do compile { target c++14 } }
+
+template < typename ... Ts > void f (Ts ... args)
+{
+  [ts (args ...)] { return ts; } (); // { dg-error "" }
+}
+
+int main ()
+{ 
+  f ();// { dg-message "required" }
+  f (1,2); // { dg-message "required" }
+  return 0;
+}


[committed] Fix selftest::temp_source_file ctor

2016-07-25 Thread David Malcolm
Temporary source files created by the selftests weren't generated
correctly if they contained a "%" character (seen whilst writing
selftests for the locations-within-string-literals patch I'm
working on).

Root cause was a buggy fprintf in the selftest::temp_source_file ctor.

Fixed thusly.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu;
committed to trunk (as r238732), under the "obvious" rule.

gcc/ChangeLog:
* input.c (selftest::temp_source_file::temp_source_file): Fix
missing "%s" in fprintf.
---
 gcc/input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/input.c b/gcc/input.c
index a916597..47845d00 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -1175,7 +1175,7 @@ temp_source_file::temp_source_file (const location &loc, 
const char *suffix,
   if (!out)
 ::selftest::fail_formatted (loc, "unable to open tempfile: %s",
m_filename);
-  fprintf (out, content);
+  fprintf (out, "%s", content);
   fclose (out);
 }
 
-- 
1.8.5.3



Re: Init df for split pass since some target use REG_NOTE in split pattern

2016-07-25 Thread Jeff Law

On 07/25/2016 12:35 PM, Richard Biener wrote:

On July 25, 2016 5:56:29 PM GMT+02:00, Jeff Law  wrote:

On 07/25/2016 12:36 AM, Kito Cheng wrote:

Hi all:

Some target(for example i386, sh and h8300) use find_regno_note in
split pattern but df infrastructure seem not initialize at split

pass,

so it may got wrong note since it's out-of-date.

ChangeLog
2016-07-25  Kito Cheng 

* gcc/recog.c (split_all_insns): Initialize df for split

pass.

(split_all_insns_noflow): Likewise.


Patch was not included/attached to your message.  Please resend with
the
patch and hopefully a testcase which exhibits the problem.


And maybe back ends shouldn't look at notes in their splitters?
It's certainly not ideal.But I'm not sure what alternatives we have 
and it's bee supported for a very long time.


For example on the H8, there is no shift-by-a-variable-amount.  Instead 
you have to emit it as a loop.  However, we don't expose the loop until 
late in the RTL pipeline (via a splitter).  When we split a variable 
shift into a loop, one of the things we want to know is whether or not 
the register holding the shift count dies or not.  If it does not die, 
we'll just use it rather than copying it elsewhere.


Similarly we can generate a more efficient split for something like 
(ior (ashift (X) (const_int)) for certain integers when we know the 
input dies.   Or for some (add (mult)) expressions.


Jeff




[PATCH] Remove dead call to gimple_bb in slsr_process_phi

2016-07-25 Thread Bill Schmidt
Hi, 

As reported on the gcc mailing list, slsr_process_phi contains a dead call
to gimple_bb.  I looked into why this wasn't noticed before, and concluded
that we don't actually need the call.  To reach this point, the phi argument
must not correspond to a strength-reduction candidate in the table, and must
not be a function argument.  Since even simple copies and casts create entries
in the candidate table, the definition of the phi arg has a non-trivial RHS
that is not of a form useful to strength reduction, and thus there is no
reason to proceed; the phi is itself therefore not a strength reduction
candidate.

The existing logic leaves arg_bb with its initial NULL value, which correctly
terminates processing for this phi.  Thus I am just changing the logic to
make this explicit, and we don't need the call to gimple_bb at all.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
Is this ok for trunk?  (The error is harmless, so I see no reason for a
backport.)

Thanks!
Bill


2016-07-25  Bill Schmidt  

* gimple-ssa-strength-reduction.c (slsr_process_phi): Remove dead
and unnecessary call to gimple_bb.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 238719)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -785,14 +785,10 @@ slsr_process_phi (gphi *phi, bool speed)
savings += stmt_cost (arg_stmt, speed);
}
}
-  else
+  else if (SSA_NAME_IS_DEFAULT_DEF (arg))
{
  derived_base_name = arg;
-
- if (SSA_NAME_IS_DEFAULT_DEF (arg))
-   arg_bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
- else
-   gimple_bb (SSA_NAME_DEF_STMT (arg));
+ arg_bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
}
 
   if (!arg_bb || arg_bb->loop_father != cand_loop)



[PATCH, rs6000] Correct effective target in gcc.target/powerpc/pr63354.c

2016-07-25 Thread Bill Schmidt
Hi,

The subject test case uses the -mprofile-kernel option, which is not
permitted in 32-bit mode.  It currently restricts the effective target
to powerpc64, which is not sufficient when using -m32 on 64-bit hardware.
This patch changes the effective target restriction to lp64 to correct
this.

Tested on powerpc64-unknown-linux-gnu, verified.  Ok for trunk?

Thanks,
Bill


2016-07-25  Bill Schmidt  

* gcc.target/powerpc/pr63354.c: Require lp64 since
-mprofile-kernel is not legal with -m32.


Index: gcc/testsuite/gcc.target/powerpc/pr63354.c
===
--- gcc/testsuite/gcc.target/powerpc/pr63354.c  (revision 236120)
+++ gcc/testsuite/gcc.target/powerpc/pr63354.c  (working copy)
@@ -3,7 +3,7 @@
 
 /* { dg-do compile { target { powerpc*-*-linux* } } } */
 /* { dg-options "-O2 -pg -mprofile-kernel" } */
-/* { dg-require-effective-target powerpc64 } */
+/* { dg-require-effective-target lp64 } */
 /* { dg-final { scan-assembler-not "mtlr" } } */
 
 int foo(void)



Re: Gimple loop splitting v2

2016-07-25 Thread Andrew Pinski
On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz  wrote:
> Hi,
>
> On Tue, 1 Dec 2015, Jeff Law wrote:
>
>> > So, okay for trunk?
>> -ENOPATCH
>
> Sigh :)
> Here it is.


I found one problem with it.
Take:
void f(int *a, int M, int *b)
{
  for(int i = 0; i <= M; i++)
{
   if (i < M)
a[i] = i;
}
}
 CUT ---
There are two issues with the code as below.  The outer most loop's
aux is still set which causes the vectorizer not to vector the loop.
The other issue is I need to run pass_scev_cprop after pass_loop_split
to get the induction variable usage after the loop gone so the
vectorizer will work.

Something like (note this is copy and paste from a terminal):
diff --git a/gcc/passes.def b/gcc/passes.def
index c327900..e8d6ea6 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -262,8 +262,8 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_copy_prop);
  NEXT_PASS (pass_dce);
  NEXT_PASS (pass_tree_unswitch);
- NEXT_PASS (pass_scev_cprop);
  NEXT_PASS (pass_loop_split);
+ NEXT_PASS (pass_scev_cprop);
  NEXT_PASS (pass_record_bounds);
  NEXT_PASS (pass_loop_distribution);
  NEXT_PASS (pass_copy_prop);
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 5411530..e72ef19 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -592,7 +592,11 @@ tree_ssa_split_loops (void)

   gcc_assert (scev_initialized_p ());
   FOR_EACH_LOOP (loop, 0)
-loop->aux = NULL;
+{
+  loop->aux = NULL;
+  if (loop_outer (loop))
+   loop_outer (loop)->aux = NULL;
+}

   /* Go through all loops starting from innermost.  */
   FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
@@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
 }

   FOR_EACH_LOOP (loop, 0)
-loop->aux = NULL;
+{
+  loop->aux = NULL;
+  if (loop_outer (loop))
+   loop_outer (loop)->aux = NULL;
+}

   if (changed)
 return TODO_cleanup_cfg;
-  CUT -

Thanks,
Andrew


>
>
> Ciao,
> Michael.
> * common.opt (-fsplit-loops): New flag.
> * passes.def (pass_loop_split): Add.
> * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
> (enable_fdo_optimizations): Add loop splitting.
> * timevar.def (TV_LOOP_SPLIT): Add.
> * tree-pass.h (make_pass_loop_split): Declare.
> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
> * tree-ssa-loop-split.c: New file.
> * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
> * doc/invoke.texi (fsplit-loops): Document.
> * doc/passes.texi (Loop optimization): Add paragraph about loop
> splitting.
>
> testsuite/
> * gcc.dg/loop-split.c: New test.
>
> Index: common.opt
> ===
> --- common.opt  (revision 231115)
> +++ common.opt  (working copy)
> @@ -2453,6 +2457,10 @@ funswitch-loops
>  Common Report Var(flag_unswitch_loops) Optimization
>  Perform loop unswitching.
>
> +fsplit-loops
> +Common Report Var(flag_split_loops) Optimization
> +Perform loop splitting.
> +
>  funwind-tables
>  Common Report Var(flag_unwind_tables) Optimization
>  Just generate unwind tables for exception handling.
> Index: passes.def
> ===
> --- passes.def  (revision 231115)
> +++ passes.def  (working copy)
> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
>   NEXT_PASS (pass_dce);
>   NEXT_PASS (pass_tree_unswitch);
>   NEXT_PASS (pass_scev_cprop);
> + NEXT_PASS (pass_loop_split);
>   NEXT_PASS (pass_record_bounds);
>   NEXT_PASS (pass_loop_distribution);
>   NEXT_PASS (pass_copy_prop);
> Index: opts.c
> ===
> --- opts.c  (revision 231115)
> +++ opts.c  (working copy)
> @@ -532,6 +532,7 @@ static const struct default_options defa
> regardless of them being declared inline.  */
>  { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
>  { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 
> 1 },
> +{ OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
>  { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
>  { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
>  { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
> @@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt
>  opts->x_flag_ipa_cp_alignment = value;
>if (!opts_set->x_flag_predictive_commoning)
>  opts->x_flag_predictive_commoning = value;
> +  if (!opts_set->x_flag_split_loops)
> +opts->x_flag_split_loops = value;
>if (!opts_set->x_flag_unswitch_loops)
>  opts->x_flag_unswitch_loops = value;
>if (!opts_set->x_flag_gcse_after_reload)
> Index: timevar.def
> =

Re: [PATCH, vec-tails 07/10] Support loop epilogue combining

2016-07-25 Thread Jeff Law

On 07/25/2016 12:32 PM, Richard Biener wrote:

On July 25, 2016 8:01:17 PM GMT+02:00, Jeff Law  wrote:

On 07/22/2016 05:36 AM, Richard Biener wrote:

The thing that needs work I think is re-running of if-conversion.

I wonder if we could revamp if-conversion to work on a subset of the
CFG?   I can see that potentially being useful in other contexts.
Would
that work for you Richi?


Well, you need to make it not need post-dominators or preserve them (or compute 
"post-dominators" on SESE regions).
Oh, but it'd be so nice to have DOMs and/or PDOMs on regions.  But 
that's probably out of scope for gcc-7.





What doesn't work with the idea to clone the epilogue using 
__built-in_vectorized()
For the if- vs. Not if-converted loop?
I must be missing something.   I don't see how 
builtin_vectorized_function helps, but maybe I've got the wrong built-in 
or don't understand what you're suggesting.


It sounds like this is the biggest impediment to moving forward.  So 
let's reset and make sure we're all on the same page here.


Ilya, what's the fundamental reason why we need to run if-conversion 
again?Yes, I know you want to if-convert the epilogue, but why?


What are the consequences of not doing if-conversion on the epilogue? 
Presumably we miss a vectorization opportunity on the tail.  But that 
may be a reasonable limitation to allow the existing work to move 
forward while you go back and revamp things a little.


Jeff


  1   2   >