Re: [RFC][IPA-VRP] Disable setting param of __builtin_constant_p to null
Hi, diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index ecfab1f..23c12b5 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -3759,8 +3759,10 @@ extract_range_basic (value_range *vr, gimple *stmt) && SSA_NAME_IS_DEFAULT_DEF (arg) && TREE_CODE (SSA_NAME_VAR (arg)) == PARM_DECL) { +#if 0 set_value_range_to_null (vr, type); return; +#endif It is not cleanest either, but better to test cfun->after_inlining Thanks. Here is the patch which does this. Bootstrapped and regression tested with the rest of the patches in the series. Is this OK for trunk? Thanks, Kugan gcc/ChangeLog: 2016-07-25 Kugan Vivekanandarajah * tree-vrp.c (extract_range_basic): Check cfun->after_inlining before folding call to __builtin_constant_p with parameters to false. >From 4805ea975de0fd3b183b27324df1caa7ff29f887 Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah Date: Sat, 25 Jun 2016 11:52:57 +1000 Subject: [PATCH 2/7] Prevent setting __builtin_constant_p of param to null before inlining in Early VRP --- gcc/tree-vrp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index bdfc1b6..edaacf2 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -3763,7 +3763,8 @@ extract_range_basic (value_range *vr, gimple *stmt) arg = gimple_call_arg (stmt, 0); if (TREE_CODE (arg) == SSA_NAME && SSA_NAME_IS_DEFAULT_DEF (arg) - && TREE_CODE (SSA_NAME_VAR (arg)) == PARM_DECL) + && TREE_CODE (SSA_NAME_VAR (arg)) == PARM_DECL + && cfun->after_inlining) { set_value_range_to_null (vr, type); return; -- 1.9.1
Re: Gimple loop splitting
On Thu, Nov 12, 2015 at 8:52 AM, Michael Matz wrote: > Hello, > > this new pass implements loop iteration space splitting for loops that > contain a conditional that's always true for one part of the iteration > space and false for the other, i.e. such situations: > > for (i = beg; i < end; i++) > if (i < p) > dothis(); > else > dothat(); > > this is transformed into roughly: > > for (i = beg; i < p; i++) > dothis(); > for (; i < end; i++) > dothat(); > > Of course, not quite the above as there needs to be provisions for the > border conditions, if e.g. 'p' is outside the original iteration space, or > the conditional doesn't directly use the control IV, but some other, or > the IV runs backwards. The testcase checks many of these border > conditions. > > This transformation is in itself a good one but can also be an enabler for > the vectorizer. It does increase code size, when the loop body contains > also unconditional code (that one is duplicated), so we only transform hot > loops. I'm a bit unsure of the placement of the new pass, or if it should > be an own pass at all. Right now I've placed it after unswitching and > scev_cprop, before loop distribution. Ideally I think all three, together > with loop fusion and an gimple unroller should be integrated into one loop > nest optimizer, alas, we aren't there yet. > > I'm planning to work on loop fusion in the future as well, but that's not > for GCC 6. > > I've regstrapped this pass enabled with -O2 on x86-64-linux, without > regressions. I've also checked cpu2006 (the non-fortran part) for > correctness, not yet for performance. In the end it should probably only > be enabled for -O3+ (although if the whole loop body is conditional it > makes sense to also have it with -O2 because code growth is very small > then). > > So, okay for trunk? What ever happened to this patch? I was looking into doing this myself today but I found this patch. It is stage 1 of GCC 7, it might be a good idea to get this patch into GCC. Thanks, Andrew > > > Ciao, > Michael. > * passes.def (pass_loop_split): Add. > * timevar.def (TV_LOOP_SPLIT): Add. > * tree-pass.h (make_pass_loop_split): Declare. > * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare. > * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h, > cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h, > gimple-pretty-print.h, gimple-fold.h, gimplify-me.h. > (split_at_bb_p, patch_loop_exit, find_or_create_guard_phi, > split_loop, tree_ssa_split_loops, > make_pass_loop_split): New functions. > (pass_data_loop_split): New. > (pass_loop_split): New. > > testsuite/ > * gcc.dg/loop-split.c: New test. > > Index: passes.def > === > --- passes.def (revision 229763) > +++ passes.def (working copy) > @@ -233,6 +233,7 @@ along with GCC; see the file COPYING3. > NEXT_PASS (pass_dce); > NEXT_PASS (pass_tree_unswitch); > NEXT_PASS (pass_scev_cprop); > + NEXT_PASS (pass_loop_split); > NEXT_PASS (pass_record_bounds); > NEXT_PASS (pass_loop_distribution); > NEXT_PASS (pass_copy_prop); > Index: timevar.def > === > --- timevar.def (revision 229763) > +++ timevar.def (working copy) > @@ -179,6 +179,7 @@ DEFTIMEVAR (TV_LIM , " > DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv") > DEFTIMEVAR (TV_SCEV_CONST, "scev constant prop") > DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH, "tree loop unswitching") > +DEFTIMEVAR (TV_LOOP_SPLIT, "loop splitting") > DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling") > DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops") > DEFTIMEVAR (TV_TREE_VECTORIZATION, "tree vectorization") > Index: tree-pass.h > === > --- tree-pass.h (revision 229763) > +++ tree-pass.h (working copy) > @@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_tree_n > extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt); > +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt); > Index: tree-ssa-loop-manip.h > === > --- tree-ssa-loop-manip.h (revision 229763) > +++ tree-ssa-loop-manip.h (working copy) > @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc > > extern void create_iv (tr
Re: [PATCH] Add a STATIC_ASSERT on sizeof (struct cp_token)
On Fri, Jul 22, 2016 at 4:11 PM, Jakub Jelinek wrote: > On Fri, Jul 22, 2016 at 10:33:50AM -0400, David Malcolm wrote: >> gcc/cp/ChangeLog: >> * parser.h (struct cp_token): Add a STATIC_ASSERT on the >> size of the struct. >> --- >> gcc/cp/parser.h | 9 + >> 1 file changed, 9 insertions(+) >> >> diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h >> index ccbace9..8c1de57 100644 >> --- a/gcc/cp/parser.h >> +++ b/gcc/cp/parser.h >> @@ -71,6 +71,15 @@ struct GTY (()) cp_token { >> "|| (%1.type == CPP_DECLTYPE)"))) u; >> }; >> >> +/* The C++ frontend lexes everything first, and keeps the tokens >> + in memory, so there are possibly millions of tokens in memory. >> + Ensure that we don't accidentally grow the structure. */ >> +STATIC_ASSERT (sizeof (cp_token) == >> +(2 // "type" and "keyword" >> + + 1 // "flags" >> + + 1 // bitfields >> + + 4 // location_t >> + + sizeof (void *))); // union > > Doesn't that assume way too much on the host data layout? > This can be compiled with weirdo system compilers or on weirdo hosts, > I bet we don't really support non-8-bit char hosts, but still this might > break somewhere... > > The formatting is wrong BTW, == shouldn't appear at the end of line. Maybe restrict it to __GNUC__ and __x86_64__ or so. Richard. > Jakub
Re: [PATCH] Minor changes in tree-vrp.c
On Sat, Jul 23, 2016 at 2:40 PM, Patrick Palka wrote: > 1. When dumping assert details, print loc->expr instead of the bare SSA > name. loc->expr is not always equal to the SSA name. For example we > sometimes insert an ASSERT_EXPR like > > x_7 = ASSERT_EXPR 8>; > > The diff of the new dump output looks like: > > Assertions to be inserted for x_4(D) > if (_4 <= 8) > > BB #3 > EDGE 2->3 2 [39.0%] (FALSE_VALUE,EXECUTABLE) > - PREDICATE: x_4(D) gt_expr 8 > + PREDICATE: (unsigned int) x_4(D) + 4294967295 gt_expr 8 > > 2. In extract_code_and_val_from_cond_with_ops verify that name is equal to > either cond_op0 or cond_op1. If name is not equal to one of these > operands then its caller register_edge_assert_for will malfunction and > the wrong assertion will get inserted. > > Is this OK to commit after bootstrap + regtesting? Ok. Richard. > gcc/ChangeLog: > > * tree-vrp.c (dump_asserts_for): Print loc->expr instead of > name. > (extract_code_and_val_from_cond_with_ops): Verify that name is > either cond_op0 or cond_op1. > --- > gcc/tree-vrp.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c > index a3068ec..5072370 100644 > --- a/gcc/tree-vrp.c > +++ b/gcc/tree-vrp.c > @@ -4827,7 +4827,7 @@ dump_asserts_for (FILE *file, tree name) > dump_edge_info (file, loc->e, dump_flags, 0); > } >fprintf (file, "\n\tPREDICATE: "); > - print_generic_expr (file, name, 0); > + print_generic_expr (file, loc->expr, 0); >fprintf (file, " %s ", get_tree_code_name (loc->comp_code)); >print_generic_expr (file, loc->val, 0); >fprintf (file, "\n\n"); > @@ -5009,13 +5009,15 @@ extract_code_and_val_from_cond_with_ops (tree name, > enum tree_code cond_code, >comp_code = swap_tree_comparison (cond_code); >val = cond_op0; > } > - else > + else if (name == cond_op0) > { >/* The comparison is of the form NAME COMP VAL, so the > comparison code remains unchanged. */ >comp_code = cond_code; >val = cond_op1; > } > + else > +gcc_unreachable (); > >/* Invert the comparison code as necessary. */ >if (invert) > -- > 2.9.2.413.g76d2a70 >
Re: [PATCH 5/6] add a constructor to elim_graph
On Sun, Jul 24, 2016 at 1:44 PM, wrote: > From: Trevor Saunders > > gcc/ChangeLog: Ok. Richard. > 2016-07-24 Trevor Saunders > > * tree-outof-ssa.c (struct elim_graph): Change type of members > to auto_vec and auto_sbitmap. > (elim_graph::elim_graph): New constructor. > (delete_elim_graph): Remove. > (expand_phi_nodes): Adjust. > --- > gcc/tree-outof-ssa.c | 64 > +--- > 1 file changed, 16 insertions(+), 48 deletions(-) > > diff --git a/gcc/tree-outof-ssa.c b/gcc/tree-outof-ssa.c > index 5047788..be57ce4 100644 > --- a/gcc/tree-outof-ssa.c > +++ b/gcc/tree-outof-ssa.c > @@ -128,23 +128,25 @@ ssa_is_replaceable_p (gimple *stmt) > > struct elim_graph > { > + elim_graph (var_map map); > + >/* Size of the elimination vectors. */ >int size; > >/* List of nodes in the elimination graph. */ > - vec nodes; > + auto_vec nodes; > >/* The predecessor and successor edge list. */ > - vec edge_list; > + auto_vec edge_list; > >/* Source locus on each edge */ > - vec edge_locus; > + auto_vec edge_locus; > >/* Visited vector. */ > - sbitmap visited; > + auto_sbitmap visited; > >/* Stack for visited nodes. */ > - vec stack; > + auto_vec stack; > >/* The variable partition map. */ >var_map map; > @@ -153,11 +155,11 @@ struct elim_graph >edge e; > >/* List of constant copies to emit. These are pushed on in pairs. */ > - vec const_dests; > - vec const_copies; > + auto_vec const_dests; > + auto_vec const_copies; > >/* Source locations for any constant copies. */ > - vec copy_locus; > + auto_vec copy_locus; > }; > > > @@ -392,25 +394,12 @@ insert_part_to_rtx_on_edge (edge e, rtx dest, int src, > source_location locus) > } > > > -/* Create an elimination graph with SIZE nodes and associated data > - structures. */ > +/* Create an elimination graph for map. */ > > -static elim_graph * > -new_elim_graph (int size) > +elim_graph::elim_graph (var_map map) : > + nodes (30), edge_list (20), edge_locus (10), visited (map->num_partitions), > + stack (30), map (map), const_dests (20), const_copies (20), copy_locus (10) > { > - elim_graph *g = (elim_graph *) xmalloc (sizeof (struct elim_graph)); > - > - g->nodes.create (30); > - g->const_dests.create (20); > - g->const_copies.create (20); > - g->copy_locus.create (10); > - g->edge_list.create (20); > - g->edge_locus.create (10); > - g->stack.create (30); > - > - g->visited = sbitmap_alloc (size); > - > - return g; > } > > > @@ -425,24 +414,6 @@ clear_elim_graph (elim_graph *g) > } > > > -/* Delete elimination graph G. */ > - > -static inline void > -delete_elim_graph (elim_graph *g) > -{ > - sbitmap_free (g->visited); > - g->stack.release (); > - g->edge_list.release (); > - g->const_copies.release (); > - g->const_dests.release (); > - g->nodes.release (); > - g->copy_locus.release (); > - g->edge_locus.release (); > - > - free (g); > -} > - > - > /* Return the number of nodes in graph G. */ > > static inline int > @@ -925,8 +896,7 @@ void > expand_phi_nodes (struct ssaexpand *sa) > { >basic_block bb; > - elim_graph *g = new_elim_graph (sa->map->num_partitions); > - g->map = sa->map; > + elim_graph g (sa->map); > >FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb, > EXIT_BLOCK_PTR_FOR_FN (cfun), next_bb) > @@ -935,7 +905,7 @@ expand_phi_nodes (struct ssaexpand *sa) > edge e; > edge_iterator ei; > FOR_EACH_EDGE (e, ei, bb->preds) > - eliminate_phi (e, g); > + eliminate_phi (e, &g); > set_phi_nodes (bb, NULL); > /* We can't redirect EH edges in RTL land, so we need to do this >here. Redirection happens only when splitting is necessary, > @@ -961,8 +931,6 @@ expand_phi_nodes (struct ssaexpand *sa) > ei_next (&ei); > } >} > - > - delete_elim_graph (g); > } > > > -- > 2.9.0 >
Re: [PATCH 4/6] remove elim_graph typedef
On Sun, Jul 24, 2016 at 1:44 PM, wrote: > From: Trevor Saunders > > gcc/ChangeLog: Ok. Richard. > 2016-07-24 Trevor Saunders > > * tree-outof-ssa.c (struct elim_graph): Remove typedef. > (new_elim_graph): Adjust. > (clear_elim_graph): Likewise. > (delete_elim_graph): Likewise. > (elim_graph_size): Likewise. > (elim_graph_add_node): Likewise. > (elim_graph_add_edge): Likewise. > (elim_graph_remove_succ_edge): Likewise. > (eliminate_name): Likewise. > (eliminate_build): Likewise. > (elim_forward): Likewise. > (elim_unvisited_predecessor): Likewise. > (elim_backward): Likewise. > (elim_create): Likewise. > (eliminate_phi): Likewise. > (expand_phi_nodes): Likewise. > --- > gcc/tree-outof-ssa.c | 37 +++-- > 1 file changed, 19 insertions(+), 18 deletions(-) > > diff --git a/gcc/tree-outof-ssa.c b/gcc/tree-outof-ssa.c > index 4125529..5047788 100644 > --- a/gcc/tree-outof-ssa.c > +++ b/gcc/tree-outof-ssa.c > @@ -126,7 +126,8 @@ ssa_is_replaceable_p (gimple *stmt) > rarely more than 6, and in the bootstrap of gcc, the maximum number > of nodes encountered was 12. */ > > -typedef struct _elim_graph { > +struct elim_graph > +{ >/* Size of the elimination vectors. */ >int size; > > @@ -157,7 +158,7 @@ typedef struct _elim_graph { > >/* Source locations for any constant copies. */ >vec copy_locus; > -} *elim_graph; > +}; > > > /* For an edge E find out a good source location to associate with > @@ -394,10 +395,10 @@ insert_part_to_rtx_on_edge (edge e, rtx dest, int src, > source_location locus) > /* Create an elimination graph with SIZE nodes and associated data > structures. */ > > -static elim_graph > +static elim_graph * > new_elim_graph (int size) > { > - elim_graph g = (elim_graph) xmalloc (sizeof (struct _elim_graph)); > + elim_graph *g = (elim_graph *) xmalloc (sizeof (struct elim_graph)); > >g->nodes.create (30); >g->const_dests.create (20); > @@ -416,7 +417,7 @@ new_elim_graph (int size) > /* Empty elimination graph G. */ > > static inline void > -clear_elim_graph (elim_graph g) > +clear_elim_graph (elim_graph *g) > { >g->nodes.truncate (0); >g->edge_list.truncate (0); > @@ -427,7 +428,7 @@ clear_elim_graph (elim_graph g) > /* Delete elimination graph G. */ > > static inline void > -delete_elim_graph (elim_graph g) > +delete_elim_graph (elim_graph *g) > { >sbitmap_free (g->visited); >g->stack.release (); > @@ -445,7 +446,7 @@ delete_elim_graph (elim_graph g) > /* Return the number of nodes in graph G. */ > > static inline int > -elim_graph_size (elim_graph g) > +elim_graph_size (elim_graph *g) > { >return g->nodes.length (); > } > @@ -454,7 +455,7 @@ elim_graph_size (elim_graph g) > /* Add NODE to graph G, if it doesn't exist already. */ > > static inline void > -elim_graph_add_node (elim_graph g, int node) > +elim_graph_add_node (elim_graph *g, int node) > { >int x; >int t; > @@ -469,7 +470,7 @@ elim_graph_add_node (elim_graph g, int node) > /* Add the edge PRED->SUCC to graph G. */ > > static inline void > -elim_graph_add_edge (elim_graph g, int pred, int succ, source_location locus) > +elim_graph_add_edge (elim_graph *g, int pred, int succ, source_location > locus) > { >g->edge_list.safe_push (pred); >g->edge_list.safe_push (succ); > @@ -481,7 +482,7 @@ elim_graph_add_edge (elim_graph g, int pred, int succ, > source_location locus) > return the successor node. -1 is returned if there is no such edge. */ > > static inline int > -elim_graph_remove_succ_edge (elim_graph g, int node, source_location *locus) > +elim_graph_remove_succ_edge (elim_graph *g, int node, source_location *locus) > { >int y; >unsigned x; > @@ -543,7 +544,7 @@ do { > \ > /* Add T to elimination graph G. */ > > static inline void > -eliminate_name (elim_graph g, int T) > +eliminate_name (elim_graph *g, int T) > { >elim_graph_add_node (g, T); > } > @@ -570,7 +571,7 @@ queue_phi_copy_p (var_map map, tree t) > G->e. */ > > static void > -eliminate_build (elim_graph g) > +eliminate_build (elim_graph *g) > { >tree Ti; >int p0, pi; > @@ -619,7 +620,7 @@ eliminate_build (elim_graph g) > /* Push successors of T onto the elimination stack for G. */ > > static void > -elim_forward (elim_graph g, int T) > +elim_forward (elim_graph *g, int T) > { >int S; >source_location locus; > @@ -637,7 +638,7 @@ elim_forward (elim_graph g, int T) > /* Return 1 if there unvisited predecessors of T in graph G. */ > > static int > -elim_unvisited_predecessor (elim_graph g, int T) > +elim_unvisited_predecessor (elim_graph *g, int T) > { >int P; >source_location locus; > @@ -653,7 +654,7 @@ elim_unvisited_predecessor (elim_graph g, int T) > /* Process predecesso
Re: [PATCH] Adapt the numbering scheme (PR gcov-profile/64874)
On 07/22/2016 03:20 PM, Jakub Jelinek wrote: > On Fri, Jul 22, 2016 at 01:46:44PM +0200, Martin Liška wrote: >> As described in the PR, current numbering scheme in gcov-io.h would overflow >> in couple of years. >> Thus, I'm suggesting to switch from: >> >> [major][minor/10][minor%10][release_status] >> >> to: >> [major/10][major%10][minor][release_status] >> >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Won't this clash with the already released versions? > Unless you start e.g. using different release_status letters from the past > (but that is complicated with vendor supplied DEV-PHASE), won't > say gcc 3.4.0 and gcc 30.4.1 have the same string? > > Wouldn't it be better to just use letters for the major/10? As GCC major > didn't reach 10 yet, we only had [0-9][0-9][0-9]. in the past, so I think: > > - v[0] = (major < 10 ? '0' : 'A' - 10) + major; > - v[1] = (minor / 10) + '0'; > - v[2] = (minor % 10) + '0'; > + v[0] = (major / 10) + 'A'; > + v[1] = (major % 10) + '0'; > + v[2] = minor + '0'; > > would be better, then there will be no clash, and the versioning scheme will > allow until 259.9. > > Jakub > Hello. I like the change suggested by Jakub, I've updated the numbering scheme, as well as comments in gcov-io.h. Final version installed as r238702. Martin >From 7cdcfec705d90f4d0a04d82617373077280ec495 Mon Sep 17 00:00:00 2001 From: marxin Date: Fri, 22 Jul 2016 11:54:20 +0200 Subject: [PATCH] Adapt the numbering scheme (PR gcov-profile/64874) gcc/ChangeLog: 2016-07-22 Martin Liska PR gcov-profile/64874 * gcov-io.h: Update command about file format. * gcov-iov.c (main): Adapt the numbering scheme. --- gcc/gcov-io.h | 18 +- gcc/gcov-iov.c | 6 +++--- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h index 3446407..bbf013a 100644 --- a/gcc/gcov-io.h +++ b/gcc/gcov-io.h @@ -63,19 +63,19 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see Although the ident and version are formally 32 bit numbers, they are derived from 4 character ASCII strings. The version number - consists of the single character major version number, a two - character minor version number (leading zero for versions less than - 10), and a single character indicating the status of the release. + consists of a two character major version number + (first digit starts from 'A' letter to not to clash with the older + numbering scheme), the single character minor version number, + and a single character indicating the status of the release. That will be 'e' experimental, 'p' prerelease and 'r' for release. Because, by good fortune, these are in alphabetical order, string collating can be used to compare version strings. Be aware that the 'e' designation will (naturally) be unstable and might be - incompatible with itself. For gcc 3.4 experimental, it would be - '304e' (0x33303465). When the major version reaches 10, the - letters A-Z will be used. Assuming minor increments releases every - 6 months, we have to make a major increment every 50 years. - Assuming major increments releases every 5 years, we're ok for the - next 155 years -- good enough for me. + incompatible with itself. For gcc 17.0 experimental, it would be + 'B70e' (0x42373065). As we currently do not release more than 5 minor + releases, the single character should be always fine. Major number + is currently changed roughly every year, which gives us space + for next 250 years (maximum allowed number would be 259.9). A record has a tag, length and variable amount of data. diff --git a/gcc/gcov-iov.c b/gcc/gcov-iov.c index 202f32a..f87445a 100644 --- a/gcc/gcov-iov.c +++ b/gcc/gcov-iov.c @@ -58,9 +58,9 @@ main (int argc, char **argv) || strcmp (argv[2], "prerelease") == 0) phase = '*'; - v[0] = (major < 10 ? '0' : 'A' - 10) + major; - v[1] = (minor / 10) + '0'; - v[2] = (minor % 10) + '0'; + v[0] = (major / 10) + 'A'; + v[1] = (major % 10) + '0'; + v[2] = minor + '0'; v[3] = phase; for (ix = 0; ix != 4; ix++) -- 2.9.0
Re: [PATCH 3/6] add ctor to topo_info
On Sun, Jul 24, 2016 at 1:44 PM, wrote: > From: Trevor Saunders > > gcc/ChangeLog: > > 2016-07-24 Trevor Saunders > > * tree-ssa-structalias.c (struct topo_info): Add constructor, > and change types of members to auto_vec and auto_sbitmap. > (init_topo_info): Remove. > (topo_info::topo_info): New constructor. > (solve_graph): Adjust. > --- > gcc/tree-ssa-structalias.c | 31 --- > 1 file changed, 8 insertions(+), 23 deletions(-) > > diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c > index 5e3c7d0..94d81ed1 100644 > --- a/gcc/tree-ssa-structalias.c > +++ b/gcc/tree-ssa-structalias.c > @@ -1536,36 +1536,21 @@ unify_nodes (constraint_graph_t graph, unsigned int > to, unsigned int from, > > struct topo_info > { > + topo_info (); > + >/* sbitmap of visited nodes. */ > - sbitmap visited; > + auto_sbitmap visited; >/* Array that stores the topological order of the graph, *in > reverse*. */ > - vec topo_order; > + auto_vec topo_order; > }; > > > /* Initialize and return a topological info structure. */ > > -static struct topo_info * > -init_topo_info (void) > -{ > - size_t size = graph->size; > - struct topo_info *ti = XNEW (struct topo_info); > - ti->visited = sbitmap_alloc (size); > - bitmap_clear (ti->visited); > - ti->topo_order.create (1); > - return ti; > -} > - > - > -/* Free the topological sort info pointed to by TI. */ > - > -static void > -free_topo_info (struct topo_info *ti) > +topo_info::topo_info () : visited (graph->size), topo_order (1) > { > - sbitmap_free (ti->visited); > - ti->topo_order.release (); > - free (ti); > + bitmap_clear (visited); > } > > /* Visit the graph in topological order, and store the order in the > @@ -2679,7 +2664,7 @@ solve_graph (constraint_graph_t graph) >while (!bitmap_empty_p (changed)) > { >unsigned int i; > - struct topo_info *ti = init_topo_info (); > + topo_info *ti = new topo_info (); I think it would be better to save the repeated allocations and instead just clear the bitmap and the vector here. Your patch makes this more difficult. Richard. >stats.iterations++; > >bitmap_obstack_initialize (&iteration_obstack); > @@ -2797,7 +2782,7 @@ solve_graph (constraint_graph_t graph) > } > } > } > - free_topo_info (ti); > + delete ti; >bitmap_obstack_release (&iteration_obstack); > } > > -- > 2.9.0 >
[PATCH] Handle loops with loop->latch == NULL (PR gcov-profile/71868)
Hi. As discussed with Honza, we should sum all edge frequencies when a loop has multiple latches. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? Martin >From 07dc9092511aedfe2786630d72419b16ef660d0c Mon Sep 17 00:00:00 2001 From: marxin Date: Fri, 22 Jul 2016 11:36:52 +0200 Subject: [PATCH] Handle loops with loop->latch == NULL (PR gcov-profile/71868) gcc/ChangeLog: 2016-07-22 Martin Liska PR gcov-profile/71868 * cfgloopanal.c (expected_loop_iterations_unbounded): When we have a function with multiple latches, count them all. --- gcc/cfgloopanal.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c index 2739f44..ae08108 100644 --- a/gcc/cfgloopanal.c +++ b/gcc/cfgloopanal.c @@ -244,7 +244,7 @@ expected_loop_iterations_unbounded (const struct loop *loop, /* If we have no profile at all, use AVG_LOOP_NITER. */ if (profile_status_for_fn (cfun) == PROFILE_ABSENT) expected = PARAM_VALUE (PARAM_AVG_LOOP_NITER); - else if (loop->latch->count || loop->header->count) + else if (loop->latch && (loop->latch->count || loop->header->count)) { gcov_type count_in, count_latch; @@ -274,8 +274,9 @@ expected_loop_iterations_unbounded (const struct loop *loop, freq_latch = 0; FOR_EACH_EDGE (e, ei, loop->header->preds) - if (e->src == loop->latch) - freq_latch = EDGE_FREQUENCY (e); + if (e->src == loop->latch + || flow_bb_inside_loop_p (loop, e->src)) + freq_latch += EDGE_FREQUENCY (e); else freq_in += EDGE_FREQUENCY (e); -- 2.9.0
[PATCH] Call get_ops just for SSA_NAMEs (PR tree-optimization/71987)
Hi. As other calls of get_ops is guarded with TREE_CODE (x) == SSA_NAME, I guess the same should be done for the call that causes the ICE. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? Martin >From 9dd5cc00eb3c271fad91fede6a9b06df356e001f Mon Sep 17 00:00:00 2001 From: marxin Date: Mon, 25 Jul 2016 09:19:54 +0200 Subject: [PATCH] Call get_ops just for SSA_NAMEs (PR tree-optimization/71987) gcc/ChangeLog: 2016-07-25 Martin Liska PR tree-optimization/71987 * tree-ssa-reassoc.c (maybe_optimize_range_tests): Call get_ops just for SSA_NAMEs. gcc/testsuite/ChangeLog: 2016-07-25 Martin Liska * gcc.dg/torture/pr71987.c: New test. --- gcc/testsuite/gcc.dg/torture/pr71987.c | 21 + gcc/tree-ssa-reassoc.c | 10 ++ 2 files changed, 27 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr71987.c diff --git a/gcc/testsuite/gcc.dg/torture/pr71987.c b/gcc/testsuite/gcc.dg/torture/pr71987.c new file mode 100644 index 000..87d5938 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr71987.c @@ -0,0 +1,21 @@ +/* PR tree-optimization/71987 */ + +int a, b, *c, *d; + +short fn1 (int p1) +{ + return a ? p1 : a; +} + +void fn2 () +{ + int e, *f = &e; + b = fn1 (d != &e); + c = f; +} + +int main () +{ + fn2 (); + return 0; +} diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index ece2d08..dfaf7d8 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -3519,7 +3519,8 @@ maybe_optimize_range_tests (gimple *stmt) (or &, corresponding to 1/0 in the phi arguments, push into ops the individual range test arguments of the bitwise or resp. and, recursively. */ - if (!get_ops (rhs, code, &ops, + if (TREE_CODE (rhs) == SSA_NAME + && !get_ops (rhs, code, &ops, loop_containing_stmt (stmt)) && (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) != tcc_comparison) @@ -3537,11 +3538,12 @@ maybe_optimize_range_tests (gimple *stmt) bb_ent.last_idx++; bb_ent.op = rhs; } - else if (is_gimple_assign (stmt) + else if (TREE_CODE (lhs) == SSA_NAME + && is_gimple_assign (stmt) && (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison) - &&!get_ops (lhs, code, &ops, - loop_containing_stmt (stmt)) + && !get_ops (lhs, code, &ops, +loop_containing_stmt (stmt)) && has_single_use (lhs)) { operand_entry *oe = operand_entry_pool.allocate (); -- 2.9.0
[PATCH] Fix memory leak introduced in r238336
Hi. This is quite obvious change. I've been waiting for bootstrap and regression tests on ppc64le-redhat-linux. Ready after it finishes? Martin >From 2f416d7feca35d9075124f4dc74f3560a18beefb Mon Sep 17 00:00:00 2001 From: marxin Date: Fri, 22 Jul 2016 12:46:08 +0200 Subject: [PATCH] Fix memory leak introduced in r238336 gcc/ChangeLog: 2016-07-22 Martin Liska * tree-ssa-loop-niter.c (loop_only_exit_p): Release body array. --- gcc/tree-ssa-loop-niter.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index b7d7c32..95bb5cf 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -2119,7 +2119,10 @@ loop_only_exit_p (const struct loop *loop, const_edge exit) { for (bsi = gsi_start_bb (body[i]); !gsi_end_p (bsi); gsi_next (&bsi)) if (stmt_can_terminate_bb_p (gsi_stmt (bsi))) - return true; + { + free (body); + return true; + } } free (body); -- 2.9.0
Re: [PR70920] transform (intptr_t) x eq/ne CST to x eq/ne (typeof x) cst
On Mon, 25 Jul 2016, Prathamesh Kulkarni wrote: > Hi Richard, > The attached patch tries to fix PR70920. > It adds your pattern from comment 1 in the PR > (with additional gating on INTEGRAL_TYPE_P to avoid regressing > finalize_18.f90) > and second pattern, which is reverse of the first transform. > I needed to update ssa-dom-branch-1.c because with patch applied, > jump threading removed the second if (i != 0B) block. > The dumps with and without patch for ssa-dom-branch-1.c start > to differ with forwprop1: > > before: > : > _1 = temp_16(D)->code; > _2 = _1 == 42; > _3 = (int) _2; > _4 = (long int) _3; > temp_17 = (struct rtx_def *) _4; > if (temp_17 != 0B) > goto ; > else > goto ; > > after: > : > _1 = temp_16(D)->code; > _2 = _1 == 42; > _3 = (int) _2; > _4 = (long int) _2; > temp_17 = (struct rtx_def *) _4; > if (_1 == 42) > goto ; > else > goto ; > > I suppose the transform is correct for above test-case ? > > Then vrp dump shows: > Threaded jump 5 --> 9 to 13 > Threaded jump 8 --> 9 to 13 > Threaded jump 3 --> 9 to 13 > Threaded jump 12 --> 9 to 14 > Removing basic block 9 > basic block 9, loop depth 0 > pred: > if (i1_10(D) != 0B) > goto ; > else > goto ; > succ: 10 > 11 > > So there remained two instances of if (i1_10 (D) != 0B) in dom2 dump file, > and hence needed to update the test-case. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > OK to commit ? --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3408,3 +3408,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) { CONSTRUCTOR_ELT (ctor, idx / k)->value; }) (BIT_FIELD_REF { CONSTRUCTOR_ELT (ctor, idx / k)->value; } @1 { bitsize_int ((idx % k) * width); }) + +/* PR70920: Transform (intptr_t)x eq/ne CST to x eq/ne (typeof x) CST. */ + +(for cmp (ne eq) + (simplify + (cmp (convert@2 @0) INTEGER_CST@1) + (if (POINTER_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@2))) you can use @1 here and omit @2. + (cmp @0 (convert @1) + +/* Reverse of the above case: + x has integral_type, CST is a pointer constant. + Transform (typeof CST)x eq/ne CST to x eq/ne (typeof x) CST. */ + +(for cmp (ne eq) + (simplify + (cmp (convert @0) @1) + (if (POINTER_TYPE_P (TREE_TYPE (@1)) + && INTEGRAL_TYPE_P (TREE_TYPE (@0))) +(cmp @0 (convert @1) The second pattern lacks the INTEGER_CST on @1 so it doesn't match its comment. Please do not add vertical space between pattern comment and pattern. Please place patterns not at the end of match.pd but where similar transforms are done. Like after /* Simplify pointer equality compares using PTA. */ (for neeq (ne eq) (simplify (neeq @0 @1) (if (POINTER_TYPE_P (TREE_TYPE (@0)) && ptrs_compare_unequal (@0, @1)) { neeq == EQ_EXPR ? boolean_false_node : boolean_true_node; }))) please also share the (for ...) for both patterns or merge them by changing the condition to (if ((POINTER_TYPE_P (TREE_TYPE (@0)) && INTEGRAL_TYPE_P (TREE_TYPE (@1))) || (INTEGRAL_TYPE_P (TREE_TYPE (@0)) && POINTER_TYPE_P (TREE_TYPE (@1 Richard. > PS: Writing changelog entries for match.pd is a bit tedious. > Should we add optional names for pattern so we can refer to them by names > in the ChangeLog for the more complicated ones ? > Or maybe just use comments: > (simplify /* name */ ... ) -;) That will add the fun of inventing names ;) > Thanks, > Prathamesh > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [AArch64][3/3] Migrate aarch64_expand_prologue/epilogue to aarch64_add_constant
On 21/07/16 11:08, Richard Earnshaw (lists) wrote: On 20/07/16 16:02, Jiong Wang wrote: Richard, Thanks for the review, yes, I believe using aarch64_add_constant is unconditionally safe here. Because we have generated a stack tie to clobber the whole memory thus prevent any instruction which access stack be scheduled after that. The access to deallocated stack issue was there and fixed by https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html. aarch64_add_constant itself is generating the same instruction sequences as the original code, except for a few cases, it will prefer move scratch_reg, #imm add sp, sp, scratch_reg than: add sp, sp, #imm_part1 add sp, sp, #imm_part2 OK, I've had another look at this and I'm happy that we don't (currently) run into the problem I'm concerned about. However, this new usage does impose a constraint on aarch64_add_constant that will need to be respected in future, so please can you add the following to the comment that precedes that function: /* ... This function is sometimes used to adjust the stack pointer, so we must ensure that it can never cause transient stack deallocation by writing an invalid value into REGNUM. */ + bool frame_related_p = (regnum == SP_REGNUM); I think it would be better to make the frame-related decision be an explicit parameter passed to the routine (don't forget SP is not always the frame pointer). Then the new uses would pass 'true' and the existing uses 'false'. R. Thanks, attachment is the updated patch which: * Added above new comments for aarch64_add_constant. * One new parameter "frame_related_p" for aarch64_add_constant. I thought adding new gcc assertion for sanity check of frame_related_p and REGNUM, haven't done that as I found dwarf2cfi.c is doing that. OK for trunk? gcc/ 2016-07-25 Jiong Wang * config/aarch64/aarch64.c (aarch64_add_constant): New parameter "frame_related_p". Generate CFA annotation when it's necessary. (aarch64_expand_prologue): Use aarch64_add_constant. (aarch64_expand_epilogue): Likewise. (aarch64_output_mi_thunk): Pass "false" when calling aarch64_add_constant. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 41844a1..ca93f6e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1866,14 +1866,19 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) } /* Add DELTA onto REGNUM in MODE, using SCRATCHREG to held intermediate value if - it is necessary. */ + it is necessary. + + This function is sometimes used to adjust the stack pointer, so we must + ensure that it can never cause transient stack deallocation by writing an + invalid value into REGNUM. */ static void aarch64_add_constant (machine_mode mode, int regnum, int scratchreg, - HOST_WIDE_INT delta) + HOST_WIDE_INT delta, bool frame_related_p) { HOST_WIDE_INT mdelta = abs_hwi (delta); rtx this_rtx = gen_rtx_REG (mode, regnum); + rtx_insn *insn; /* Do nothing if mdelta is zero. */ if (!mdelta) @@ -1882,7 +1887,8 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg, /* We only need single instruction if the offset fit into add/sub. */ if (aarch64_uimm12_shift (mdelta)) { - emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta))); + insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta))); + RTX_FRAME_RELATED_P (insn) = frame_related_p; return; } @@ -1895,15 +1901,23 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg, HOST_WIDE_INT low_off = mdelta & 0xfff; low_off = delta < 0 ? -low_off : low_off; - emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off))); - emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off))); + insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off))); + RTX_FRAME_RELATED_P (insn) = frame_related_p; + insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off))); + RTX_FRAME_RELATED_P (insn) = frame_related_p; return; } /* Otherwise use generic function to handle all other situations. */ rtx scratch_rtx = gen_rtx_REG (mode, scratchreg); aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode); - emit_insn (gen_add2_insn (this_rtx, scratch_rtx)); + insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx)); + if (frame_related_p) +{ + RTX_FRAME_RELATED_P (insn) = frame_related_p; + rtx adj = plus_constant (mode, this_rtx, delta); + add_reg_note (insn , REG_CFA_ADJUST_CFA, gen_rtx_SET (this_rtx, adj)); +} } static bool @@ -3038,36 +3052,7 @@ aarch64_expand_prologue (void) frame_size -= (offset + crtl->outgoing_args_size); fp_offset = 0; - if (frame_size >= 0x100) - { - rtx op0 = gen_rtx_REG (Pmode, IP0_REGNUM); - emit_mov
Re: [PATCH] Call get_ops just for SSA_NAMEs (PR tree-optimization/71987)
Hi Martin, On 25/07/16 18:56, Martin Liška wrote: Hi. As other calls of get_ops is guarded with TREE_CODE (x) == SSA_NAME, I guess the same should be done for the call that causes the ICE. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? Martin 0001-Call-get_ops-just-for-SSA_NAMEs-PR-tree-optimization.patch From 9dd5cc00eb3c271fad91fede6a9b06df356e001f Mon Sep 17 00:00:00 2001 From: marxin Date: Mon, 25 Jul 2016 09:19:54 +0200 Subject: [PATCH] Call get_ops just for SSA_NAMEs (PR tree-optimization/71987) gcc/ChangeLog: 2016-07-25 Martin Liska PR tree-optimization/71987 * tree-ssa-reassoc.c (maybe_optimize_range_tests): Call get_ops just for SSA_NAMEs. gcc/testsuite/ChangeLog: 2016-07-25 Martin Liska * gcc.dg/torture/pr71987.c: New test. --- gcc/testsuite/gcc.dg/torture/pr71987.c | 21 + gcc/tree-ssa-reassoc.c | 10 ++ 2 files changed, 27 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr71987.c diff --git a/gcc/testsuite/gcc.dg/torture/pr71987.c b/gcc/testsuite/gcc.dg/torture/pr71987.c new file mode 100644 index 000..87d5938 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr71987.c @@ -0,0 +1,21 @@ +/* PR tree-optimization/71987 */ + +int a, b, *c, *d; + +short fn1 (int p1) +{ + return a ? p1 : a; +} + +void fn2 () +{ + int e, *f = &e; + b = fn1 (d != &e); + c = f; +} + +int main () +{ + fn2 (); + return 0; +} diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index ece2d08..dfaf7d8 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -3519,7 +3519,8 @@ maybe_optimize_range_tests (gimple *stmt) (or &, corresponding to 1/0 in the phi arguments, push into ops the individual range test arguments of the bitwise or resp. and, recursively. */ - if (!get_ops (rhs, code, &ops, + if (TREE_CODE (rhs) == SSA_NAME + && !get_ops (rhs, code, &ops, loop_containing_stmt (stmt)) && (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) != tcc_comparison) @@ -3537,11 +3538,12 @@ maybe_optimize_range_tests (gimple *stmt) bb_ent.last_idx++; bb_ent.op = rhs; } - else if (is_gimple_assign (stmt) + else if (TREE_CODE (lhs) == SSA_NAME + && is_gimple_assign (stmt) && (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison) - &&!get_ops (lhs, code, &ops, - loop_containing_stmt (stmt)) + && !get_ops (lhs, code, &ops, + loop_containing_stmt (stmt)) && has_single_use (lhs)) { operand_entry *oe = operand_entry_pool.allocate (); Sorry about the breakage. Since final_range_test_p allows either lhs or rhs to be SSA_NAME (for the different cases it accepts), we should indeed check for TREE_CODE being SSA_NAME. Unfortunately it didn't trigger in my testing. Lets wait for the maintainers conformation. Thanks for working on this, Kugan
Re: [PATCH] Call get_ops just for SSA_NAMEs (PR tree-optimization/71987)
On 07/25/2016 11:51 AM, kugan wrote: > Sorry about the breakage. Since final_range_test_p allows either lhs or rhs > to be SSA_NAME (for the different cases it accepts), we should indeed check > for TREE_CODE being SSA_NAME. Unfortunately it didn't trigger in my testing. > Lets wait for the maintainers conformation. > > Thanks for working on this, > Kugan You're welcome, I noticed that I see breakage for e.g. Firefox or Inkscape (-O2), while linux-kernel or GIMP work fine. Martin
Re: [PATCH] Teach VRP to register assertions along default switch labels (PR 18046)
On Mon, Jul 25, 2016 at 5:38 AM, Patrick Palka wrote: > On Fri, 22 Jul 2016, Patrick Palka wrote: > >> On Fri, 22 Jul 2016, Patrick Palka wrote: >> >> > On Fri, 22 Jul 2016, Patrick Palka wrote: >> > >> > > This patch teaches VRP to register along a default switch label >> > > assertions that corresponds to the anti range of each case label. >> > > >> > > Does this look OK to commit after bootstrap + regtest on >> > > x86_64-pc-linux-gnu? >> > >> > Forgot the changelog: >> > >> > gcc/ChangeLog: >> > >> > PR tree-optimization/18046 >> > * tree-vrp.c (find_switch_asserts): Register edge assertions >> > for the default label which correspond to the anti-ranges >> > of each non-case label. >> > >> > gcc/testsuite/ChangeLog: >> > >> > PR tree-optimization/18046 >> > * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Bump FSM count to 5. >> > * gcc.dg/tree-ssa/vrp103.c: New test. >> > * gcc.dg/tree-ssa/vrp104.c: New test. >> > >> >> The patch failed to bootstrap due to a number -Warray-bounds warnings >> getting emitted for the autogenerated header file insn-modes.h: >> >> In file included from /home/patrick/code/gcc/gcc/machmode.h:24:0, >> from /home/patrick/code/gcc/gcc/coretypes.h:391, >> from /home/patrick/code/gcc/gcc/lto-streamer-out.c:25: >> ./insn-modes.h: In function ‘void produce_asm_for_decls()’: >> ./insn-modes.h:638:36: warning: array subscript is outside array bounds >> [-Warray-bounds] >> default: return mode_inner[mode]; >> ~~~^ >> >> These new warnings seem legitimate. From what I can tell, this array >> access along the default switch label will always be out of bounds. The >> code it's warning about basically looks like this: >> >> enum A { a, b, c }; >> int foo (A i) >> { >> int array[3]; >> switch (i) >> { >> case a: return x; >> case b: return y; >> case c: return z; >> default: return array[i]; >> } >> } >> >> and while the switch does exhaust every possible enumeration value of A, >> there's nothing stopping the user from passing an arbitrary int to >> foo() thus triggering the default case. So this patch suppresses these >> warnings by making genemit emit an assertion that verifies that mode is >> within the bounds of the array. This assertion won't affect performance >> because the mode_*_inline functions are always called with constant >> arguments. >> >> This version of the patch has the following changes: >> >> 1. Fixes the bootstrap failure as mentioned above. >> 2. Checks if the switch operand is live along the default edge before >> registering assertions. >> 3. Combines contiguous case ranges to reduce the number of assert >> expressions to insert. >> >> Bootstrap + regtesting in progress on x86_64-pc-linux-gnu. >> >> gcc/ChangeLog: >> >> PR tree-optimization/18046 >> * genmodes.c (emit_mode_size_inline): Emit an assert that >> verifies that mode is a valid array index. >> (emit_mode_nuinits_inline): Likewise. >> (emit_mode_inner_inline): Likewise. >> (emit_mode_unit_size_inline): Likewise. >> (emit_mode_unit_precision_inline): Likewise. >> * tree-vrp.c (compare_case_label_values): New qsort callback. >> (find_switch_asserts): Register edge assertions for >> the default label, which correspond to the anti-range of each >> non-case label. >> >> gcc/testsuite/ChangeLog: >> >> PR tree-optimization/18046 >> * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Bump FSM count to 5. >> * gcc.dg/tree-ssa/vrp103.c: New test. >> * gcc.dg/tree-ssa/vrp104.c: New test. > > Here's another version of the patch, which has the following changes: > > 1. Use wide-int arithmetic directly instead of tree arithmetic. > 2. Don't bother re-sorting and re-using the ci vector. Instead just > inspect gimple_switch_label() directly since cases are already sorted > there by CASE_LOW. > 3. Add an insertion limit of 10 and a tunable param. > > Bootstrapped + regtested on x86_64-pc-linux-gnu. Ok. The only thing I wonder is whether VRP does a good job combining non-adjacent anti-ranges or if the result is sth non-sensible. ISTR VRP simply chooses A when combining A and B which would mean inserting asserts will only provide better ranges for more than one asserts via equivalences (which might be good enough, of course). I think the anti-range merge behavior is good but I for example wonder about the behavior for ~[0, n] which will end up as [n, +INF] for unsigned. Also the combining limitation will make ranges derived from the switch parameter less useful, like switch (x) { default: x++; where x++ will only be derived from the merged anti-range. All these are existing VRP range representation limitations of course. Thanks, Richard. > gcc/ChangeLog: > > PR tree-optimization/18046 > * genmodes.c (emit_mode_size_inline): Emit an assert that > ver
Re: [RFC][IPA-VRP] Disable setting param of __builtin_constant_p to null
On Mon, Jul 25, 2016 at 8:59 AM, kugan wrote: > Hi, > >>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c >>> index ecfab1f..23c12b5 100644 >>> --- a/gcc/tree-vrp.c >>> +++ b/gcc/tree-vrp.c >>> @@ -3759,8 +3759,10 @@ extract_range_basic (value_range *vr, gimple >>> *stmt) >>> && SSA_NAME_IS_DEFAULT_DEF (arg) >>> && TREE_CODE (SSA_NAME_VAR (arg)) == PARM_DECL) >>> { >>> +#if 0 >>> set_value_range_to_null (vr, type); >>> return; >>> +#endif >> >> >> It is not cleanest either, but better to test cfun->after_inlining > > > Thanks. Here is the patch which does this. Bootstrapped and regression > tested with the rest of the patches in the series. Is this OK for trunk? Ok. Richard. > Thanks, > Kugan > > gcc/ChangeLog: > > 2016-07-25 Kugan Vivekanandarajah > > * tree-vrp.c (extract_range_basic): Check cfun->after_inlining > before > folding call to __builtin_constant_p with parameters to false.
Re: [PATCH] Handle loops with loop->latch == NULL (PR gcov-profile/71868)
On Mon, Jul 25, 2016 at 10:55 AM, Martin Liška wrote: > Hi. > > As discussed with Honza, we should sum all edge frequencies when a loop > has multiple latches. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? - if (e->src == loop->latch) - freq_latch = EDGE_FREQUENCY (e); + if (e->src == loop->latch + || flow_bb_inside_loop_p (loop, e->src)) + freq_latch += EDGE_FREQUENCY (e); the e->src == loop->latch condition is redundant now. Richard. > Martin
Re: [PATCH] Fix memory leak introduced in r238336
On Mon, Jul 25, 2016 at 10:58 AM, Martin Liška wrote: > Hi. > > This is quite obvious change. > > I've been waiting for bootstrap and regression tests on ppc64le-redhat-linux. > Ready after it finishes? Ok. Richard. > Martin
Re: [PATCH] Call get_ops just for SSA_NAMEs (PR tree-optimization/71987)
On Mon, Jul 25, 2016 at 10:56 AM, Martin Liška wrote: > Hi. > > As other calls of get_ops is guarded with TREE_CODE (x) == SSA_NAME, I guess > the > same should be done for the call that causes the ICE. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? - else if (is_gimple_assign (stmt) + else if (TREE_CODE (lhs) == SSA_NAME + && is_gimple_assign (stmt) && (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison) - &&!get_ops (lhs, code, &ops, - loop_containing_stmt (stmt)) + && !get_ops (lhs, code, &ops, + loop_containing_stmt (stmt)) the check is redundant here (the is_gimple_assign check as well). If the RHS code is a tcc_comparison the lhs has to be an SSA name. Ok with the is_gimple_assign check removed instead. Thanks, Richard. > Martin
[PATCH] Use std::mt19937, std::thread and std::atomic to simplify tests
These tests that hammer shared_ptr and weak_ptr to test the thread-safety of the reference counting were originally written for the TR1 smart pointers, and so pre-date std::thread etc. This refactors them to use C++11 std::thread and std::atomic (with relaxed memory ordering, since we don't need to know the counter value until after we've joined all the worker threads). I also used the std::mt19937 typedef that uses the same parameters as the test lists explicitly (mt19937 produces uint_fast32_t not unsigned long, but that doesn't change anything here) and replaced the non-uniform modulus operation with std::uniform_int_distribution. To verify that after the changes we still test what we're trying to test, I changed the shared_ptr code to remove the synchronization in _M_add_ref_lock and saw the tests fail. * testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc: Use std::mt19937, std::thread and std::atomic to simplify test. * testsuite/20_util/shared_ptr/thread/mutex_weaktoshared.cc: Likewise. Tested x86_64-linux (8 HT cores) and powerpc64-linux (160 cores). Committed to trunk. commit b2f86d9d04f9561b5189eccaa85e5d2b21884de4 Author: Jonathan Wakely Date: Mon Jul 25 10:32:11 2016 +0100 Use std::mt19937, std::thread and std::atomic to simplify tests * testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc: Use std::mt19937, std::thread and std::atomic to simplify test. * testsuite/20_util/shared_ptr/thread/mutex_weaktoshared.cc: Likewise. diff --git a/libstdc++-v3/testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc b/libstdc++-v3/testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc index d5b9d4f..a853307 100644 --- a/libstdc++-v3/testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc +++ b/libstdc++-v3/testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc @@ -25,11 +25,11 @@ #include #include #include -#include #include #include - -#include +#include +#include +#include #ifdef _GLIBCXX_HAVE_UNISTD_H #include // To test for _POSIX_THREAD_PRIORITY_SCHEDULING @@ -50,19 +50,13 @@ const unsigned long HAMMER_REPEAT = 10; const unsigned long KILL_ONE_IN = 1000; struct A - { -static _Atomic_word counter; -A() - { - __gnu_cxx::__atomic_add(&counter, 1); - } -~A() - { - __gnu_cxx::__atomic_add(&counter, -1); - } - }; +{ + static std::atomic counter; + A() { counter.fetch_add(1, std::memory_order_relaxed); } + ~A() { counter.fetch_sub(1, std::memory_order_relaxed); } +}; -_Atomic_word A::counter = 0; +std::atomic A::counter{ 0 }; typedef std::shared_ptr sp_A_t; typedef std::weak_ptr wp_A_t; @@ -80,16 +74,10 @@ struct shared_and_weak_pools { } }; -void* thread_hammer_and_kill(void* opaque_pools) +void thread_hammer_and_kill(shared_and_weak_pools& pools) { - shared_and_weak_pools& pools = *static_cast(opaque_pools); - // Using the same parameters as in the RNG test cases. - std::mersenne_twister_engine< -unsigned long, 32, 624, 397, 31, -0x9908b0dful, 11, -0xul, 7, -0x9d2c5680ul, 15, -0xefc6ul, 18, 1812433253ul> rng; + std::mt19937 urbg; + std::uniform_int_distribution<> dist(0, KILL_ONE_IN - 1); sp_vector_t::iterator cur_shared = pools.shared_pool.begin(); wp_vector_t::iterator cur_weak = pools.weak_pool.begin(); @@ -107,26 +95,16 @@ void* thread_hammer_and_kill(void* opaque_pools) break; } - if (rng() % KILL_ONE_IN == 0) + if (dist(urbg) == 0) { cur_shared->reset(); ++cur_shared; } } - return 0; } -void* thread_hammer(void* opaque_weak) +void thread_hammer(wp_vector_t& weak_pool) { - wp_vector_t& weak_pool = *static_cast(opaque_weak); - // Using the same parameters as in the RNG test cases. - std::mersenne_twister_engine< -unsigned long, 32, 624, 397, 31, -0x9908b0dful, 11, -0xul, 7, -0x9d2c5680ul, 15, -0xefc6ul, 18, 1812433253ul> rng; - wp_vector_t::iterator cur_weak = weak_pool.begin(); for (unsigned int i = 0; i < HAMMER_REPEAT; ++i) @@ -142,51 +120,38 @@ void* thread_hammer(void* opaque_weak) break; } } - return 0; } -int +void test01() { bool test __attribute__((unused)) = true; sp_vector_t obj_pool(POOL_SIZE); - for(sp_vector_t::iterator cur = obj_pool.begin(); cur != obj_pool.end(); ++cur) - { -cur->reset(new A); - } + for(auto& obj : obj_pool) +obj.reset(new A); + // Obtain weak references. std::vector weak_pool(HAMMER_MAX_THREADS, wp_vector_t(obj_pool.begin(), obj_pool.end())); // Launch threads with pointer to weak reference. - pthread_t threads[HAMMER_MAX_THREADS]; + std::thread threads[HAMMER_MAX_THREADS]; #if defined(__sun) && defined(__svr4__) && _XOPEN_VERSION >= 500 pthread_setconcurrency (HAMMER_MAX_THREADS); #endif - pthread_attr_t tattr; - pthread_attr_ini
Re: [PATCH/AARCH64] Add scheduler for vulcan.
On Wed, Jul 20, 2016 at 03:07:45PM +0530, Virendra Pathak wrote: > Hi gcc-patches group, > > Please find the patch for adding the basic scheduler for vulcan > in the aarch64 port. > > Tested the patch with compiling cross aarch64-linux-gcc, > bootstrapped native aarch64-unknown-linux-gnu and > run gcc regression. > > Kindly review and merge the patch to trunk, if the patch is okay. > > There are few TODO in this patch which we have planned to > submit in the next submission e.g. crc and crypto > instructions, further improving integer & fp load/store > based on addressing mode of the address. Hi Virendra, Thanks for the patch, I have some concerns about the size of the automata that this description generates. As you can see (use (automata_option "stats") in the description to enable statistics) this scheduler description generates a 10x larger automata for Vulcan than the second largest description we have for AArch64 (cortex_a53_advsimd): Automaton `cortex_a53_advsimd' 9072 NDFA states, 49572 NDFA arcs 9072 DFA states, 49572 DFA arcs 4050 minimal DFA states, 23679 minimal DFA arcs 368 all insns 11 insn equivalence classes 0 locked states 28759 transition comb vector els, 44550 trans table els: use simple vect 44550 min delay table els, compression factor 2 Automaton `vulcan' 103223 NDFA states, 651918 NDFA arcs 103223 DFA states, 651918 DFA arcs 45857 minimal DFA states, 352255 minimal DFA arcs 368 all insns 28 insn equivalence classes 0 locked states 429671 transition comb vector els, 1283996 trans table els: use comb vect 1283996 min delay table els, compression factor 2 Such a large automaton increases compiler build time and memory consumption, often for little scheduling benefit. Normally such a large automaton comes from using a large repeat expression (*). For example in your modeling of divisions: > +(define_insn_reservation "vulcan_div" 13 > + (and (eq_attr "tune" "vulcan") > + (eq_attr "type" "sdiv,udiv")) > + "vulcan_i1*13") > + > +(define_insn_reservation "vulcan_fp_divsqrt_s" 16 > + (and (eq_attr "tune" "vulcan") > + (eq_attr "type" "fdivs,fsqrts")) > + "vulcan_f0*8|vulcan_f1*8") > + > +(define_insn_reservation "vulcan_fp_divsqrt_d" 23 > + (and (eq_attr "tune" "vulcan") > + (eq_attr "type" "fdivd,fsqrtd")) > + "vulcan_f0*12|vulcan_f1*12") In other pipeline models, we try to keep these repeat numbers low to avoid the large state-space growth they cause. For example, the Cortex-A57 pipeline model describes them as: (define_insn_reservation "cortex_a57_fp_divd" 16 (and (eq_attr "tune" "cortexa57") (eq_attr "type" "fdivd, fsqrtd, neon_fp_div_d, neon_fp_sqrt_d")) "ca57_cx2_block*3") The lower accuracy is acceptable because of the nature of the scheduling model. For a machine with an issue rate of "4" like Vulcan, each cycle the compiler models it tries to find four instructions to schedule, before it progresses the state of the automaton. If an instruction is modelled as blocking the "vulcan_i1" unit for 13 cycles, that means up to 52 instructions that the scheduler would have to find before issuing the next instruction which would use vulcan_i1. Because scheduling works within basic-blocks, the chance of finding so many independent instructions is extremely low, and so you'd never see the benefit of the 13-cycle block. I tried lowering the repeat expressions as so: > +(define_insn_reservation "vulcan_div" 13 > + (and (eq_attr "tune" "vulcan") > + (eq_attr "type" "sdiv,udiv")) > + "vulcan_i1*3") > + > +(define_insn_reservation "vulcan_fp_divsqrt_s" 16 > + (and (eq_attr "tune" "vulcan") > + (eq_attr "type" "fdivs,fsqrts")) > + "vulcan_f0*3|vulcan_f1*3") > + > +(define_insn_reservation "vulcan_fp_divsqrt_d" 23 > + (and (eq_attr "tune" "vulcan") > + (eq_attr "type" "fdivd,fsqrtd")) > + "vulcan_f0*5|vulcan_f1*5") Which more than halves the size of the generated automaton: Automaton `vulcan' 45370 NDFA states, 319261 NDFA arcs 45370 DFA states, 319261 DFA arcs 20150 minimal DFA states, 170824 minimal DFA arcs 368 all insns 28 insn equivalence classes 0 locked states 215565 transition comb vector els, 564200 trans table els: use comb vect 564200 min delay table els, compression factor 2 The other technique we use to reduce the size of the generated automaton is to split off the AdvSIMD/FP model from the main pipeline description (the thunderx _main, thunderx_mult, thunderx_divide, and thunderx_simd models take this approach even further and acheieve very small automaton as a result) A change like wiring the vulcan_f0 and vulcan_f1 reservations to be cpu_units of a new define_automaton "vulcan_advsimd" would cut the size of the automaton by half again: Automaton `vulcan' 8520 NDFA states, 52754 NDFA arcs 8520 DFA states,
[PATCHv2, PING][ARM] -mpure-code option for ARM
On 11/07/16 17:56, Andre Vieira (lists) wrote: > On 07/07/16 13:30, mickael guene wrote: >> Hi Andre, >> >> Another feedback on your purecode patch. >> You have to disable casesi pattern since then it will >> generate wrong code with -mpure-code option. >> Indeed it will generate an 'adr rx, .Lx' (aka >> 'subs rx, PC, #offset') which will not work in our >> case since 'Lx' label is put in an .rodata section. >> So offset value is unknown and can be impossible >> to encode correctly. >> >> Regards >> Mickael >> >> On 06/30/2016 04:32 PM, Andre Vieira (lists) wrote: >>> Hello, >>> >>> This patch adds the -mpure-code option for ARM. This option ensures >>> functions are put into sections that contain only code and no data. To >>> ensure this throughout compilation we give these sections the ARM >>> processor-specific ELF section attribute "SHF_ARM_PURECODE". This option >>> is only supported for non-pic code for armv7-m targets. >>> >>> This patch introduces a new target hook 'TARGET_ASM_ELF_FLAGS_NUMERIC'. >>> This target hook enables a target to use the numeric value for elf >>> section attributes rather than their alphabetical representation. If >>> TARGET_ASM_ELF_FLAGS_NUMERIC returns TRUE, the existing >>> 'default_elf_asm_named_section', will print the numeric value of the >>> section attributes for the current section. This target hook has two >>> parameters: >>> unsigned int FLAGS, the input parameter that tells the function the >>> current section's attributes; >>> unsigned int *NUM, used to pass down the numerical representation of the >>> section's attributes. >>> >>> The default implementation for TARGET_ASM_ELF_FLAGS_NUMERIC will return >>> false, so existing behavior is not changed. >>> >>> Bootstrapped and tested for arm-none-linux-gnueabihf. Further tested for >>> arm-none-eabi with a Cortex-M3 target. >>> >>> >>> gcc/ChangeLog: >>> 2016-06-30 Andre Vieira >>> Terry Guo >>> >>> * target.def (elf_flags_numeric): New target hook. >>> * targhooks.h (default_asm_elf_flags_numeric): New. >>> * varasm.c (default_asm_elf_flags_numeric): New. >>> (default_elf_asm_named_section): Use new target hook. >>> * config/arm/arm.opt (mpure-code): New. >>> * config/arm/arm.h (SECTION_ARM_PURECODE): New. >>> * config/arm/arm.c (arm_asm_init_sections): Add section >>> attribute to default text section if -mpure-code. >>> (arm_option_check_internal): Diagnose use of option with >>> non supported targets and/or options. >>> (arm_asm_elf_flags_numeric): New. >>> (arm_function_section): New. >>> (arm_elf_section_type_flags): New. >>> * config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Disable >>> for -mpure-code. >>> * gcc/doc/texi (TARGET_ASM_ELF_FLAGS_NUMERIC): New. >>> * gcc/doc/texi.in (TARGET_ASM_ELF_FLAGS_NUMERIC): Likewise. >>> >>> >>> >>> gcc/testsuite/ChangeLog: >>> 2016-06-30 Andre Vieira >>> Terry Guo >>> >>> * gcc.target/arm/pure-code/ffunction-sections.c: New. >>> * gcc.target/arm/pure-code/no-literal-pool.c: New. >>> * gcc.target/arm/pure-code/pure-code.exp: New. >>> >> > Hi Sandra, Mickael, > > Thank you for your comments. I changed the description of -mpure-code in > invoke.texi to better reflect the error message you get wrt supported > targets. > > As for the target hook description, I hope the text is clearer now. Let > me know if you think it needs further explanation. > > I also fixed the double '%' in the text string for unnamed text sections > and disabled the casesi pattern. > > I duplicated the original casesi test > 'gcc/testsuite/gcc.c-torture/compile/pr46934.c' for pure-code to make > sure the casesi was disabled and other patterns were selected instead. > > Reran regressions for pure-code.exp for Cortex-M3. > > Cheers, > Andre > > > gcc/ChangeLog: > 2016-07-11 Andre Vieira > Terry Guo > > * target.def (elf_flags_numeric): New target hook. > * hooks.c (hook_uint_uintp_false): New generic hook. > * varasm.c (default_elf_asm_named_section): Use new target hook. > * config/arm/arm.opt (mpure-code): New. > * config/arm/arm.h (SECTION_ARM_PURECODE): New. > * config/arm/arm.c (arm_asm_init_sections): Add section > attribute to default text section if -mpure-code. > (arm_option_check_internal): Diagnose use of option with > non supported targets and/or options. > (arm_asm_elf_flags_numeric): New. > (arm_function_section): New. > (arm_elf_section_type_flags): New. > * config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Disable > for -mpure-code. > * config/arm/arm.md (casesi): Disable pattern for > -mpure-code. > * gcc/doc/texi (TARGET_ASM_ELF_FLAGS_NUMERIC): New. > * gcc/doc/texi.in (TARGET_ASM_ELF_FLAGS_NUMERIC): Likewise. >
[Fortran, Patch, pr70524, v1] [5/6/7 Regression] ICE when using -frepack-arrays -Warray-temporaries
Hi all, the attached patch fixes the ICE when the options in the title are used. Two issues caused this ICE: 1. the error-printing routines relied on the locus.nextc which was not set by the gfc_set_backend_locus() and is now set by the patch (locally, not in gfc_set_backend_locus()). 2. the locus of the function whose result triggered the warning (and with it the ICE) was set too late. Both issues are addressed by the patch. Albeit I am not quite sure, whether my solution to 1. (the first chunk in the patch) is ok this way. Bootstraps and regtests ok on x86_64-linux-gnu/F23. Ok for trunk? And with one week delay for gcc-5- and -6-branch? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de pr70524_1.clog Description: Binary data diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index e95c8dd..d52cd11 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -6103,7 +6103,12 @@ gfc_trans_dummy_array_bias (gfc_symbol * sym, tree tmpdesc, return; } + loc.nextc = NULL; gfc_save_backend_locus (&loc); + /* loc.nextc is not set by save_backend_locus but the location routines + depend on it. */ + if (loc.nextc == NULL) +loc.nextc = loc.lb->line; gfc_set_backend_locus (&sym->declared_at); /* Descriptor type. */ diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c index 05dfcb4..2a34a4c 100644 --- a/gcc/fortran/trans-decl.c +++ b/gcc/fortran/trans-decl.c @@ -4087,6 +4087,8 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block) else if (proc_sym->as) { tree result = TREE_VALUE (current_fake_result_decl); + gfc_save_backend_locus (&loc); + gfc_set_backend_locus (&proc_sym->declared_at); gfc_trans_dummy_array_bias (proc_sym, result, block); /* An automatic character length, pointer array result. */ @@ -4096,8 +4098,6 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block) tmp = NULL; if (proc_sym->ts.deferred) { - gfc_save_backend_locus (&loc); - gfc_set_backend_locus (&proc_sym->declared_at); gfc_start_block (&init); tmp = gfc_null_and_pass_deferred_len (proc_sym, &init, &loc); gfc_add_init_cleanup (block, gfc_finish_block (&init), tmp); diff --git a/gcc/testsuite/gfortran.dg/dependency_48.f90 b/gcc/testsuite/gfortran.dg/dependency_48.f90 new file mode 100644 index 000..6470019 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/dependency_48.f90 @@ -0,0 +1,26 @@ +! { dg-do compile } +! { dg-options "-frepack-arrays -Warray-temporaries -O" } + +! Same as dependency_35 but with repack-arrays + +module foo + implicit none +contains + pure function bar(i,j) ! { dg-warning "Creating array temporary at \\(1\\)" } +integer, intent(in) :: i,j +integer, dimension(2,2) :: bar +bar = 33 + end function bar +end module foo + +program main + use foo + implicit none + integer a(2,2), b(2,2),c(2,2), d(2,2), e(2) + + read (*,*) b, c, d + a = matmul(b,c) + d + a = b + bar(3,4) + a = bar(3,4)*5 + b + e = sum(b,1) + 3 +end program main
Re: [AArch64][2/14] ARMv8.2-A FP16 one operand vector intrinsics
On Wed, Jul 20, 2016 at 06:00:34PM +0100, Jiong Wang wrote: > On 07/07/16 17:14, Jiong Wang wrote: > >This patch add ARMv8.2-A FP16 one operand vector intrinsics. > > > >We introduced new mode iterators to cover HF modes, qualified patterns > >which was using old mode iterators are switched to new ones. > > > >We can't simply extend old iterator like VDQF to conver HF modes, > >because not all patterns using VDQF are with new FP16 support, thus we > >introduced new, temperary iterators, and only apply new iterators on > >those patterns which do have FP16 supports. > > I noticed the patchset at > > https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00308.html > > has some modifications on the standard name "div" and "sqrt", thus there > are minor conflicts as this patch touch "sqrt" as well. > > This patch resolve the conflict and the change is to let > aarch64_emit_approx_sqrt simply return false for V4HFmode and V8HFmode. This is OK for trunk, with one modification... > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > 58a9d695c0ef9e6e1d67030580428699aba05be4..5ed633542efe58763d68fd9bfbb478ae6ef569c3 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -7598,6 +7598,10 @@ bool > aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp) > { >machine_mode mode = GET_MODE (dst); > + > + if (mode == V4HFmode || mode == V8HFmode) > +return false; > + Given that you don't plan to handle any HFmode modes here, I'd prefer: if (GET_MODE_INNER (mode) == HFmode) return false; That'll save you from updating this again in patch 7/14. Otherwise, this is OK. Thanks, James > gcc/ > 2016-07-20 Jiong Wang > > * config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New. > * config/aarch64/aarch64-simd-builtins.def: Register new builtins. > * config/aarch64/aarch64-simd.md (aarch64_rsqrte): Extend to HF > modes. > (neg2): Likewise. > (abs2): Likewise. > (2): Likewise. > (l2): Likewise. > (2): Likewise. > (2): Likewise. > (ftrunc2): Likewise. > (2): Likewise. > (sqrt2): Likewise. > (*sqrt2): Likewise. > (aarch64_frecpe): Likewise. > (aarch64_cm): Likewise. > * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return > false for V4HF and V8HF. > * config/aarch64/iterators.md (VHSDF, VHSDF_DF, VHSDF_SDF): New. > (VDQF_COND, fcvt_target, FCVT_TARGET, hcon): Extend mode attribute to > HF modes. > (stype): New. > * config/aarch64/arm_neon.h (vdup_n_f16): New. > (vdupq_n_f16): Likewise. > (vld1_dup_f16): Use vdup_n_f16. > (vld1q_dup_f16): Use vdupq_n_f16. > (vabs_f16): New. > (vabsq_f16): Likewise. > (vceqz_f16): Likewise. > (vceqzq_f16): Likewise. > (vcgez_f16): Likewise. > (vcgezq_f16): Likewise. > (vcgtz_f16): Likewise. > (vcgtzq_f16): Likewise. > (vclez_f16): Likewise. > (vclezq_f16): Likewise. > (vcltz_f16): Likewise. > (vcltzq_f16): Likewise. > (vcvt_f16_s16): Likewise. > (vcvtq_f16_s16): Likewise. > (vcvt_f16_u16): Likewise. > (vcvtq_f16_u16): Likewise. > (vcvt_s16_f16): Likewise. > (vcvtq_s16_f16): Likewise. > (vcvt_u16_f16): Likewise. > (vcvtq_u16_f16): Likewise. > (vcvta_s16_f16): Likewise. > (vcvtaq_s16_f16): Likewise. > (vcvta_u16_f16): Likewise. > (vcvtaq_u16_f16): Likewise. > (vcvtm_s16_f16): Likewise. > (vcvtmq_s16_f16): Likewise. > (vcvtm_u16_f16): Likewise. > (vcvtmq_u16_f16): Likewise. > (vcvtn_s16_f16): Likewise. > (vcvtnq_s16_f16): Likewise. > (vcvtn_u16_f16): Likewise. > (vcvtnq_u16_f16): Likewise. > (vcvtp_s16_f16): Likewise. > (vcvtpq_s16_f16): Likewise. > (vcvtp_u16_f16): Likewise. > (vcvtpq_u16_f16): Likewise. > (vneg_f16): Likewise. > (vnegq_f16): Likewise. > (vrecpe_f16): Likewise. > (vrecpeq_f16): Likewise. > (vrnd_f16): Likewise. > (vrndq_f16): Likewise. > (vrnda_f16): Likewise. > (vrndaq_f16): Likewise. > (vrndi_f16): Likewise. > (vrndiq_f16): Likewise. > (vrndm_f16): Likewise. > (vrndmq_f16): Likewise. > (vrndn_f16): Likewise. > (vrndnq_f16): Likewise. > (vrndp_f16): Likewise. > (vrndpq_f16): Likewise. > (vrndx_f16): Likewise. > (vrndxq_f16): Likewise. > (vrsqrte_f16): Likewise. > (vrsqrteq_f16): Likewise. > (vsqrt_f16): Likewise. > (vsqrtq_f16): Likewise. >
Re: [AArch64][3/14] ARMv8.2-A FP16 two operands vector intrinsics
On Wed, Jul 20, 2016 at 06:00:46PM +0100, Jiong Wang wrote: > On 07/07/16 17:15, Jiong Wang wrote: > >This patch add ARMv8.2-A FP16 two operands vector intrinsics. > > The updated patch resolve the conflict with > >https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html > > The change is to let aarch64_emit_approx_div return false for > V4HFmode and V8HFmode. As with patch 2/14, please rewrite this hunk: > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > 5ed633542efe58763d68fd9bfbb478ae6ef569c3..a7437c04eb936a5e3ebd0bc77eb4afd8c052df28 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -7717,6 +7717,10 @@ bool > aarch64_emit_approx_div (rtx quo, rtx num, rtx den) > { >machine_mode mode = GET_MODE (quo); > + > + if (mode == V4HFmode || mode == V8HFmode) > +return false; > + To: if (GET_MODE_INNER (mode) == HFmode) return false; Otherwise, this patch is OK for trunk. Thanks, James > gcc/ > 2016-07-20 Jiong Wang > > * config/aarch64/aarch64-simd-builtins.def: Register new builtins. > * config/aarch64/aarch64-simd.md > (aarch64_rsqrts): Extend to HF modes. > (fabd3): Likewise. > (3): Likewise. > (3): Likewise. > (aarch64_p): Likewise. > (3): Likewise. > (3): Likewise. > (3): Likewise. > (aarch64_faddp): Likewise. > (aarch64_fmulx): Likewise. > (aarch64_frecps): Likewise. > (*aarch64_fac): Rename to aarch64_fac. > (add3): Extend to HF modes. > (sub3): Likewise. > (mul3): Likewise. > (div3): Likewise. > (*div3): Likewise. > * config/aarch64/aarch64.c (aarch64_emit_approx_div): Return > false for V4HF and V8HF. > * config/aarch64/iterators.md (VDQ_HSDI, VSDQ_HSDI): New mode > iterator. > * config/aarch64/arm_neon.h (vadd_f16): Likewise. > (vaddq_f16): Likewise. > (vabd_f16): Likewise. > (vabdq_f16): Likewise. > (vcage_f16): Likewise. > (vcageq_f16): Likewise. > (vcagt_f16): Likewise. > (vcagtq_f16): Likewise. > (vcale_f16): Likewise. > (vcaleq_f16): Likewise. > (vcalt_f16): Likewise. > (vcaltq_f16): Likewise. > (vceq_f16): Likewise. > (vceqq_f16): Likewise. > (vcge_f16): Likewise. > (vcgeq_f16): Likewise. > (vcgt_f16): Likewise. > (vcgtq_f16): Likewise. > (vcle_f16): Likewise. > (vcleq_f16): Likewise. > (vclt_f16): Likewise. > (vcltq_f16): Likewise. > (vcvt_n_f16_s16): Likewise. > (vcvtq_n_f16_s16): Likewise. > (vcvt_n_f16_u16): Likewise. > (vcvtq_n_f16_u16): Likewise. > (vcvt_n_s16_f16): Likewise. > (vcvtq_n_s16_f16): Likewise. > (vcvt_n_u16_f16): Likewise. > (vcvtq_n_u16_f16): Likewise. > (vdiv_f16): Likewise. > (vdivq_f16): Likewise. > (vdup_lane_f16): Likewise. > (vdup_laneq_f16): Likewise. > (vdupq_lane_f16): Likewise. > (vdupq_laneq_f16): Likewise. > (vdups_lane_f16): Likewise. > (vdups_laneq_f16): Likewise. > (vmax_f16): Likewise. > (vmaxq_f16): Likewise. > (vmaxnm_f16): Likewise. > (vmaxnmq_f16): Likewise. > (vmin_f16): Likewise. > (vminq_f16): Likewise. > (vminnm_f16): Likewise. > (vminnmq_f16): Likewise. > (vmul_f16): Likewise. > (vmulq_f16): Likewise. > (vmulx_f16): Likewise. > (vmulxq_f16): Likewise. > (vpadd_f16): Likewise. > (vpaddq_f16): Likewise. > (vpmax_f16): Likewise. > (vpmaxq_f16): Likewise. > (vpmaxnm_f16): Likewise. > (vpmaxnmq_f16): Likewise. > (vpmin_f16): Likewise. > (vpminq_f16): Likewise. > (vpminnm_f16): Likewise. > (vpminnmq_f16): Likewise. > (vrecps_f16): Likewise. > (vrecpsq_f16): Likewise. > (vrsqrts_f16): Likewise. > (vrsqrtsq_f16): Likewise. > (vsub_f16): Likewise. > (vsubq_f16): Likewise.
Re: [PATCH] correct atomic_compare_exchange_n return type (c++/71675)
Hi Martin, I observed the following error: ERROR: gcc.dg/atomic/pr71675.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects : syntax error in target selector "target c11" for " dg-do 3 compile { target c11 } " It seems we don't have a c11 effective target check available in dejagnu target-supports.exp. Thanks, Renlin diff --git a/gcc/testsuite/gcc.dg/atomic/pr71675.c b/gcc/testsuite/gcc.dg/atomic/pr71675.c new file mode 100644 index 000..0e344ac --- /dev/null +++ b/gcc/testsuite/gcc.dg/atomic/pr71675.c @@ -0,0 +1,32 @@ +/* PR c++/71675 - __atomic_compare_exchange_n returns wrong type for typed enum + */ +/* { dg-do compile { target c11 } } */
Re: [AArch64][4/14] ARMv8.2-A FP16 three operands vector intrinsics
On Thu, Jul 07, 2016 at 05:16:01PM +0100, Jiong Wang wrote: > This patch add ARMv8.2-A FP16 three operands vector intrinsics. > > Three operands intrinsics only contain fma and fms. OK. Thanks, James > > 2016-07-07 Jiong Wang > > gcc/ > * config/aarch64/aarch64-simd-builtins.def: Register new builtins. > * config/aarch64/aarch64-simd.md (fma4): Extend to HF modes. > (fnma4): Likewise. > * config/aarch64/arm_neon.h (vfma_f16): New. > (vfmaq_f16): Likewise. > (vfms_f16): Likewise. > (vfmsq_f16): Likewise. >
[PATCH] Don't call get_working_sets w/ LTO and -fauto-profile (PR, gcov-profile/70993)
Hi. Currently, call to get_working_sets is only called from tree_profiling (called from 'pass_ipa_tree_profile' and is guarded in gate with !flag_auto_profile). I would like to apply the same logic in lto-cgraph.c. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? Martin >From dcf13fb3ac8a4c9e85842f147665b527ce71234b Mon Sep 17 00:00:00 2001 From: marxin Date: Fri, 22 Jul 2016 13:07:40 +0200 Subject: [PATCH] Don't call get_working_sets w/ LTO and -fauto-profile (PR gcov-profile/70993) gcc/ChangeLog: 2016-07-22 Martin Liska * lto-cgraph.c (input_symtab): Don't call get_working_sets if flag_auto_profile is set to true. --- gcc/lto-cgraph.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index 5cef2ba..2642041 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -1867,7 +1867,9 @@ input_symtab (void) } merge_profile_summaries (file_data_vec); - get_working_sets (); + + if (!flag_auto_profile) +get_working_sets (); /* Clear out the aux field that was used to store enough state to -- 2.9.0
Re: [AArch64][5/14] ARMv8.2-A FP16 lane vector intrinsics
On Thu, Jul 07, 2016 at 05:16:28PM +0100, Jiong Wang wrote: > This patch add ARMv8.2-A FP16 lane vector intrinsics. > > Lane intrinsics are generally derivatives of multiply intrinsics, > including multiply accumulate. All necessary backend support for them > are there already except fmulx, the implementions are largely a > combination of existed multiply intrinsics with vdup intrinsics OK. Thanks, James > 2016-07-07 Jiong Wang > > gcc/ > * config/aarch64/aarch64-simd.md > (*aarch64_mulx_elt_to_64v2df): Rename to > "*aarch64_mulx_elt_from_dup". > (*aarch64_mul3_elt): Update schedule type. > (*aarch64_mul3_elt_from_dup): Likewise. > (*aarch64_fma4_elt_from_dup): Likewise. > (*aarch64_fnma4_elt_from_dup): Likewise. > * config/aarch64/iterators.md (VMUL): Supprt half precision > float modes. > (f, fp): Support HF modes. > * config/aarch64/arm_neon.h (vfma_lane_f16): New. > (vfmaq_lane_f16): Likewise. > (vfma_laneq_f16): Likewise. > (vfmaq_laneq_f16): Likewise. > (vfma_n_f16): Likewise. > (vfmaq_n_f16): Likewise. > (vfms_lane_f16): Likewise. > (vfmsq_lane_f16): Likewise. > (vfms_laneq_f16): Likewise. > (vfmsq_laneq_f16): Likewise. > (vfms_n_f16): Likewise. > (vfmsq_n_f16): Likewise. > (vmul_lane_f16): Likewise. > (vmulq_lane_f16): Likewise. > (vmul_laneq_f16): Likewise. > (vmulq_laneq_f16): Likewise. > (vmul_n_f16): Likewise. > (vmulq_n_f16): Likewise. > (vmulx_lane_f16): Likewise. > (vmulxq_lane_f16): Likewise. > (vmulx_laneq_f16): Likewise. > (vmulxq_laneq_f16): Likewise. >
Re: [AArch64][6/14] ARMv8.2-A FP16 reduction vector intrinsics
On Thu, Jul 07, 2016 at 05:16:58PM +0100, Jiong Wang wrote: > This patch add ARMv8.2-A FP16 reduction vector intrinsics. OK. Thanks, James > gcc/ > 2016-07-07 Jiong Wang > > * config/aarch64/arm_neon.h (vmaxv_f16): New. > (vmaxvq_f16): Likewise. > (vminv_f16): Likewise. > (vminvq_f16): Likewise. > (vmaxnmv_f16): Likewise. > (vmaxnmvq_f16): Likewise. > (vminnmv_f16): Likewise. > (vminnmvq_f16): Likewise. > * config/aarch64/iterators.md (vp): Support HF modes.
Re: [PATCH 2/3] Run profile feedback tests with autofdo
On 07/15/2016 10:37 AM, Bin.Cheng wrote: > On Thu, Jul 14, 2016 at 11:33 PM, Andi Kleen wrote: > I haven't seen that. Unstable in what way? For GCC doesn't support FDO, it run below tests as you said: PASS: gcc.dg/tree-prof/20041218-1.c compilation, -g UNSUPPORTED: gcc.dg/tree-prof/20041218-1.c: Cannot run create_gcov --binary /data/work/trunk/build/gcc/testsuite/gcc/20041218-1.gcda Normally, it doesn't create gcov data file, thus the next test is unsupported. I guess in parallel test, the second test picks up gcov data files from other process, which results in random pass. Is it possible to not have these when fdo is supported? >>> Hmm, typo: s/supported/not supported/. >> >> I don't see how this can happen: as you can see the .gcda file name >> is unique for each test case. >> >> I think there may be problems with the perf.data, could add a postfix. >> But you should only see that with real autofdo. > Don't know. I only configured/built GCC on x86_64 with below command lines: > > $ ../gcc/configure --prefix=... > --enable-languages=c,c++,fortran,go,java,lto,objc,obj-c++ > --with-build-config=bootstrap-O3 --disable-werror > $ make && make install > > Not sure what the status for autofdo is in this case. "make check -k" > is stable for me, but "make check -k -j#" gives unstable result in > tree-prof.exp tests. Anything I did wrong? > > Thanks, > bin > I can confirm that I see the same problem on a Xeon machine (../configure && make && make check -k -jX). I see many: g++.dg/tree-prof/pr35545.C: Cannot run create_gcov --binary /home/marxin/Programming/gcc/objdir/gcc/testsuite/g++/pr35545.x01 --profile=perf.data -gcov_version=1 --gcov=/home/marxin/Programming/gcc/objdir/gcc/testsuite/g++/pr35545.gcda Thanks, Martin
Re: [AArch64][7/14] ARMv8.2-A FP16 one operand scalar intrinsics
On Wed, Jul 20, 2016 at 06:00:53PM +0100, Jiong Wang wrote: > On 07/07/16 17:17, Jiong Wang wrote: > >This patch add ARMv8.2-A FP16 one operand scalar intrinsics > > > >Scalar intrinsics are kept in arm_fp16.h instead of arm_neon.h. > > The updated patch resolve the conflict with > >https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00308.html > > The change is to let aarch64_emit_approx_sqrt return false for HFmode. OK, but... > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > a7437c04eb936a5e3ebd0bc77eb4afd8c052df28..27866ccd605abec6ea7c9110022f329c9b172ee0 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -7599,7 +7599,7 @@ aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp) > { >machine_mode mode = GET_MODE (dst); > > - if (mode == V4HFmode || mode == V8HFmode) > + if (mode == HFmode || mode == V4HFmode || mode == V8HFmode) > return false; ...if you take my advice on patch 2/14, you won't need this change. Otherwise, OK. Thanks, James > gcc/ > 2016-07-20 Jiong Wang > > * config.gcc (aarch64*-*-*): Install arm_fp16.h. > * config/aarch64/aarch64-builtins.c (hi_UP): New. > * config/aarch64/aarch64-simd-builtins.def: Register new builtins. > * config/aarch64/aarch64-simd.md (aarch64_frsqrte): Extend to > HF mode. > (aarch64_frecp): Likewise. > (aarch64_cm): Likewise. > * config/aarch64/aarch64.md (2): Likewise. > (l2): Likewise. > (fix_trunc2): Likewise. > (sqrt2): Likewise. > (*sqrt2): Likewise. > (abs2): Likewise. > (hf2): New pattern for HF mode. > (hihf2): Likewise. > * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return > for HF mode. > * config/aarch64/arm_neon.h: Include arm_fp16.h. > * config/aarch64/iterators.md (GPF_F16): New. > (GPI_F16): Likewise. > (VHSDF_HSDF): Likewise. > (w1): Support HF mode. > (w2): Likewise. > (v): Likewise. > (s): Likewise. > (q): Likewise. > (Vmtype): Likewise. > (V_cmp_result): Likewise. > (fcvt_iesize): Likewise. > (FCVT_IESIZE): Likewise. > * config/aarch64/arm_fp16.h: New file. > (vabsh_f16): New. > (vceqzh_f16): Likewise. > (vcgezh_f16): Likewise. > (vcgtzh_f16): Likewise. > (vclezh_f16): Likewise. > (vcltzh_f16): Likewise. > (vcvth_f16_s16): Likewise. > (vcvth_f16_s32): Likewise. > (vcvth_f16_s64): Likewise. > (vcvth_f16_u16): Likewise. > (vcvth_f16_u32): Likewise. > (vcvth_f16_u64): Likewise. > (vcvth_s16_f16): Likewise. > (vcvth_s32_f16): Likewise. > (vcvth_s64_f16): Likewise. > (vcvth_u16_f16): Likewise. > (vcvth_u32_f16): Likewise. > (vcvth_u64_f16): Likewise. > (vcvtah_s16_f16): Likewise. > (vcvtah_s32_f16): Likewise. > (vcvtah_s64_f16): Likewise. > (vcvtah_u16_f16): Likewise. > (vcvtah_u32_f16): Likewise. > (vcvtah_u64_f16): Likewise. > (vcvtmh_s16_f16): Likewise. > (vcvtmh_s32_f16): Likewise. > (vcvtmh_s64_f16): Likewise. > (vcvtmh_u16_f16): Likewise. > (vcvtmh_u32_f16): Likewise. > (vcvtmh_u64_f16): Likewise. > (vcvtnh_s16_f16): Likewise. > (vcvtnh_s32_f16): Likewise. > (vcvtnh_s64_f16): Likewise. > (vcvtnh_u16_f16): Likewise. > (vcvtnh_u32_f16): Likewise. > (vcvtnh_u64_f16): Likewise. > (vcvtph_s16_f16): Likewise. > (vcvtph_s32_f16): Likewise. > (vcvtph_s64_f16): Likewise. > (vcvtph_u16_f16): Likewise. > (vcvtph_u32_f16): Likewise. > (vcvtph_u64_f16): Likewise. > (vnegh_f16): Likewise. > (vrecpeh_f16): Likewise. > (vrecpxh_f16): Likewise. > (vrndh_f16): Likewise. > (vrndah_f16): Likewise. > (vrndih_f16): Likewise. > (vrndmh_f16): Likewise. > (vrndnh_f16): Likewise. > (vrndph_f16): Likewise. > (vrndxh_f16): Likewise. > (vrsqrteh_f16): Likewise. > (vsqrth_f16): Likewise.
Re: [AArch64][8/14] ARMv8.2-A FP16 two operands scalar intrinsics
On Wed, Jul 20, 2016 at 06:00:58PM +0100, Jiong Wang wrote: > On 07/07/16 17:17, Jiong Wang wrote: > >This patch add ARMv8.2-A FP16 two operands scalar intrinsics. > > The updated patch resolve the conflict with > >https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html > > The change is to let aarch64_emit_approx_div return false for HFmode. If you take the change I proposed on patch 3/14 you won't need this. OK otherwise. Thanks, James > gcc/ > 2016-07-20 Jiong Wang > > * config/aarch64/aarch64-simd-builtins.def: Register new builtins. > * config/aarch64/aarch64.md > (hf3): New. > (hf3): Likewise. > (add3): Likewise. > (sub3): Likewise. > (mul3): Likewise. > (div3): Likewise. > (*div3): Likewise. > (3): Extend to HF. > * config/aarch64/aarch64.c (aarch64_emit_approx_div): Return > false for HFmode. > * config/aarch64/aarch64-simd.md (aarch64_rsqrts): Likewise. > (fabd3): Likewise. > (3): Likewise. > (3): Likewise. > (aarch64_fmulx): Likewise. > (aarch64_fac): Likewise. > (aarch64_frecps): Likewise. > (hfhi3): New. > (hihf3): Likewise. > * config/aarch64/iterators.md (VHSDF_SDF): Delete. > (VSDQ_HSDI): Support HI. > (fcvt_target, FCVT_TARGET): Likewise. > * config/aarch64/arm_fp16.h: (vaddh_f16): New. > (vsubh_f16): Likewise. > (vabdh_f16): Likewise. > (vcageh_f16): Likewise. > (vcagth_f16): Likewise. > (vcaleh_f16): Likewise. > (vcalth_f16): Likewise.(vcleh_f16): Likewise. > (vclth_f16): Likewise. > (vcvth_n_f16_s16): Likewise. > (vcvth_n_f16_s32): Likewise. > (vcvth_n_f16_s64): Likewise. > (vcvth_n_f16_u16): Likewise. > (vcvth_n_f16_u32): Likewise. > (vcvth_n_f16_u64): Likewise. > (vcvth_n_s16_f16): Likewise. > (vcvth_n_s32_f16): Likewise. > (vcvth_n_s64_f16): Likewise. > (vcvth_n_u16_f16): Likewise. > (vcvth_n_u32_f16): Likewise. > (vcvth_n_u64_f16): Likewise. > (vdivh_f16): Likewise. > (vmaxh_f16): Likewise. > (vmaxnmh_f16): Likewise. > (vminh_f16): Likewise. > (vminnmh_f16): Likewise. > (vmulh_f16): Likewise. > (vmulxh_f16): Likewise. > (vrecpsh_f16): Likewise. > (vrsqrtsh_f16): Likewise. >
Re: C++ PATCH for c++/71913 (copy elision choices)
Hi Jason, On 22/07/16 04:01, Jason Merrill wrote: 71913 is a case where unsafe_copy_elision_p was being too conservative. We can allow copy elision in a new expression; the only way we could end up initializing a base subobject without knowing it would be through a placement new, in which case we would already be using the wrong (complete object) constructor, so copy elision doesn't make it any worse. diff --git a/gcc/testsuite/g++.dg/init/elide5.C b/gcc/testsuite/g++.dg/init/elide5.C new file mode 100644 index 000..0a9978c --- /dev/null +++ b/gcc/testsuite/g++.dg/init/elide5.C @@ -0,0 +1,27 @@ +// PR c++/71913 +// { dg-do link { target c++11 } } + +void* operator new(unsigned long, void* p) { return p; } g++.dg/init/elide5.C fails on target whose SIZE_TYPE is not "long unsigned int". testsuite/g++.dg/init/elide5.C:4:42: error: 'operator new' takes type 'size_t' ('unsigned int') as first parameter [-fpermissive] I have checked, for most 32 bit architectures or ABI, the SIZE_TYPE is "unsigned int". arm is one of them. To make this test case portable, will __SIZE_TYPE__ be better in this case, instead of "unsigned long" as first argument of new operator? (sorry for the duplicate reply in the bugzilla, I just found the email here) Regards, Renlin
Re: [AArch64][9/14] ARMv8.2-A FP16 three operands scalar intrinsics
On Thu, Jul 07, 2016 at 05:18:15PM +0100, Jiong Wang wrote: > This patch add ARMv8.2-A FP16 three operands scalar intrinsics. OK. Thanks, James > gcc/ > 2016-07-07 Jiong Wang > > * config/aarch64/aarch64-simd-builtins.def: Register new builtins. > * config/aarch64/aarch64.md (fma): New for HF. > (fnma): Likewise. > * config/aarch64/arm_fp16.h (vfmah_f16): New. > (vfmsh_f16): Likewise.
Re: [AArch64][10/14] ARMv8.2-A FP16 lane scalar intrinsics
On Thu, Jul 07, 2016 at 05:18:29PM +0100, Jiong Wang wrote: > This patch adds ARMv8.2-A FP16 lane scalar intrinsics. OK. Thanks, James > > gcc/ > 2016-07-07 Jiong Wang > > * config/aarch64/arm_neon.h (vfmah_lane_f16): New. > (vfmah_laneq_f16): Likewise. > (vfmsh_lane_f16): Likewise. > (vfmsh_laneq_f16): Likewise. > (vmulh_lane_f16): Likewise. > (vmulh_laneq_f16): Likewise. > (vmulxh_lane_f16): Likewise. > (vmulxh_laneq_f16): Likewise. >
Re: [RFC][IPA-VRP] Early VRP Implementation
On Fri, Jul 22, 2016 at 2:10 PM, kugan wrote: > Hi Richard, > > Thanks for the review. > > On 18/07/16 21:51, Richard Biener wrote: >> >> On Fri, Jul 15, 2016 at 9:33 AM, kugan >> wrote: >>> >>> Hi Andrew, >>> >>> On 15/07/16 17:28, Andrew Pinski wrote: On Fri, Jul 15, 2016 at 12:08 AM, kugan wrote: > > > Hi Andrew, > >> Why separate out early VRP from tree-vrp? Just a little curious. > > > > > It is based on the discussion in > https://gcc.gnu.org/ml/gcc/2016-01/msg00069.html. > In summary, conclusion (based on my understanding) was to implement a > simplified VRP algorithm that doesn't require ASSERT_EXPR insertion. But I don't see why you are moving it from tree-vrp.c . That was my question, you pointing to that discussion does not say to split it into a new file and expose these interfaces. >>> >>> Are you saying that I should keep this part of tree-vrp.c. I am happy to >>> do >>> that if this is considered the best approach. >> >> >> Yes, I think that's the best approach. >> > Thanks. Moved it into tree-vrp.c now. > >> Can you, as a refactoring before your patch, please change VRP to use >> an alloc-pool >> for allocating value_range? The DOM-based VRP will add a lot of >> malloc/free churn >> otherwise. >> >> Generally watch coding-style such as missed function comments. >> >> As you do a non-iterating VRP and thus do not visit back-edges you don't >> need >> to initialize loops or SCEV nor do you need loop-closed SSA. >> >> As you do a DOM-based VRP using SSA propagator stuff like ssa_prop_result >> doesn't make any sense. > > > I have removed it. >> >> >> +edge evrp_dom_walker::before_dom_children (basic_block bb) >> +{ >> + /* If we are going out of scope, restore the old VR. */ >> + while (!cond_stack.is_empty () >> +&& !dominated_by_p (CDI_DOMINATORS, bb, cond_stack.last >> ().first)) >> +{ >> + tree var = cond_stack.last ().second.first; >> + value_range *vr = cond_stack.last ().second.second; >> + value_range *vr_to_del = get_value_range (var); >> + XDELETE (vr_to_del); >> + change_value_range (var, vr); >> + cond_stack.pop (); >> +} >> >> that should be in after_dom_children I think and use a marker instead >> of a DOM query. >> See other examples like DOM itself or SCCVN. >> > > Done. +/* Restore/Pop all the old VRs maintained in the cond_stack. */ + +void evrp_dom_walker::finalize_dom_walker () +{ + while (!cond_stack.is_empty ()) +{ + tree var = cond_stack.last ().second; + pop_value_range (var); + cond_stack.pop (); +} why is this necessary at all? Looks like a waste of time (and the stack should be empty here anyway). I think you leak cond_stack as well, I suppose using auto_vec might work here, you have to check. >> >> + /* Discover VR when condition is true. */ >> + if (te == e >> + && !TREE_OVERFLOW_P (op0) >> + && !TREE_OVERFLOW_P (op1)) > > > This is mainly to handle gcc.dg/pr38934.c. This would result in ICE from > set_value_range otherwise: > > gcc_assert ((!TREE_OVERFLOW_P (min) || is_overflow_infinity (min)) > && (!TREE_OVERFLOW_P (max) || is_overflow_infinity > (max))); Ok, instead make sure to remove the overflow flag via if (TREE_OVERFLOW_P (op0)) op0 = drop_tree_overflow (op0); ... it looks like you want to merge the if ( & EDGE_TRUE_VALUE) and FALSE_VALUE cases and only invert the tree comparison for the false edge. + tree cond = build2 (code, boolean_type_node, op0, op1); + extract_range_for_var_from_comparison_expr (&vr, op0, cond); no wasteful tree building here please. Instead of cond pass in code, op0 and op1 as separate arguments. + /* If we found any usable VR, set the VR to ssa_name and create a +PUSH old value in the cond_stack with the old VR. */ + if (vr.type == VR_RANGE || vr.type == VR_ANTI_RANGE) + { + value_range *new_vr = vrp_value_range_pool.allocate (); + memset (new_vr, 0, sizeof (*new_vr)); + *new_vr = vr; + cond_stack.safe_push (std::make_pair (bb, op0)); the memset looks redundant given you copy-assing from vr anyway. You seem to miss the chance to register ranges for x_2 in i_1 < x_2 where both i_1 and x_2 might have ranges that are useful. push and pop_value_range should be methods in the dom walker class and vrp_stack and cond_stack should be a single stack. You can omit basic_block if you use a "magic" NULL_TREE var as marker (simply push a NULL_TREE, NULL pair at the end of before_dom_children). >> >> you can use e->flags & EDGE_TRUE_VALUE/EDGE_FALSE_VALUE >> >> why do you need those TREE_OVERFLOW checks? >> >> + tree cond = build2 (code, boolean_type_node, op0, op1); >> + tree a = build2 (ASSERT_EXPR, TREE_TYPE (op0), op0, cond); >> + ex
Re: [PATCH] Don't call get_working_sets w/ LTO and -fauto-profile (PR, gcov-profile/70993)
On Mon, Jul 25, 2016 at 1:05 PM, Martin Liška wrote: > Hi. > > Currently, call to get_working_sets is only called from tree_profiling > (called from 'pass_ipa_tree_profile' and is guarded in gate with > !flag_auto_profile). > I would like to apply the same logic in lto-cgraph.c. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? Ok. Richard. > Martin
Re: [AArch64][0/14] ARMv8.2-A FP16 extension support
On Thu, Jul 07, 2016 at 05:12:48PM +0100, Jiong Wang wrote: > Hello, > > As a follow up of > > https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01240.html, > > This patch set adds ARMv8.2-A FP16 scalar and vector intrinsics support, > gcc middle-end will also be aware of some standard operations that some > instructions can be auto-generated. > > According to ACLE, ARMv8.2-A FP16 intrinsics for AArch64 is superset of > intrinsics for AArch32, so all those intrinsic related testcases, > particularly those under the directory advsimd-intrinsics, are also > appliable to AArch64. This patch set has only included those testcases > that are exclusive for AArch64. > > Jiong Wang (14) > ARMv8.2-A FP16 data processing intrinsics > ARMv8.2-A FP16 one operand vector intrinsics > ARMv8.2-A FP16 two operands vector intrinsics > ARMv8.2-A FP16 three operands vector intrinsics > ARMv8.2-A FP16 lane vector intrinsics > ARMv8.2-A FP16 reduction vector intrinsics > ARMv8.2-A FP16 one operand scalar intrinsics > ARMv8.2-A FP16 two operands scalar intrinsics > ARMv8.2-A FP16 three operands scalar intrinsics > ARMv8.2-A FP16 lane scalar intrinsics At this point, I've OKed the first 10 patches in the series, these represent the functional changes to the compiler. I'm leaving the testsuite patches for now, as they depend on testsuite changes that have yet to be approved for the ARM port. To save you from having to continue to rebase the functional parts of this patch while you wait for review of the ARM changes, I would be OK with you committing them now, on the understanding that you'll continue to check the testsuite in the time between now and the testsuite changes are approved, and that you'll fix any issues that you find. > ARMv8.2-A FP16 testsuite selector > ARMv8.2-A testsuite for new data movement intrinsics > ARMv8.2-A testsuite for new vector intrinsics > ARMv8.2-A testsuite for new scalar intrinsics I've taken a brief look through these testsuite changes and they look OK to me. I'll revisit them properly once I've seen the ARM patches go in. Thanks, James
[PATCH] Remove special streaming of builtins
So I needed to fix that builtins appearing in BLOCK_VARs and the solution I came up with accidentially disabled streaming via the special path. Thus the following patch removes the special-casing completely and makes the BLOCK_VARs handling work the same way as for regular externs (by streaming a local copy). We stream each builtin decl once and then refer to it via the decl index (which is cheaper than the special casing). I'm not 100% this solves for example the -fno-math-errno inlining across TUs (it certainly doesn't if you use attribute optimize with -fno-math-errno), but it eventually should by means of having two different BUILT_IN_XXX if they have different behavior. At least if all relevant bits are set on the function _type_ rather than the decl which I think we still lto-symtab replace with one entity during WPA(?) Well. LTO bootstrapped and tested on x86_64-unknown-linux-gnu (c,c++,fortran), bootstrapped on x86_64-unknown-linux-gnu (all), testing in progress. I might have not catched all fndecl compares. Will apply to trunk if testing completes. As said, maybe followup cleanups possible, at least to lto-opts.c / lto-wrapper. Richard. 2016-07-25 Richard Biener * cgraph.c (cgraph_node::verify_node): Compare against builtin by using DECL_BUILT_IN_CLASS and DECL_FUNCTION_CODE. * tree-chkp.c (chkp_gimple_call_builtin_p): Likewise. * tree-streamer.h (streamer_handle_as_builtin_p): Remove. (streamer_get_builtin_tree): Likewise. (streamer_write_builtin): Likewise. * lto-streamer.h (LTO_builtin_decl): Remove. * lto-streamer-in.c (lto_read_tree_1): Remove assert. (lto_input_scc): Remove LTO_builtin_decl handling. (lto_input_tree_1): Liekwise. * lto-streamer-out.c (lto_output_tree_1): Remove special handling of builtins. (DFS::DFS): Likewise. * tree-streamer-in.c (streamer_get_builtin_tree): Remove. * tree-streamer-out.c (pack_ts_function_decl_value_fields): Remove assert. (streamer_write_builtin): Remove. lto/ * lto.c (compare_tree_sccs_1): Remove streamer_handle_as_builtin_p uses. (unify_scc): Likewise. (lto_read_decls): Likewise. Index: gcc/cgraph.c === --- gcc/cgraph.c(revision 238590) +++ gcc/cgraph.c(working copy) @@ -3136,8 +3136,9 @@ cgraph_node::verify_node (void) && !e->speculative /* Optimized out calls are redirected to __builtin_unreachable. */ && (e->frequency - || e->callee->decl -!= builtin_decl_implicit (BUILT_IN_UNREACHABLE)) + || ! e->callee->decl + || DECL_BUILT_IN_CLASS (e->callee->decl) != BUILT_IN_NORMAL + || DECL_FUNCTION_CODE (e->callee->decl) != BUILT_IN_UNREACHABLE) && (e->frequency != compute_call_stmt_bb_frequency (e->caller->decl, gimple_bb (e->call_stmt Index: gcc/lto/lto.c === --- gcc/lto/lto.c (revision 238590) +++ gcc/lto/lto.c (working copy) @@ -1061,12 +1061,6 @@ compare_tree_sccs_1 (tree t1, tree t2, t TREE_FIXED_CST_PTR (t1), TREE_FIXED_CST_PTR (t2))) return false; - - /* We want to compare locations up to the point where it makes - a difference for streaming - thus whether the decl is builtin or not. */ - if (CODE_CONTAINS_STRUCT (code, TS_DECL_MINIMAL)) -compare_values (streamer_handle_as_builtin_p); - if (CODE_CONTAINS_STRUCT (code, TS_DECL_COMMON)) { compare_values (DECL_MODE); @@ -1602,8 +1596,7 @@ unify_scc (struct data_in *data_in, unsi streamer. The others should be singletons, too, and we should not merge them in any way. */ gcc_assert (code != TRANSLATION_UNIT_DECL - && code != IDENTIFIER_NODE - && !streamer_handle_as_builtin_p (t)); + && code != IDENTIFIER_NODE); } /* Fixup the streamer cache with the prevailing nodes according @@ -1710,8 +1703,7 @@ lto_read_decls (struct lto_file_decl_dat if (len == 1 && (TREE_CODE (first) == IDENTIFIER_NODE || TREE_CODE (first) == INTEGER_CST - || TREE_CODE (first) == TRANSLATION_UNIT_DECL - || streamer_handle_as_builtin_p (first))) + || TREE_CODE (first) == TRANSLATION_UNIT_DECL)) continue; /* Try to unify the SCC with already existing ones. */ Index: gcc/lto-streamer-in.c === --- gcc/lto-streamer-in.c (revision 238590) +++ gcc/lto-streamer-in.c (working copy) @@ -1302,10 +1302,6 @@ lto_read_tree_1 (struct lto_input
Re: [PATCH] nvptx: do not implicitly enable -ftoplevel-reorder
On 07/22/16 11:19, Alexander Monakov wrote: I hope I've satisfactorily explained the failures you've pointed out (thanks for the data). I think I should leave the choice of what to do next (revert the patch or leave it in and install fixups where appropriate) up to you? Please revert the nvptx patch.
[Patch, testuite, committed] Fix some more tests that fail for non 32-bit int targets
Hi, The below patch fixes tests that fail for the avr target, because they assume ints are atleast 32 bits wide and pointers and longs have the same size. I've required int32plus support for one test, and for the other two, I've introduced a cast to intptr_t to avoid the pointer <-> int size difference warning. Reg tested on avr and x86_64 with no regressions. Committed as obvious. Regards Senthil gcc/testsuite/ChangeLog 2016-07-25 Senthil Kumar Selvaraj * gcc.dg/torture/pr69352.c (foo): Cast to intptr_t instead of long. * gcc.dg/torture/pr69771.c: Require int32plus. * gcc.dg/torture/pr71866.c (inb): Add cast to intptr_t. Index: gcc.dg/torture/pr69352.c === --- gcc.dg/torture/pr69352.c(revision 238707) +++ gcc.dg/torture/pr69352.c(working copy) @@ -1,5 +1,7 @@ /* { dg-do compile } */ +#include + int a[10][14], b, c, d, e, f, g, h, i; void bar (void); int @@ -13,7 +15,7 @@ else m = 13; if (a[x][m]) -l = (long) foo; +l = (intptr_t) foo; a[x][i] = l; while (c) { Index: gcc.dg/torture/pr69771.c === --- gcc.dg/torture/pr69771.c(revision 238707) +++ gcc.dg/torture/pr69771.c(working copy) @@ -1,5 +1,6 @@ /* PR rtl-optimization/69771 */ /* { dg-do compile } */ +/* { dg-require-effective-target int32plus } */ unsigned char a = 5, c; unsigned short b = 0; Index: gcc.dg/torture/pr71866.c === --- gcc.dg/torture/pr71866.c(revision 238707) +++ gcc.dg/torture/pr71866.c(working copy) @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* { dg-additional-options "-ftree-pre -fcode-hoisting" } */ +#include typedef unsigned char u8; extern unsigned long pci_io_base; u8 in_8 (const volatile void *); @@ -25,7 +26,7 @@ static inline u8 inb (unsigned long port) { - return readb((volatile void *)pci_io_base + port); + return readb((volatile void *)(intptr_t)pci_io_base + port); } static inline void outb (u8 val, unsigned long port)
Re: [PATCH] Adapt the numbering scheme (PR gcov-profile/64874)
On 07/25/16 04:42, Martin Liška wrote: I like the change suggested by Jakub, I've updated the numbering scheme, as well as comments in gcov-io.h. ok. I'm not too fussed about a problem that is 25 years away and would result in contusion of code (then) instrumented 30 years ago. nathan
[patch] libstdc++: fix ext/rope::dump() bug
Hello, maintainers. Recently, I tried to figure out how the rope container works by dumping the content. I found that the implementation of rope::dump() has a misspell bug which use a static member function as a enum value. It seems that the original SGI STL implementation doesn't have this bug. bug trigger code below. #include int main() { __gnu_cxx::crope r(100, 'x'); r.dump(); return 0; } patch is attached. Thanks, Georeth diff --git a/libstdc++-v3/include/ext/ropeimpl.h b/libstdc++-v3/include/ext/ropeimpl.h index 4316af7..93bd152 100644 --- a/libstdc++-v3/include/ext/ropeimpl.h +++ b/libstdc++-v3/include/ext/ropeimpl.h @@ -1117,7 +1117,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION printf("NULL\n"); return; } - if (_S_concat == __r->_M_tag) + if (__detail::_S_concat == __r->_M_tag) { _RopeConcatenation* __c = (_RopeConcatenation*)__r; _RopeRep* __left = __c->_M_left;
Re: [PATCH] nvptx: do not implicitly enable -ftoplevel-reorder
On Mon, 25 Jul 2016, Nathan Sidwell wrote: > On 07/22/16 11:19, Alexander Monakov wrote: > > > I hope I've satisfactorily explained the failures you've pointed out (thanks > > for the data). I think I should leave the choice of what to do next (revert > > the patch or leave it in and install fixups where appropriate) up to you? > > Please revert the nvptx patch. Done - svn rev. 238710. I'd still like to get feedback on the other points I've raised in that email, though. Alexander
Re: [patch] Get rid of stack trampolines for nested functions
> Should UNITS_PER_WORD here be POINTER_SIZE/BITS_PER_UNIT right? Hmm, yes, presumably, it's the size of the static chain and so a pointer. > Another UNITS_PER_WORD that I think ought to be > POINTER_SIZE/BITS_PER_UNIT. Probably worth a pass over the patch to > look for this throughout. Yes, it was very likely enabled only on platforms with word-sized pointers. > > +ftrampolines > > +Common Report Var(flag_trampolines) Init(0) > > +Always generate trampolines for pointers to nested functions > > Even for targets that use procedure descriptors natively? I'm not sure > what's the best choice for them -- I could argue either way on this issue. For targets that use procedure descriptors natively, it's a no-op because the entire patch is supposed to be a no-op for them. I'm going to rephrase. > I'm probably not well versed enough in this aspect of the various > targets's ABIs to review the target stuff. We'll want/need the target > maintainers to chime in. That might be easier if the target bits were > split out into their own patches. Good idea, I'll do that. > "number of bytes", don't you mean "number of bits" in the last paragraph? Yes, that's confused, I'll rephrase too. > > Index: tree-nested.c > > === > > --- tree-nested.c (revision 237789) > > +++ tree-nested.c (working copy) > > @@ -21,6 +21,7 @@ > > > > #include "system.h" > > #include "coretypes.h" > > #include "backend.h" > > > > +#include "target.h" > > Not real happy to see target stuff getting into tree-*. Is this really > needed? It's the target hook mentioned earlier, e.g. for targets that use procedure descriptors natively we don't want to create generic descriptors but use the regular "degenerated" trampolines. > I think the biggest issues are the ABI concerns and getting the target > maintainers to take a looksie. Everything else looks pretty good. Thanks for the review. -- Eric Botcazou
Re: [patch] libstdc++: fix ext/rope::dump() bug
On 25/07/16 20:26 +0800, Georeth Chow wrote: Hello, maintainers. Recently, I tried to figure out how the rope container works by dumping the content. I found that the implementation of rope::dump() has a misspell bug which use a static member function as a enum value. It seems that the original SGI STL implementation doesn't have this bug. bug trigger code below. #include int main() { __gnu_cxx::crope r(100, 'x'); r.dump(); return 0; } patch is attached. Thanks for the patch, I've committed it to trunk. Fix missing qualification in 2016-07-25 Georeth Chow * include/ext/ropeimpl.h (rope<>::_S_dump(_RopeRep*, int)): Qualify _S_concat enumerator. * testsuite/ext/rope/6.cc: New test.
Re: [patch] Get rid of stack trampolines for nested functions
On 29/06/16 23:08, Eric Botcazou wrote: > Index: config/aarch64/aarch64.h > === > --- config/aarch64/aarch64.h (revision 237789) > +++ config/aarch64/aarch64.h (working copy) > @@ -779,6 +779,9 @@ typedef struct > correctly. */ > #define TRAMPOLINE_SECTION text_section > > +/* Use custom descriptors instead of trampolines when possible. */ > +#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1 > + Eric, If I understand how this is supposed to work then this is not future-proof against changes to the architecture. The bottom two bits in both AArch32 (arm) and AArch64 are reserved for future use by the architecture; they must not be used by software for tricks like this. As has already been seen in AArch32 state, bit-0 is used to indicate the ARM/Thumb ISA selection. The patch to arm.h is similarly problematic in this regard. R. > Hi, > > this patch implements generic support for the elimination of stack > trampolines > and, consequently, of the need to make the stack executable when pointers to > nested functions are used. That's done on a per-language and per-target > basis > (i.e. there is 1 language hook and 1 target hook to parameterize it) and > there > are no changes whatsoever in code generation if both are not turned on (and > the patch implements a -ftrampolines option to let the user override them). > > The idea is based on the fact that, for targets using function descriptors as > per their ABI like IA-64, AIX or VMS platforms, stack trampolines > "degenerate" > into descriptors built at run time on the stack and thus made up of data > only, > which in turn means that the stack doesn't need to be made executable. > > This descriptor-based scheme is implemented generically for nested functions, > i.e. the nested function lowering pass builds generic descriptors instead of > trampolines on the stack when encountering pointers to nested functions, > which > means that there are 2 kinds of pointers to functions and therefore a > run-time > identification mechanism is needed for indirect calls to distinguish them. > > Because of that, enabling the support breaks binary compatibility (for code > manipulating pointers to nested functions). That's OK for Ada and nested > functions are first-class citizens in the language anyway so we really need > this, but not for C so for example Ada doesn't use it at the interface with C > (when objects have "convention C" in Ada parlance). > > This was bootstrapped/regtested on x86_64-suse-linux but AdaCore has been > using it on native platforms (Linux, Windows, Solaris, etc) for years. > > OK for the mainline? > > > 2016-06-29 Eric Botcazou > > PR ada/37139 > PR ada/67205 > * common.opt (-ftrampolines): New option. > * doc/invoke.texi (Code Gen Options): Document it. > * doc/tm.texi.in (Trampolines): Add TARGET_CUSTOM_FUNCTION_DESCRIPTORS > * doc/tm.texi: Regenerate. > * builtins.def: Add init_descriptor and adjust_descriptor. > * builtins.c (expand_builtin_init_trampoline): Do not issue a warning > on platforms with descriptors. > (expand_builtin_init_descriptor): New function. > (expand_builtin_adjust_descriptor): Likewise. > (expand_builtin) : New case. > : Likewise. > * calls.c (prepare_call_address): Remove SIBCALLP parameter and add > FLAGS parameter. Deal with indirect calls by descriptor and adjust. > Set STATIC_CHAIN_REG_P on the static chain register, if any. > (call_expr_flags): Set ECF_BY_DESCRIPTOR for calls by descriptor. > (expand_call): Likewise. Move around call to prepare_call_address > and pass all flags to it. > * cfgexpand.c (expand_call_stmt): Reinstate CALL_EXPR_BY_DESCRIPTOR. > * gimple.h (enum gf_mask): New GF_CALL_BY_DESCRIPTOR value. > (gimple_call_set_by_descriptor): New setter. > (gimple_call_by_descriptor_p): New getter. > * gimple.c (gimple_build_call_from_tree): Set CALL_EXPR_BY_DESCRIPTOR. > (gimple_call_flags): Deal with GF_CALL_BY_DESCRIPTOR. > * langhooks.h (struct lang_hooks): Add custom_function_descriptors. > * langhooks-def.h (LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS): Define. > (LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS. > * rtl.h (STATIC_CHAIN_REG_P): New macro. > * rtlanal.c (find_first_parameter_load): Skip static chain registers. > * target.def (custom_function_descriptors): New POD hook. > * tree.h (FUNC_ADDR_BY_DESCRIPTOR): New flag on ADDR_EXPR. > (CALL_EXPR_BY_DESCRIPTOR): New flag on CALL_EXPR. > * tree-core.h (ECF_BY_DESCRIPTOR): New mask. > Document FUNC_ADDR_BY_DESCRIPTOR and CALL_EXPR_BY_DESCRIPTOR. > * tree.c (make_node_stat) : Set function alignment to > DEFAULT_FUNCTION_ALIGNMENT instead of FUNCTION_BOUNDARY. > (build_common_builtin_nodes): Initiali
Re: [AArch64][3/3] Migrate aarch64_expand_prologue/epilogue to aarch64_add_constant
On 25/07/16 10:34, Jiong Wang wrote: > On 21/07/16 11:08, Richard Earnshaw (lists) wrote: >> On 20/07/16 16:02, Jiong Wang wrote: >>> Richard, >>>Thanks for the review, yes, I believe using aarch64_add_constant is >>> unconditionally >>> safe here. Because we have generated a stack tie to clobber the whole >>> memory thus >>> prevent any instruction which access stack be scheduled after that. >>> >>>The access to deallocated stack issue was there and fixed by >>> >>>https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html. >>> >>> aarch64_add_constant itself is generating the same instruction >>> sequences as the >>> original code, except for a few cases, it will prefer >>> >>>move scratch_reg, #imm >>>add sp, sp, scratch_reg >>> >>> than: >>>add sp, sp, #imm_part1 >>>add sp, sp, #imm_part2 >>> >>> >>> >>> >> OK, I've had another look at this and I'm happy that we don't >> (currently) run into the problem I'm concerned about. >> >> However, this new usage does impose a constraint on aarch64_add_constant >> that will need to be respected in future, so please can you add the >> following to the comment that precedes that function: >> >> /* ... >> >> This function is sometimes used to adjust the stack pointer, so we >> must ensure that it can never cause transient stack deallocation >> by writing an invalid value into REGNUM. */ >> >> >>> + bool frame_related_p = (regnum == SP_REGNUM); >> I think it would be better to make the frame-related decision be an >> explicit parameter passed to the routine (don't forget SP is not always >> the frame pointer). Then the new uses would pass 'true' and the >> existing uses 'false'. >> >> R. > > Thanks, attachment is the updated patch which: > > * Added above new comments for aarch64_add_constant. > * One new parameter "frame_related_p" for aarch64_add_constant. > I thought adding new gcc assertion for sanity check of > frame_related_p and REGNUM, haven't done that as I found dwarf2cfi.c > is doing that. > > OK for trunk? This is fine, thanks. R. > > gcc/ > 2016-07-25 Jiong Wang > > * config/aarch64/aarch64.c (aarch64_add_constant): New > parameter "frame_related_p". Generate CFA annotation when > it's necessary. > (aarch64_expand_prologue): Use aarch64_add_constant. > (aarch64_expand_epilogue): Likewise. > (aarch64_output_mi_thunk): Pass "false" when calling > aarch64_add_constant. > > > update.patch > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 41844a1..ca93f6e 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -1866,14 +1866,19 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) > } > > /* Add DELTA onto REGNUM in MODE, using SCRATCHREG to held intermediate > value if > - it is necessary. */ > + it is necessary. > + > + This function is sometimes used to adjust the stack pointer, so we must > + ensure that it can never cause transient stack deallocation by writing an > + invalid value into REGNUM. */ > > static void > aarch64_add_constant (machine_mode mode, int regnum, int scratchreg, > - HOST_WIDE_INT delta) > + HOST_WIDE_INT delta, bool frame_related_p) > { >HOST_WIDE_INT mdelta = abs_hwi (delta); >rtx this_rtx = gen_rtx_REG (mode, regnum); > + rtx_insn *insn; > >/* Do nothing if mdelta is zero. */ >if (!mdelta) > @@ -1882,7 +1887,8 @@ aarch64_add_constant (machine_mode mode, int regnum, > int scratchreg, >/* We only need single instruction if the offset fit into add/sub. */ >if (aarch64_uimm12_shift (mdelta)) > { > - emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta))); > + insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta))); > + RTX_FRAME_RELATED_P (insn) = frame_related_p; >return; > } > > @@ -1895,15 +1901,23 @@ aarch64_add_constant (machine_mode mode, int regnum, > int scratchreg, >HOST_WIDE_INT low_off = mdelta & 0xfff; > >low_off = delta < 0 ? -low_off : low_off; > - emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off))); > - emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off))); > + insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off))); > + RTX_FRAME_RELATED_P (insn) = frame_related_p; > + insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off))); > + RTX_FRAME_RELATED_P (insn) = frame_related_p; >return; > } > >/* Otherwise use generic function to handle all other situations. */ >rtx scratch_rtx = gen_rtx_REG (mode, scratchreg); >aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode); > - emit_insn (gen_add2_insn (this_rtx, scratch_rtx)); > + insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx)); > + if (frame_related_p) > +{ > + RTX_FRAME_RELAT
[PATCH 0/7, GCC, V8M] ARMv8-M Security Extensions
[PATCH 0/7, GCC, V8M] ARMv8-M Security Extensions Hello, This is a respin of a previous patch series for ARMv8-M Security Extensions. In this version I have removed one patch, rebased the rest and changed some of them. This patch series aims at implementing support for ARMv8-M's Security Extensions. You can find the specification of ARMV8-M Security Extensions in: ARM®v8-M Security Extensions: Requirements on Development Tools (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html). We currently: - do not support passing arguments or returning on the stack for cmse_nonsecure_{call,entry} functions, - only test Security Extensions for -mfpu=fpv5-d16 and fpv5-sp-d16 and only support single and double precision FPU's with d16. Bootstrapped and tested on arm-none-linux-gnueabihf and tested on arm-none-eabi with ARMv8-M Baseline and Mainline targets. Andre Vieira (7): Add support for ARMv8-M's Security Extensions flag and intrinsics Handling ARMv8-M Security Extension's cmse_nonsecure_entry attribute ARMv8-M Security Extension's cmse_nonsecure_entry: __acle_se label and bxns return ARMv8-M Security Extension's cmse_nonsecure_entry: clear registers Handling ARMv8-M Security Extension's cmse_nonsecure_call attribute ARMv8-M Security Extension's cmse_nonsecure_call: use __gnu_cmse_nonsecure_call Added support for ARMV8-M Security Extension cmse_nonsecure_caller intrinsic
[PATCH 1/7, GCC, ARM, V8M] Add support for ARMv8-M's Secure Extensions flag and intrinsics
This patch adds the support of the '-mcmse' option to enable ARMv8-M's Security Extensions and supports the following intrinsics: cmse_TT cmse_TT_fptr cmse_TTT cmse_TTT_fptr cmse_TTA cmse_TTA_fptr cmse_TTAT cmse_TTAT_fptr cmse_check_address_range cmse_check_pointed_object cmse_is_nsfptr cmse_nsfptr_create It also defines the mandatory cmse_address_info struct and the __ARM_FEATURE_CMSE macro. See Chapter 4, Sections 5.2, 5.3 and 5.6 of ARM®v8-M Security Extensions: Requirements on Development Tools (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html). *** gcc/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * config.gcc (extra_headers): Added arm_cmse.h. * config/arm/arm-arches.def (ARM_ARCH): (armv8-m): Add FL2_CMSE. (armv8-m.main): Likewise. (armv8-m.main+dsp): Likewise. * config/arm/arm-c.c (arm_cpu_builtins): Added __ARM_FEATURE_CMSE macro. * config/arm/arm-protos.h (arm_is_constant_pool_ref): Define FL2_CMSE. * config/arm.c (arm_arch_cmse): New. (arm_option_override): New error for unsupported cmse target. * config/arm/arm.h (arm_arch_cmse): New. * config/arm/arm.opt (mcmse): New. * doc/invoke.texi (ARM Options): Add -mcmse. * config/arm/arm_cmse.h: New file. *** libgcc/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * config/arm/cmse.c: Likewise. * config/arm/t-arm (HAVE_CMSE): New. *** gcc/testsuite/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse.exp: New. * gcc.target/arm/cmse/cmse-1.c: New. * gcc.target/arm/cmse/cmse-12.c: New. * lib/target-supports.exp (check_effective_target_arm_cmse_ok): New. diff --git a/gcc/config.gcc b/gcc/config.gcc index 1f75f17877334c2bb61cd16b69539ec7514db8ae..8555bbf19d81b517493c86b38aff31a633ac50eb 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -320,7 +320,7 @@ arc*-*-*) arm*-*-*) cpu_type=arm extra_objs="arm-builtins.o aarch-common.o" - extra_headers="mmintrin.h arm_neon.h arm_acle.h" + extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_cmse.h" target_type_format_char='%' c_target_objs="arm-c.o" cxx_target_objs="arm-c.o" diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def index be46521c9eaea54f9ad78a92874567589289dbdf..0e523959551cc3b1da31411ccdd1105b830db845 100644 --- a/gcc/config/arm/arm-arches.def +++ b/gcc/config/arm/arm-arches.def @@ -63,11 +63,11 @@ ARM_ARCH("armv8.1-a+crc",cortexa53, 8A, ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A, FL2_FOR_ARCH8_1A)) ARM_ARCH("armv8-m.base", cortexm0, 8M_BASE, -ARM_FSET_MAKE_CPU1 ( FL_FOR_ARCH8M_BASE)) +ARM_FSET_MAKE ( FL_FOR_ARCH8M_BASE, FL2_CMSE)) ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN, -ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_FOR_ARCH8M_MAIN)) +ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8M_MAIN, FL2_CMSE)) ARM_ARCH("armv8-m.main+dsp", cortexm7, 8M_MAIN, -ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN)) +ARM_FSET_MAKE (FL_CO_PROC | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN, FL2_CMSE)) ARM_ARCH("iwmmxt", iwmmxt, 5TE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT)) ARM_ARCH("iwmmxt2", iwmmxt2,5TE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2)) diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c index b98470fff45b20a5398c2534bc3bb3edfb7bfd01..ad2fb09d1f9ca14300c6283f3831a527db656267 100644 --- a/gcc/config/arm/arm-c.c +++ b/gcc/config/arm/arm-c.c @@ -76,6 +76,14 @@ arm_cpu_builtins (struct cpp_reader* pfile) def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT); + if (arm_arch8 && !arm_arch_notm) +{ + if (arm_arch_cmse && use_cmse) + builtin_define_with_int_value ("__ARM_FEATURE_CMSE", 3); + else + builtin_define ("__ARM_FEATURE_CMSE"); +} + if (TARGET_ARM_FEATURE_LDREX) builtin_define_with_int_value ("__ARM_FEATURE_LDREX", TARGET_ARM_FEATURE_LDREX); diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 49c3a92dba80db32b698a0b44ad72d56111c1358..0754fb7252a8b86ffacc0cee4598686752af6e56 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -391,6 +391,7 @@ extern bool arm_is_constant_pool_ref (rtx); #define FL_ARCH6KZ(1 << 31) /* ARMv6KZ architecture. */ #define FL2_ARCH8_1 (1 << 0) /* Architecture 8.1. */ +#define FL2_CMSE (1 << 1) /* ARMv8-M Security Extensions. */ /* Flags that only effect tuning, not available instructions. */ #define FL_TUNE(FL
[PATCH 2/7, GCC, ARM, V8M] Handling ARMv8-M Security Extension's cmse_nonsecure_entry attribute
This patch adds support for the ARMv8-M Security Extensions 'cmse_nonsecure_entry' attribute. In this patch we implement the attribute handling and diagnosis around the attribute. See Section 5.4 of ARM®v8-M Security Extensions (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html). *** gcc/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * config/arm/arm.c (arm_handle_cmse_nonsecure_entry): New. (arm_attribute_table): Added cmse_nonsecure_entry (arm_compute_func_type): Handle cmse_nonsecure_entry. (cmse_func_args_or_return_in_stack): New. (arm_handle_cmse_nonsecure_entry): New. * config/arm/arm.h (ARM_FT_CMSE_ENTRY): New macro define. (IS_CMSE_ENTRY): Likewise. *** gcc/testsuite/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse-3.c: New. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index e3697bbcb425999db31ac2b4f47e14bb3f2ffa89..5307ec8f904230db5ea44150ef471d928926ab6d 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1373,6 +1373,7 @@ enum reg_class #define ARM_FT_VOLATILE(1 << 4) /* Does not return. */ #define ARM_FT_NESTED (1 << 5) /* Embedded inside another func. */ #define ARM_FT_STACKALIGN (1 << 6) /* Called with misaligned stack. */ +#define ARM_FT_CMSE_ENTRY (1 << 7) /* ARMv8-M non-secure entry function. */ /* Some macros to test these flags. */ #define ARM_FUNC_TYPE(t) (t & ARM_FT_TYPE_MASK) @@ -1381,6 +1382,7 @@ enum reg_class #define IS_NAKED(t)(t & ARM_FT_NAKED) #define IS_NESTED(t) (t & ARM_FT_NESTED) #define IS_STACKALIGN(t) (t & ARM_FT_STACKALIGN) +#define IS_CMSE_ENTRY(t) (t & ARM_FT_CMSE_ENTRY) /* Structure used to hold the function stack frame layout. Offsets are diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 9903d9cd8c5ff68a2318a643bdf31cf48016eba4..11417ab3c2f7101866ee5d6b100913480e5c336e 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -134,6 +134,7 @@ static tree arm_handle_isr_attribute (tree *, tree, tree, int, bool *); #if TARGET_DLLIMPORT_DECL_ATTRIBUTES static tree arm_handle_notshared_attribute (tree *, tree, tree, int, bool *); #endif +static tree arm_handle_cmse_nonsecure_entry (tree *, tree, tree, int, bool *); static void arm_output_function_epilogue (FILE *, HOST_WIDE_INT); static void arm_output_function_prologue (FILE *, HOST_WIDE_INT); static int arm_comp_type_attributes (const_tree, const_tree); @@ -343,6 +344,9 @@ static const struct attribute_spec arm_attribute_table[] = { "notshared",0, 0, false, true, false, arm_handle_notshared_attribute, false }, #endif + /* ARMv8-M Security Extensions support. */ + { "cmse_nonsecure_entry", 0, 0, true, false, false, +arm_handle_cmse_nonsecure_entry, false }, { NULL, 0, 0, false, false, false, NULL, false } }; @@ -3633,6 +3637,9 @@ arm_compute_func_type (void) else type |= arm_isr_value (TREE_VALUE (a)); + if (lookup_attribute ("cmse_nonsecure_entry", attr)) +type |= ARM_FT_CMSE_ENTRY; + return type; } @@ -6634,6 +6641,110 @@ arm_handle_notshared_attribute (tree *node, } #endif +/* This function returns true if a function with declaration FNDECL, name + NAME and type FNTYPE uses the stack to pass arguments or return variables + and false otherwise. This is used for functions with the attributes + 'cmse_nonsecure_call' or 'cmse_nonsecure_entry' and this function will issue + diagnostic messages if the stack is used. */ + +static bool +cmse_func_args_or_return_in_stack (tree fndecl, tree name, tree fntype) +{ + function_args_iterator args_iter; + CUMULATIVE_ARGS args_so_far_v; + cumulative_args_t args_so_far; + bool first_param = true; + tree arg_type, prev_arg_type = NULL_TREE, ret_type; + + /* Error out if any argument is passed on the stack. */ + arm_init_cumulative_args (&args_so_far_v, fntype, NULL_RTX, fndecl); + args_so_far = pack_cumulative_args (&args_so_far_v); + FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter) +{ + rtx arg_rtx; + machine_mode arg_mode = TYPE_MODE (arg_type); + + prev_arg_type = arg_type; + if (VOID_TYPE_P (arg_type)) + continue; + + if (!first_param) + arm_function_arg_advance (args_so_far, arg_mode, arg_type, true); + arg_rtx = arm_function_arg (args_so_far, arg_mode, arg_type, true); + if (!arg_rtx + || arm_arg_partial_bytes (args_so_far, arg_mode, arg_type, true)) + { + error ("%qE attribute not available to functions with arguments " +"passed on the stack", name); + return true; + } + first_param = false; +} + + /* Error out for variadic functions since we cannot control how many + arguments will be passed and thus stack could be used. stdarg_p () is not +
[PATCH] Fix PR71984
The following should fix PR71984 which stems from a bogus (but harmless for CONCAT) use of GET_MODE_UNIT_SIZE vs. GET_MODE_SIZE (where the former is really really badly named...). In the process of fixing it I also hardened it against VOIDmode which might have occured for CONCAT as well given we have CSImode. Bootstrap / regtest running on x86_64-unknown-linux-gnu. I don't have AVX512 HW to check but by inspection the testcase is fixed. (and the issue is obvious once you analyzed it) It was Honza who sneaked in this GET_MODE_UNIT_SIZE use, the bad name exists since the initial repository version. Ok for trunk? Thanks, Richard. 2016-07-25 Richard Biener PR rtl-optimization/71984 * simplify-rtx.c (simplify_subreg): Use GET_MODE_SIZE and prepare for VOIDmode. * gcc.dg/torture/pr71984.c: New testcase. Index: gcc/simplify-rtx.c === *** gcc/simplify-rtx.c (revision 238710) --- gcc/simplify-rtx.c (working copy) *** simplify_subreg (machine_mode outermode, *** 6116,6122 unsigned int part_size, final_offset; rtx part, res; ! part_size = GET_MODE_UNIT_SIZE (GET_MODE (XEXP (op, 0))); if (byte < part_size) { part = XEXP (op, 0); --- 6116,6125 unsigned int part_size, final_offset; rtx part, res; ! enum machine_mode part_mode = GET_MODE (XEXP (op, 0)); ! if (part_mode == VOIDmode) ! part_mode = GET_MODE_INNER (GET_MODE (op)); ! part_size = GET_MODE_SIZE (part_mode); if (byte < part_size) { part = XEXP (op, 0); *** simplify_subreg (machine_mode outermode, *** 6131,6137 if (final_offset + GET_MODE_SIZE (outermode) > part_size) return NULL_RTX; ! enum machine_mode part_mode = GET_MODE (part); if (part_mode == VOIDmode) part_mode = GET_MODE_INNER (GET_MODE (op)); res = simplify_subreg (outermode, part, part_mode, final_offset); --- 6134,6140 if (final_offset + GET_MODE_SIZE (outermode) > part_size) return NULL_RTX; ! part_mode = GET_MODE (part); if (part_mode == VOIDmode) part_mode = GET_MODE_INNER (GET_MODE (op)); res = simplify_subreg (outermode, part, part_mode, final_offset); Index: gcc/testsuite/gcc.dg/torture/pr71984.c === *** gcc/testsuite/gcc.dg/torture/pr71984.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr71984.c (working copy) *** *** 0 --- 1,21 + /* { dg-do run { target lp64 } } */ + /* { dg-additional-options "-w -Wno-psabi" } */ + + typedef unsigned char v64u8 __attribute__((vector_size(64))); + typedef unsigned long v64u64 __attribute__((vector_size(64))); + typedef unsigned char u8; + + static u8 __attribute__ ((noinline, noclone)) + foo (v64u64 v64u64_0) + { + return ((v64u8)(v64u64){0, v64u64_0[0]})[13]; + } + + int + main () + { + u8 x = foo((v64u64){0x0706050403020100UL}); + if (x != 5) + __builtin_abort (); + return 0; + }
[PATCH 3/7, GCC, ARM, V8M] ARMv8-M Security Extension's cmse_nonsecure_entry: __acle_se label and bxns return
This patch extends support for the ARMv8-M Security Extensions 'cmse_nonsecure_entry' attribute in two ways: 1) Generate two labels for the function, the regular function name and one with the function's name appended to '__acle_se_', this will trigger the linker to create a secure gateway veneer for this entry function. 2) Return from cmse_nonsecure_entry marked functions using bxns. See Section 5.4 of ARM®v8-M Security Extensions (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html). *** gcc/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * config/arm/arm.c (use_return_insn): Change to return with bxns when cmse_nonsecure_entry. (output_return_instruction): Likewise. (arm_output_function_prologue): Likewise. (thumb_pop): Likewise. (thumb_exit): Likewise. (arm_function_ok_for_sibcall): Disable sibcall for entry functions. (arm_asm_declare_function_name): New. * config/arm/arm-protos.h (arm_asm_declare_function_name): New. * config/arm/elf.h (ASM_DECLARE_FUNCTION_NAME): Redefine to use arm_asm_declare_function_name. *** gcc/testsuite/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse-2.c: New. * gcc.target/arm/cmse/cmse-4.c: New. diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 0754fb7252a8b86ffacc0cee4598686752af6e56..1d2e35b52f631570450b6c8eaf077e18e9b99203 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -31,6 +31,7 @@ extern int arm_volatile_func (void); extern void arm_expand_prologue (void); extern void arm_expand_epilogue (bool); extern void arm_declare_function_name (FILE *, const char *, tree); +extern void arm_asm_declare_function_name (FILE *, const char *, tree); extern void thumb2_expand_return (bool); extern const char *arm_strip_name_encoding (const char *); extern void arm_asm_output_labelref (FILE *, const char *); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 11417ab3c2f7101866ee5d6b100913480e5c336e..9fba371768b1eba3a11dc8aa5d6acf8cc30f464d 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3866,6 +3866,11 @@ use_return_insn (int iscond, rtx sibling) return 0; } + /* ARMv8-M nonsecure entry function need to use bxns to return and thus need + several instructions if anything needs to be popped. */ + if (saved_int_regs && IS_CMSE_ENTRY (func_type)) +return 0; + /* If there are saved registers but the LR isn't saved, then we need two instructions for the return. */ if (saved_int_regs && !(saved_int_regs & (1 << LR_REGNUM))) @@ -6903,6 +6908,11 @@ arm_function_ok_for_sibcall (tree decl, tree exp) if (IS_INTERRUPT (func_type)) return false; + /* ARMv8-M non-secure entry functions need to return with bxns which is only + generated for entry functions themselves. */ + if (IS_CMSE_ENTRY (arm_current_func_type ())) +return false; + if (!VOID_TYPE_P (TREE_TYPE (DECL_RESULT (cfun->decl { /* Check that the return value locations are the same. For @@ -19739,6 +19749,7 @@ output_return_instruction (rtx operand, bool really_return, bool reverse, (e.g. interworking) then we can load the return address directly into the PC. Otherwise we must load it into LR. */ if (really_return + && !IS_CMSE_ENTRY (func_type) && (IS_INTERRUPT (func_type) || !TARGET_INTERWORK)) return_reg = reg_names[PC_REGNUM]; else @@ -19879,8 +19890,10 @@ output_return_instruction (rtx operand, bool really_return, bool reverse, break; default: + if (IS_CMSE_ENTRY (func_type)) + snprintf (instr, sizeof (instr), "bxns%s\t%%|lr", conditional); /* Use bx if it's available. */ - if (arm_arch5 || arm_arch4t) + else if (arm_arch5 || arm_arch4t) sprintf (instr, "bx%s\t%%|lr", conditional); else sprintf (instr, "mov%s\t%%|pc, %%|lr", conditional); @@ -19893,6 +19906,42 @@ output_return_instruction (rtx operand, bool really_return, bool reverse, return ""; } +/* Output in FILE asm statements needed to declare the NAME of the function + defined by its DECL node. */ + +void +arm_asm_declare_function_name (FILE *file, const char *name, tree decl) +{ + size_t cmse_name_len; + char *cmse_name = 0; + char cmse_prefix[] = "__acle_se_"; + + if (use_cmse && lookup_attribute ("cmse_nonsecure_entry", + DECL_ATTRIBUTES (decl))) +{ + cmse_name_len = sizeof (cmse_prefix) + strlen (name); + cmse_name = XALLOCAVEC (char, cmse_name_len); + snprintf (cmse_name, cmse_name_len, "%s%s", cmse_prefix, name); + targetm.asm_out.globalize_label (file, cmse_name); +} + + if (cmse_name) +{ + ARM_DECLARE_FUNCTION_NAME (file, cmse_name, decl);
[PATCH 4/7, GCC, ARM, V8M] ARMv8-M Security Extension's cmse_nonsecure_entry: clear registers
This patch extends support for the ARMv8-M Security Extensions 'cmse_nonsecure_entry' attribute to safeguard against leak of information through unbanked registers. When returning from a nonsecure entry function we clear all caller-saved registers that are not used to pass return values, by writing either the LR, in case of general purpose registers, or the value 0, in case of FP registers. We use the LR to write to APSR and FPSCR too. We currently do not support entry functions that pass arguments or return variables on the stack and we diagnose this. This patch relies on the existing code to make sure callee-saved registers used in cmse_nonsecure_entry functions are saved and restored thus retaining their nonsecure mode value, this should be happening already as it is required by AAPCS. This patch also clears padding bits for cmse_nonsecure_entry functions with struct and union return types. For unions a bit is only considered a padding bit if it is an unused bit in every field of that union. The function that calculates these is used in a later patch to do the same for arguments of cmse_nonsecure_call's. *** gcc/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * config/arm/arm.c (output_return_instruction): Clear registers. (thumb2_expand_return): Likewise. (thumb1_expand_epilogue): Likewise. (thumb_exit): Likewise. (arm_expand_epilogue): Likewise. (cmse_nonsecure_entry_clear_before_return): New. (comp_not_to_clear_mask_str_un): New. (compute_not_to_clear_mask): New. * config/arm/thumb1.md (*epilogue_insns): Change length attribute. * config/arm/thumb2.md (*thumb2_return): Likewise. *** gcc/testsuite/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse.exp: Test different multilibs separate. * gcc.target/arm/cmse/struct-1.c: New. * gcc.target/arm/cmse/bitfield-1.c: New. * gcc.target/arm/cmse/bitfield-2.c: New. * gcc.target/arm/cmse/bitfield-3.c: New. * gcc.target/arm/cmse/baseline/cmse-2.c: Test that registers are cleared. * gcc.target/arm/cmse/mainline/soft/cmse-5.c: New. * gcc.target/arm/cmse/mainline/hard/cmse-5.c: New. * gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: New. * gcc.target/arm/cmse/mainline/softfp/cmse-5.c: New. * gcc.target/arm/cmse/mainline/softfp-sp/cmse-5.c: New. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 9fba371768b1eba3a11dc8aa5d6acf8cc30f464d..81a9d9a6fb29d0956a661734d60dd2e44cb554b8 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -17473,6 +17473,279 @@ note_invalid_constants (rtx_insn *insn, HOST_WIDE_INT address, int do_pushes) return; } +/* This function computes the clear mask and PADDING_BITS_TO_CLEAR for structs + and unions in the context of ARMv8-M Security Extensions. It is used as a + helper function for both 'cmse_nonsecure_call' and 'cmse_nonsecure_entry' + functions. The PADDING_BITS_TO_CLEAR pointer can be the base to either one + or four masks, depending on whether it is being computed for a + 'cmse_nonsecure_entry' return value or a 'cmse_nonsecure_call' argument + respectively. The tree for the type of the argument or a field within an + argument is passed in ARG_TYPE, the current register this argument or field + starts in is kept in the pointer REGNO and updated accordingly, the bit this + argument or field starts at is passed in STARTING_BIT and the last used bit + is kept in LAST_USED_BIT which is also updated accordingly. */ + +static unsigned HOST_WIDE_INT +comp_not_to_clear_mask_str_un (tree arg_type, int * regno, + uint32_t * padding_bits_to_clear, + unsigned starting_bit, int * last_used_bit) + +{ + unsigned HOST_WIDE_INT not_to_clear_reg_mask = 0; + + if (TREE_CODE (arg_type) == RECORD_TYPE) +{ + unsigned current_bit = starting_bit; + tree field; + long int offset, size; + + + field = TYPE_FIELDS (arg_type); + while (field) + { + /* The offset within a structure is always an offset from +the start of that structure. Make sure we take that into the +calculation of the register based offset that we use here. */ + offset = starting_bit; + offset += TREE_INT_CST_ELT (DECL_FIELD_BIT_OFFSET (field), 0); + offset %= 32; + + /* This is the actual size of the field, for bitfields this is the +bitfield width and not the container size. */ + size = TREE_INT_CST_ELT (DECL_SIZE (field), 0); + + if (*last_used_bit != offset) + { + if (offset < *last_used_bit) + { + /* This field's offset is before the 'last_used_bit', that +means this field goes on the next register. So we need to
[PATCH 5/7, GCC, ARM, V8M] Handling ARMv8-M Security Extension's cmse_nonsecure_call attribute
This patch adds support for the ARMv8-M Security Extensions 'cmse_nonsecure_call' attribute. This attribute may only be used for function types and when used in combination with the '-mcmse' compilation flag. See Section 5.5 of ARM®v8-M Security Extensions (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html). We currently do not support cmse_nonsecure_call functions that pass arguments or return variables on the stack and we diagnose this. *** gcc/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * config/arm/arm.c (gimplify.h): New include. (arm_handle_cmse_nonsecure_call): New. (arm_attribute_table): Added cmse_nonsecure_call. *** gcc/testsuite/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse-3.c: Add tests. * gcc.target/arm/cmse/cmse-4.c: Add tests. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81a9d9a6fb29d0956a661734d60dd2e44cb554b8..128baae92e4b65507eb18b679874b1ad24ca7c4a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -61,6 +61,7 @@ #include "builtins.h" #include "tm-constrs.h" #include "rtl-iter.h" +#include "gimplify.h" /* This file should be included last. */ #include "target-def.h" @@ -135,6 +136,7 @@ static tree arm_handle_isr_attribute (tree *, tree, tree, int, bool *); static tree arm_handle_notshared_attribute (tree *, tree, tree, int, bool *); #endif static tree arm_handle_cmse_nonsecure_entry (tree *, tree, tree, int, bool *); +static tree arm_handle_cmse_nonsecure_call (tree *, tree, tree, int, bool *); static void arm_output_function_epilogue (FILE *, HOST_WIDE_INT); static void arm_output_function_prologue (FILE *, HOST_WIDE_INT); static int arm_comp_type_attributes (const_tree, const_tree); @@ -347,6 +349,8 @@ static const struct attribute_spec arm_attribute_table[] = /* ARMv8-M Security Extensions support. */ { "cmse_nonsecure_entry", 0, 0, true, false, false, arm_handle_cmse_nonsecure_entry, false }, + { "cmse_nonsecure_call", 0, 0, true, false, false, +arm_handle_cmse_nonsecure_call, false }, { NULL, 0, 0, false, false, false, NULL, false } }; @@ -6750,6 +6754,78 @@ arm_handle_cmse_nonsecure_entry (tree *node, tree name, return NULL_TREE; } + +/* Called upon detection of the use of the cmse_nonsecure_call attribute, this + function will check whether the attribute is allowed here and will add the + attribute to the function type tree or otherwise issue a diagnostic. The + reason we check this at declaration time is to only allow the use of the + attribute with declarations of function pointers and not function + declarations. This function checks NODE is of the expected type and issues + diagnostics otherwise using NAME. If it is not of the expected type + *NO_ADD_ATTRS will be set to true. */ + +static tree +arm_handle_cmse_nonsecure_call (tree *node, tree name, +tree /* args */, +int /* flags */, +bool *no_add_attrs) +{ + tree decl = NULL_TREE; + tree type, fntype, main_variant; + + if (!use_cmse) +{ + *no_add_attrs = true; + return NULL_TREE; +} + + if (TREE_CODE (*node) == VAR_DECL || TREE_CODE (*node) == TYPE_DECL) +{ + decl = *node; + type = TREE_TYPE (decl); +} + + if (!decl + || (!(TREE_CODE (type) == POINTER_TYPE + && TREE_CODE (TREE_TYPE (type)) == FUNCTION_TYPE) + && TREE_CODE (type) != FUNCTION_TYPE)) +{ + warning (OPT_Wattributes, "%qE attribute only applies to base type of a " +"function pointer", name); + *no_add_attrs = true; + return NULL_TREE; +} + + /* type is either a function pointer, when the attribute is used on a function + * pointer, or a function type when used in a typedef. */ + if (TREE_CODE (type) == FUNCTION_TYPE) +fntype = type; + else +fntype = TREE_TYPE (type); + + *no_add_attrs |= cmse_func_args_or_return_in_stack (NULL, name, fntype); + + if (*no_add_attrs) +return NULL_TREE; + + /* Prevent trees being shared among function types with and without + cmse_nonsecure_call attribute. Do however make sure they keep the same + main_variant, this is required for correct DIE output. */ + main_variant = TYPE_MAIN_VARIANT (fntype); + fntype = build_distinct_type_copy (fntype); + TYPE_MAIN_VARIANT (fntype) = main_variant; + if (TREE_CODE (type) == FUNCTION_TYPE) +TREE_TYPE (decl) = fntype; + else +TREE_TYPE (type) = fntype; + + /* Construct a type attribute and add it to the function type. */ + tree attrs = tree_cons (get_identifier ("cmse_nonsecure_call"), NULL_TREE, + TYPE_ATTRIBUTES (fntype)); + TYPE_ATTRIBUTES (fntype) = attrs; + return NULL_TREE; +} + /* Return 0 if the attributes for two types are incompatible, 1 if they
[PATCH 6/7, GCC, ARM, V8M] ARMv8-M Security Extension's cmse_nonsecure_call: use __gnu_cmse_nonsecure_call
This patch extends support for the ARMv8-M Security Extensions 'cmse_nonsecure_call' to use a new library function '__gnu_cmse_nonsecure_call'. This library function is responsible for (without using r0-r3 or d0-d7): 1) saving and clearing all callee-saved registers using the secure stack 2) clearing the LSB of the address passed in r4 and using blxns to 'jump' to it 3) clearing ASPR, including the 'ge bits' if DSP is enabled 4) clearing FPSCR if using non-soft float-abi 5) restoring callee-saved registers. The decisions whether to include DSP 'ge bits' clearing and floating point registers (single/double precision) all depends on the multilib used. See Section 5.5 of ARM®v8-M Security Extensions (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html). *** gcc/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * config/arm/arm.c (detect_cmse_nonsecure_call): New. (cmse_nonsecure_call_clear_caller_saved): New. (arm_reorg): Use cmse_nonsecure_call_clear_caller_saved. * config/arm/arm-protos.h (detect_cmse_nonsecure_call): New. * config/arm/arm.md (call): Handle cmse_nonsecure_entry. (call_value): Likewise. (nonsecure_call_internal): New. (nonsecure_call_value_internal): New. * config/arm/thumb1.md (*nonsecure_call_reg_thumb1_v5): New. (*nonsecure_call_value_reg_thumb1_v5): New. * config/arm/thumb2.md (*nonsecure_call_reg_thumb2): New. (*nonsecure_call_value_reg_thumb2): New. * config/arm/unspecs.md (UNSPEC_NONSECURE_MEM): New. *** libgcc/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * config/arm/cmse_nonsecure_call.S: New. * config/arm/t-arm: Compile cmse_nonsecure_call.S *** gcc/testsuite/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse.exp: Run tests in mainline dir. * gcc.target/arm/cmse/cmse-9.c: Added some extra tests. * gcc.target/arm/cmse/baseline/bitfield-4.c: New. * gcc.target/arm/cmse/baseline/bitfield-5.c: New. * gcc.target/arm/cmse/baseline/bitfield-6.c: New. * gcc.target/arm/cmse/baseline/bitfield-7.c: New. * gcc.target/arm/cmse/baseline/bitfield-8.c: New. * gcc.target/arm/cmse/baseline/bitfield-9.c: New. * gcc.target/arm/cmse/baseline/bitfield-and-union-1.c: New. * gcc.target/arm/cmse/baseline/cmse-11.c: New. * gcc.target/arm/cmse/baseline/cmse-13.c: New. * gcc.target/arm/cmse/baseline/cmse-6.c: New. * gcc/testsuite/gcc.target/arm/cmse/baseline/union-1.c: New. * gcc/testsuite/gcc.target/arm/cmse/baseline/union-2.c: New. * gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: New. * gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: New. * gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: New. * gcc.target/arm/cmse/mainline/hard/cmse-13.c: New. * gcc.target/arm/cmse/mainline/hard/cmse-7.c: New. * gcc.target/arm/cmse/mainline/hard/cmse-8.c: New. * gcc.target/arm/cmse/mainline/soft/cmse-13.c: New. * gcc.target/arm/cmse/mainline/soft/cmse-7.c: New. * gcc.target/arm/cmse/mainline/soft/cmse-8.c: New. * gcc.target/arm/cmse/mainline/softfp-sp/cmse-7.c: New. * gcc.target/arm/cmse/mainline/softfp-sp/cmse-8.c: New. * gcc.target/arm/cmse/mainline/softfp/cmse-13.c: New. * gcc.target/arm/cmse/mainline/softfp/cmse-7.c: New. * gcc.target/arm/cmse/mainline/softfp/cmse-8.c: New. diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 1d2e35b52f631570450b6c8eaf077e18e9b99203..e7e223f7a932163cba8beeb76c10b4c90eae9234 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -133,6 +133,7 @@ extern int arm_const_double_inline_cost (rtx); extern bool arm_const_double_by_parts (rtx); extern bool arm_const_double_by_immediates (rtx); extern void arm_emit_call_insn (rtx, rtx, bool); +bool detect_cmse_nonsecure_call (tree); extern const char *output_call (rtx *); void arm_emit_movpair (rtx, rtx); extern const char *output_mov_long_double_arm_from_arm (rtx *); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 128baae92e4b65507eb18b679874b1ad24ca7c4a..fd0bc10083026ce544bf56244ce0eb740295e5d3 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -17822,6 +17822,197 @@ compute_not_to_clear_mask (tree arg_type, rtx arg_rtx, int regno, return not_to_clear_mask; } +/* Saves callee saved registers, clears callee saved registers and caller saved + registers not used to pass arguments before a cmse_nonsecure_call. And + restores the callee saved registers after. */ + +static void +cmse_nonsecure_call_clear_caller_saved (void) +{ + basic_block bb; + + FOR_EACH_BB_FN (bb, cfun) +{ + rtx_insn *insn; + + FOR_BB_INSNS (bb, insn) + { + uint64_t to_cle
Re: [patch] Get rid of stack trampolines for nested functions
> If I understand how this is supposed to work then this is not > future-proof against changes to the architecture. The bottom two bits > in both AArch32 (arm) and AArch64 are reserved for future use by the > architecture; they must not be used by software for tricks like this. I see, thanks for the info, the value will need to be bumped to 4 then. >From the definition of TARGET_PTRMEMFUNC_VBIT_LOCATION, I gather that the MIPS setting is probably incorrect too and ought to be bumped to 2. -- Eric Botcazou
[PATCH 7/7, GCC, ARM, V8M] Added support for ARMV8-M Security Extension cmse_nonsecure_caller intrinsic
This patch adds support ARMv8-M's Security Extension's cmse_nonsecure_caller intrinsic. This intrinsic is used to check whether an entry function was called from a non-secure state. See Section 5.4.3 of ARM®v8-M Security Extensions: Requirements on Development Tools (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html) for further details. The FIXME in config/arm/arm_cmse.h is for a diagnostic message that is suggested in the ARMv8-M Security Extensions document mentioned above, to diagnose the use of the cmse_nonsecure_caller intrinsic outside of functions with the 'cmse_nonsecure_entry' attribute. Checking whether the intrinsic is called from within such functions can easily be done inside 'arm_expand_builtin'. However, making the warning point to the right location is more complicated. The ARMv8-M Security Extensions specification does mention that such a diagnostic might become mandatory, so I might have to pick this up later, otherwise it is left as a potential extra feature. *** gcc/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * config/arm/arm-builtins.c (arm_builtins): Define ARM_BUILTIN_CMSE_NONSECURE_CALLER. (bdesc_2arg): Add line for cmse_nonsecure_caller. (arm_expand_builtin): Handle cmse_nonsecure_caller. * config/arm/arm_cmse.h (cmse_nonsecure_caller): New. *** gcc/testsuite/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse-1.c: Add test for cmse_nonsecure_caller. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 68b2839879f78e8d819444fbc11d2a91f8d6279a..2589ec2d1233f3daff94a1d35ebf63c8a9b93ecf 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -515,6 +515,8 @@ enum arm_builtins ARM_BUILTIN_GET_FPSCR, ARM_BUILTIN_SET_FPSCR, + ARM_BUILTIN_CMSE_NONSECURE_CALLER, + #undef CRYPTO1 #undef CRYPTO2 #undef CRYPTO3 @@ -1789,6 +1791,17 @@ arm_init_builtins (void) = add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr, ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE); } + + if (arm_arch_cmse) +{ + tree ftype_cmse_nonsecure_caller + = build_function_type_list (unsigned_type_node, NULL); + arm_builtin_decls[ARM_BUILTIN_CMSE_NONSECURE_CALLER] + = add_builtin_function ("__builtin_arm_cmse_nonsecure_caller", + ftype_cmse_nonsecure_caller, + ARM_BUILTIN_CMSE_NONSECURE_CALLER, BUILT_IN_MD, + NULL, NULL_TREE); +} } /* Return the ARM builtin for CODE. */ @@ -2368,6 +2381,12 @@ arm_expand_builtin (tree exp, emit_insn (pat); return target; +case ARM_BUILTIN_CMSE_NONSECURE_CALLER: + target = gen_reg_rtx (SImode); + op0 = arm_return_addr (0, NULL_RTX); + emit_insn (gen_addsi3 (target, op0, const1_rtx)); + return target; + case ARM_BUILTIN_TEXTRMSB: case ARM_BUILTIN_TEXTRMUB: case ARM_BUILTIN_TEXTRMSH: diff --git a/gcc/config/arm/arm_cmse.h b/gcc/config/arm/arm_cmse.h index c9afdcbf48160a963fc254795543a1d9e981a215..989d09f9d08268224e745da5a2e0aa85916cd3a9 100644 --- a/gcc/config/arm/arm_cmse.h +++ b/gcc/config/arm/arm_cmse.h @@ -163,6 +163,13 @@ __attribute__ ((__always_inline__)) cmse_TTAT (void *__p) __CMSE_TT_ASM (at) +/* FIXME: diagnose use outside cmse_nonsecure_entry functions. */ +__extension__ static __inline int __attribute__ ((__always_inline__)) +cmse_nonsecure_caller (void) +{ + return __builtin_arm_cmse_nonsecure_caller (); +} + #define CMSE_AU_NONSECURE 2 #define CMSE_MPU_NONSECURE 16 #define CMSE_NONSECURE 18 @@ -177,7 +184,7 @@ __extension__ void * cmse_check_address_range (void *, size_t, int); #define cmse_check_pointed_object(p, f) \ - ((typeof ((p))) cmse_check_address_range ((p), sizeof (*(p)), (f))) + ((typeof ((p))) cmse_check_address_range ((p), sizeof (*(p)), (f))) #define cmse_nsfptr_create(p) ((typeof ((p))) ((intptr_t) (p) & ~1)) diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c index d5b9a2d9d59569de170da814ae660e9fb2b943e7..ddcf12a30a6c1806969d239c448da81ccf49532e 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c +++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c @@ -65,3 +65,32 @@ int foo (char * p) /* { dg-final { scan-assembler-times "ttat " 2 } } */ /* { dg-final { scan-assembler-times "bl.cmse_check_address_range" 7 } } */ /* { dg-final { scan-assembler-not "cmse_check_pointed_object" } } */ + +typedef int (*int_ret_funcptr_t) (void); +typedef int __attribute__ ((cmse_nonsecure_call)) (*int_ret_nsfuncptr_t) (void); + +int __attribute__ ((cmse_nonsecure_entry)) +baz (void) +{ + return cmse_nonsecure_caller (); +} + +int __attribute__ ((cmse_nonsecure_entry)) +qux (int_ret_funcptr_t int_ret_funcptr)
Re: [patch] Get rid of stack trampolines for nested functions
On 25/07/16 14:25, Eric Botcazou wrote: >> If I understand how this is supposed to work then this is not >> future-proof against changes to the architecture. The bottom two bits >> in both AArch32 (arm) and AArch64 are reserved for future use by the >> architecture; they must not be used by software for tricks like this. > > I see, thanks for the info, the value will need to be bumped to 4 then. > > From the definition of TARGET_PTRMEMFUNC_VBIT_LOCATION, I gather that the > MIPS > setting is probably incorrect too and ought to be bumped to 2. > So does this require that every function have an 8-byte alignment boundary? or just those that appear in nested functions? If the former, how do you get routines written in assembler, or produced by third-party compilers, to comply? R.
[PATCH] do not take mutex in _Unwind_Find_registered_FDE if there is no registered objects
_Unwind_Find_FDE calls _Unwind_Find_registered_FDE and it takes lock even when there is no registered objects. As far as I see only statically linked applications call __register_frame_info* functions, so for dynamically linked executables taking the lock to check unseen_objects and seen_objects is a pessimization. Since the function is called on each thrown exception this is a lot of unneeded locking. This patch checks unseen_objects and seen_objects outside of the lock and returns earlier if both are NULL. diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c index 5b16a1f..41de746 100644 --- a/libgcc/unwind-dw2-fde.c +++ b/libgcc/unwind-dw2-fde.c @@ -1001,6 +1001,13 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases) struct object *ob; const fde *f = NULL; + /* __atomic_write is not used to modify unseen_objects and seen_objects + since they are modified in locked sections only and unlock provides + release semantics. */ + if (!__atomic_load_n(&unseen_objects, __ATOMIC_ACQUIRE) + && !__atomic_load_n(&seen_objects, __ATOMIC_ACQUIRE)) + return NULL; + init_object_mutex_once (); __gthread_mutex_lock (&object_mutex); @@ -1020,8 +1027,8 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases) while ((ob = unseen_objects)) { struct object **p; + struct object *next = ob->next; - unseen_objects = ob->next; f = search_object (ob, pc); /* Insert the object into the classified list. */ @@ -1031,6 +1038,8 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases) ob->next = *p; *p = ob; + unseen_objects = next; + if (f) goto fini; } -- Gleb.
Re: [patch] Get rid of stack trampolines for nested functions
> So does this require that every function have an 8-byte alignment > boundary? or just those that appear in nested functions? If the > former, how do you get routines written in assembler, or produced by > third-party compilers, to comply? The former. You need to be able to control what happens at the interface with other languages and accept what they send (and do not send them descriptors). The ISO standard contains an entire annex specifying this interfacing in Ada. -- Eric Botcazou
[PR71078] x / abs(x) -> copysign (1.0, x)
Hi, The attached patch tries to fix PR71078. I am not sure if I have got the converts right. I put (convert? @0) and (convert1? (abs @1)) to match for cases when operands's types may be different from outermost type like in pr71078-3.c test-case (included in patch). Bootstrap+test in progress on x86_64-unknown-linux-gnu. Thanks, Prathamesh diff --git a/gcc/match.pd b/gcc/match.pd index 21bf617..6c3d6ec 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -391,6 +391,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (mult (abs@1 @0) @1) (mult @0 @0)) +/* PR71078: x / abs(x) -> copysign (1.0, x) */ +(simplify + (rdiv:C (convert? @0) (convert1? (abs @0))) + (if (FLOAT_TYPE_P (type) + && ! HONOR_NANS (type) + && ! HONOR_INFINITIES (type)) + (switch +(if (type == float_type_node) + (BUILT_IN_COPYSIGNF { build_one_cst (type); } (convert @0))) +(if (type == double_type_node) + (BUILT_IN_COPYSIGN { build_one_cst (type); } (convert @0))) +(if (type == long_double_type_node) + (BUILT_IN_COPYSIGNL { build_one_cst (type); } (convert @0)) + /* cos(copysign(x, y)) -> cos(x). Similarly for cosh. */ (for coss (COS COSH) copysigns (COPYSIGN) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c new file mode 100644 index 000..6204c14 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */ + +#include + +float f1(float x) +{ + float t1 = fabsf (x); + float t2 = x / t1; + return t2; +} + +double f2(double x) +{ + double t1 = fabs (x); + double t2 = x / t1; + return t2; +} + +long double f3 (long double x) +{ + long double t1 = fabsl (x); + long double t2 = x / t1; + return t2; +} + +/* { dg-final { scan-tree-dump "__builtin_copysignf" "forwprop1" } } */ +/* { dg-final { scan-tree-dump "__builtin_copysign" "forwprop1" } } */ +/* { dg-final { scan-tree-dump "__builtin_copysignl" "forwprop1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c new file mode 100644 index 000..96485af --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */ + +#include + +float f1(float x) +{ + float t1 = fabsf (x); + float t2 = t1 / x; + return t2; +} + +double f2(double x) +{ + double t1 = fabs (x); + double t2 = t1 / x; + return t2; +} + +long double f3 (long double x) +{ + long double t1 = fabsl (x); + long double t2 = t1 / x; + return t2; +} + +/* { dg-final { scan-tree-dump "__builtin_copysignf" "forwprop1" } } */ +/* { dg-final { scan-tree-dump "__builtin_copysign" "forwprop1" } } */ +/* { dg-final { scan-tree-dump "__builtin_copysignl" "forwprop1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-3.c new file mode 100644 index 000..8780b6a --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-3.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */ + +#include +double f(float f) +{ + double t1 = fabs(f); + double t2 = f / t1; + return t2; +} + +/* { dg-final { scan-tree-dump "__builtin_copysign" "forwprop1" } } */ ChangeLog Description: Binary data
Re: Gimple loop splitting
Hi, On Sun, 24 Jul 2016, Andrew Pinski wrote: > What ever happened to this patch? It got accepted but I deferred inclusion in GCC 6 because it was late in the cycle then and performance results didn't show super improvements (only looked at cpu2006). No regressions, but no nice speedups either. > I was looking into doing this myself today but I found this patch. It is > stage 1 of GCC 7, it might be a good idea to get this patch into GCC. Indeed. If you want to performance test it on something you know where it should help, I'm all ears. Ciao, Michael.
Re: libgo patch committed: Update to 1.7rc3
Hi Ian, > I have committed a patch to update libgo to the 1.7rc3 release > candidate. This is very close to the upcoming 1.7 release. As usual > with libgo updates, the patch is too large to include in this e-mail > message. I've appended the changes to the gccgo-specific directories. this broke Solaris bootstrap: /vol/gcc/src/hg/trunk/local/libgo/go/os/user/decls_solaris.go:24:46: error: reference to undefined identifier 'syscall.group' func libc_getgrgid_r(gid syscall.Gid_t, grp *syscall.group, buf *byte, buflen syscall.Size_t, result **syscall.group) int ^ /vol/gcc/src/hg/trunk/local/libgo/go/os/user/decls_solaris.go:24:104: error: reference to undefined identifier 'syscall.group' func libc_getgrgid_r(gid syscall.Gid_t, grp *syscall.group, buf *byte, buflen syscall.Size_t, result **syscall.group) int ^ make[4]: *** [os/user.lo] Error 1 Easily fixed by the following patch: diff --git a/libgo/go/os/user/decls_solaris.go b/libgo/go/os/user/decls_solaris.go --- a/libgo/go/os/user/decls_solaris.go +++ b/libgo/go/os/user/decls_solaris.go @@ -21,4 +21,4 @@ func libc_getpwuid_r(uid syscall.Uid_t, func libc_getgrnam_r(name *byte, grp *syscall.Group, buf *byte, buflen syscall.Size_t, result **syscall.Group) int //extern __posix_getgrgid_r -func libc_getgrgid_r(gid syscall.Gid_t, grp *syscall.group, buf *byte, buflen syscall.Size_t, result **syscall.group) int +func libc_getgrgid_r(gid syscall.Gid_t, grp *syscall.Group, buf *byte, buflen syscall.Size_t, result **syscall.Group) int There are also a couple of testsuite regressions I need to investigate. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH 1/6] add auto_sbitmap class
On 07/24/2016 10:28 AM, Richard Biener wrote: On July 24, 2016 1:44:44 PM GMT+02:00, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2016-07-24 Trevor Saunders * sbitmap.h (auto_sbitmap): New class. OK. Do we need a private move constructor and assignment operator here too? jeff
Re: [patch] Get rid of stack trampolines for nested functions
On 07/25/2016 06:56 AM, Eric Botcazou wrote: Another UNITS_PER_WORD that I think ought to be POINTER_SIZE/BITS_PER_UNIT. Probably worth a pass over the patch to look for this throughout. Yes, it was very likely enabled only on platforms with word-sized pointers. That's what I suspected. Thankfully it's not a huge patch and I didn't see anything else that would likely trip those oddballs that use something other than word size pointers. +ftrampolines +Common Report Var(flag_trampolines) Init(0) +Always generate trampolines for pointers to nested functions Even for targets that use procedure descriptors natively? I'm not sure what's the best choice for them -- I could argue either way on this issue. For targets that use procedure descriptors natively, it's a no-op because the entire patch is supposed to be a no-op for them. I'm going to rephrase. Ah, OK, it's just misleading docs. I'll trust you to clean that up. Not real happy to see target stuff getting into tree-*. Is this really needed? It's the target hook mentioned earlier, e.g. for targets that use procedure descriptors natively we don't want to create generic descriptors but use the regular "degenerated" trampolines. Makes sense. I see this happening more and more and it makes me wonder if we want to make a hard line in the sand by splitting out the key bits necessary to allow these kinds hook calls, but not direct access to the rest of the target bits. But that's outside the scope of this patch. I think the biggest issues are the ABI concerns and getting the target maintainers to take a looksie. Everything else looks pretty good. Thanks for the review. So i think with the various nits I pointed out the target independent stuff is good to go on the trunk. Then you can just iterate with the target guys to get those wrapped up. jeff
Re: [PATCH] Teach VRP to register assertions along default switch labels (PR 18046)
On 07/24/2016 11:56 AM, Andrew Pinski wrote: On Sun, Jul 24, 2016 at 9:49 AM, Patrick Palka wrote: On Sat, Jul 23, 2016 at 9:13 PM, kugan wrote: On 23/07/16 05:26, Patrick Palka wrote: This patch teaches VRP to register along a default switch label assertions that corresponds to the anti range of each case label. Hi Patrick, In case of a larger switch statement with case values that cannot be combined, you could end up with very large number of ASSERT_EXPRs in the default basic block. I am not sure if this would have any impact on compile time/memory usage? If that is the case you might want to punt at some length? Hi Kugan, That sounds like a good idea to me. So I gathered some info about the number of assert_exprs inserted in the default bb during GCC bootstrap with this patch: Max # of asserts inserted in a default BB: 59 Median #: 2 Average #: 3.33817 Std. dev: 4.76926 95% percentile: 10 Based on the last point I guess 10 would be a good limit? Make sure this is a parameter so someone can increase it if they want. But 10 seems like a good default. Agreed on both points. jeff
Re: Init df for split pass since some target use REG_NOTE in split pattern
On 07/25/2016 12:36 AM, Kito Cheng wrote: Hi all: Some target(for example i386, sh and h8300) use find_regno_note in split pattern but df infrastructure seem not initialize at split pass, so it may got wrong note since it's out-of-date. ChangeLog 2016-07-25 Kito Cheng * gcc/recog.c (split_all_insns): Initialize df for split pass. (split_all_insns_noflow): Likewise. Patch was not included/attached to your message. Please resend with the patch and hopefully a testcase which exhibits the problem. jeff
Re: libgo patch committed: Update to 1.7rc3
libgo version bump to indicate the change in Go version? On 07/22/2016 01:15 PM, Ian Lance Taylor wrote: I have committed a patch to update libgo to the 1.7rc3 release candidate. This is very close to the upcoming 1.7 release. As usual with libgo updates, the patch is too large to include in this e-mail message. I've appended the changes to the gccgo-specific directories. Ian
[committed] testsuite: add two more label_values annotations
Hi, I've applied the following patch as obvious to mark two more tests as taking addresses of labels (this cannot work on nvptx). The issue on pr16973.c was uncovered in the fallout of my (now reverted) toplevel-reorder patch; the other test, pr71494.c, fails regardless of that. 2016-07-25 Alexander Monakov * gcc.c-torture/execute/pr71494.c: Require label_values. * gcc.dg/pr16973.c: Ditto. diff --git a/gcc/testsuite/gcc.c-torture/execute/pr71494.c b/gcc/testsuite/gcc.c-torture/execute/pr71494.c index f962f2c..cee407d 100644 --- a/gcc/testsuite/gcc.c-torture/execute/pr71494.c +++ b/gcc/testsuite/gcc.c-torture/execute/pr71494.c @@ -1,4 +1,5 @@ /* PR middle-end/71494 */ +/* { dg-require-effective-target label_values } */ int main () diff --git a/gcc/testsuite/gcc.dg/pr16973.c b/gcc/testsuite/gcc.dg/pr16973.c index e24c9f8..83274a1 100644 --- a/gcc/testsuite/gcc.dg/pr16973.c +++ b/gcc/testsuite/gcc.dg/pr16973.c @@ -3,6 +3,7 @@ to add back the label. */ /* { dg-options "" } */ +/* { dg-require-effective-target label_values } */ void f (void)
Re: GCC testsuite maintenance (was: [PATCH] Fix OpenACC vector_length parsing in fortran)
On Fri, 15 Jul 2016, Thomas Schwinge wrote: > > No, we want to have as little churn as possible in existing tests, the > > general policy is to add new tests (not just for OpenACC/OpenMP, but for > > all functionality). > > Hmm, that's something I had not been aware of, and I can't find this > covered in the documentation. So, you're basically saying that files in > the testsuite are write-once, and should not be maintained aside from > fixing errors, adjusting due to optimization changes, or due to changed > diagnostics, and the like? (Of course, I do agree that we shouldn't Yes, that's my view. It makes it easier to distinguish regressions from new tests that fail (on some platforms) if what a test tests doesn't change unnecessarily. -- Joseph S. Myers jos...@codesourcery.com
Re: [PR70920] transform (intptr_t) x eq/ne CST to x eq/ne (typeof x) cst
On 25 July 2016 at 14:32, Richard Biener wrote: > On Mon, 25 Jul 2016, Prathamesh Kulkarni wrote: > >> Hi Richard, >> The attached patch tries to fix PR70920. >> It adds your pattern from comment 1 in the PR >> (with additional gating on INTEGRAL_TYPE_P to avoid regressing >> finalize_18.f90) >> and second pattern, which is reverse of the first transform. >> I needed to update ssa-dom-branch-1.c because with patch applied, >> jump threading removed the second if (i != 0B) block. >> The dumps with and without patch for ssa-dom-branch-1.c start >> to differ with forwprop1: >> >> before: >> : >> _1 = temp_16(D)->code; >> _2 = _1 == 42; >> _3 = (int) _2; >> _4 = (long int) _3; >> temp_17 = (struct rtx_def *) _4; >> if (temp_17 != 0B) >> goto ; >> else >> goto ; >> >> after: >> : >> _1 = temp_16(D)->code; >> _2 = _1 == 42; >> _3 = (int) _2; >> _4 = (long int) _2; >> temp_17 = (struct rtx_def *) _4; >> if (_1 == 42) >> goto ; >> else >> goto ; >> >> I suppose the transform is correct for above test-case ? >> >> Then vrp dump shows: >> Threaded jump 5 --> 9 to 13 >> Threaded jump 8 --> 9 to 13 >> Threaded jump 3 --> 9 to 13 >> Threaded jump 12 --> 9 to 14 >> Removing basic block 9 >> basic block 9, loop depth 0 >> pred: >> if (i1_10(D) != 0B) >> goto ; >> else >> goto ; >> succ: 10 >> 11 >> >> So there remained two instances of if (i1_10 (D) != 0B) in dom2 dump file, >> and hence needed to update the test-case. >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu. >> OK to commit ? > > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3408,3 +3408,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > { CONSTRUCTOR_ELT (ctor, idx / k)->value; }) > (BIT_FIELD_REF { CONSTRUCTOR_ELT (ctor, idx / k)->value; } >@1 { bitsize_int ((idx % k) * width); }) > + > +/* PR70920: Transform (intptr_t)x eq/ne CST to x eq/ne (typeof x) CST. > */ > + > +(for cmp (ne eq) > + (simplify > + (cmp (convert@2 @0) INTEGER_CST@1) > + (if (POINTER_TYPE_P (TREE_TYPE (@0)) > + && INTEGRAL_TYPE_P (TREE_TYPE (@2))) > > you can use @1 here and omit @2. > > + (cmp @0 (convert @1) > + > +/* Reverse of the above case: > + x has integral_type, CST is a pointer constant. > + Transform (typeof CST)x eq/ne CST to x eq/ne (typeof x) CST. */ > + > +(for cmp (ne eq) > + (simplify > + (cmp (convert @0) @1) > + (if (POINTER_TYPE_P (TREE_TYPE (@1)) > + && INTEGRAL_TYPE_P (TREE_TYPE (@0))) > +(cmp @0 (convert @1) > > The second pattern lacks the INTEGER_CST on @1 so it doesn't match > its comment. Please do not add vertical space between pattern > comment and pattern. > > Please place patterns not at the end of match.pd but where similar > transforms are done. Like after > > /* Simplify pointer equality compares using PTA. */ > (for neeq (ne eq) > (simplify > (neeq @0 @1) > (if (POINTER_TYPE_P (TREE_TYPE (@0)) >&& ptrs_compare_unequal (@0, @1)) >{ neeq == EQ_EXPR ? boolean_false_node : boolean_true_node; }))) > > please also share the (for ...) for both patterns or merge them > by changing the condition to > > (if ((POINTER_TYPE_P (TREE_TYPE (@0)) > && INTEGRAL_TYPE_P (TREE_TYPE (@1))) >|| (INTEGRAL_TYPE_P (TREE_TYPE (@0)) >&& POINTER_TYPE_P (TREE_TYPE (@1 > Hi, Done suggested changes in this version. pr70920-4.c (test-case in patch) is now folded during ccp instead of forwprop after merging the two patterns. Passes bootstrap+test on x86_64-unknown-linux-gnu. OK for trunk ? Thanks, Prathamesh > Richard. > >> PS: Writing changelog entries for match.pd is a bit tedious. >> Should we add optional names for pattern so we can refer to them by names >> in the ChangeLog for the more complicated ones ? >> Or maybe just use comments: >> (simplify /* name */ ... ) -;) > > That will add the fun of inventing names ;) > >> Thanks, >> Prathamesh >> > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg) diff --git a/gcc/match.pd b/gcc/match.pd index 21bf617..6c2ec82 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2513,6 +2513,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && ptrs_compare_unequal (@0, @1)) { neeq == EQ_EXPR ? boolean_false_node : boolean_true_node; }))) +/* PR70920: Transform (intptr_t)x eq/ne CST to x eq/ne (typeof x) CST. + and (typeof ptr_cst) x eq/ne ptr_cst to x eq/ne (typeof x) CST */ +(for cmp (ne eq) + (simplify + (cmp (convert @0) INTEGER_CST@1) + (if ((POINTER_TYPE_P (TREE_TYPE (@0)) && INTEGRAL_TYPE_P (TREE_TYPE (@1))) +|| (INTEGRAL_TYPE_P (TREE_TYPE (@0)) && POINTER_TYPE_P (TREE_TYPE (@1 + (cmp @0 (convert @1) + /* Non-equality compare simplifications from fold_binary */ (for cmp (lt gt le ge) /* Comparisons with the highest or lowest possible integer of diff --git a/gcc/testsuite/gcc.dg/pr70920-1.c b/gcc/testsuite/gcc.dg/pr70920-1.
Re: [Patch, testuite, committed] Fix some more tests that fail for non 32-bit int targets
On Jul 25, 2016, at 5:00 AM, Senthil Kumar Selvaraj wrote: > > The below patch fixes tests that fail for the avr target, because they > assume ints are atleast 32 bits wide and pointers and longs have the > same size. > > I've required int32plus support for one test, and for the other two, > I've introduced a cast to intptr_t to avoid the pointer <-> int size > difference warning. > > Reg tested on avr and x86_64 with no regressions. Committed as > obvious. Can you use __INTPTR_TYPE__ instead? This way, you don't need to add any include, and the test case will work in cross environments without any libc, which is at times handy. Thanks. See grep INTPTR gcc/gcc/testsuite/gcc.dg/*.c for how people use it, and for the unsigned version. > 2016-07-25 Senthil Kumar Selvaraj > > * gcc.dg/torture/pr69352.c (foo): Cast to intptr_t instead of long. > * gcc.dg/torture/pr69771.c: Require int32plus. > * gcc.dg/torture/pr71866.c (inb): Add cast to intptr_t.
Re: [PATCH 2/6] use auto_sbitmap in various places
On 07/24/2016 05:44 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2016-07-24 Trevor Saunders * bt-load.c (compute_out): Use auto_sbitmap class. (link_btr_uses): Likewise. * cfganal.c (mark_dfs_back_edges): Likewise. (post_order_compute): Likewise. (inverted_post_order_compute): Likewise. (pre_and_rev_post_order_compute_fn): Likewise. (single_pred_before_succ_order): Likewise. * cfgexpand.c (pass_expand::execute): Likewise. * cfgloop.c (verify_loop_structure): Likewise. * cfgloopmanip.c (fix_bb_placements): Likewise. (remove_path): Likewise. (update_dominators_in_loop): Likewise. * cfgrtl.c (break_superblocks): Likewise. * ddg.c (check_sccs): Likewise. (create_ddg_all_sccs): Likewise. * df-core.c (df_worklist_dataflow): Likewise. * dse.c (dse_step3): Likewise. * except.c (eh_region_outermost): Likewise. * function.c (thread_prologue_and_epilogue_insns): Likewise. * gcse.c (prune_expressions): Likewise. (prune_insertions_deletions): Likewise. * gimple-ssa-backprop.c (backprop::~backprop): Likewise. * graph.c (draw_cfg_nodes_no_loops): Likewise. * ira-lives.c (remove_some_program_points_and_update_live_ranges): Likewise. * lcm.c (compute_earliest): Likewise. (compute_farthest): Likewise. * loop-unroll.c (unroll_loop_constant_iterations): Likewise. (unroll_loop_runtime_iterations): Likewise. (unroll_loop_stupid): Likewise. * lower-subreg.c (decompose_multiword_subregs): Likewise. * lra-lives.c: Likewise. * lra.c (lra): Likewise. * modulo-sched.c (schedule_reg_moves): Likewise. (optimize_sc): Likewise. (get_sched_window): Likewise. (sms_schedule_by_order): Likewise. (check_nodes_order): Likewise. (order_nodes_of_sccs): Likewise. (order_nodes_in_scc): Likewise. * recog.c (split_all_insns): Likewise. * regcprop.c (pass_cprop_hardreg::execute): Likewise. * reload1.c (reload): Likewise. * sched-rgn.c (haifa_find_rgns): Likewise. (split_edges): Likewise. (compute_trg_info): Likewise. * sel-sched.c (init_seqno): Likewise. * store-motion.c (remove_reachable_equiv_notes): Likewise. * tree-into-ssa.c (update_ssa): Likewise. * tree-ssa-live.c (live_worklist): Likewise. * tree-ssa-loop-im.c (fill_always_executed_in): Likewise. * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): * Likewise. (try_peel_loop): Likewise. * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): * Likewise. * tree-ssa-pre.c (compute_antic): Likewise. * tree-ssa-reassoc.c (undistribute_ops_list): Likewise. * tree-stdarg.c (reachable_at_most_once): Likewise. * tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Likewise. * var-tracking.c (vt_find_locations): Likewise. While looking at this, I noticed some more. Check the local sbitmaps in ddg.c::find_nodes_on_paths. I wonder if building a plugin to help find this kind of thing would help. Essentially what we want is TYPE which we're considering converting to an auto_TYPE. We want to see the declaration, possibly an allocation and an explicit release. All paths have to pass through an explicit release. The object must not escape. We can have a whitelist of routines where the object can be passed in as a parameter. Given that kind of infrastructure we ought to be able to look at a type and say, yes, this seems to make sense to turn into an auto and here's all the places that need twiddling for that change. That infrastructure could also do things like say "X escaped via call Y" or a path doesn't release which can further guide the process, or show leaks. diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c index 5dde66c..6e87a6f 100644 --- a/gcc/modulo-sched.c +++ b/gcc/modulo-sched.c So I got a bit lost tracking some of these through the call chains. I could try again, taking notes along the way to make sure I properly verified the children don't make a copy MUST_FOLLOW. But I'm going to assume you did this correctly. diff --git a/gcc/store-motion.c b/gcc/store-motion.c index 6d7d37f..1d504e7 100644 --- a/gcc/store-motion.c +++ b/gcc/store-motion.c Ick. Who wrote remove_unreachable_equiv_notes?!? OK, you don't need to answer that, I just needed to vent a little. diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index a5f3f71..6247a4c 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -2451,7 +2451,7 @@ compute_antic (void) has_abnormal_preds seems ripe for a similar change in this function. diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c index 5d09879..9088978 100644 --- a/gcc/var-tracking.c +++ b/gcc/var-tracking.c @@ -6996,7 +6
[PATCH] Replacing gcc's dependence on libiberty's fnmatch to gnulib's fnmatch
On top of the previously filed patch for importing gnulib (the link isn’t available on the archive yet, however this contains some of the information: http://gcc.1065356.n5.nabble.com/Importing-gnulib-into-the-gcc-tree-td1275807.html#a1279573) now I have replaced another function from libiberty with the corresponding version from gnulib. Even though in both OSX and GNU/Linux, fnmatch is provided by the GNU libc already, so the copy in libiberty is not used in your systems. However since the objective is to replace whatever functions can be leveraged by gnulib, these changes have been made. Tested and bootstrapped on x86_64(appledarwin,linux). Changelog: 2016-07-25 Ayush Goel *src/gcc/genattrtab.c: included *src/gcc/genautomata.c: included *src/gcc/Makefile.in: removed dependency from libiberty’s fnmatch *src/gnulib/update-gnulib.sh: Added fnmatch to list of modules imported from gnulib Wiki link for more details: https://gcc.gnu.org/wiki/replacelibibertywithgnulib -Ayush importgnulib_fnmatch.patch Description: Binary data
Re: [PATCH] Invalidate argument slots for const/pure calls
On 07/23/2016 12:32 PM, John David Anglin wrote: The attached patch fixes PR middle-end/71712. We need to invalidate the argument slots in const/pure calls as they are owned by the callee. The caller can't assume they are preserved across the call. Fix is similar to that for PR71532. Tested on hppa-unknown-linux-gnu. Okay for trunk? Dave -- John David Anglin dave.ang...@bell.net cselib.c.d.txt 2016-08-23 John David Anglin PR middle-end/71732 * cselib.c (cselib_process_insn): Invalidate argument slots for const/pure calls. OK. jeff
[PATCH] Replacing gcc's dependence on libiberty's md5 to gnulib's md5
On top of the previously filed patch for importing gnulib (the link isn’t available on the archive yet, however this contains some of the information: http://gcc.1065356.n5.nabble.com/Importing-gnulib-into-the-gcc-tree-td1275807.html#a1279573) now I have replaced another function from libiberty with the corresponding version from gnulib. Tested and bootstrapped on x86_64(appledarwin,linux). Changelog: 2016-07-25 Ayush Goel *src/gcc/Makefile.in: replaced header path for md5 from libiberty to the one in gnulib *src/gnulib/update-gnulib.sh: imported md5 from gnulib Wiki link for more details: https://gcc.gnu.org/wiki/replacelibibertywithgnulib -Ayush importgnulib_md5.patch Description: Binary data
Re: [C PATCH] Don't mark C99 inline functions as always_inline (PR c/71969)
On 07/23/2016 10:38 AM, Richard Biener wrote: On July 22, 2016 5:09:07 PM GMT+02:00, Jakub Jelinek wrote: Hi! As Richard reported on IRC, we are marking C99 inline (without extern) functions as effectively always_inline, even when the intent has been to do this only for the GNU extern inline functions (i.e. -fgnu89-inline extern inline or extern inline __attribute__((gnu_inline))). Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? The alias-11.c change is needed because the inliner decides not to inline new_cfi, and as it uses since -std=c11 default C99 inline semantics without extern, the body isn't emitted. LGTM, Just a note, Jakub is on PTO this week and next week -- he'll commit when he returns. Or if this is time critical, anyone with commit privileges can commit on Jakub's behalf. Jeff
Re: [PATCH] Replacing gcc's dependence on libiberty's fnmatch to gnulib's fnmatch
The link for that patch importing gnulib inside gcc’s tree: https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01302.html Apologies for the confusion. On 25 July 2016 at 10:48:20 PM, ayush goel (ayushgoel1...@gmail.com) wrote: > > On top of the previously filed patch for importing gnulib (the link isn’t > available on > the archive yet, however this contains some of the information: > http://gcc.1065356.n5.nabble.com/Importing-gnulib-into-the-gcc-tree-td1275807.html#a1279573) > now I have replaced another function from libiberty with the corresponding > version > from gnulib. > Even though in both OSX and GNU/Linux, fnmatch is provided by the GNU libc > already, so > the copy in libiberty is not used in your systems. However since the > objective is to replace > whatever functions can be leveraged by gnulib, these changes have been made. > > Tested and bootstrapped on x86_64(appledarwin,linux). > > Changelog: > 2016-07-25 Ayush Goel > > *src/gcc/genattrtab.c: included > *src/gcc/genautomata.c: included > *src/gcc/Makefile.in: removed dependency from libiberty’s fnmatch > *src/gnulib/update-gnulib.sh: Added fnmatch to list of modules imported from > gnulib > > Wiki link for more details: > https://gcc.gnu.org/wiki/replacelibibertywithgnulib > > -Ayush > > >
Re: [PATCH] Replacing gcc's dependence on libiberty's md5 to gnulib's md5
Link for the patch importing gnulib inside gcc’s tree: https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01302.html Apologies for the confusion -Ayush On 25 July 2016 at 10:52:00 PM, ayush goel (ayushgoel1...@gmail.com) wrote: > On top of the previously filed patch for importing gnulib (the link isn’t > available on > the archive yet, however this contains some of the information: > http://gcc.1065356.n5.nabble.com/Importing-gnulib-into-the-gcc-tree-td1275807.html#a1279573) > now I have replaced another function from libiberty with the corresponding > version > from gnulib. > > Tested and bootstrapped on x86_64(appledarwin,linux). > > Changelog: > 2016-07-25 Ayush Goel > > *src/gcc/Makefile.in: replaced header path for md5 from libiberty to the one > in gnulib > *src/gnulib/update-gnulib.sh: imported md5 from gnulib > > Wiki link for more details: > https://gcc.gnu.org/wiki/replacelibibertywithgnulib > > -Ayush > > >
Re: [PATCH] predict.c: merge multi-edges
On 07/22/2016 05:38 AM, Martin Liška wrote: On 07/22/2016 12:53 PM, Segher Boessenkool wrote: Hi Martin, On Fri, Jul 22, 2016 at 10:17:51AM +0200, Martin Liška wrote: /* We can not predict the probabilities of outgoing edges of bb. Set them - evenly and hope for the best. */ + evenly and hope for the best. If UNLIKELY_EDGES is not null, distribute + evel probability for all edges not mentioned in the set. These edges + are given PROB_VERY_UNLIKELY probability. */ Typo ("evel"); Hello Segher. The typo is fixed. + unsigned unlikely_count = unlikely_edges ? unlikely_edges->elements () : 0; + FOR_EACH_EDGE (e, ei, bb->succs) if (!(e->flags & (EDGE_EH | EDGE_FAKE))) nedges ++; + + unsigned c = nedges - unlikely_count; What prevents c from becoming 0? The sum of outgoing edge probabilities will be very small then (unless there are very many such noreturn edges, then the sum is too big, instead). You are right, I'm sending second version where I set even probabilities in case of all edges are PROB_VERY_UNLIKELY. However, I was unable to come up with a test-case for that: int main(int argc, char **argv) { switch (argc) { case 1: __builtin_unreachable(); case 4: __builtin_unreachable(); default: __builtin_unreachable(); } return 10; } No predictors are set as 'has_return_edges' == false in tree_bb_level_predictions. Can you turn this into a test as well? With that change this patch is OK for the trunk. jeff
Re: [PATCH] correct atomic_compare_exchange_n return type (c++/71675)
On 07/25/2016 05:03 AM, Renlin Li wrote: Hi Martin, I observed the following error: ERROR: gcc.dg/atomic/pr71675.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects : syntax error in target selector "target c11" for " dg-do 3 compile { target c11 } " It seems we don't have a c11 effective target check available in dejagnu target-supports.exp. ISTM like we should either add a c11 effective target check, or force the compiler into c11 mode via -std=c11. Jeff
Re: [PATCH, vec-tails 07/10] Support loop epilogue combining
On 07/22/2016 05:36 AM, Richard Biener wrote: The thing that needs work I think is re-running of if-conversion. I wonder if we could revamp if-conversion to work on a subset of the CFG? I can see that potentially being useful in other contexts. Would that work for you Richi? We've already got Bin doing that for DOM... Also I don't like at all that we have many variants of vectorizing but somehow the decision which one to choose is rather unclear. The way the epilogue vectorization code is hooked in is rather awkward and bound to be a maintainance burden (well, maybe a small one). I think it's going to be a small one. I suspect that we really need another architecture with masking capabilities to really be able to see how the costing models ought to work and bring sanity to that decision. And last, I double there is a case for a masked vectorized loop - I can bet that doing a non-masked vectorized loop plus a masked epilogue (with no iteration then!) will be always faster unless you hit the window of very few iterations (or optimizing for size - in which case vectorizing is questionable on its own and disabled IIRC). Ilya, does this case make a noticeable difference with the ICC implementation? Jeff
Re: [PATCH] correct atomic_compare_exchange_n return type (c++/71675)
On Mon, Jul 25, 2016 at 1:56 PM, Jeff Law wrote: > On 07/25/2016 05:03 AM, Renlin Li wrote: >> >> Hi Martin, >> >> I observed the following error: >> >> ERROR: gcc.dg/atomic/pr71675.c -O2 -flto -fuse-linker-plugin >> -fno-fat-lto-objects : syntax error in target selector "target c11" for >> " dg-do 3 compile { target c11 } " >> >> It seems we don't have a c11 effective target check available >> in dejagnu target-supports.exp. > > ISTM like we should either add a c11 effective target check, or force the > compiler into c11 mode via -std=c11. I think the C testsuite also doesn't run in multiple modes like C++, so adding -std=c11 is probably the better choice. Jason
Re: C++ PATCH for c++/71913 (copy elision choices)
On Mon, Jul 25, 2016 at 7:15 AM, Renlin Li wrote: > Hi Jason, > > On 22/07/16 04:01, Jason Merrill wrote: >> >> 71913 is a case where unsafe_copy_elision_p was being too >> conservative. We can allow copy elision in a new expression; the only >> way we could end up initializing a base subobject without knowing it >> would be through a placement new, in which case we would already be >> using the wrong (complete object) constructor, so copy elision doesn't >> make it any worse. >> > >> diff --git a/gcc/testsuite/g++.dg/init/elide5.C >> b/gcc/testsuite/g++.dg/init/elide5.C >> new file mode 100644 >> index 000..0a9978c >> --- /dev/null >> +++ b/gcc/testsuite/g++.dg/init/elide5.C >> @@ -0,0 +1,27 @@ >> +// PR c++/71913 >> +// { dg-do link { target c++11 } } >> + >> +void* operator new(unsigned long, void* p) { return p; } > > > g++.dg/init/elide5.C fails on target whose SIZE_TYPE is not "long unsigned > int". > > testsuite/g++.dg/init/elide5.C:4:42: error: 'operator new' takes type > 'size_t' ('unsigned int') as first parameter [-fpermissive] > > I have checked, for most 32 bit architectures or ABI, the SIZE_TYPE is > "unsigned int". arm is one of them. > > To make this test case portable, will __SIZE_TYPE__ be better in this case, > instead of "unsigned long" as first argument of new operator? Thanks, I'll fix that shortly. Jason
Re: GCC testsuite maintenance (was: [PATCH] Fix OpenACC vector_length parsing in fortran)
On Jul 25, 2016, at 9:37 AM, Joseph Myers wrote: > > On Fri, 15 Jul 2016, Thomas Schwinge wrote: > >>> No, we want to have as little churn as possible in existing tests, the >>> general policy is to add new tests (not just for OpenACC/OpenMP, but for >>> all functionality). >> >> Hmm, that's something I had not been aware of, and I can't find this >> covered in the documentation. So, you're basically saying that files in >> the testsuite are write-once, and should not be maintained aside from >> fixing errors, adjusting due to optimization changes, or due to changed >> diagnostics, and the like? (Of course, I do agree that we shouldn't > > Yes, that's my view. It makes it easier to distinguish regressions from > new tests that fail (on some platforms) if what a test tests doesn't > change unnecessarily. Right. Roughly the test suite is a monotonic, slow moving ever increasing mass. Generally we favor new files for new tests and little to no recoding or rearrangement of existing stuff. We do some limited forms for maintenance, for example, intptr_t to make a less portable test case, to be more portable, just to pick a random recent edit; but these tend to be very minor and very narrowly focused.
C++ PATCH for c++/71972 (ICE with constexpr array element self-modification)
Here, when we start to modify an array element, we add a constructor_elt for it to the CONSTRUCTOR. When calculating the value to store there we found this empty element and tried to use it, leading to an ICE because it was NULL. Tested x86_64-pc-linux-gnu, applying to trunk. commit b472cebfc4ce1bc44d2d328f5f693322e04c905b Author: Jason Merrill Date: Mon Jul 25 09:54:04 2016 -0400 PR c++/71972 - constexpr array self-modification * constexpr.c (cxx_eval_array_reference): Handle looking for the value of an element we're currently modifying. diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index f139260..47fb39b 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -2129,8 +2129,32 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree t, else found = (i < len); - if (!found) + if (found) { + tree r; + if (TREE_CODE (ary) == CONSTRUCTOR) + r = (*CONSTRUCTOR_ELTS (ary))[i].value; + else if (TREE_CODE (ary) == VECTOR_CST) + r = VECTOR_CST_ELT (ary, i); + else if (elem_nchars == 1) + r = build_int_cst (cv_unqualified (TREE_TYPE (TREE_TYPE (ary))), + TREE_STRING_POINTER (ary)[i]); + else + { + tree type = cv_unqualified (TREE_TYPE (TREE_TYPE (ary))); + r = native_interpret_expr (type, (const unsigned char *) +TREE_STRING_POINTER (ary) ++ i * elem_nchars, elem_nchars); + } + if (r) + /* Don't VERIFY_CONSTANT here. */ + return r; + + /* Otherwise the element doesn't have a value yet. */ +} + + /* Not found. */ + if (TREE_CODE (ary) == CONSTRUCTOR && CONSTRUCTOR_NO_IMPLICIT_ZERO (ary)) { @@ -2150,23 +2174,6 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree t, overflow_p); } - if (TREE_CODE (ary) == CONSTRUCTOR) -return (*CONSTRUCTOR_ELTS (ary))[i].value; - else if (TREE_CODE (ary) == VECTOR_CST) -return VECTOR_CST_ELT (ary, i); - else if (elem_nchars == 1) -return build_int_cst (cv_unqualified (TREE_TYPE (TREE_TYPE (ary))), - TREE_STRING_POINTER (ary)[i]); - else -{ - tree type = cv_unqualified (TREE_TYPE (TREE_TYPE (ary))); - return native_interpret_expr (type, (const unsigned char *) - TREE_STRING_POINTER (ary) - + i * elem_nchars, elem_nchars); -} - /* Don't VERIFY_CONSTANT here. */ -} - /* Subroutine of cxx_eval_constant_expression. Attempt to reduce a field access of a value of class type. */ diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-array5.C b/gcc/testsuite/g++.dg/cpp1y/constexpr-array5.C new file mode 100644 index 000..3abdd84 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-array5.C @@ -0,0 +1,13 @@ +// PR c++/71972 +// { dg-do compile { target c++14 } } + +typedef int size_t; +template struct S { + template constexpr S(const char (&)[M]) : data{} { +data[0] = data[0]; + } + char data[N]; +}; +int main() { + constexpr S<1> s1(""); +}
C++ PATCH for c++/65970 (constexpr infinite loop)
An infinite loop in a constexpr function led to a compiler hang. Fixed by putting an upper bound on loop iterations in constexpr evaluation. Tested x86_64-pc-linux-gnu, applying to trunk. commit 6c93c22a799e5ea5e1cdfe661476cf123ed1a4e8 Author: Jason Merrill Date: Mon Jul 25 10:31:10 2016 -0400 PR c++/65970 - constexpr infinite loop gcc/c-family/ * c.opt (fconstexpr-loop-limit): New. gcc/cp/ * constexpr.c (cxx_eval_loop_expr): Count iterations. * cp-gimplify.c (genericize_cp_loop): Use start_locus even for infinite loops. diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 8c70152..a5358ed 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1174,6 +1174,10 @@ fconstexpr-depth= C++ ObjC++ Joined RejectNegative UInteger Var(max_constexpr_depth) Init(512) -fconstexpr-depth= Specify maximum constexpr recursion depth. +fconstexpr-loop-limit= +C++ ObjC++ Joined RejectNegative UInteger Var(constexpr_loop_limit) Init(262144) +-fconstexpr-loop-limit=Specify maximum constexpr loop iteration count. + fdebug-cpp C ObjC C++ ObjC++ Emit debug annotations during preprocessing. diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 47fb39b..6bcb41a 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -3468,6 +3468,7 @@ cxx_eval_loop_expr (const constexpr_ctx *ctx, tree t, constexpr_ctx new_ctx = *ctx; tree body = TREE_OPERAND (t, 0); + int count = 0; do { hash_set save_exprs; @@ -3480,6 +3481,16 @@ cxx_eval_loop_expr (const constexpr_ctx *ctx, tree t, for (hash_set::iterator iter = save_exprs.begin(); iter != save_exprs.end(); ++iter) new_ctx.values->remove (*iter); + if (++count >= constexpr_loop_limit) + { + if (!ctx->quiet) + error_at (EXPR_LOC_OR_LOC (t, input_location), + "constexpr loop iteration count exceeds limit of %d " + "(use -fconstexpr-loop-limit= to increase the limit)", + constexpr_loop_limit); + *non_constant_p = true; + break; + } } while (!returns (jump_target) && !breaks (jump_target) && !*non_constant_p); diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c index 59953a6..d9f7cea 100644 --- a/gcc/cp/cp-gimplify.c +++ b/gcc/cp/cp-gimplify.c @@ -264,14 +264,7 @@ genericize_cp_loop (tree *stmt_p, location_t start_locus, tree cond, tree body, loop = stmt_list; } else -{ - location_t loc = start_locus; - if (!cond || integer_nonzerop (cond)) - loc = EXPR_LOCATION (expr_first (body)); - if (loc == UNKNOWN_LOCATION) - loc = start_locus; - loop = build1_loc (loc, LOOP_EXPR, void_type_node, stmt_list); -} +loop = build1_loc (start_locus, LOOP_EXPR, void_type_node, stmt_list); stmt_list = NULL; append_to_statement_list (loop, &stmt_list); diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 9e0f07e..79c842d 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -191,7 +191,8 @@ in the following sections. @xref{C++ Dialect Options,,Options Controlling C++ Dialect}. @gccoptlist{-fabi-version=@var{n} -fno-access-control @gol -fargs-in-order=@var{n} -fcheck-new @gol --fconstexpr-depth=@var{n} -ffriend-injection @gol +-fconstexpr-depth=@var{n} -fconstexpr-loop-limit=@var{n} @gol +-ffriend-injection @gol -fno-elide-constructors @gol -fno-enforce-eh-specs @gol -ffor-scope -fno-for-scope -fno-gnu-keywords @gol @@ -2265,6 +2266,12 @@ to @var{n}. A limit is needed to detect endless recursion during constant expression evaluation. The minimum specified by the standard is 512. +@item -fconstexpr-loop-limit=@var{n} +@opindex fconstexpr-loop-limit +Set the maximum number of iterations for a loop in C++14 constexpr functions +to @var{n}. A limit is needed to detect infinite loops during +constant expression evaluation. The default is 262144 (1<<18). + @item -fdeduce-init-list @opindex fdeduce-init-list Enable deduction of a template type parameter as diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-loop6.C b/gcc/testsuite/g++.dg/cpp1y/constexpr-loop6.C new file mode 100644 index 000..e49e531 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-loop6.C @@ -0,0 +1,11 @@ +// PR c++/65970 +// { dg-do compile { target c++14 } } +// { dg-options -fconstexpr-loop-limit=5 } + +constexpr int foo() { + while (true) // { dg-error "-fconstexpr-loop-limit" } +; + return 0; +} + +constexpr int i = foo(); // { dg-message "" }
Re: [PATCH, vec-tails 07/10] Support loop epilogue combining
On July 25, 2016 8:01:17 PM GMT+02:00, Jeff Law wrote: >On 07/22/2016 05:36 AM, Richard Biener wrote: >> The thing that needs work I think is re-running of if-conversion. >I wonder if we could revamp if-conversion to work on a subset of the >CFG? I can see that potentially being useful in other contexts. >Would >that work for you Richi? Well, you need to make it not need post-dominators or preserve them (or compute "post-dominators" on SESE regions). What doesn't work with the idea to clone the epilogue using __built-in_vectorized() For the if- vs. Not if-converted loop? Richard. >We've already got Bin doing that for DOM... > > >> Also I don't like at >> all that we have many variants of vectorizing but somehow the >decision which one >> to choose is rather unclear. The way the epilogue vectorization code >> is hooked in >> is rather awkward and bound to be a maintainance burden (well, maybe >a >> small one). >I think it's going to be a small one. I suspect that we really need >another architecture with masking capabilities to really be able to see > >how the costing models ought to work and bring sanity to that decision. > >> >> And last, I double there is a case for a masked vectorized loop - I >can bet that >> doing a non-masked vectorized loop plus a masked epilogue (with no >iteration >> then!) will be always faster unless you hit the window of very few >iterations >> (or optimizing for size - in which case vectorizing is questionable >on >> its own and >> disabled IIRC). >Ilya, does this case make a noticeable difference with the ICC >implementation? > >Jeff
Re: Init df for split pass since some target use REG_NOTE in split pattern
On July 25, 2016 5:56:29 PM GMT+02:00, Jeff Law wrote: >On 07/25/2016 12:36 AM, Kito Cheng wrote: >> Hi all: >> >> Some target(for example i386, sh and h8300) use find_regno_note in >> split pattern but df infrastructure seem not initialize at split >pass, >> so it may got wrong note since it's out-of-date. >> >> ChangeLog >> 2016-07-25 Kito Cheng >> >> * gcc/recog.c (split_all_insns): Initialize df for split >pass. >> (split_all_insns_noflow): Likewise. >> >Patch was not included/attached to your message. Please resend with >the >patch and hopefully a testcase which exhibits the problem. And maybe back ends shouldn't look at notes in their splitters? Richard. >jeff
C++ PATCH for c++/71833 (member template with two template parameter packs)
My patch for 54440 coerce_template_parameter_pack mysteriously assumed that the index of the first argument we want to pack into the trailing pack somehow matched the index of the parameter pack in the parameter list. I don't know what I was thinking. Tested x86_64-pc-linux-gnu, applying to trunk. commit f4823136e2e9d96fcf25dbb94fcf18c511533dc0 Author: Jason Merrill Date: Mon Jul 25 13:06:33 2016 -0400 PR c++/71833 - member template with two parameter packs * pt.c (coerce_template_parameter_pack): Fix logic for pack index. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index a61f1c8..7dd6b25 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -7403,11 +7403,12 @@ coerce_template_parameter_pack (tree parms, /* Convert the remaining arguments, which will be a part of the parameter pack "parm". */ + int first_pack_arg = arg_idx; for (; arg_idx < nargs; ++arg_idx) { tree arg = TREE_VEC_ELT (inner_args, arg_idx); tree actual_parm = TREE_VALUE (parm); - int pack_idx = arg_idx - parm_idx; + int pack_idx = arg_idx - first_pack_arg; if (packed_parms) { @@ -7436,12 +7437,12 @@ coerce_template_parameter_pack (tree parms, TREE_VEC_ELT (packed_args, pack_idx) = arg; } - if (arg_idx - parm_idx < TREE_VEC_LENGTH (packed_args) + if (arg_idx - first_pack_arg < TREE_VEC_LENGTH (packed_args) && TREE_VEC_LENGTH (packed_args) > 0) { if (complain & tf_error) error ("wrong number of template arguments (%d, should be %d)", - arg_idx - parm_idx, TREE_VEC_LENGTH (packed_args)); + arg_idx - first_pack_arg, TREE_VEC_LENGTH (packed_args)); return error_mark_node; } diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-nested1.C b/gcc/testsuite/g++.dg/cpp0x/variadic-nested1.C new file mode 100644 index 000..abfb49a --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic-nested1.C @@ -0,0 +1,9 @@ +// PR c++/71833 +// { dg-do compile { target c++11 } } + +template < typename ... Ts > struct A +{ + template < Ts ..., typename ... Us > struct B {}; +}; + +A <>::B < int > e;
C++ PATCH for c++/71837 (pack expansion in init-capture)
This useless testcase uses a pack expansion as the parenthesized initializer for an init-capture, which is only valid when the parameter pack has a single argument. This patch avoids an ICE by leaving the TREE_LIST around the pack expansion and dealing with it at substitution time. Tested x86_64-pc-linux-gnu, applying to trunk. commit 6929f98b018d844d310ec65d07f5f8ce73f98292 Author: Jason Merrill Date: Mon Jul 25 12:13:27 2016 -0400 PR c++/71837 - pack expansion in init-capture * lambda.c (add_capture): Leave a pack expansion in a TREE_LIST. (build_lambda_object): Call build_x_compound_expr_from_list. * pt.c (tsubst) [DECLTYPE_TYPE]: Likewise. diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c index 4d6d80fe..9bf8a56 100644 --- a/gcc/cp/lambda.c +++ b/gcc/cp/lambda.c @@ -79,6 +79,10 @@ build_lambda_object (tree lambda_expr) goto out; } + if (TREE_CODE (val) == TREE_LIST) + val = build_x_compound_expr_from_list (val, ELK_INIT, + tf_warning_or_error); + if (DECL_P (val)) mark_used (val); @@ -449,7 +453,9 @@ add_capture (tree lambda, tree id, tree orig_init, bool by_reference_p, variadic = true; } - if (TREE_CODE (initializer) == TREE_LIST) + if (TREE_CODE (initializer) == TREE_LIST + /* A pack expansion might end up with multiple elements. */ + && !PACK_EXPANSION_P (TREE_VALUE (initializer))) initializer = build_x_compound_expr_from_list (initializer, ELK_INIT, tf_warning_or_error); type = TREE_TYPE (initializer); diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index a61f1c8..30c74ae 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -13619,6 +13619,18 @@ tsubst (tree t, tree args, tsubst_flags_t complain, tree in_decl) /*function_p*/false, /*integral_constant_expression*/false); + if (DECLTYPE_FOR_INIT_CAPTURE (t)) + { + if (type == NULL_TREE) + { + if (complain & tf_error) + error ("empty initializer in lambda init-capture"); + type = error_mark_node; + } + else if (TREE_CODE (type) == TREE_LIST) + type = build_x_compound_expr_from_list (type, ELK_INIT, complain); + } + --cp_unevaluated_operand; --c_inhibit_evaluation_warnings; diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-init15.C b/gcc/testsuite/g++.dg/cpp1y/lambda-init15.C new file mode 100644 index 000..8b3e520 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/lambda-init15.C @@ -0,0 +1,13 @@ +// PR c++/71837 +// { dg-do compile { target c++14 } } + +template < typename ... Ts > void f (Ts ... args) +{ + [ts (args ...)] { return ts; } (); +} + +int main () +{ + f (0); + return 0; +} diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-init15a.C b/gcc/testsuite/g++.dg/cpp1y/lambda-init15a.C new file mode 100644 index 000..166d650 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/lambda-init15a.C @@ -0,0 +1,14 @@ +// PR c++/71837 +// { dg-do compile { target c++14 } } + +template < typename ... Ts > void f (Ts ... args) +{ + [ts (args ...)] { return ts; } (); // { dg-error "" } +} + +int main () +{ + f ();// { dg-message "required" } + f (1,2); // { dg-message "required" } + return 0; +}
[committed] Fix selftest::temp_source_file ctor
Temporary source files created by the selftests weren't generated correctly if they contained a "%" character (seen whilst writing selftests for the locations-within-string-literals patch I'm working on). Root cause was a buggy fprintf in the selftest::temp_source_file ctor. Fixed thusly. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu; committed to trunk (as r238732), under the "obvious" rule. gcc/ChangeLog: * input.c (selftest::temp_source_file::temp_source_file): Fix missing "%s" in fprintf. --- gcc/input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/input.c b/gcc/input.c index a916597..47845d00 100644 --- a/gcc/input.c +++ b/gcc/input.c @@ -1175,7 +1175,7 @@ temp_source_file::temp_source_file (const location &loc, const char *suffix, if (!out) ::selftest::fail_formatted (loc, "unable to open tempfile: %s", m_filename); - fprintf (out, content); + fprintf (out, "%s", content); fclose (out); } -- 1.8.5.3
Re: Init df for split pass since some target use REG_NOTE in split pattern
On 07/25/2016 12:35 PM, Richard Biener wrote: On July 25, 2016 5:56:29 PM GMT+02:00, Jeff Law wrote: On 07/25/2016 12:36 AM, Kito Cheng wrote: Hi all: Some target(for example i386, sh and h8300) use find_regno_note in split pattern but df infrastructure seem not initialize at split pass, so it may got wrong note since it's out-of-date. ChangeLog 2016-07-25 Kito Cheng * gcc/recog.c (split_all_insns): Initialize df for split pass. (split_all_insns_noflow): Likewise. Patch was not included/attached to your message. Please resend with the patch and hopefully a testcase which exhibits the problem. And maybe back ends shouldn't look at notes in their splitters? It's certainly not ideal.But I'm not sure what alternatives we have and it's bee supported for a very long time. For example on the H8, there is no shift-by-a-variable-amount. Instead you have to emit it as a loop. However, we don't expose the loop until late in the RTL pipeline (via a splitter). When we split a variable shift into a loop, one of the things we want to know is whether or not the register holding the shift count dies or not. If it does not die, we'll just use it rather than copying it elsewhere. Similarly we can generate a more efficient split for something like (ior (ashift (X) (const_int)) for certain integers when we know the input dies. Or for some (add (mult)) expressions. Jeff
[PATCH] Remove dead call to gimple_bb in slsr_process_phi
Hi, As reported on the gcc mailing list, slsr_process_phi contains a dead call to gimple_bb. I looked into why this wasn't noticed before, and concluded that we don't actually need the call. To reach this point, the phi argument must not correspond to a strength-reduction candidate in the table, and must not be a function argument. Since even simple copies and casts create entries in the candidate table, the definition of the phi arg has a non-trivial RHS that is not of a form useful to strength reduction, and thus there is no reason to proceed; the phi is itself therefore not a strength reduction candidate. The existing logic leaves arg_bb with its initial NULL value, which correctly terminates processing for this phi. Thus I am just changing the logic to make this explicit, and we don't need the call to gimple_bb at all. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? (The error is harmless, so I see no reason for a backport.) Thanks! Bill 2016-07-25 Bill Schmidt * gimple-ssa-strength-reduction.c (slsr_process_phi): Remove dead and unnecessary call to gimple_bb. Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 238719) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -785,14 +785,10 @@ slsr_process_phi (gphi *phi, bool speed) savings += stmt_cost (arg_stmt, speed); } } - else + else if (SSA_NAME_IS_DEFAULT_DEF (arg)) { derived_base_name = arg; - - if (SSA_NAME_IS_DEFAULT_DEF (arg)) - arg_bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)); - else - gimple_bb (SSA_NAME_DEF_STMT (arg)); + arg_bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)); } if (!arg_bb || arg_bb->loop_father != cand_loop)
[PATCH, rs6000] Correct effective target in gcc.target/powerpc/pr63354.c
Hi, The subject test case uses the -mprofile-kernel option, which is not permitted in 32-bit mode. It currently restricts the effective target to powerpc64, which is not sufficient when using -m32 on 64-bit hardware. This patch changes the effective target restriction to lp64 to correct this. Tested on powerpc64-unknown-linux-gnu, verified. Ok for trunk? Thanks, Bill 2016-07-25 Bill Schmidt * gcc.target/powerpc/pr63354.c: Require lp64 since -mprofile-kernel is not legal with -m32. Index: gcc/testsuite/gcc.target/powerpc/pr63354.c === --- gcc/testsuite/gcc.target/powerpc/pr63354.c (revision 236120) +++ gcc/testsuite/gcc.target/powerpc/pr63354.c (working copy) @@ -3,7 +3,7 @@ /* { dg-do compile { target { powerpc*-*-linux* } } } */ /* { dg-options "-O2 -pg -mprofile-kernel" } */ -/* { dg-require-effective-target powerpc64 } */ +/* { dg-require-effective-target lp64 } */ /* { dg-final { scan-assembler-not "mtlr" } } */ int foo(void)
Re: Gimple loop splitting v2
On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz wrote: > Hi, > > On Tue, 1 Dec 2015, Jeff Law wrote: > >> > So, okay for trunk? >> -ENOPATCH > > Sigh :) > Here it is. I found one problem with it. Take: void f(int *a, int M, int *b) { for(int i = 0; i <= M; i++) { if (i < M) a[i] = i; } } CUT --- There are two issues with the code as below. The outer most loop's aux is still set which causes the vectorizer not to vector the loop. The other issue is I need to run pass_scev_cprop after pass_loop_split to get the induction variable usage after the loop gone so the vectorizer will work. Something like (note this is copy and paste from a terminal): diff --git a/gcc/passes.def b/gcc/passes.def index c327900..e8d6ea6 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_dce); NEXT_PASS (pass_tree_unswitch); - NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_loop_split); + NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_record_bounds); NEXT_PASS (pass_loop_distribution); NEXT_PASS (pass_copy_prop); diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c index 5411530..e72ef19 100644 --- a/gcc/tree-ssa-loop-split.c +++ b/gcc/tree-ssa-loop-split.c @@ -592,7 +592,11 @@ tree_ssa_split_loops (void) gcc_assert (scev_initialized_p ()); FOR_EACH_LOOP (loop, 0) -loop->aux = NULL; +{ + loop->aux = NULL; + if (loop_outer (loop)) + loop_outer (loop)->aux = NULL; +} /* Go through all loops starting from innermost. */ FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) @@ -631,7 +635,11 @@ tree_ssa_split_loops (void) } FOR_EACH_LOOP (loop, 0) -loop->aux = NULL; +{ + loop->aux = NULL; + if (loop_outer (loop)) + loop_outer (loop)->aux = NULL; +} if (changed) return TODO_cleanup_cfg; - CUT - Thanks, Andrew > > > Ciao, > Michael. > * common.opt (-fsplit-loops): New flag. > * passes.def (pass_loop_split): Add. > * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3. > (enable_fdo_optimizations): Add loop splitting. > * timevar.def (TV_LOOP_SPLIT): Add. > * tree-pass.h (make_pass_loop_split): Declare. > * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare. > * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h, > * tree-ssa-loop-split.c: New file. > * Makefile.in (OBJS): Add tree-ssa-loop-split.o. > * doc/invoke.texi (fsplit-loops): Document. > * doc/passes.texi (Loop optimization): Add paragraph about loop > splitting. > > testsuite/ > * gcc.dg/loop-split.c: New test. > > Index: common.opt > === > --- common.opt (revision 231115) > +++ common.opt (working copy) > @@ -2453,6 +2457,10 @@ funswitch-loops > Common Report Var(flag_unswitch_loops) Optimization > Perform loop unswitching. > > +fsplit-loops > +Common Report Var(flag_split_loops) Optimization > +Perform loop splitting. > + > funwind-tables > Common Report Var(flag_unwind_tables) Optimization > Just generate unwind tables for exception handling. > Index: passes.def > === > --- passes.def (revision 231115) > +++ passes.def (working copy) > @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3. > NEXT_PASS (pass_dce); > NEXT_PASS (pass_tree_unswitch); > NEXT_PASS (pass_scev_cprop); > + NEXT_PASS (pass_loop_split); > NEXT_PASS (pass_record_bounds); > NEXT_PASS (pass_loop_distribution); > NEXT_PASS (pass_copy_prop); > Index: opts.c > === > --- opts.c (revision 231115) > +++ opts.c (working copy) > @@ -532,6 +532,7 @@ static const struct default_options defa > regardless of them being declared inline. */ > { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 }, > { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, > 1 }, > +{ OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 }, > { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 }, > { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 }, > { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 }, > @@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt > opts->x_flag_ipa_cp_alignment = value; >if (!opts_set->x_flag_predictive_commoning) > opts->x_flag_predictive_commoning = value; > + if (!opts_set->x_flag_split_loops) > +opts->x_flag_split_loops = value; >if (!opts_set->x_flag_unswitch_loops) > opts->x_flag_unswitch_loops = value; >if (!opts_set->x_flag_gcse_after_reload) > Index: timevar.def > =
Re: [PATCH, vec-tails 07/10] Support loop epilogue combining
On 07/25/2016 12:32 PM, Richard Biener wrote: On July 25, 2016 8:01:17 PM GMT+02:00, Jeff Law wrote: On 07/22/2016 05:36 AM, Richard Biener wrote: The thing that needs work I think is re-running of if-conversion. I wonder if we could revamp if-conversion to work on a subset of the CFG? I can see that potentially being useful in other contexts. Would that work for you Richi? Well, you need to make it not need post-dominators or preserve them (or compute "post-dominators" on SESE regions). Oh, but it'd be so nice to have DOMs and/or PDOMs on regions. But that's probably out of scope for gcc-7. What doesn't work with the idea to clone the epilogue using __built-in_vectorized() For the if- vs. Not if-converted loop? I must be missing something. I don't see how builtin_vectorized_function helps, but maybe I've got the wrong built-in or don't understand what you're suggesting. It sounds like this is the biggest impediment to moving forward. So let's reset and make sure we're all on the same page here. Ilya, what's the fundamental reason why we need to run if-conversion again?Yes, I know you want to if-convert the epilogue, but why? What are the consequences of not doing if-conversion on the epilogue? Presumably we miss a vectorization opportunity on the tail. But that may be a reasonable limitation to allow the existing work to move forward while you go back and revamp things a little. Jeff