Continuing from https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00228.html
It took longer than expected, but I've finally rebased this on top of the new clone_function_name* API and included the requested optimizations. There are a few remaining spots where we could probably apply similar optimizations: - gcc/multiple_target.c:create_target_clone - gcc/multiple_target.c:create_dispatcher_calls - gcc/omp-expand.c:grid_expand_target_grid_body But I've yet to figure out how these work in detail and with the new API these shouldn't block the main change from being merged. I've also included a small change to rs6000 which I'm pretty sure is safe, but I have no way of testing. I'm not sure what's the consensus on what's more appropriate, but I thought that it would be a good idea to keep these changes as separate patches since only the first one really changes behaviour, while the rest are optimizations. It's conceivable that someone who is backporting this to an older release might want to just backport the core bits of the change. I can re-submit it as one patch if that's more appropriate. Everything in these patches was bootstrapped and regression tested on x86_64. Ok for trunk? - Michael
From 40cb5c888522d69bc42791f0c884dcb9e29eff37 Mon Sep 17 00:00:00 2001 From: Michael Ploujnikov <michael.ploujni...@oracle.com> Date: Thu, 1 Nov 2018 12:57:30 -0400 Subject: [PATCH 1/4] Make function assembly more independent. This is achieved by having clone_function_name assign unique clone numbers for each function independently. gcc: 2018-11-28 Michael Ploujnikov <michael.ploujni...@oracle.com> * cgraphclones.c: Replaced clone_fn_id_num with clone_fn_ids; hash map. (clone_function_name_numbered): Use clone_fn_ids. gcc/testsuite: 2018-11-28 Michael Ploujnikov <michael.ploujni...@oracle.com> * gcc.dg/independent-cloneids-1.c: New test. --- gcc/cgraphclones.c | 10 ++++- gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++ 2 files changed, 46 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c diff --git gcc/cgraphclones.c gcc/cgraphclones.c index e17959c0ca..fdd84b60d3 100644 --- gcc/cgraphclones.c +++ gcc/cgraphclones.c @@ -513,7 +513,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, return new_node; } -static GTY(()) unsigned int clone_fn_id_num; +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids; /* Return a new assembler name for a clone of decl named NAME. Apart from the string SUFFIX, the new name will end with a unique (for @@ -525,7 +525,13 @@ static GTY(()) unsigned int clone_fn_id_num; tree clone_function_name_numbered (const char *name, const char *suffix) { - return clone_function_name (name, suffix, clone_fn_id_num++); + /* Initialize the function->counter mapping the first time it's + needed. */ + if (!clone_fn_ids) + clone_fn_ids = hash_map<const char *, unsigned int>::create_ggc (64); + unsigned int &suffix_counter = clone_fn_ids->get_or_insert ( + IDENTIFIER_POINTER (get_identifier (name))); + return clone_function_name (name, suffix, suffix_counter++); } /* Return a new assembler name for a clone of DECL. Apart from string diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c new file mode 100644 index 0000000000..3203895492 --- /dev/null +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone" } */ + +extern int printf (const char *, ...); + +static int __attribute__ ((noinline)) +foo (int arg) +{ + return 7 * arg; +} + +static int __attribute__ ((noinline)) +bar (int arg) +{ + return arg * arg; +} + +int +baz (int arg) +{ + printf("%d\n", bar (3)); + printf("%d\n", bar (4)); + printf("%d\n", foo (5)); + printf("%d\n", foo (6)); + /* adding or removing the following call should not affect foo + function's clone numbering */ + printf("%d\n", bar (7)); + return foo (8); +} + +/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]0:} 1 } } */ +/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]1:} 1 } } */ +/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]2:} 1 } } */ +/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]0:} 1 } } */ +/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]1:} 1 } } */ +/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]2:} 1 } } */ +/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]3:} } } */ +/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]4:} } } */ -- 2.19.1
From 80b07ddf059415c896cecb9862517899db3993e9 Mon Sep 17 00:00:00 2001 From: Michael Ploujnikov <michael.ploujni...@oracle.com> Date: Mon, 17 Sep 2018 16:02:27 -0400 Subject: [PATCH 2/4] Minimize clone counter memory usage in create_virtual_clone. Based on Martin Jambour's suggestion: https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00111.html gcc: 2018-11-28 Michael Ploujnikov <michael.ploujni...@oracle.com> * cgraph.h (cgraph_node::create_virtual_clone): Add a new argument: num_suffix. * cgraphclones.c (cgraph_node::create_virtual_clone): Pass num_suffix to clone_function_name. * ipa-cp.c (create_specialized_node): Keep track of clone counters in clone_num_suffixes hash map. (ipcp_driver): Free the counter hash map. * ipa-hsa.c (process_hsa_functions): Keep track of clone counters in hsa_clone_numbers hash map. --- gcc/cgraph.h | 8 +++++--- gcc/cgraphclones.c | 16 +++++++++++----- gcc/ipa-cp.c | 10 +++++++++- gcc/ipa-hsa.c | 9 +++++++-- 4 files changed, 32 insertions(+), 11 deletions(-) diff --git gcc/cgraph.h gcc/cgraph.h index c13d79850f..4c0fb4da05 100644 --- gcc/cgraph.h +++ gcc/cgraph.h @@ -968,11 +968,13 @@ public: cgraph_node *new_inlined_to, bitmap args_to_skip, const char *suffix = NULL); - /* Create callgraph node clone with new declaration. The actual body will - be copied later at compilation stage. */ + /* Create callgraph node clone with new declaration. The actual body will be + copied later at compilation stage. The name of the new clone will be + constructed from the name of the original node, SUFFIX and NUM_SUFFIX. */ cgraph_node *create_virtual_clone (vec<cgraph_edge *> redirect_callers, vec<ipa_replace_map *, va_gc> *tree_map, - bitmap args_to_skip, const char * suffix); + bitmap args_to_skip, const char * suffix, + unsigned num_suffix); /* cgraph node being removed from symbol table; see if its entry can be replaced by other inline clone. */ diff --git gcc/cgraphclones.c gcc/cgraphclones.c index fdd84b60d3..51bdd7680e 100644 --- gcc/cgraphclones.c +++ gcc/cgraphclones.c @@ -595,8 +595,9 @@ clone_function_name (tree decl, const char *suffix) } -/* Create callgraph node clone with new declaration. The actual body will - be copied later at compilation stage. +/* Create callgraph node clone with new declaration. The actual body will be + copied later at compilation stage. The name of the new clone will be + constructed from the name of the original node, SUFFIX and NUM_SUFFIX. TODO: after merging in ipa-sra use function call notes instead of args_to_skip bitmap interface. @@ -604,7 +605,8 @@ clone_function_name (tree decl, const char *suffix) cgraph_node * cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers, vec<ipa_replace_map *, va_gc> *tree_map, - bitmap args_to_skip, const char * suffix) + bitmap args_to_skip, const char * suffix, + unsigned num_suffix) { tree old_decl = decl; cgraph_node *new_node = NULL; @@ -639,8 +641,12 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers, strcpy (name + len + 1, suffix); name[len] = '.'; DECL_NAME (new_decl) = get_identifier (name); - SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name_numbered (old_decl, - suffix)); + SET_DECL_ASSEMBLER_NAME (new_decl, + clone_function_name ( + IDENTIFIER_POINTER ( + DECL_ASSEMBLER_NAME (old_decl)), + suffix, + num_suffix)); SET_DECL_RTL (new_decl, NULL); new_node = create_clone (new_decl, count, false, diff --git gcc/ipa-cp.c gcc/ipa-cp.c index 4471bae11c..e0cd1bc45b 100644 --- gcc/ipa-cp.c +++ gcc/ipa-cp.c @@ -376,6 +376,9 @@ static profile_count max_count; static long overall_size, max_new_size; +/* Node to unique clone suffix number map. */ +static hash_map<cgraph_node *, unsigned> *clone_num_suffixes; + /* Return the param lattices structure corresponding to the Ith formal parameter of the function described by INFO. */ static inline struct ipcp_param_lattices * @@ -3828,8 +3831,11 @@ create_specialized_node (struct cgraph_node *node, } } + unsigned &suffix_counter = clone_num_suffixes->get_or_insert (node); new_node = node->create_virtual_clone (callers, replace_trees, - args_to_skip, "constprop"); + args_to_skip, "constprop", + suffix_counter); + suffix_counter++; bool have_self_recursive_calls = !self_recursive_calls.is_empty (); for (unsigned j = 0; j < self_recursive_calls.length (); j++) @@ -5043,6 +5049,7 @@ ipcp_driver (void) ipa_check_create_node_params (); ipa_check_create_edge_args (); + clone_num_suffixes = new hash_map<cgraph_node *, unsigned>; if (dump_file) { @@ -5064,6 +5071,7 @@ ipcp_driver (void) ipcp_store_vr_results (); /* Free all IPCP structures. */ + delete clone_num_suffixes; free_toporder_info (&topo); delete edge_clone_summaries; edge_clone_summaries = NULL; diff --git gcc/ipa-hsa.c gcc/ipa-hsa.c index 7c6ceaab8f..b5dc3dfef2 100644 --- gcc/ipa-hsa.c +++ gcc/ipa-hsa.c @@ -74,6 +74,7 @@ process_hsa_functions (void) if (hsa_summaries == NULL) hsa_summaries = new hsa_summary_t (symtab); + hash_map<cgraph_node *, unsigned> hsa_clone_numbers; FOR_EACH_DEFINED_FUNCTION (node) { hsa_function_summary *s = hsa_summaries->get (node); @@ -86,9 +87,11 @@ process_hsa_functions (void) { if (!check_warn_node_versionable (node)) continue; + unsigned &clone_number = hsa_clone_numbers.get_or_insert (node); cgraph_node *clone = node->create_virtual_clone (vec <cgraph_edge *> (), - NULL, NULL, "hsa"); + NULL, NULL, "hsa", clone_number); + clone_number++; TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl); clone->externally_visible = node->externally_visible; @@ -107,9 +110,11 @@ process_hsa_functions (void) { if (!check_warn_node_versionable (node)) continue; + unsigned &clone_number = hsa_clone_numbers.get_or_insert (node); cgraph_node *clone = node->create_virtual_clone (vec <cgraph_edge *> (), - NULL, NULL, "hsa"); + NULL, NULL, "hsa", clone_number); + clone_number++; TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl); clone->externally_visible = node->externally_visible; -- 2.19.1
From 5bb67981114c08fbaf78e5cc0f3f8abaa0aee60b Mon Sep 17 00:00:00 2001 From: Michael Ploujnikov <michael.ploujni...@oracle.com> Date: Mon, 26 Nov 2018 14:22:39 -0500 Subject: [PATCH 3/4] Minimize clone counter memory usage in LTO. gcc/lto: 2018-11-28 Michael Ploujnikov <michael.ploujni...@oracle.com> * lto-partition.c (privatize_symbol_name_1): Keep track of non-unique symbol counters in the lto_clone_numbers hash map. (lto_promote_cross_file_statics): Allocate and free the lto_clone_numbers hash map. (lto_promote_statics_nonwpa): Free the lto_clone_numbers hash map. --- gcc/lto/lto-partition.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c index 24e7c23859..3afdcd6f5d 100644 --- gcc/lto/lto-partition.c +++ gcc/lto/lto-partition.c @@ -951,6 +951,9 @@ validize_symbol_for_target (symtab_node *node) } } +/* Maps symbol names to unique lto clone counters. */ +static hash_map<const char *, unsigned> *lto_clone_numbers; + /* Helper for privatize_symbol_name. Mangle NODE symbol name represented by DECL. */ @@ -962,10 +965,13 @@ privatize_symbol_name_1 (symtab_node *node, tree decl) if (must_not_rename (node, name)) return false; + unsigned &clone_number = lto_clone_numbers->get_or_insert ( + IDENTIFIER_POINTER (DECL_NAME (decl))); name = maybe_rewrite_identifier (name); symtab->change_decl_assembler_name (decl, - clone_function_name_numbered ( - name, "lto_priv")); + clone_function_name ( + name, "lto_priv", clone_number)); + clone_number++; if (node->lto_file_data) lto_record_renamed_decl (node->lto_file_data, name, @@ -1157,6 +1163,8 @@ lto_promote_cross_file_statics (void) part->encoder = compute_ltrans_boundary (part->encoder); } + lto_clone_numbers = new hash_map<const char *, unsigned>; + /* Look at boundaries and promote symbols as needed. */ for (i = 0; i < n_sets; i++) { @@ -1187,6 +1195,7 @@ lto_promote_cross_file_statics (void) promote_symbol (node); } } + delete lto_clone_numbers; } /* Rename statics in the whole unit in the case that @@ -1196,9 +1205,12 @@ void lto_promote_statics_nonwpa (void) { symtab_node *node; + + lto_clone_numbers = new hash_map<const char *, unsigned>; FOR_EACH_SYMBOL (node) { rename_statics (NULL, node); validize_symbol_for_target (node); } + delete lto_clone_numbers; } -- 2.19.1
From a822759e422f0cea11640ec3d6d6fba94bcd4ee9 Mon Sep 17 00:00:00 2001 From: Michael Ploujnikov <michael.ploujni...@oracle.com> Date: Wed, 28 Nov 2018 09:24:30 -0500 Subject: [PATCH 4/4] There can be at most one .resolver clone per function gcc: 2018-11-28 Michael Ploujnikov <michael.ploujni...@oracle.com> * config/rs6000/rs6000.c (make_resolver_func): Generate resolver symbol with clone_function_name instead of clone_function_name_numbered. --- gcc/config/rs6000/rs6000.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git gcc/config/rs6000/rs6000.c gcc/config/rs6000/rs6000.c index 4f113cb025..a9d038829b 100644 --- gcc/config/rs6000/rs6000.c +++ gcc/config/rs6000/rs6000.c @@ -36997,7 +36997,7 @@ make_resolver_func (const tree default_decl, { /* Make the resolver function static. The resolver function returns void *. */ - tree decl_name = clone_function_name_numbered (default_decl, "resolver"); + tree decl_name = clone_function_name (default_decl, "resolver"); const char *resolver_name = IDENTIFIER_POINTER (decl_name); tree type = build_function_type_list (ptr_type_node, NULL_TREE); tree decl = build_fn_decl (resolver_name, type); -- 2.19.1
signature.asc
Description: OpenPGP digital signature