On 6/18/19 3:45 AM, Xiong Hu Luo wrote: Hello.
Thank you for the interest in the area. > This patch aims to fix PR69678 caused by PGO indirect call profiling bugs. > Currently the default instrument function can only find the indirect function > that called more than 50% with an incorrect count number returned. Can you please explain what you mean by 'an incorrect count number returned'? > This patch > leverages the "--param indir-call-topn-profile=1" and enables multiple > indirect Note that I've remove indir-call-topn-profile last week, the patch will not apply on current trunk. However, I can help you how to adapt single-value counters to support tracking of multiple values. > targets profiling and use in LTO-WPA and LTO-LTRANS stage, as a result, > function > specialization, profiling, partial devirtualization, inlining and cloning > could > be done successfully based on it. This decision is definitely big question for Honza? > Performance can get improved 3x (1.7 sec -> 0.4 sec) on simple tests. > Details are: > 1. When do PGO with indir-call-topn-profile, the gcda data format is not > supported in ipa-profile pass, If you take a look at gcc/ipa-profile.c:195 you can see how the probability is propagated to IPA passes. Why is that not sufficient? Martin > so add variables to pass the information > through passes, and postpone gimple_ic to ipa-profile like default as inline > pass will decide whether it is benefit to transform indirect call. > 2. Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for > profile full support in ipa passes and cgraph_edge functions. > 3. Fix various hidden speculative call ICEs exposed after enabling this > feature when running SPEC2017. > 4. Add 1 in module testcase and 2 cross module testcases. > 5. TODOs: > 5.1. Some reference info will be dropped from WPA to LTRANS, so > reference check will be difficult in LTRANS, need replace the strstr > with reference compare. > 5.2. Some duplicate code need be removed as top1 and topn share same > logic. > Actually top1 related logic could be eliminated totally as topn includes > it. > 5.3. Split patch maybe needed as too big but not sure how many would be > reasonable. > 6. Performance result for ppc64le: > 6.1. Representative test: indir-call-prof-topn.c runtime improved from > 1.7s to 0.4s. > 6.2. SPEC2017 peakrate: > 523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r > (+13.33%); > 525.x264_r (-5.29%). > No big changes of other benchmarks. > Option: -Ofast -mcpu=power8 > PASS1_OPTIMIZE: -fprofile-generate --param indir-call-topn-profile=1 > -flto > PASS2_OPTIMIZE: -fprofile-use --param indir-call-topn-profile=1 -flto > -fprofile-correction > 6.3. No performance change on PHP benchmark. > 7. Bootstrap and regression test passed on Power8-LE. > > gcc/ChangeLog > > 2019-06-17 Xiong Hu Luo <luo...@linux.ibm.com> > > PR ipa/69678 > * cgraph.c (cgraph_node::get_create): Copy profile_id. > (cgraph_edge::speculative_call_info): Find real > reference for indirect targets. > (cgraph_edge::resolve_speculation): Add speculative code process > for indirect targets. > (cgraph_edge::redirect_call_stmt_to_callee): Likewise. > (cgraph_node::verify_node): Likewise. > * cgraph.h (common_target_ids): New variable. > (common_target_probabilities): Likewise. > (num_of_ics): Likewise. > * cgraphclones.c (cgraph_node::create_clone): Copy profile_id. > * ipa-inline.c (inline_small_functions): Add iterator update. > * ipa-profile.c (ipa_profile_generate_summary): Add indirect > multiple targets logic. > (ipa_profile): Likewise. > * ipa-utils.c (ipa_merge_profiles): Clone speculative src's > referrings to dst. > * ipa.c (process_references): Fix typo. > * lto-cgraph.c (lto_output_edge): Add indirect multiple targets > logic. > (input_edge): Likewise. > * predict.c (dump_prediction): Revome edges count assert to be > precise. > * tree-profile.c (gimple_gen_ic_profiler): Use the new variable > __gcov_indirect_call.counters and __gcov_indirect_call.callee. > (gimple_gen_ic_func_profiler): Likewise. > (pass_ipa_tree_profile::gate): Fix comment typos. > * tree-inline.c (copy_bb): Duplicate all the speculative edges > if indirect call contains multiple speculative targets. > * value-prof.c (check_counter): Proportion the counter for > multiple targets. > (ic_transform_topn): New function. > (gimple_ic_transform): Handle topn case, fix comment typos. > > gcc/testsuite/ChangeLog > > 2019-06-17 Xiong Hu Luo <luo...@linux.ibm.com> > > PR ipa/69678 > * gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase. > * gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase. > * gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase. > * gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase. > --- > gcc/cgraph.c | 38 +++- > gcc/cgraph.h | 9 +- > gcc/cgraphclones.c | 1 + > gcc/ipa-inline.c | 3 + > gcc/ipa-profile.c | 185 +++++++++++++++++- > gcc/ipa-utils.c | 5 + > gcc/ipa.c | 2 +- > gcc/lto-cgraph.c | 38 ++++ > gcc/predict.c | 1 - > .../tree-prof/crossmodule-indir-call-topn-1.c | 35 ++++ > .../crossmodule-indir-call-topn-1a.c | 22 +++ > .../tree-prof/crossmodule-indir-call-topn-2.c | 42 ++++ > .../gcc.dg/tree-prof/indir-call-prof-topn.c | 38 ++++ > gcc/tree-inline.c | 97 +++++---- > gcc/tree-profile.c | 12 +- > gcc/value-prof.c | 146 +++++++++++++- > 16 files changed, 606 insertions(+), 68 deletions(-) > create mode 100644 > gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c > create mode 100644 > gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c > create mode 100644 > gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c > create mode 100644 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c > > diff --git a/gcc/cgraph.c b/gcc/cgraph.c > index de82316d4b1..0d373a67d1b 100644 > --- a/gcc/cgraph.c > +++ b/gcc/cgraph.c > @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl) > fprintf (dump_file, "Introduced new external node " > "(%s) and turned into root of the clone tree.\n", > node->dump_name ()); > + node->profile_id = first_clone->profile_id; > } > else if (dump_file) > fprintf (dump_file, "Introduced new external node " > @@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge > *&direct, > int i; > cgraph_edge *e2; > cgraph_edge *e = this; > + cgraph_node *referred_node; > > if (!e->indirect_unknown_callee) > for (e2 = e->caller->indirect_calls; > @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge > *&direct, > && ((ref->stmt && ref->stmt == e->call_stmt) > || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid))) > { > - reference = ref; > - break; > + if (e2->indirect_info && e2->indirect_info->num_of_ics) > + { > + referred_node = dyn_cast<cgraph_node *> (ref->referred); > + if (strstr (e->callee->name (), referred_node->name ())) > + { > + reference = ref; > + break; > + } > + } > + else > + { > + reference = ref; > + break; > + } > } > > /* Speculative edge always consist of all three components - direct edge, > @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl) > in the functions inlined through it. */ > } > edge->count += e2->count; > - edge->speculative = false; > + if (edge->indirect_info && edge->indirect_info->num_of_ics) > + { > + edge->indirect_info->num_of_ics--; > + if (edge->indirect_info->num_of_ics == 0) > + edge->speculative = false; > + } > + else > + edge->speculative = false; > e2->speculative = false; > ref->remove_reference (); > if (e2->indirect_unknown_callee || e2->inline_failed) > @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void) > e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt, > false); > e->count = gimple_bb (e->call_stmt)->count; > - e2->speculative = false; > + if (e2->indirect_info && e2->indirect_info->num_of_ics) > + { > + e2->indirect_info->num_of_ics--; > + if (e2->indirect_info->num_of_ics == 0) > + e2->speculative = false; > + } > + else > + e2->speculative = false; > e2->count = gimple_bb (e2->call_stmt)->count; > ref->speculative = false; > ref->stmt = NULL; > @@ -3407,7 +3435,7 @@ cgraph_node::verify_node (void) > > for (e = callees; e; e = e->next_callee) > { > - if (!e->aux) > + if (!e->aux && !e->speculative) > { > error ("edge %s->%s has no corresponding call_stmt", > identifier_to_locale (e->caller->name ()), > diff --git a/gcc/cgraph.h b/gcc/cgraph.h > index c294602d762..ed0fbc60432 100644 > --- a/gcc/cgraph.h > +++ b/gcc/cgraph.h > @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3. If not see > #include "profile-count.h" > #include "ipa-ref.h" > #include "plugin-api.h" > +#include "gcov-io.h" > > extern void debuginfo_early_init (void); > extern void debuginfo_init (void); > @@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info > int param_index; > /* ECF flags determined from the caller. */ > int ecf_flags; > - /* Profile_id of common target obtrained from profile. */ > + /* Profile_id of common target obtained from profile. */ > int common_target_id; > /* Probability that call will land in function with COMMON_TARGET_ID. */ > int common_target_probability; > > + /* Profile_id of common target obtained from profile. */ > + int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2]; > + /* Probabilities that call will land in function with COMMON_TARGET_IDS. > */ > + int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2]; > + unsigned num_of_ics; > + > /* Set when the call is a virtual call with the parameter being the > associated object pointer rather than a simple direct call. */ > unsigned polymorphic : 1; > diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c > index 15f7e119d18..94f424bc10c 100644 > --- a/gcc/cgraphclones.c > +++ b/gcc/cgraphclones.c > @@ -467,6 +467,7 @@ cgraph_node::create_clone (tree new_decl, profile_count > prof_count, > new_node->icf_merged = icf_merged; > new_node->merged_comdat = merged_comdat; > new_node->thunk = thunk; > + new_node->profile_id = profile_id; > > new_node->clone.tree_map = NULL; > new_node->clone.args_to_skip = args_to_skip; > diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c > index 360c3de3289..ef2b217b3f9 100644 > --- a/gcc/ipa-inline.c > +++ b/gcc/ipa-inline.c > @@ -1866,12 +1866,15 @@ inline_small_functions (void) > } > if (has_speculative) > for (edge = node->callees; edge; edge = next) > + { > + next = edge->next_callee; > if (edge->speculative && !speculation_useful_p (edge, > edge->aux != NULL)) > { > edge->resolve_speculation (); > update = true; > } > + } > if (update) > { > struct cgraph_node *where = node->global.inlined_to > diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c > index de9563d808c..d04476295a0 100644 > --- a/gcc/ipa-profile.c > +++ b/gcc/ipa-profile.c > @@ -168,6 +168,10 @@ ipa_profile_generate_summary (void) > struct cgraph_node *node; > gimple_stmt_iterator gsi; > basic_block bb; > + enum hist_type type; > + > + type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? > HIST_TYPE_INDIR_CALL_TOPN > + : HIST_TYPE_INDIR_CALL; > > hash_table<histogram_hash> hashtable (10); > > @@ -186,10 +190,10 @@ ipa_profile_generate_summary (void) > histogram_value h; > h = gimple_histogram_value_of_type > (DECL_STRUCT_FUNCTION (node->decl), > - stmt, HIST_TYPE_INDIR_CALL); > + stmt, type); > /* No need to do sanity check: gimple_ic_transform already > takes away bad histograms. */ > - if (h) > + if (h && type == HIST_TYPE_INDIR_CALL) > { > /* counter 0 is target, counter 1 is number of execution > we called target, > counter 2 is total number of executions. */ > @@ -212,6 +216,46 @@ ipa_profile_generate_summary (void) > gimple_remove_histogram_value (DECL_STRUCT_FUNCTION > (node->decl), > stmt, h); > } > + else if (h && type == HIST_TYPE_INDIR_CALL_TOPN) > + { > + unsigned j; > + struct cgraph_edge *e = node->get_edge (stmt); > + if (e && !e->indirect_unknown_callee) > + continue; > + > + e->indirect_info->num_of_ics = 0; > + for (j = 1; j < h->n_counters; j += 2) > + { > + if (h->hvalue.counters[j] == 0) > + continue; > + > + e->indirect_info->common_target_ids[j / 2] > + = h->hvalue.counters[j]; > + e->indirect_info->common_target_probabilities[j / 2] > + = GCOV_COMPUTE_SCALE ( > + h->hvalue.counters[j + 1], > + gimple_bb (stmt)->count.ipa ().to_gcov_type ()); > + if (e->indirect_info > + ->common_target_probabilities[j / 2] > + > REG_BR_PROB_BASE) > + { > + if (dump_file) > + fprintf (dump_file, > + "Probability capped to 1\n"); > + e->indirect_info > + ->common_target_probabilities[j / 2] > + = REG_BR_PROB_BASE; > + } > + e->indirect_info->num_of_ics++; > + } > + > + gcc_assert (e->indirect_info->num_of_ics > + <= GCOV_ICALL_TOPN_NCOUNTS / 2); > + > + gimple_remove_histogram_value (DECL_STRUCT_FUNCTION ( > + node->decl), > + stmt, h); > + } > } > time += estimate_num_insns (stmt, &eni_time_weights); > size += estimate_num_insns (stmt, &eni_size_weights); > @@ -492,6 +536,7 @@ ipa_profile (void) > int nindirect = 0, ncommon = 0, nunknown = 0, nuseless = 0, nconverted = 0; > int nmismatch = 0, nimpossible = 0; > bool node_map_initialized = false; > + gcov_type threshold; > > if (dump_file) > dump_histogram (dump_file, histogram); > @@ -500,14 +545,12 @@ ipa_profile (void) > overall_time += histogram[i]->count * histogram[i]->time; > overall_size += histogram[i]->size; > } > + threshold = 0; > if (overall_time) > { > - gcov_type threshold; > - > gcc_assert (overall_size); > > cutoff = (overall_time * PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE) + 500) > / 1000; > - threshold = 0; > for (i = 0; cumulated < cutoff; i++) > { > cumulated += histogram[i]->count * histogram[i]->time; > @@ -543,7 +586,7 @@ ipa_profile (void) > histogram.release (); > histogram_pool.release (); > > - /* Produce speculative calls: we saved common traget from porfiling into > + /* Produce speculative calls: we saved common target from profiling into > e->common_target_id. Now, at link time, we can look up corresponding > function node and produce speculative call. */ > > @@ -558,7 +601,8 @@ ipa_profile (void) > { > if (n->count.initialized_p ()) > nindirect++; > - if (e->indirect_info->common_target_id) > + if (e->indirect_info->common_target_id > + || (e->indirect_info && e->indirect_info->num_of_ics == 1)) > { > if (!node_map_initialized) > init_node_map (false); > @@ -613,7 +657,7 @@ ipa_profile (void) > if (dump_file) > fprintf (dump_file, > "Not speculating: " > - "parameter count mistmatch\n"); > + "parameter count mismatch\n"); > } > else if (e->indirect_info->polymorphic > && !opt_for_fn (n->decl, flag_devirtualize) > @@ -655,7 +699,130 @@ ipa_profile (void) > nunknown++; > } > } > - } > + if (e->indirect_info && e->indirect_info->num_of_ics > 1) > + { > + if (in_lto_p) > + { > + if (dump_file) > + { > + fprintf (dump_file, > + "Updating hotness threshold in LTO mode.\n"); > + fprintf (dump_file, "Updated min count: %" PRId64 "\n", > + (int64_t) threshold); > + } > + set_hot_bb_threshold (threshold > + / e->indirect_info->num_of_ics); > + } > + if (!node_map_initialized) > + init_node_map (false); > + node_map_initialized = true; > + ncommon++; > + unsigned speculative = 0; > + for (i = 0; i < (int)e->indirect_info->num_of_ics; i++) > + { > + n2 = find_func_by_profile_id ( > + e->indirect_info->common_target_ids[i]); > + if (n2) > + { > + if (dump_file) > + { > + fprintf ( > + dump_file, > + "Indirect call -> direct call from" > + " other module %s => %s, prob %3.2f\n", > + n->dump_name (), n2->dump_name (), > + e->indirect_info->common_target_probabilities[i] > + / (float) REG_BR_PROB_BASE); > + } > + if (e->indirect_info->common_target_probabilities[i] > + < REG_BR_PROB_BASE / 2) > + { > + nuseless++; > + if (dump_file) > + fprintf ( > + dump_file, > + "Not speculating: probability is too low.\n"); > + } > + else if (!e->maybe_hot_p ()) > + { > + nuseless++; > + if (dump_file) > + fprintf (dump_file, > + "Not speculating: call is cold.\n"); > + } > + else if (n2->get_availability () <= AVAIL_INTERPOSABLE > + && n2->can_be_discarded_p ()) > + { > + nuseless++; > + if (dump_file) > + fprintf (dump_file, > + "Not speculating: target is overwritable " > + "and can be discarded.\n"); > + } > + else if (ipa_node_params_sum && ipa_edge_args_sum > + && (!vec_safe_is_empty ( > + IPA_NODE_REF (n2)->descriptors)) > + && ipa_get_param_count (IPA_NODE_REF (n2)) > + != ipa_get_cs_argument_count ( > + IPA_EDGE_REF (e)) > + && (ipa_get_param_count (IPA_NODE_REF (n2)) > + >= ipa_get_cs_argument_count ( > + IPA_EDGE_REF (e)) > + || !stdarg_p (TREE_TYPE (n2->decl)))) > + { > + nmismatch++; > + if (dump_file) > + fprintf (dump_file, "Not speculating: " > + "parameter count mismatch\n"); > + } > + else if (e->indirect_info->polymorphic > + && !opt_for_fn (n->decl, flag_devirtualize) > + && !possible_polymorphic_call_target_p (e, n2)) > + { > + nimpossible++; > + if (dump_file) > + fprintf (dump_file, > + "Not speculating: " > + "function is not in the polymorphic " > + "call target list\n"); > + } > + else > + { > + /* Target may be overwritable, but profile says that > + control flow goes to this particular implementation > + of N2. Speculate on the local alias to allow > + inlining. > + */ > + if (!n2->can_be_discarded_p ()) > + { > + cgraph_node *alias; > + alias = dyn_cast<cgraph_node *> ( > + n2->noninterposable_alias ()); > + if (alias) > + n2 = alias; > + } > + nconverted++; > + e->make_speculative ( > + n2, e->count.apply_probability ( > + e->indirect_info > + ->common_target_probabilities[i])); > + update = true; > + speculative++; > + } > + } > + else > + { > + if (dump_file) > + fprintf (dump_file, > + "Function with profile-id %i not found.\n", > + e->indirect_info->common_target_ids[i]); > + nunknown++; > + } > + } > + if (speculative < e->indirect_info->num_of_ics) > + e->indirect_info->num_of_ics = speculative; > + } > + } > if (update) > ipa_update_overall_fn_summary (n); > } > diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c > index 79b250c3943..30347691029 100644 > --- a/gcc/ipa-utils.c > +++ b/gcc/ipa-utils.c > @@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst, > update_max_bb_count (); > compute_function_frequency (); > pop_cfun (); > + /* When src is speculative, clone the referrings. */ > + if (src->indirect_call_target) > + for (e = src->callers; e; e = e->next_caller) > + if (e->callee == src && e->speculative) > + dst->clone_referring (src); > for (e = dst->callees; e; e = e->next_callee) > { > if (e->speculative) > diff --git a/gcc/ipa.c b/gcc/ipa.c > index 2496694124c..c1fe081a72d 100644 > --- a/gcc/ipa.c > +++ b/gcc/ipa.c > @@ -166,7 +166,7 @@ process_references (symtab_node *snode, > devirtualization happens. After inlining still keep their declarations > around, so we can devirtualize to a direct call. > > - Also try to make trivial devirutalization when no or only one target is > + Also try to make trivial devirtualization when no or only one target is > possible. */ > > static void > diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c > index 4dfa2862be3..0c8f547d44e 100644 > --- a/gcc/lto-cgraph.c > +++ b/gcc/lto-cgraph.c > @@ -238,6 +238,7 @@ lto_output_edge (struct lto_simple_output_block *ob, > struct cgraph_edge *edge, > unsigned int uid; > intptr_t ref; > struct bitpack_d bp; > + unsigned i; > > if (edge->indirect_unknown_callee) > streamer_write_enum (ob->main_stream, LTO_symtab_tags, > LTO_symtab_last_tag, > @@ -296,6 +297,25 @@ lto_output_edge (struct lto_simple_output_block *ob, > struct cgraph_edge *edge, > if (edge->indirect_info->common_target_id) > streamer_write_hwi_stream > (ob->main_stream, edge->indirect_info->common_target_probability); > + > + gcc_assert (edge->indirect_info->num_of_ics > + <= GCOV_ICALL_TOPN_NCOUNTS / 2); > + > + streamer_write_hwi_stream (ob->main_stream, > + edge->indirect_info->num_of_ics); > + > + if (edge->indirect_info->num_of_ics) > + { > + for (i = 0; i < edge->indirect_info->num_of_ics; i++) > + { > + streamer_write_hwi_stream ( > + ob->main_stream, edge->indirect_info->common_target_ids[i]); > + if (edge->indirect_info->common_target_ids[i]) > + streamer_write_hwi_stream ( > + ob->main_stream, > + edge->indirect_info->common_target_probabilities[i]); > + } > + } > } > } > > @@ -1438,6 +1458,7 @@ input_edge (struct lto_input_block *ib, vec<symtab_node > *> nodes, > cgraph_inline_failed_t inline_failed; > struct bitpack_d bp; > int ecf_flags = 0; > + unsigned i; > > caller = dyn_cast<cgraph_node *> (nodes[streamer_read_hwi (ib)]); > if (caller == NULL || caller->decl == NULL_TREE) > @@ -1488,6 +1509,23 @@ input_edge (struct lto_input_block *ib, > vec<symtab_node *> nodes, > edge->indirect_info->common_target_id = streamer_read_hwi (ib); > if (edge->indirect_info->common_target_id) > edge->indirect_info->common_target_probability = streamer_read_hwi > (ib); > + > + edge->indirect_info->num_of_ics = streamer_read_hwi (ib); > + > + gcc_assert (edge->indirect_info->num_of_ics > + <= GCOV_ICALL_TOPN_NCOUNTS / 2); > + > + if (edge->indirect_info->num_of_ics) > + { > + for (i = 0; i < edge->indirect_info->num_of_ics; i++) > + { > + edge->indirect_info->common_target_ids[i] > + = streamer_read_hwi (ib); > + if (edge->indirect_info->common_target_ids[i]) > + edge->indirect_info->common_target_probabilities[i] > + = streamer_read_hwi (ib); > + } > + } > } > } > > diff --git a/gcc/predict.c b/gcc/predict.c > index 43ee91a5b13..b7f38891c72 100644 > --- a/gcc/predict.c > +++ b/gcc/predict.c > @@ -763,7 +763,6 @@ dump_prediction (FILE *file, enum br_predictor predictor, > int probability, > && bb->count.precise_p () > && reason == REASON_NONE) > { > - gcc_assert (e->count ().precise_p ()); > fprintf (file, ";;heuristics;%s;%" PRId64 ";%" PRId64 ";%.1f;\n", > predictor_info[predictor].name, > bb->count.to_gcov_type (), e->count ().to_gcov_type (), > diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c > b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c > new file mode 100644 > index 00000000000..e0a83c2e067 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c > @@ -0,0 +1,35 @@ > +/* { dg-require-effective-target lto } */ > +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */ > +/* { dg-require-profiling "-fprofile-generate" } */ > +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param > indir-call-topn-profile=1" } */ > + > +#include <stdio.h> > + > +typedef int (*fptr) (int); > +int > +one (int a); > + > +int > +two (int a); > + > +fptr table[] = {&one, &two}; > + > +int > +main() > +{ > + int i, x; > + fptr p = &one; > + > + x = one (3); > + > + for (i = 0; i < 350000000; i++) > + { > + x = (*p) (3); > + p = table[x]; > + } > + printf ("done:%d\n", x); > +} > + > +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct > call.* one transformation on insn" "profile_estimate" } } */ > +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct > call.* two transformation on insn" "profile_estimate" } } */ > + > diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c > b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c > new file mode 100644 > index 00000000000..a8c6e365fb9 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c > @@ -0,0 +1,22 @@ > +/* It seems there is no way to avoid the other source of mulitple > + source testcase from being compiled independently. Just avoid > + error. */ > +#ifdef DOJOB > +int > +one (int a) > +{ > + return 1; > +} > + > +int > +two (int a) > +{ > + return 0; > +} > +#else > +int > +main() > +{ > + return 0; > +} > +#endif > diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c > b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c > new file mode 100644 > index 00000000000..aa3887fde83 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c > @@ -0,0 +1,42 @@ > +/* { dg-require-effective-target lto } */ > +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */ > +/* { dg-require-profiling "-fprofile-generate" } */ > +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param > indir-call-topn-profile=1" } */ > + > +#include <stdio.h> > + > +typedef int (*fptr) (int); > +int > +one (int a); > + > +int > +two (int a); > + > +fptr table[] = {&one, &two}; > + > +int foo () > +{ > + int i, x; > + fptr p = &one; > + > + x = one (3); > + > + for (i = 0; i < 350000000; i++) > + { > + x = (*p) (3); > + p = table[x]; > + } > + return x; > +} > + > +int > +main() > +{ > + int x = foo (); > + printf ("done:%d\n", x); > +} > + > +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct > call.* one transformation on insn" "profile_estimate" } } */ > +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct > call.* two transformation on insn" "profile_estimate" } } */ > + > + > diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c > b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c > new file mode 100644 > index 00000000000..951bc7ddd19 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c > @@ -0,0 +1,38 @@ > +/* { dg-require-profiling "-fprofile-generate" } */ > +/* { dg-options "-O2 -fdump-ipa-profile --param indir-call-topn-profile=1" } > */ > + > +#include <stdio.h> > + > +typedef int (*fptr) (int); > +int > +one (int a) > +{ > + return 1; > +} > + > +int > +two (int a) > +{ > + return 0; > +} > + > +fptr table[] = {&one, &two}; > + > +int > +main() > +{ > + int i, x; > + fptr p = &one; > + > + one (3); > + > + for (i = 0; i < 350000000; i++) > + { > + x = (*p) (3); > + p = table[x]; > + } > + printf ("done:%d\n", x); > +} > + > +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct > call.* one transformation on insn" "profile" } } */ > +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct > call.* two transformation on insn" "profile" } } */ > diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c > index 9017da878b1..f69b31b197e 100644 > --- a/gcc/tree-inline.c > +++ b/gcc/tree-inline.c > @@ -2028,43 +2028,66 @@ copy_bb (copy_body_data *id, basic_block bb, > switch (id->transform_call_graph_edges) > { > case CB_CGE_DUPLICATE: > - edge = id->src_node->get_edge (orig_stmt); > - if (edge) > - { > - struct cgraph_edge *old_edge = edge; > - profile_count old_cnt = edge->count; > - edge = edge->clone (id->dst_node, call_stmt, > - gimple_uid (stmt), > - num, den, > - true); > - > - /* Speculative calls consist of two edges - direct and > - indirect. Duplicate the whole thing and distribute > - frequencies accordingly. */ > - if (edge->speculative) > - { > - struct cgraph_edge *direct, *indirect; > - struct ipa_ref *ref; > - > - gcc_assert (!edge->indirect_unknown_callee); > - old_edge->speculative_call_info (direct, indirect, > ref); > - > - profile_count indir_cnt = indirect->count; > - indirect = indirect->clone (id->dst_node, call_stmt, > - gimple_uid (stmt), > - num, den, > - true); > - > - profile_probability prob > - = indir_cnt.probability_in (old_cnt + indir_cnt); > - indirect->count > - = copy_basic_block->count.apply_probability (prob); > - edge->count = copy_basic_block->count - > indirect->count; > - id->dst_node->clone_reference (ref, stmt); > - } > - else > - edge->count = copy_basic_block->count; > - } > + { > + edge = id->src_node->get_edge (orig_stmt); > + struct cgraph_edge *old_edge = edge; > + struct cgraph_edge *direct, *indirect; > + bool next_speculative; > + do > + { > + next_speculative = false; > + if (edge) > + { > + profile_count old_cnt = edge->count; > + edge > + = edge->clone (id->dst_node, call_stmt, > + gimple_uid (stmt), num, den, true); > + > + /* Speculative calls consist of two edges - direct > + and indirect. Duplicate the whole thing and > + distribute frequencies accordingly. */ > + if (edge->speculative) > + { > + struct ipa_ref *ref; > + > + gcc_assert (!edge->indirect_unknown_callee); > + old_edge->speculative_call_info (direct, > + indirect, ref); > + > + profile_count indir_cnt = indirect->count; > + indirect > + = indirect->clone (id->dst_node, call_stmt, > + gimple_uid (stmt), num, > + den, true); > + > + profile_probability prob > + = indir_cnt.probability_in (old_cnt > + + indir_cnt); > + indirect->count > + = copy_basic_block->count.apply_probability ( > + prob); > + edge->count > + = copy_basic_block->count - indirect->count; > + id->dst_node->clone_reference (ref, stmt); > + } > + else > + edge->count = copy_basic_block->count; > + } > + /* If the indirect call contains more than one indirect > + targets, need clone all speculative edges here. */ > + if (old_edge && old_edge->next_callee > + && old_edge->speculative && indirect > + && indirect->indirect_info > + && indirect->indirect_info->num_of_ics > 1) > + { > + edge = old_edge->next_callee; > + old_edge = old_edge->next_callee; > + if (edge->speculative) > + next_speculative = true; > + } > + } > + while (next_speculative); > + } > break; > > case CB_CGE_MOVE_CLONES: > diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c > index 1c3034aac10..4964dbdebb5 100644 > --- a/gcc/tree-profile.c > +++ b/gcc/tree-profile.c > @@ -74,8 +74,8 @@ static GTY(()) tree ic_tuple_callee_field; > /* Do initialization work for the edge profiler. */ > > /* Add code: > - __thread gcov* __gcov_indirect_call_counters; // pointer to actual > counter > - __thread void* __gcov_indirect_call_callee; // actual callee address > + __thread gcov* __gcov_indirect_call.counters; // pointer to actual > counter > + __thread void* __gcov_indirect_call.callee; // actual callee address > __thread int __gcov_function_counter; // time profiler function counter > */ > static void > @@ -395,7 +395,7 @@ gimple_gen_ic_profiler (histogram_value value, unsigned > tag, unsigned base) > f_1 = foo; > __gcov_indirect_call.counters = &__gcov4.main[0]; > PROF_9 = f_1; > - __gcov_indirect_call_callee = PROF_9; > + __gcov_indirect_call.callee = PROF_9; > _4 = f_1 (); > */ > > @@ -458,11 +458,11 @@ gimple_gen_ic_func_profiler (void) > > /* Insert code: > > - if (__gcov_indirect_call_callee != NULL) > + if (__gcov_indirect_call.callee != NULL) > __gcov_indirect_call_profiler_v3 (profile_id, ¤t_function_decl); > > The function __gcov_indirect_call_profiler_v3 is responsible for > - resetting __gcov_indirect_call_callee to NULL. */ > + resetting __gcov_indirect_call.callee to NULL. */ > > gimple_stmt_iterator gsi = gsi_start_bb (cond_bb); > void0 = build_int_cst (ptr_type_node, 0); > @@ -904,7 +904,7 @@ pass_ipa_tree_profile::gate (function *) > { > /* When profile instrumentation, use or test coverage shall be performed. > But for AutoFDO, this there is no instrumentation, thus this pass is > - diabled. */ > + disabled. */ > return (!in_lto_p && !flag_auto_profile > && (flag_branch_probabilities || flag_test_coverage > || profile_arc_flag)); > diff --git a/gcc/value-prof.c b/gcc/value-prof.c > index 5013956cf86..4869ab8ccd6 100644 > --- a/gcc/value-prof.c > +++ b/gcc/value-prof.c > @@ -579,8 +579,8 @@ free_histograms (struct function *fn) > somehow. */ > > static bool > -check_counter (gimple *stmt, const char * name, > - gcov_type *count, gcov_type *all, profile_count bb_count_d) > +check_counter (gimple *stmt, const char *name, gcov_type *count, gcov_type > *all, > + profile_count bb_count_d, float ratio = 1.0f) > { > gcov_type bb_count = bb_count_d.ipa ().to_gcov_type (); > if (*all != bb_count || *count > *all) > @@ -599,7 +599,7 @@ check_counter (gimple *stmt, const char * name, > "count (%d)\n", name, (int)*all, (int)bb_count); > *all = bb_count; > if (*count > *all) > - *count = *all; > + *count = *all * ratio; > return false; > } > else > @@ -1410,9 +1410,132 @@ gimple_ic (gcall *icall_stmt, struct cgraph_node > *direct_call, > return dcall_stmt; > } > > +/* If --param=indir-call-topn-profile=1 is specified when compiling, there > maybe > + multiple indirect targets in histogram. Check every indirect/virtual call > + if callee function exists, if not exit, leave it to LTO stage for later > + process. Modify code of this indirect call to an if-else structure in > + ipa-profile finally. */ > +static bool > +ic_transform_topn (gimple_stmt_iterator *gsi) > +{ > + unsigned j; > + gcall *stmt; > + histogram_value histogram; > + gcov_type val, count, count_all, all, bb_all; > + struct cgraph_node *d_call; > + profile_count bb_count; > + > + stmt = dyn_cast<gcall *> (gsi_stmt (*gsi)); > + if (!stmt) > + return false; > + > + if (gimple_call_fndecl (stmt) != NULL_TREE) > + return false; > + > + if (gimple_call_internal_p (stmt)) > + return false; > + > + histogram > + = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL_TOPN); > + if (!histogram) > + return false; > + > + count = 0; > + all = 0; > + bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type (); > + bb_count = gimple_bb (stmt)->count; > + > + /* n_counters need be odd to avoid access violation. */ > + gcc_assert (histogram->n_counters % 2 == 1); > + > + /* For indirect call topn, accumulate all the counts first. */ > + for (j = 1; j < histogram->n_counters; j += 2) > + { > + val = histogram->hvalue.counters[j]; > + count = histogram->hvalue.counters[j + 1]; > + if (val) > + all += count; > + } > + > + count_all = all; > + /* Do the indirect call conversion if function body exists, or else leave > it > + to LTO stage. */ > + for (j = 1; j < histogram->n_counters; j += 2) > + { > + val = histogram->hvalue.counters[j]; > + count = histogram->hvalue.counters[j + 1]; > + if (val) > + { > + /* The order of CHECK_COUNTER calls is important > + since check_counter can correct the third parameter > + and we want to make count <= all <= bb_count. */ > + if (check_counter (stmt, "ic", &all, &bb_all, bb_count) > + || check_counter (stmt, "ic", &count, &all, > + profile_count::from_gcov_type (all), > + (float) count / count_all)) > + { > + gimple_remove_histogram_value (cfun, stmt, histogram); > + return false; > + } > + > + d_call = find_func_by_profile_id ((int) val); > + > + if (d_call == NULL) > + { > + if (val) > + { > + if (dump_file) > + { > + fprintf ( > + dump_file, > + "Indirect call -> direct call from other module"); > + print_generic_expr (dump_file, gimple_call_fn (stmt), > + TDF_SLIM); > + fprintf (dump_file, > + "=> %i (will resolve only with LTO)\n", > + (int) val); > + } > + } > + return false; > + } > + > + if (!check_ic_target (stmt, d_call)) > + { > + if (dump_file) > + { > + fprintf (dump_file, "Indirect call -> direct call "); > + print_generic_expr (dump_file, gimple_call_fn (stmt), > + TDF_SLIM); > + fprintf (dump_file, "=> "); > + print_generic_expr (dump_file, d_call->decl, TDF_SLIM); > + fprintf (dump_file, > + " transformation skipped because of type mismatch"); > + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); > + } > + gimple_remove_histogram_value (cfun, stmt, histogram); > + return false; > + } > + > + if (dump_file) > + { > + fprintf (dump_file, "Indirect call -> direct call "); > + print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM); > + fprintf (dump_file, "=> "); > + print_generic_expr (dump_file, d_call->decl, TDF_SLIM); > + fprintf (dump_file, > + " transformation on insn postponed to ipa-profile"); > + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); > + fprintf (dump_file, "hist->count %" PRId64 > + " hist->all %" PRId64"\n", count, all); > + } > + } > + } > + > + return true; > +} > /* > For every checked indirect/virtual call determine if most common pid of > - function/class method has probability more than 50%. If yes modify code of > + function/class method has probability more than 50%. If yes modify code of > this call to: > */ > > @@ -1423,6 +1546,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi) > histogram_value histogram; > gcov_type val, count, all, bb_all; > struct cgraph_node *direct_call; > + enum hist_type type; > > stmt = dyn_cast <gcall *> (gsi_stmt (*gsi)); > if (!stmt) > @@ -1434,18 +1558,24 @@ gimple_ic_transform (gimple_stmt_iterator *gsi) > if (gimple_call_internal_p (stmt)) > return false; > > - histogram = gimple_histogram_value_of_type (cfun, stmt, > HIST_TYPE_INDIR_CALL); > + type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? > HIST_TYPE_INDIR_CALL_TOPN > + : HIST_TYPE_INDIR_CALL; > + > + histogram = gimple_histogram_value_of_type (cfun, stmt, type); > if (!histogram) > return false; > > + if (type == HIST_TYPE_INDIR_CALL_TOPN) > + return ic_transform_topn (gsi); > + > val = histogram->hvalue.counters [0]; > count = histogram->hvalue.counters [1]; > all = histogram->hvalue.counters [2]; > > bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type (); > - /* The order of CHECK_COUNTER calls is important - > + /* The order of CHECK_COUNTER calls is important > since check_counter can correct the third parameter > - and we want to make count <= all <= bb_all. */ > + and we want to make count <= all <= bb_all. */ > if (check_counter (stmt, "ic", &all, &bb_all, gimple_bb (stmt)->count) > || check_counter (stmt, "ic", &count, &all, > profile_count::from_gcov_type (all))) > @@ -1494,7 +1624,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi) > print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM); > fprintf (dump_file, "=> "); > print_generic_expr (dump_file, direct_call->decl, TDF_SLIM); > - fprintf (dump_file, " transformation on insn postponned to > ipa-profile"); > + fprintf (dump_file, " transformation on insn postponed to > ipa-profile"); > print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); > fprintf (dump_file, "hist->count %" PRId64 > " hist->all %" PRId64"\n", count, all); >