On 6/18/19 3:45 AM, Xiong Hu Luo wrote:

Hello.

Thank you for the interest in the area.

> This patch aims to fix PR69678 caused by PGO indirect call profiling bugs.
> Currently the default instrument function can only find the indirect function
> that called more than 50% with an incorrect count number returned.

Can you please explain what you mean by 'an incorrect count number returned'?

>  This patch
> leverages the "--param indir-call-topn-profile=1" and enables multiple 
> indirect

Note that I've remove indir-call-topn-profile last week, the patch will not 
apply
on current trunk. However, I can help you how to adapt single-value counters
to support tracking of multiple values.

> targets profiling and use in LTO-WPA and LTO-LTRANS stage, as a result, 
> function
> specialization, profiling, partial devirtualization, inlining and cloning 
> could
> be done successfully based on it.

This decision is definitely big question for Honza?

> Performance can get improved 3x (1.7 sec -> 0.4 sec) on simple tests.
> Details are:
>   1.  When do PGO with indir-call-topn-profile, the gcda data format is not
>   supported in ipa-profile pass,

If you take a look at gcc/ipa-profile.c:195 you can see how the probability
is propagated to IPA passes. Why is that not sufficient?

Martin

> so add variables to pass the information
>   through passes, and postpone gimple_ic to ipa-profile like default as inline
>   pass will decide whether it is benefit to transform indirect call.
>   2.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
>   profile full support in ipa passes and cgraph_edge functions.
>   3.  Fix various hidden speculative call ICEs exposed after enabling this
>   feature when running SPEC2017.
>   4.  Add 1 in module testcase and 2 cross module testcases.
>   5.  TODOs:
>     5.1.  Some reference info will be dropped from WPA to LTRANS, so
>     reference check will be difficult in LTRANS, need replace the strstr
>     with reference compare.
>     5.2.  Some duplicate code need be removed as top1 and topn share same 
> logic.
>     Actually top1 related logic could be eliminated totally as topn includes 
> it.
>     5.3.  Split patch maybe needed as too big but not sure how many would be
>     reasonable.
>   6.  Performance result for ppc64le:
>     6.1.  Representative test: indir-call-prof-topn.c runtime improved from
>     1.7s to 0.4s.
>     6.2.  SPEC2017 peakrate:
>         523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r 
> (+13.33%);
>         525.x264_r (-5.29%).
>         No big changes of other benchmarks.
>         Option: -Ofast -mcpu=power8
>         PASS1_OPTIMIZE: -fprofile-generate --param indir-call-topn-profile=1 
> -flto
>         PASS2_OPTIMIZE: -fprofile-use --param indir-call-topn-profile=1 -flto
>         -fprofile-correction
>     6.3.  No performance change on PHP benchmark.
>   7.  Bootstrap and regression test passed on Power8-LE.
> 
> gcc/ChangeLog
> 
>       2019-06-17  Xiong Hu Luo  <luo...@linux.ibm.com>
> 
>       PR ipa/69678
>       * cgraph.c (cgraph_node::get_create): Copy profile_id.
>       (cgraph_edge::speculative_call_info): Find real
>       reference for indirect targets.
>       (cgraph_edge::resolve_speculation): Add speculative code process
>       for indirect targets.
>       (cgraph_edge::redirect_call_stmt_to_callee): Likewise.
>       (cgraph_node::verify_node): Likewise.
>       * cgraph.h (common_target_ids): New variable.
>       (common_target_probabilities): Likewise.
>       (num_of_ics): Likewise.
>       * cgraphclones.c (cgraph_node::create_clone): Copy profile_id.
>       * ipa-inline.c (inline_small_functions): Add iterator update.
>       * ipa-profile.c (ipa_profile_generate_summary): Add indirect
>       multiple targets logic.
>       (ipa_profile): Likewise.
>       * ipa-utils.c (ipa_merge_profiles): Clone speculative src's
>       referrings to dst.
>       * ipa.c (process_references): Fix typo.
>       * lto-cgraph.c (lto_output_edge): Add indirect multiple targets
>       logic.
>       (input_edge): Likewise.
>       * predict.c (dump_prediction): Revome edges count assert to be
>       precise.
>       * tree-profile.c (gimple_gen_ic_profiler): Use the new variable
>       __gcov_indirect_call.counters and __gcov_indirect_call.callee.
>       (gimple_gen_ic_func_profiler): Likewise.
>       (pass_ipa_tree_profile::gate): Fix comment typos.
>       * tree-inline.c (copy_bb): Duplicate all the speculative edges
>       if indirect call contains multiple speculative targets.
>       * value-prof.c (check_counter): Proportion the counter for
>       multiple targets.
>       (ic_transform_topn): New function.
>       (gimple_ic_transform): Handle topn case, fix comment typos.
> 
> gcc/testsuite/ChangeLog
> 
>       2019-06-17  Xiong Hu Luo  <luo...@linux.ibm.com>
> 
>       PR ipa/69678
>       * gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase.
>       * gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase.
>       * gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase.
>       * gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase.
> ---
>  gcc/cgraph.c                                  |  38 +++-
>  gcc/cgraph.h                                  |   9 +-
>  gcc/cgraphclones.c                            |   1 +
>  gcc/ipa-inline.c                              |   3 +
>  gcc/ipa-profile.c                             | 185 +++++++++++++++++-
>  gcc/ipa-utils.c                               |   5 +
>  gcc/ipa.c                                     |   2 +-
>  gcc/lto-cgraph.c                              |  38 ++++
>  gcc/predict.c                                 |   1 -
>  .../tree-prof/crossmodule-indir-call-topn-1.c |  35 ++++
>  .../crossmodule-indir-call-topn-1a.c          |  22 +++
>  .../tree-prof/crossmodule-indir-call-topn-2.c |  42 ++++
>  .../gcc.dg/tree-prof/indir-call-prof-topn.c   |  38 ++++
>  gcc/tree-inline.c                             |  97 +++++----
>  gcc/tree-profile.c                            |  12 +-
>  gcc/value-prof.c                              | 146 +++++++++++++-
>  16 files changed, 606 insertions(+), 68 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
>  create mode 100644 
> gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
>  create mode 100644 
> gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
> 
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index de82316d4b1..0d373a67d1b 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
>       fprintf (dump_file, "Introduced new external node "
>                "(%s) and turned into root of the clone tree.\n",
>                node->dump_name ());
> +      node->profile_id = first_clone->profile_id;
>      }
>    else if (dump_file)
>      fprintf (dump_file, "Introduced new external node "
> @@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge 
> *&direct,
>    int i;
>    cgraph_edge *e2;
>    cgraph_edge *e = this;
> +  cgraph_node *referred_node;
>  
>    if (!e->indirect_unknown_callee)
>      for (e2 = e->caller->indirect_calls;
> @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge 
> *&direct,
>       && ((ref->stmt && ref->stmt == e->call_stmt)
>           || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
>        {
> -     reference = ref;
> -     break;
> +     if (e2->indirect_info && e2->indirect_info->num_of_ics)
> +       {
> +         referred_node = dyn_cast<cgraph_node *> (ref->referred);
> +         if (strstr (e->callee->name (), referred_node->name ()))
> +           {
> +             reference = ref;
> +             break;
> +           }
> +       }
> +     else
> +       {
> +         reference = ref;
> +         break;
> +       }
>        }
>  
>    /* Speculative edge always consist of all three components - direct edge,
> @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
>           in the functions inlined through it.  */
>      }
>    edge->count += e2->count;
> -  edge->speculative = false;
> +  if (edge->indirect_info && edge->indirect_info->num_of_ics)
> +    {
> +      edge->indirect_info->num_of_ics--;
> +      if (edge->indirect_info->num_of_ics == 0)
> +     edge->speculative = false;
> +    }
> +  else
> +    edge->speculative = false;
>    e2->speculative = false;
>    ref->remove_reference ();
>    if (e2->indirect_unknown_callee || e2->inline_failed)
> @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>         e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt,
>                                                    false);
>         e->count = gimple_bb (e->call_stmt)->count;
> -       e2->speculative = false;
> +       if (e2->indirect_info && e2->indirect_info->num_of_ics)
> +         {
> +           e2->indirect_info->num_of_ics--;
> +           if (e2->indirect_info->num_of_ics == 0)
> +             e2->speculative = false;
> +         }
> +       else
> +         e2->speculative = false;
>         e2->count = gimple_bb (e2->call_stmt)->count;
>         ref->speculative = false;
>         ref->stmt = NULL;
> @@ -3407,7 +3435,7 @@ cgraph_node::verify_node (void)
>  
>        for (e = callees; e; e = e->next_callee)
>       {
> -       if (!e->aux)
> +       if (!e->aux && !e->speculative)
>           {
>             error ("edge %s->%s has no corresponding call_stmt",
>                    identifier_to_locale (e->caller->name ()),
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index c294602d762..ed0fbc60432 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "profile-count.h"
>  #include "ipa-ref.h"
>  #include "plugin-api.h"
> +#include "gcov-io.h"
>  
>  extern void debuginfo_early_init (void);
>  extern void debuginfo_init (void);
> @@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info
>    int param_index;
>    /* ECF flags determined from the caller.  */
>    int ecf_flags;
> -  /* Profile_id of common target obtrained from profile.  */
> +  /* Profile_id of common target obtained from profile.  */
>    int common_target_id;
>    /* Probability that call will land in function with COMMON_TARGET_ID.  */
>    int common_target_probability;
>  
> +  /* Profile_id of common target obtained from profile.  */
> +  int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2];
> +  /* Probabilities that call will land in function with COMMON_TARGET_IDS.  
> */
> +  int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2];
> +  unsigned num_of_ics;
> +
>    /* Set when the call is a virtual call with the parameter being the
>       associated object pointer rather than a simple direct call.  */
>    unsigned polymorphic : 1;
> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
> index 15f7e119d18..94f424bc10c 100644
> --- a/gcc/cgraphclones.c
> +++ b/gcc/cgraphclones.c
> @@ -467,6 +467,7 @@ cgraph_node::create_clone (tree new_decl, profile_count 
> prof_count,
>    new_node->icf_merged = icf_merged;
>    new_node->merged_comdat = merged_comdat;
>    new_node->thunk = thunk;
> +  new_node->profile_id = profile_id;
>  
>    new_node->clone.tree_map = NULL;
>    new_node->clone.args_to_skip = args_to_skip;
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index 360c3de3289..ef2b217b3f9 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -1866,12 +1866,15 @@ inline_small_functions (void)
>       }
>        if (has_speculative)
>       for (edge = node->callees; edge; edge = next)
> +     {
> +       next = edge->next_callee;
>         if (edge->speculative && !speculation_useful_p (edge,
>                                                         edge->aux != NULL))
>           {
>             edge->resolve_speculation ();
>             update = true;
>           }
> +     }
>        if (update)
>       {
>         struct cgraph_node *where = node->global.inlined_to
> diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
> index de9563d808c..d04476295a0 100644
> --- a/gcc/ipa-profile.c
> +++ b/gcc/ipa-profile.c
> @@ -168,6 +168,10 @@ ipa_profile_generate_summary (void)
>    struct cgraph_node *node;
>    gimple_stmt_iterator gsi;
>    basic_block bb;
> +  enum hist_type type;
> +
> +  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? 
> HIST_TYPE_INDIR_CALL_TOPN
> +                                                  : HIST_TYPE_INDIR_CALL;
>  
>    hash_table<histogram_hash> hashtable (10);
>    
> @@ -186,10 +190,10 @@ ipa_profile_generate_summary (void)
>                 histogram_value h;
>                 h = gimple_histogram_value_of_type
>                       (DECL_STRUCT_FUNCTION (node->decl),
> -                      stmt, HIST_TYPE_INDIR_CALL);
> +                      stmt, type);
>                 /* No need to do sanity check: gimple_ic_transform already
>                    takes away bad histograms.  */
> -               if (h)
> +               if (h && type == HIST_TYPE_INDIR_CALL)
>                   {
>                     /* counter 0 is target, counter 1 is number of execution 
> we called target,
>                        counter 2 is total number of executions.  */
> @@ -212,6 +216,46 @@ ipa_profile_generate_summary (void)
>                     gimple_remove_histogram_value (DECL_STRUCT_FUNCTION 
> (node->decl),
>                                                     stmt, h);
>                   }
> +               else if (h && type == HIST_TYPE_INDIR_CALL_TOPN)
> +                 {
> +                   unsigned j;
> +                   struct cgraph_edge *e = node->get_edge (stmt);
> +                   if (e && !e->indirect_unknown_callee)
> +                     continue;
> +
> +                   e->indirect_info->num_of_ics = 0;
> +                   for (j = 1; j < h->n_counters; j += 2)
> +                     {
> +                       if (h->hvalue.counters[j] == 0)
> +                         continue;
> +
> +                       e->indirect_info->common_target_ids[j / 2]
> +                         = h->hvalue.counters[j];
> +                       e->indirect_info->common_target_probabilities[j / 2]
> +                         = GCOV_COMPUTE_SCALE (
> +                           h->hvalue.counters[j + 1],
> +                           gimple_bb (stmt)->count.ipa ().to_gcov_type ());
> +                       if (e->indirect_info
> +                             ->common_target_probabilities[j / 2]
> +                           > REG_BR_PROB_BASE)
> +                         {
> +                           if (dump_file)
> +                             fprintf (dump_file,
> +                                      "Probability capped to 1\n");
> +                           e->indirect_info
> +                             ->common_target_probabilities[j / 2]
> +                             = REG_BR_PROB_BASE;
> +                         }
> +                       e->indirect_info->num_of_ics++;
> +                     }
> +
> +                   gcc_assert (e->indirect_info->num_of_ics
> +                               <= GCOV_ICALL_TOPN_NCOUNTS / 2);
> +
> +                   gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (
> +                                                    node->decl),
> +                                                  stmt, h);
> +                 }
>               }
>             time += estimate_num_insns (stmt, &eni_time_weights);
>             size += estimate_num_insns (stmt, &eni_size_weights);
> @@ -492,6 +536,7 @@ ipa_profile (void)
>    int nindirect = 0, ncommon = 0, nunknown = 0, nuseless = 0, nconverted = 0;
>    int nmismatch = 0, nimpossible = 0;
>    bool node_map_initialized = false;
> +  gcov_type threshold;
>  
>    if (dump_file)
>      dump_histogram (dump_file, histogram);
> @@ -500,14 +545,12 @@ ipa_profile (void)
>        overall_time += histogram[i]->count * histogram[i]->time;
>        overall_size += histogram[i]->size;
>      }
> +  threshold = 0;
>    if (overall_time)
>      {
> -      gcov_type threshold;
> -
>        gcc_assert (overall_size);
>  
>        cutoff = (overall_time * PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE) + 500) 
> / 1000;
> -      threshold = 0;
>        for (i = 0; cumulated < cutoff; i++)
>       {
>         cumulated += histogram[i]->count * histogram[i]->time;
> @@ -543,7 +586,7 @@ ipa_profile (void)
>    histogram.release ();
>    histogram_pool.release ();
>  
> -  /* Produce speculative calls: we saved common traget from porfiling into
> +  /* Produce speculative calls: we saved common target from profiling into
>       e->common_target_id.  Now, at link time, we can look up corresponding
>       function node and produce speculative call.  */
>  
> @@ -558,7 +601,8 @@ ipa_profile (void)
>       {
>         if (n->count.initialized_p ())
>           nindirect++;
> -       if (e->indirect_info->common_target_id)
> +       if (e->indirect_info->common_target_id
> +           || (e->indirect_info && e->indirect_info->num_of_ics == 1))
>           {
>             if (!node_map_initialized)
>               init_node_map (false);
> @@ -613,7 +657,7 @@ ipa_profile (void)
>                     if (dump_file)
>                       fprintf (dump_file,
>                                "Not speculating: "
> -                              "parameter count mistmatch\n");
> +                              "parameter count mismatch\n");
>                   }
>                 else if (e->indirect_info->polymorphic
>                          && !opt_for_fn (n->decl, flag_devirtualize)
> @@ -655,7 +699,130 @@ ipa_profile (void)
>                 nunknown++;
>               }
>           }
> -      }
> +       if (e->indirect_info && e->indirect_info->num_of_ics > 1)
> +         {
> +           if (in_lto_p)
> +             {
> +               if (dump_file)
> +                 {
> +                   fprintf (dump_file,
> +                            "Updating hotness threshold in LTO mode.\n");
> +                   fprintf (dump_file, "Updated min count: %" PRId64 "\n",
> +                            (int64_t) threshold);
> +                 }
> +               set_hot_bb_threshold (threshold
> +                                     / e->indirect_info->num_of_ics);
> +             }
> +           if (!node_map_initialized)
> +             init_node_map (false);
> +           node_map_initialized = true;
> +           ncommon++;
> +           unsigned speculative = 0;
> +           for (i = 0; i < (int)e->indirect_info->num_of_ics; i++)
> +             {
> +               n2 = find_func_by_profile_id (
> +                 e->indirect_info->common_target_ids[i]);
> +               if (n2)
> +                 {
> +                   if (dump_file)
> +                     {
> +                       fprintf (
> +                         dump_file,
> +                         "Indirect call -> direct call from"
> +                         " other module %s => %s, prob %3.2f\n",
> +                         n->dump_name (), n2->dump_name (),
> +                         e->indirect_info->common_target_probabilities[i]
> +                           / (float) REG_BR_PROB_BASE);
> +                     }
> +                   if (e->indirect_info->common_target_probabilities[i]
> +                       < REG_BR_PROB_BASE / 2)
> +                     {
> +                       nuseless++;
> +                       if (dump_file)
> +                         fprintf (
> +                           dump_file,
> +                           "Not speculating: probability is too low.\n");
> +                     }
> +                   else if (!e->maybe_hot_p ())
> +                     {
> +                       nuseless++;
> +                       if (dump_file)
> +                         fprintf (dump_file,
> +                                  "Not speculating: call is cold.\n");
> +                     }
> +                   else if (n2->get_availability () <= AVAIL_INTERPOSABLE
> +                            && n2->can_be_discarded_p ())
> +                     {
> +                       nuseless++;
> +                       if (dump_file)
> +                         fprintf (dump_file,
> +                                  "Not speculating: target is overwritable "
> +                                  "and can be discarded.\n");
> +                     }
> +                   else if (ipa_node_params_sum && ipa_edge_args_sum
> +                            && (!vec_safe_is_empty (
> +                              IPA_NODE_REF (n2)->descriptors))
> +                            && ipa_get_param_count (IPA_NODE_REF (n2))
> +                                 != ipa_get_cs_argument_count (
> +                                   IPA_EDGE_REF (e))
> +                            && (ipa_get_param_count (IPA_NODE_REF (n2))
> +                                  >= ipa_get_cs_argument_count (
> +                                    IPA_EDGE_REF (e))
> +                                || !stdarg_p (TREE_TYPE (n2->decl))))
> +                     {
> +                       nmismatch++;
> +                       if (dump_file)
> +                         fprintf (dump_file, "Not speculating: "
> +                                             "parameter count mismatch\n");
> +                     }
> +                   else if (e->indirect_info->polymorphic
> +                            && !opt_for_fn (n->decl, flag_devirtualize)
> +                            && !possible_polymorphic_call_target_p (e, n2))
> +                     {
> +                       nimpossible++;
> +                       if (dump_file)
> +                         fprintf (dump_file,
> +                                  "Not speculating: "
> +                                  "function is not in the polymorphic "
> +                                  "call target list\n");
> +                     }
> +                   else
> +                     {
> +                       /* Target may be overwritable, but profile says that
> +                          control flow goes to this particular implementation
> +                          of N2.  Speculate on the local alias to allow
> +                          inlining.
> +                          */
> +                       if (!n2->can_be_discarded_p ())
> +                         {
> +                           cgraph_node *alias;
> +                           alias = dyn_cast<cgraph_node *> (
> +                             n2->noninterposable_alias ());
> +                           if (alias)
> +                             n2 = alias;
> +                         }
> +                       nconverted++;
> +                       e->make_speculative (
> +                         n2, e->count.apply_probability (
> +                               e->indirect_info
> +                                 ->common_target_probabilities[i]));
> +                       update = true;
> +                       speculative++;
> +                     }
> +                 }
> +               else
> +                 {
> +                   if (dump_file)
> +                     fprintf (dump_file,
> +                              "Function with profile-id %i not found.\n",
> +                              e->indirect_info->common_target_ids[i]);
> +                   nunknown++;
> +                 }
> +             }
> +           if (speculative < e->indirect_info->num_of_ics)
> +             e->indirect_info->num_of_ics = speculative;
> +         }
> +     }
>         if (update)
>        ipa_update_overall_fn_summary (n);
>       }
> diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
> index 79b250c3943..30347691029 100644
> --- a/gcc/ipa-utils.c
> +++ b/gcc/ipa-utils.c
> @@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst,
>        update_max_bb_count ();
>        compute_function_frequency ();
>        pop_cfun ();
> +      /* When src is speculative, clone the referrings.  */
> +      if (src->indirect_call_target)
> +     for (e = src->callers; e; e = e->next_caller)
> +       if (e->callee == src && e->speculative)
> +         dst->clone_referring (src);
>        for (e = dst->callees; e; e = e->next_callee)
>       {
>         if (e->speculative)
> diff --git a/gcc/ipa.c b/gcc/ipa.c
> index 2496694124c..c1fe081a72d 100644
> --- a/gcc/ipa.c
> +++ b/gcc/ipa.c
> @@ -166,7 +166,7 @@ process_references (symtab_node *snode,
>     devirtualization happens.  After inlining still keep their declarations
>     around, so we can devirtualize to a direct call.
>  
> -   Also try to make trivial devirutalization when no or only one target is
> +   Also try to make trivial devirtualization when no or only one target is
>     possible.  */
>  
>  static void
> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> index 4dfa2862be3..0c8f547d44e 100644
> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -238,6 +238,7 @@ lto_output_edge (struct lto_simple_output_block *ob, 
> struct cgraph_edge *edge,
>    unsigned int uid;
>    intptr_t ref;
>    struct bitpack_d bp;
> +  unsigned i;
>  
>    if (edge->indirect_unknown_callee)
>      streamer_write_enum (ob->main_stream, LTO_symtab_tags, 
> LTO_symtab_last_tag,
> @@ -296,6 +297,25 @@ lto_output_edge (struct lto_simple_output_block *ob, 
> struct cgraph_edge *edge,
>        if (edge->indirect_info->common_target_id)
>       streamer_write_hwi_stream
>          (ob->main_stream, edge->indirect_info->common_target_probability);
> +
> +      gcc_assert (edge->indirect_info->num_of_ics
> +               <= GCOV_ICALL_TOPN_NCOUNTS / 2);
> +
> +      streamer_write_hwi_stream (ob->main_stream,
> +                              edge->indirect_info->num_of_ics);
> +
> +      if (edge->indirect_info->num_of_ics)
> +     {
> +       for (i = 0; i < edge->indirect_info->num_of_ics; i++)
> +         {
> +           streamer_write_hwi_stream (
> +             ob->main_stream, edge->indirect_info->common_target_ids[i]);
> +           if (edge->indirect_info->common_target_ids[i])
> +             streamer_write_hwi_stream (
> +               ob->main_stream,
> +               edge->indirect_info->common_target_probabilities[i]);
> +         }
> +     }
>      }
>  }
>  
> @@ -1438,6 +1458,7 @@ input_edge (struct lto_input_block *ib, vec<symtab_node 
> *> nodes,
>    cgraph_inline_failed_t inline_failed;
>    struct bitpack_d bp;
>    int ecf_flags = 0;
> +  unsigned i;
>  
>    caller = dyn_cast<cgraph_node *> (nodes[streamer_read_hwi (ib)]);
>    if (caller == NULL || caller->decl == NULL_TREE)
> @@ -1488,6 +1509,23 @@ input_edge (struct lto_input_block *ib, 
> vec<symtab_node *> nodes,
>        edge->indirect_info->common_target_id = streamer_read_hwi (ib);
>        if (edge->indirect_info->common_target_id)
>          edge->indirect_info->common_target_probability = streamer_read_hwi 
> (ib);
> +
> +      edge->indirect_info->num_of_ics = streamer_read_hwi (ib);
> +
> +      gcc_assert (edge->indirect_info->num_of_ics
> +               <= GCOV_ICALL_TOPN_NCOUNTS / 2);
> +
> +      if (edge->indirect_info->num_of_ics)
> +     {
> +       for (i = 0; i < edge->indirect_info->num_of_ics; i++)
> +         {
> +           edge->indirect_info->common_target_ids[i]
> +             = streamer_read_hwi (ib);
> +           if (edge->indirect_info->common_target_ids[i])
> +             edge->indirect_info->common_target_probabilities[i]
> +               = streamer_read_hwi (ib);
> +         }
> +     }
>      }
>  }
>  
> diff --git a/gcc/predict.c b/gcc/predict.c
> index 43ee91a5b13..b7f38891c72 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -763,7 +763,6 @@ dump_prediction (FILE *file, enum br_predictor predictor, 
> int probability,
>        && bb->count.precise_p ()
>        && reason == REASON_NONE)
>      {
> -      gcc_assert (e->count ().precise_p ());
>        fprintf (file, ";;heuristics;%s;%" PRId64 ";%" PRId64 ";%.1f;\n",
>              predictor_info[predictor].name,
>              bb->count.to_gcov_type (), e->count ().to_gcov_type (),
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c 
> b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
> new file mode 100644
> index 00000000000..e0a83c2e067
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
> @@ -0,0 +1,35 @@
> +/* { dg-require-effective-target lto } */
> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param 
> indir-call-topn-profile=1" } */
> +
> +#include <stdio.h>
> +
> +typedef int (*fptr) (int);
> +int
> +one (int a);
> +
> +int
> +two (int a);
> +
> +fptr table[] = {&one, &two};
> +
> +int
> +main()
> +{
> +  int i, x;
> +  fptr p = &one;
> +
> +  x = one (3);
> +
> +  for (i = 0; i < 350000000; i++)
> +    {
> +      x = (*p) (3);
> +      p = table[x];
> +    }
> +  printf ("done:%d\n", x);
> +}
> +
> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct 
> call.* one transformation on insn" "profile_estimate" } } */
> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct 
> call.* two transformation on insn" "profile_estimate" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c 
> b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
> new file mode 100644
> index 00000000000..a8c6e365fb9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
> @@ -0,0 +1,22 @@
> +/* It seems there is no way to avoid the other source of mulitple
> +   source testcase from being compiled independently.  Just avoid
> +   error.  */
> +#ifdef DOJOB
> +int
> +one (int a)
> +{
> +  return 1;
> +}
> +
> +int
> +two (int a)
> +{
> +  return 0;
> +}
> +#else
> +int
> +main()
> +{
> +  return 0;
> +}
> +#endif
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c 
> b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
> new file mode 100644
> index 00000000000..aa3887fde83
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
> @@ -0,0 +1,42 @@
> +/* { dg-require-effective-target lto } */
> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param 
> indir-call-topn-profile=1" } */
> +
> +#include <stdio.h>
> +
> +typedef int (*fptr) (int);
> +int
> +one (int a);
> +
> +int
> +two (int a);
> +
> +fptr table[] = {&one, &two};
> +
> +int foo ()
> +{
> +  int i, x;
> +  fptr p = &one;
> +
> +  x = one (3);
> +
> +  for (i = 0; i < 350000000; i++)
> +    {
> +      x = (*p) (3);
> +      p = table[x];
> +    }
> +  return x;
> +}
> +
> +int
> +main()
> +{
> +  int x = foo ();
> +  printf ("done:%d\n", x);
> +}
> +
> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct 
> call.* one transformation on insn" "profile_estimate" } } */
> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct 
> call.* two transformation on insn" "profile_estimate" } } */
> +
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c 
> b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
> new file mode 100644
> index 00000000000..951bc7ddd19
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
> @@ -0,0 +1,38 @@
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-options "-O2 -fdump-ipa-profile --param indir-call-topn-profile=1" } 
> */
> +
> +#include <stdio.h>
> +
> +typedef int (*fptr) (int);
> +int
> +one (int a)
> +{
> +  return 1;
> +}
> +
> +int
> +two (int a)
> +{
> +  return 0;
> +}
> +
> +fptr table[] = {&one, &two};
> +
> +int
> +main()
> +{
> +  int i, x;
> +  fptr p = &one;
> +
> +  one (3);
> +
> +  for (i = 0; i < 350000000; i++)
> +    {
> +      x = (*p) (3);
> +      p = table[x];
> +    }
> +  printf ("done:%d\n", x);
> +}
> +
> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct 
> call.* one transformation on insn" "profile" } } */
> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct 
> call.* two transformation on insn" "profile" } } */
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index 9017da878b1..f69b31b197e 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -2028,43 +2028,66 @@ copy_bb (copy_body_data *id, basic_block bb,
>             switch (id->transform_call_graph_edges)
>               {
>               case CB_CGE_DUPLICATE:
> -               edge = id->src_node->get_edge (orig_stmt);
> -               if (edge)
> -                 {
> -                   struct cgraph_edge *old_edge = edge;
> -                   profile_count old_cnt = edge->count;
> -                   edge = edge->clone (id->dst_node, call_stmt,
> -                                       gimple_uid (stmt),
> -                                       num, den,
> -                                       true);
> -
> -                   /* Speculative calls consist of two edges - direct and
> -                      indirect.  Duplicate the whole thing and distribute
> -                      frequencies accordingly.  */
> -                   if (edge->speculative)
> -                     {
> -                       struct cgraph_edge *direct, *indirect;
> -                       struct ipa_ref *ref;
> -
> -                       gcc_assert (!edge->indirect_unknown_callee);
> -                       old_edge->speculative_call_info (direct, indirect, 
> ref);
> -
> -                       profile_count indir_cnt = indirect->count;
> -                       indirect = indirect->clone (id->dst_node, call_stmt,
> -                                                   gimple_uid (stmt),
> -                                                   num, den,
> -                                                   true);
> -
> -                       profile_probability prob
> -                          = indir_cnt.probability_in (old_cnt + indir_cnt);
> -                       indirect->count
> -                          = copy_basic_block->count.apply_probability (prob);
> -                       edge->count = copy_basic_block->count - 
> indirect->count;
> -                       id->dst_node->clone_reference (ref, stmt);
> -                     }
> -                   else
> -                     edge->count = copy_basic_block->count;
> -                 }
> +               {
> +                 edge = id->src_node->get_edge (orig_stmt);
> +                 struct cgraph_edge *old_edge = edge;
> +                 struct cgraph_edge *direct, *indirect;
> +                 bool next_speculative;
> +                 do
> +                   {
> +                     next_speculative = false;
> +                     if (edge)
> +                       {
> +                         profile_count old_cnt = edge->count;
> +                         edge
> +                           = edge->clone (id->dst_node, call_stmt,
> +                                          gimple_uid (stmt), num, den, true);
> +
> +                         /* Speculative calls consist of two edges - direct
> +                            and indirect.  Duplicate the whole thing and
> +                            distribute frequencies accordingly.  */
> +                         if (edge->speculative)
> +                           {
> +                             struct ipa_ref *ref;
> +
> +                             gcc_assert (!edge->indirect_unknown_callee);
> +                             old_edge->speculative_call_info (direct,
> +                                                              indirect, ref);
> +
> +                             profile_count indir_cnt = indirect->count;
> +                             indirect
> +                               = indirect->clone (id->dst_node, call_stmt,
> +                                                  gimple_uid (stmt), num,
> +                                                  den, true);
> +
> +                             profile_probability prob
> +                               = indir_cnt.probability_in (old_cnt
> +                                                           + indir_cnt);
> +                             indirect->count
> +                               = copy_basic_block->count.apply_probability (
> +                                 prob);
> +                             edge->count
> +                               = copy_basic_block->count - indirect->count;
> +                             id->dst_node->clone_reference (ref, stmt);
> +                           }
> +                         else
> +                           edge->count = copy_basic_block->count;
> +                       }
> +                     /* If the indirect call contains more than one indirect
> +                        targets, need clone all speculative edges here.  */
> +                     if (old_edge && old_edge->next_callee
> +                         && old_edge->speculative && indirect
> +                         && indirect->indirect_info
> +                         && indirect->indirect_info->num_of_ics > 1)
> +                       {
> +                         edge = old_edge->next_callee;
> +                         old_edge = old_edge->next_callee;
> +                         if (edge->speculative)
> +                           next_speculative = true;
> +                       }
> +                   }
> +                 while (next_speculative);
> +               }
>                 break;
>  
>               case CB_CGE_MOVE_CLONES:
> diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
> index 1c3034aac10..4964dbdebb5 100644
> --- a/gcc/tree-profile.c
> +++ b/gcc/tree-profile.c
> @@ -74,8 +74,8 @@ static GTY(()) tree ic_tuple_callee_field;
>  /* Do initialization work for the edge profiler.  */
>  
>  /* Add code:
> -   __thread gcov*    __gcov_indirect_call_counters; // pointer to actual 
> counter
> -   __thread void*    __gcov_indirect_call_callee; // actual callee address
> +   __thread gcov*    __gcov_indirect_call.counters; // pointer to actual 
> counter
> +   __thread void*    __gcov_indirect_call.callee; // actual callee address
>     __thread int __gcov_function_counter; // time profiler function counter
>  */
>  static void
> @@ -395,7 +395,7 @@ gimple_gen_ic_profiler (histogram_value value, unsigned 
> tag, unsigned base)
>        f_1 = foo;
>        __gcov_indirect_call.counters = &__gcov4.main[0];
>        PROF_9 = f_1;
> -      __gcov_indirect_call_callee = PROF_9;
> +      __gcov_indirect_call.callee = PROF_9;
>        _4 = f_1 ();
>     */
>  
> @@ -458,11 +458,11 @@ gimple_gen_ic_func_profiler (void)
>  
>    /* Insert code:
>  
> -     if (__gcov_indirect_call_callee != NULL)
> +     if (__gcov_indirect_call.callee != NULL)
>         __gcov_indirect_call_profiler_v3 (profile_id, &current_function_decl);
>  
>       The function __gcov_indirect_call_profiler_v3 is responsible for
> -     resetting __gcov_indirect_call_callee to NULL.  */
> +     resetting __gcov_indirect_call.callee to NULL.  */
>  
>    gimple_stmt_iterator gsi = gsi_start_bb (cond_bb);
>    void0 = build_int_cst (ptr_type_node, 0);
> @@ -904,7 +904,7 @@ pass_ipa_tree_profile::gate (function *)
>  {
>    /* When profile instrumentation, use or test coverage shall be performed.
>       But for AutoFDO, this there is no instrumentation, thus this pass is
> -     diabled.  */
> +     disabled.  */
>    return (!in_lto_p && !flag_auto_profile
>         && (flag_branch_probabilities || flag_test_coverage
>             || profile_arc_flag));
> diff --git a/gcc/value-prof.c b/gcc/value-prof.c
> index 5013956cf86..4869ab8ccd6 100644
> --- a/gcc/value-prof.c
> +++ b/gcc/value-prof.c
> @@ -579,8 +579,8 @@ free_histograms (struct function *fn)
>     somehow.  */
>  
>  static bool
> -check_counter (gimple *stmt, const char * name,
> -            gcov_type *count, gcov_type *all, profile_count bb_count_d)
> +check_counter (gimple *stmt, const char *name, gcov_type *count, gcov_type 
> *all,
> +            profile_count bb_count_d, float ratio = 1.0f)
>  {
>    gcov_type bb_count = bb_count_d.ipa ().to_gcov_type ();
>    if (*all != bb_count || *count > *all)
> @@ -599,7 +599,7 @@ check_counter (gimple *stmt, const char * name,
>                               "count (%d)\n", name, (int)*all, (int)bb_count);
>         *all = bb_count;
>         if (*count > *all)
> -            *count = *all;
> +         *count = *all * ratio;
>         return false;
>       }
>        else
> @@ -1410,9 +1410,132 @@ gimple_ic (gcall *icall_stmt, struct cgraph_node 
> *direct_call,
>    return dcall_stmt;
>  }
>  
> +/* If --param=indir-call-topn-profile=1 is specified when compiling, there 
> maybe
> +   multiple indirect targets in histogram.  Check every indirect/virtual call
> +   if callee function exists, if not exit, leave it to LTO stage for later
> +   process.  Modify code of this indirect call to an if-else structure in
> +   ipa-profile finally.  */
> +static bool
> +ic_transform_topn (gimple_stmt_iterator *gsi)
> +{
> +  unsigned j;
> +  gcall *stmt;
> +  histogram_value histogram;
> +  gcov_type val, count, count_all, all, bb_all;
> +  struct cgraph_node *d_call;
> +  profile_count bb_count;
> +
> +  stmt = dyn_cast<gcall *> (gsi_stmt (*gsi));
> +  if (!stmt)
> +    return false;
> +
> +  if (gimple_call_fndecl (stmt) != NULL_TREE)
> +    return false;
> +
> +  if (gimple_call_internal_p (stmt))
> +    return false;
> +
> +  histogram
> +    = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL_TOPN);
> +  if (!histogram)
> +    return false;
> +
> +  count = 0;
> +  all = 0;
> +  bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
> +  bb_count = gimple_bb (stmt)->count;
> +
> +  /* n_counters need be odd to avoid access violation.  */
> +  gcc_assert (histogram->n_counters % 2 == 1);
> +
> +  /* For indirect call topn, accumulate all the counts first.  */
> +  for (j = 1; j < histogram->n_counters; j += 2)
> +    {
> +      val = histogram->hvalue.counters[j];
> +      count = histogram->hvalue.counters[j + 1];
> +      if (val)
> +     all += count;
> +    }
> +
> +  count_all = all;
> +  /* Do the indirect call conversion if function body exists, or else leave 
> it
> +     to LTO stage.  */
> +  for (j = 1; j < histogram->n_counters; j += 2)
> +    {
> +      val = histogram->hvalue.counters[j];
> +      count = histogram->hvalue.counters[j + 1];
> +      if (val)
> +     {
> +       /* The order of CHECK_COUNTER calls is important
> +          since check_counter can correct the third parameter
> +          and we want to make count <= all <= bb_count.  */
> +       if (check_counter (stmt, "ic", &all, &bb_all, bb_count)
> +           || check_counter (stmt, "ic", &count, &all,
> +                             profile_count::from_gcov_type (all),
> +                             (float) count / count_all))
> +         {
> +           gimple_remove_histogram_value (cfun, stmt, histogram);
> +           return false;
> +         }
> +
> +       d_call = find_func_by_profile_id ((int) val);
> +
> +       if (d_call == NULL)
> +         {
> +           if (val)
> +             {
> +               if (dump_file)
> +                 {
> +                   fprintf (
> +                     dump_file,
> +                     "Indirect call -> direct call from other module");
> +                   print_generic_expr (dump_file, gimple_call_fn (stmt),
> +                                       TDF_SLIM);
> +                   fprintf (dump_file,
> +                            "=> %i (will resolve only with LTO)\n",
> +                            (int) val);
> +                 }
> +             }
> +           return false;
> +         }
> +
> +       if (!check_ic_target (stmt, d_call))
> +         {
> +           if (dump_file)
> +             {
> +               fprintf (dump_file, "Indirect call -> direct call ");
> +               print_generic_expr (dump_file, gimple_call_fn (stmt),
> +                                   TDF_SLIM);
> +               fprintf (dump_file, "=> ");
> +               print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
> +               fprintf (dump_file,
> +                        " transformation skipped because of type mismatch");
> +               print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> +             }
> +           gimple_remove_histogram_value (cfun, stmt, histogram);
> +           return false;
> +         }
> +
> +       if (dump_file)
> +       {
> +         fprintf (dump_file, "Indirect call -> direct call ");
> +         print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
> +         fprintf (dump_file, "=> ");
> +         print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
> +         fprintf (dump_file,
> +                  " transformation on insn postponed to ipa-profile");
> +         print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> +         fprintf (dump_file, "hist->count %" PRId64
> +             " hist->all %" PRId64"\n", count, all);
> +       }
> +     }
> +    }
> +
> +  return true;
> +}
>  /*
>    For every checked indirect/virtual call determine if most common pid of
> -  function/class method has probability more than 50%. If yes modify code of
> +  function/class method has probability more than 50%.  If yes modify code of
>    this call to:
>   */
>  
> @@ -1423,6 +1546,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>    histogram_value histogram;
>    gcov_type val, count, all, bb_all;
>    struct cgraph_node *direct_call;
> +  enum hist_type type;
>  
>    stmt = dyn_cast <gcall *> (gsi_stmt (*gsi));
>    if (!stmt)
> @@ -1434,18 +1558,24 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>    if (gimple_call_internal_p (stmt))
>      return false;
>  
> -  histogram = gimple_histogram_value_of_type (cfun, stmt, 
> HIST_TYPE_INDIR_CALL);
> +  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? 
> HIST_TYPE_INDIR_CALL_TOPN
> +                                                  : HIST_TYPE_INDIR_CALL;
> +
> +  histogram = gimple_histogram_value_of_type (cfun, stmt, type);
>    if (!histogram)
>      return false;
>  
> +  if (type == HIST_TYPE_INDIR_CALL_TOPN)
> +      return ic_transform_topn (gsi);
> +
>    val = histogram->hvalue.counters [0];
>    count = histogram->hvalue.counters [1];
>    all = histogram->hvalue.counters [2];
>  
>    bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
> -  /* The order of CHECK_COUNTER calls is important -
> +  /* The order of CHECK_COUNTER calls is important
>       since check_counter can correct the third parameter
> -     and we want to make count <= all <= bb_all. */
> +     and we want to make count <= all <= bb_all.  */
>    if (check_counter (stmt, "ic", &all, &bb_all, gimple_bb (stmt)->count)
>        || check_counter (stmt, "ic", &count, &all,
>                       profile_count::from_gcov_type (all)))
> @@ -1494,7 +1624,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>        print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
>        fprintf (dump_file, "=> ");
>        print_generic_expr (dump_file, direct_call->decl, TDF_SLIM);
> -      fprintf (dump_file, " transformation on insn postponned to 
> ipa-profile");
> +      fprintf (dump_file, " transformation on insn postponed to 
> ipa-profile");
>        print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>        fprintf (dump_file, "hist->count %" PRId64
>              " hist->all %" PRId64"\n", count, all);
> 

Reply via email to