> Hi,
> 
> This patch implements the fine-graind AutoFDO optimizations for GCC.
> It uses linux perf to collect sample profiles, and uses debug info to
> represent the profile. In GCC, it uses the profile to annotate CFG to
> drive FDO. This can bring 50% to 110% of the speedup derived by
> traditional instrumentation based FDO. (Average is between 70% to 80%
> for many CPU intensive applications). Comparing with traditional FDO,
> AutoFDO does not require instrumentation. It just need to have an
> optimized binary with debug info to collect the profile.
> 
> This patch has passed bootstrap and gcc regression tests as well as
> tested with crosstool. Okay for google branches?
> 
> If people in up-stream find this feature interesting, I'll spend some
> time to port this to trunk and try to opensource the tool to generate
> profile data file.

I think it is useful feature, yes (and was in my TODO list for quite some
time). Unlike edge profiles, these profiles should be also more independent of
source code/configuration changes.

Just few quick questions from first glance over the patch...
> 
> Dehao
> 
> The patch can also be viewed from:
> 
> http://codereview.appspot.com/6567079
> 
> gcc/ChangeLog.google-4_7:
> 2012-09-28  Dehao Chen  <de...@dehao.com>
> 
> * cgraphbuild.c (build_cgraph_edges): Handle AutoFDO profile.
> (rebuild_cgraph_edges): Likewise.
> * cgraph.c (cgraph_clone_node): Likewise.
> (clone_function_name): Likewise.
> * cgraph.h (cgraph_node): New field.
> * tree-pass.h (pass_ipa_auto_profile): New pass.
> * cfghooks.c (make_forwarder_block): Handle AutoFDO profile.
> * ipa-inline-transform.c (clone_inlined_nodes): Likewise.
> * toplev.c (compile_file): Likewise.
> (process_options): Likewise.
> * debug.h (auto_profile_debug_hooks): New.
> * cgraphunit.c (cgraph_finalize_compilation_unit): Handle AutoFDO
> profile.
> (cgraph_copy_node_for_versioning): Likewise.
> * regs.h (REG_FREQ_FROM_BB): Likewise.
> * gcov-io.h: (GCOV_TAG_AFDO_FILE_NAMES): New.
> (GCOV_TAG_AFDO_FUNCTION): New.
> (GCOV_TAG_AFDO_MODULE_GROUPING): New.
> * ira-int.h (REG_FREQ_FROM_EDGE_FREQ): Handle AutoFDO profile.
> * ipa-inline.c (edge_hot_enough_p): Likewise.
> (edge_badness): Likewise.
> (inline_small_functions): Likewise.
> * dwarf2out.c (auto_profile_debug_hooks): New.
> * opts.c (common_handle_option): Handle AutoFDO profile.
> * timevar.def (TV_IPA_AUTOFDO): New.
> * predict.c (compute_function_frequency): Handle AutoFDO profile.
> (rebuild_frequencies): Handle AutoFDO profile.
> * auto-profile.c (struct gcov_callsite_pos): New.
> (struct gcov_callsite): New.
> (struct gcov_stack): New.
> (struct gcov_function): New.
> (struct afdo_bfd_name): New.
> (struct afdo_module): New.
> (afdo_get_filename): New.
> (afdo_get_original_name_size): New.
> (afdo_get_bfd_name): New.
> (afdo_read_bfd_names): New.
> (afdo_stack_hash): New.
> (afdo_stack_eq): New.
> (afdo_function_hash): New.
> (afdo_function_eq): New.
> (afdo_bfd_name_hash): New.
> (afdo_bfd_name_eq): New.
> (afdo_bfd_name_del): New.
> (afdo_module_hash): New.
> (afdo_module_eq): New.
> (afdo_module_num_strings): New.
> (afdo_add_module): New.
> (read_aux_modules): New.
> (get_inline_stack_size_by_stmt): New.
> (get_inline_stack_size_by_edge): New.
> (get_function_name_from_block): New.
> (get_inline_stack_by_stmt): New.
> (get_inline_stack_by_edge): New.
> (afdo_get_function_count): New.
> (afdo_set_current_function_count): New.
> (afdo_add_bfd_name_mapping): New.
> (afdo_add_copy_scale): New.
> (get_stack_count): New.
> (get_stmt_count): New.
> (afdo_get_callsite_count): New.
> (afdo_get_bb_count): New.
> (afdo_annotate_cfg): New.
> (read_profile): New.
> (process_auto_profile): New.
> (init_auto_profile): New.
> (end_auto_profile): New.
> (afdo_find_equiv_class): New.
> (afdo_propagate_single_edge): New.
> (afdo_propagate_multi_edge): New.
> (afdo_propagate_circuit): New.
> (afdo_propagate): New.
> (afdo_calculate_branch_prob): New.
> (auto_profile): New.
> (gate_auto_profile_ipa): New.
> (struct simple_ipa_opt_pass): New.
> * auto-profile.h (init_auto_profile): New.
> (end_auto_profile): New.
> (process_auto_profile): New.
> (afdo_set_current_function_count): New.
> (afdo_add_bfd_name_mapping): New.
> (afdo_add_copy_scale): New.
> (afdo_calculate_branch_prob): New.
> (afdo_get_callsite_count): New.
> (afdo_get_bb_count): New.
> * profile.c (compute_branch_probabilities): Handle AutoFDO profile.
> (branch_prob): Likeise.
> * loop-unroll.c (decide_unroll_runtime_iterations): Likewise.
> * coverage.c (coverage_init): Likewise.
> * tree-ssa-live.c (remove_unused_scope_block_p): Likewise.
> * common.opt (fauto-profile): New.
> * tree-inline.c (copy_bb): Handle AutoFDO profile.
> (copy_edges_for_bb): Likewise.
> (copy_cfg_body): Likewise.
> * tree-profile.c (direct_call_profiling): Likewise.
> (gate_tree_profile_ipa): Likewise.
> * basic-block.h (EDGE_ANNOTATED): New field.
> (BB_ANNOTATED): New field.
> * tree-cfg.c (gimple_merge_blocks): Handle AutoFDO profile.
> * passes.c (init_optimization_passes): Handle AutoFDO profile.

> Index: gcc/cgraphbuild.c
> ===================================================================
> --- gcc/cgraphbuild.c (revision 191813)
> +++ gcc/cgraphbuild.c (working copy)
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "except.h"
>  #include "l-ipo.h"
>  #include "ipa-inline.h"
> +#include "auto-profile.h"
>  
>  /* Context of record_reference.  */
>  struct record_reference_ctx
> @@ -497,6 +498,9 @@ build_cgraph_edges (void)
>    tree decl;
>    unsigned ix;
>  
> +  if (flag_auto_profile)
> +    afdo_set_current_function_count ();
> +
>    /* Create the callgraph edges and record the nodes referenced by the 
> function.
>       body.  */
>    FOR_EACH_BB (bb)
> @@ -607,8 +611,9 @@ rebuild_cgraph_edges (void)
>    cgraph_node_remove_callees (node);
>    ipa_remove_all_references (&node->ref_list);
>  
> -  node->count = ENTRY_BLOCK_PTR->count;
> -  node->max_bb_count = 0;
> +  if (!flag_auto_profile)
> +    node->count = ENTRY_BLOCK_PTR->count;
> +  node->max_bb_count = node->count;

We probably could read profile at the same time we read edge profiles avoiding
need to maintain in across cgrpah build/rebuilds?
> @@ -2268,6 +2276,9 @@ clone_function_name (tree decl, const char *suffix
>    prefix[len] = '_';
>  #endif
>    ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
> +  if (flag_auto_profile)
> +    afdo_add_bfd_name_mapping (xstrdup (tmp_name),
> +                            xstrdup (lang_hooks.dwarf_name (decl, 0)));

You probably want to unify this with lto_record_renamed_decl.
> Index: gcc/cfghooks.c
> ===================================================================
> --- gcc/cfghooks.c    (revision 191813)
> +++ gcc/cfghooks.c    (working copy)
> @@ -775,6 +775,19 @@ make_forwarder_block (basic_block bb, bool (*redir
>          }
>      }
>  
> +  if (flag_auto_profile)
> +    {
> +      dummy->frequency = 0;
> +      dummy->count = 0;
> +      for (ei = ei_start (dummy->preds); (e = ei_safe_edge (ei)); ei_next 
> (&ei))
> +     {
> +       dummy->frequency += EDGE_FREQUENCY (e);
> +       dummy->count += e->count;
> +     }
> +      if (dummy->frequency > REG_BR_PROB_BASE)
> +     dummy->frequency = REG_BR_PROB_BASE;
> +    }
> +

I do not see why the profiles are different here?
> @@ -478,6 +480,9 @@ edge_hot_enough_p (struct cgraph_edge *edge)
>  {
>    if (cgraph_maybe_hot_edge_p (edge))
>      return true;
> +  if (flag_auto_profile
> +      && maybe_hot_count_p (afdo_get_callsite_count (edge)))
> +    return true;

Why the edge counts and efdo counts are not the same?

Honza

Reply via email to