> Hi, > > This patch implements the fine-graind AutoFDO optimizations for GCC. > It uses linux perf to collect sample profiles, and uses debug info to > represent the profile. In GCC, it uses the profile to annotate CFG to > drive FDO. This can bring 50% to 110% of the speedup derived by > traditional instrumentation based FDO. (Average is between 70% to 80% > for many CPU intensive applications). Comparing with traditional FDO, > AutoFDO does not require instrumentation. It just need to have an > optimized binary with debug info to collect the profile. > > This patch has passed bootstrap and gcc regression tests as well as > tested with crosstool. Okay for google branches? > > If people in up-stream find this feature interesting, I'll spend some > time to port this to trunk and try to opensource the tool to generate > profile data file.
I think it is useful feature, yes (and was in my TODO list for quite some time). Unlike edge profiles, these profiles should be also more independent of source code/configuration changes. Just few quick questions from first glance over the patch... > > Dehao > > The patch can also be viewed from: > > http://codereview.appspot.com/6567079 > > gcc/ChangeLog.google-4_7: > 2012-09-28 Dehao Chen <de...@dehao.com> > > * cgraphbuild.c (build_cgraph_edges): Handle AutoFDO profile. > (rebuild_cgraph_edges): Likewise. > * cgraph.c (cgraph_clone_node): Likewise. > (clone_function_name): Likewise. > * cgraph.h (cgraph_node): New field. > * tree-pass.h (pass_ipa_auto_profile): New pass. > * cfghooks.c (make_forwarder_block): Handle AutoFDO profile. > * ipa-inline-transform.c (clone_inlined_nodes): Likewise. > * toplev.c (compile_file): Likewise. > (process_options): Likewise. > * debug.h (auto_profile_debug_hooks): New. > * cgraphunit.c (cgraph_finalize_compilation_unit): Handle AutoFDO > profile. > (cgraph_copy_node_for_versioning): Likewise. > * regs.h (REG_FREQ_FROM_BB): Likewise. > * gcov-io.h: (GCOV_TAG_AFDO_FILE_NAMES): New. > (GCOV_TAG_AFDO_FUNCTION): New. > (GCOV_TAG_AFDO_MODULE_GROUPING): New. > * ira-int.h (REG_FREQ_FROM_EDGE_FREQ): Handle AutoFDO profile. > * ipa-inline.c (edge_hot_enough_p): Likewise. > (edge_badness): Likewise. > (inline_small_functions): Likewise. > * dwarf2out.c (auto_profile_debug_hooks): New. > * opts.c (common_handle_option): Handle AutoFDO profile. > * timevar.def (TV_IPA_AUTOFDO): New. > * predict.c (compute_function_frequency): Handle AutoFDO profile. > (rebuild_frequencies): Handle AutoFDO profile. > * auto-profile.c (struct gcov_callsite_pos): New. > (struct gcov_callsite): New. > (struct gcov_stack): New. > (struct gcov_function): New. > (struct afdo_bfd_name): New. > (struct afdo_module): New. > (afdo_get_filename): New. > (afdo_get_original_name_size): New. > (afdo_get_bfd_name): New. > (afdo_read_bfd_names): New. > (afdo_stack_hash): New. > (afdo_stack_eq): New. > (afdo_function_hash): New. > (afdo_function_eq): New. > (afdo_bfd_name_hash): New. > (afdo_bfd_name_eq): New. > (afdo_bfd_name_del): New. > (afdo_module_hash): New. > (afdo_module_eq): New. > (afdo_module_num_strings): New. > (afdo_add_module): New. > (read_aux_modules): New. > (get_inline_stack_size_by_stmt): New. > (get_inline_stack_size_by_edge): New. > (get_function_name_from_block): New. > (get_inline_stack_by_stmt): New. > (get_inline_stack_by_edge): New. > (afdo_get_function_count): New. > (afdo_set_current_function_count): New. > (afdo_add_bfd_name_mapping): New. > (afdo_add_copy_scale): New. > (get_stack_count): New. > (get_stmt_count): New. > (afdo_get_callsite_count): New. > (afdo_get_bb_count): New. > (afdo_annotate_cfg): New. > (read_profile): New. > (process_auto_profile): New. > (init_auto_profile): New. > (end_auto_profile): New. > (afdo_find_equiv_class): New. > (afdo_propagate_single_edge): New. > (afdo_propagate_multi_edge): New. > (afdo_propagate_circuit): New. > (afdo_propagate): New. > (afdo_calculate_branch_prob): New. > (auto_profile): New. > (gate_auto_profile_ipa): New. > (struct simple_ipa_opt_pass): New. > * auto-profile.h (init_auto_profile): New. > (end_auto_profile): New. > (process_auto_profile): New. > (afdo_set_current_function_count): New. > (afdo_add_bfd_name_mapping): New. > (afdo_add_copy_scale): New. > (afdo_calculate_branch_prob): New. > (afdo_get_callsite_count): New. > (afdo_get_bb_count): New. > * profile.c (compute_branch_probabilities): Handle AutoFDO profile. > (branch_prob): Likeise. > * loop-unroll.c (decide_unroll_runtime_iterations): Likewise. > * coverage.c (coverage_init): Likewise. > * tree-ssa-live.c (remove_unused_scope_block_p): Likewise. > * common.opt (fauto-profile): New. > * tree-inline.c (copy_bb): Handle AutoFDO profile. > (copy_edges_for_bb): Likewise. > (copy_cfg_body): Likewise. > * tree-profile.c (direct_call_profiling): Likewise. > (gate_tree_profile_ipa): Likewise. > * basic-block.h (EDGE_ANNOTATED): New field. > (BB_ANNOTATED): New field. > * tree-cfg.c (gimple_merge_blocks): Handle AutoFDO profile. > * passes.c (init_optimization_passes): Handle AutoFDO profile. > Index: gcc/cgraphbuild.c > =================================================================== > --- gcc/cgraphbuild.c (revision 191813) > +++ gcc/cgraphbuild.c (working copy) > @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. If not see > #include "except.h" > #include "l-ipo.h" > #include "ipa-inline.h" > +#include "auto-profile.h" > > /* Context of record_reference. */ > struct record_reference_ctx > @@ -497,6 +498,9 @@ build_cgraph_edges (void) > tree decl; > unsigned ix; > > + if (flag_auto_profile) > + afdo_set_current_function_count (); > + > /* Create the callgraph edges and record the nodes referenced by the > function. > body. */ > FOR_EACH_BB (bb) > @@ -607,8 +611,9 @@ rebuild_cgraph_edges (void) > cgraph_node_remove_callees (node); > ipa_remove_all_references (&node->ref_list); > > - node->count = ENTRY_BLOCK_PTR->count; > - node->max_bb_count = 0; > + if (!flag_auto_profile) > + node->count = ENTRY_BLOCK_PTR->count; > + node->max_bb_count = node->count; We probably could read profile at the same time we read edge profiles avoiding need to maintain in across cgrpah build/rebuilds? > @@ -2268,6 +2276,9 @@ clone_function_name (tree decl, const char *suffix > prefix[len] = '_'; > #endif > ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); > + if (flag_auto_profile) > + afdo_add_bfd_name_mapping (xstrdup (tmp_name), > + xstrdup (lang_hooks.dwarf_name (decl, 0))); You probably want to unify this with lto_record_renamed_decl. > Index: gcc/cfghooks.c > =================================================================== > --- gcc/cfghooks.c (revision 191813) > +++ gcc/cfghooks.c (working copy) > @@ -775,6 +775,19 @@ make_forwarder_block (basic_block bb, bool (*redir > } > } > > + if (flag_auto_profile) > + { > + dummy->frequency = 0; > + dummy->count = 0; > + for (ei = ei_start (dummy->preds); (e = ei_safe_edge (ei)); ei_next > (&ei)) > + { > + dummy->frequency += EDGE_FREQUENCY (e); > + dummy->count += e->count; > + } > + if (dummy->frequency > REG_BR_PROB_BASE) > + dummy->frequency = REG_BR_PROB_BASE; > + } > + I do not see why the profiles are different here? > @@ -478,6 +480,9 @@ edge_hot_enough_p (struct cgraph_edge *edge) > { > if (cgraph_maybe_hot_edge_p (edge)) > return true; > + if (flag_auto_profile > + && maybe_hot_count_p (afdo_get_callsite_count (edge))) > + return true; Why the edge counts and efdo counts are not the same? Honza