Hi, this pass removes early-inlining from afdo pass since all inlining should now happen from early inliner. I tedted this on spec and there are 3 inlines happening here which are blocked at early-inline time by hitting large function growth limit. We probably want to bypass that limit, I will look into that incrementaly.
This should make the non-inlined function profile merging hopefully easier. It may still make sense to separate afdo inliner from early inliner to solve the non-transitivity issues which is not that hard to do with current code orgnaization. However this should be separate IPA pass rather then another part of afdo pass, since it can be coneptually separate. Boostrapped/regtested x86_64-linux, will commit it shortly. Honza gcc/ChangeLog: * auto-profile.cc: Update toplevel comment. (early_inline): Remove. (auto_profile): Don't do early inlining. diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index 8a1d9f878c6..3f8310e6324 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -76,21 +76,30 @@ along with GCC; see the file COPYING3. If not see standalone symbol, or a clone of a function that is inlined into another function. - Phase 2: Early inline + value profile transformation. - Early inline uses autofdo_source_profile to find if a callsite is: + Phase 2: AFDO inline + value profile transformation. + This happens during early optimization. + During early inlning AFDO inliner is executed which + uses autofdo_source_profile to find if a callsite is: * inlined in the profiled binary. * callee body is hot in the profiling run. If both condition satisfies, early inline will inline the callsite regardless of the code growth. - Phase 2 is an iterative process. During each iteration, we also check - if an indirect callsite is promoted and inlined in the profiling run. - If yes, vpt will happen to force promote it and in the next iteration, - einline will inline the promoted callsite in the next iteration. + + Performing this early has benefit of doing early optimizations + before read IPA passe and getting more "context sensitivity" of + the profile read. Profile of inlined functions may differ + significantly form one inline instance to another and from the + offline version. + + This is controlled by -fauto-profile-inlinig and is independent + of -fearly-inlining. Phase 3: Annotate control flow graph. AutoFDO uses a separate pass to: * Annotate basic block count * Estimate branch probability + * Use earlier static profile to fill in the gaps + if AFDO profile is ambigous After the above 3 phases, all profile is readily annotated on the GCC IR. AutoFDO tries to reuse all FDO infrastructure as much as possible to make @@ -2217,18 +2226,6 @@ afdo_annotate_cfg (void) free_dominance_info (CDI_POST_DOMINATORS); } -/* Wrapper function to invoke early inliner. */ - -static unsigned int -early_inline () -{ - compute_fn_summary (cgraph_node::get (current_function_decl), true); - unsigned int todo = early_inliner (cfun); - if (todo & TODO_update_ssa_any) - update_ssa (TODO_update_ssa); - return todo; -} - /* Use AutoFDO profile to annoate the control flow graph. Return the todo flag. */ @@ -2254,15 +2251,9 @@ auto_profile (void) push_cfun (DECL_STRUCT_FUNCTION (node->decl)); - unsigned int todo = early_inline (); autofdo::afdo_annotate_cfg (); compute_function_frequency (); - /* Local pure-const may imply need to fixup the cfg. */ - todo |= execute_fixup_cfg (); - if (todo & TODO_cleanup_cfg) - cleanup_tree_cfg (); - free_dominance_info (CDI_DOMINATORS); free_dominance_info (CDI_POST_DOMINATORS); cgraph_edge::rebuild_edges ();