Hi Honza, On Sun, Jun 29, 2025 at 10:45 PM Jan Hubicka <hubi...@ucw.cz> wrote: > > > > > > > > On 24 Jun 2025, at 7:43 pm, Jan Hubicka <hubi...@ucw.cz> wrote: > > > > > > External email: Use caution opening links or attachments > > > > > > > > > Hi, > > > this pass removes early-inlining from afdo pass since all inlining > > > should now happen from early inliner. I tedted this on spec and there > > > are 3 inlines happening here which are blocked at early-inline time by > > > hitting large function growth limit. We probably want to bypass that > > > limit, I will look into that incrementaly. > > > > Thanks for doing this. Is the inlining difference here is due to annotation > > that happens in auto-profile pass in the earlier implementation? > > > > One unrelated question about scaling profiles. We seem to scale-up AFDO > > with and_count_scale and scale down local_profile in some other cases. > > Should we instead scale up AFDO profile to local_profile scale. Lot of the > > inlining and other parameters seem to work well with that. > > Hi, > I implemented simple afdo/fdo profile comparator. I think we will need to > figure out > what to print and what to look for. It currently records the afdo, fdo count > pairs > and then try to scale afdo counts to fdo counts printng differences.
I tried your patch on the current master and when I try running as you mentioned, it looks like PGO counters are not read. I am seeing something like (fdo is always zero): update_stmt.part.0/2212 bb 0 (cold) afdo 3400152 (guessed) scaled 0 (guessed) fdo 0 (precise) diff 0, +0.00% update_stmt.part.0/2212 bb 4 (cold) afdo 3400152 (auto FDO) scaled 0 (guessed) fdo 0 (precise) diff 0, +0.00% Also, the patch didn't apply cleanly to the current master. Is the patch complete? Thanks, Kugan > > It prints info as follows: > > brute/32 bb 60 (cold) afdo 0 (auto FDO) scaled 0 (guessed) fdo 4847 (precise) > diff -4847, -100.00% > preds > succs 59 169 > brute/32 bb 61 (hot) afdo 629997 (auto FDO) scaled 24978219 (guessed) fdo > 9332718 (precise) diff 15645501, +167.64% > preds > succs 59 62 > brute/32 bb 62 (hot) afdo 2753633 (auto FDO) scaled 109176468 (guessed) fdo > 37330872 (precise) diff 71845596, +192.46% > preds > succs 61 68 69 63 > brute/32 bb 63 (hot) afdo 2031304 (auto FDO) scaled 80537456 (guessed) fdo > 27998154 (precise) diff 52539302, +187.65% > preds > succs 62 64 > brute/32 bb 64 (very hot) afdo 2661301 (auto FDO) scaled 105515674 (guessed) > fdo 111992616 (precise) diff -6476942, -5.78% > preds > succs 63 67 68 65 > > I.e. function name, bb index, hotness according to afdo (I added very hot > variant as those we want to look at first), afdo count, > attempt to scale it correctly, fdo count and difference. > > So I think very hot with very large negative diff are first to look at. > > jh@shroud:~/cpu2017/benchspec/CPU/548.exchange2_r/build/build_peak_autofdo-cmp-m64.0000> > grep "(very hot).*, -9[0-9]\\." *.profile | sort -t")" -n -k3 > digits_2/30 bb 248 (very hot) afdo 1106295 (auto FDO) scaled 43862556 > (guessed) fdo 2180591954 (precise) diff -2136729398, -97.99% > digits_2/30 bb 276 (very hot) afdo 7848783 (auto FDO) scaled 311189765 > (guessed) fdo 13764381875 (precise) diff -13453192110, -97.74% > digits_2/30 bb 308 (very hot) afdo 8952263 (auto FDO) scaled 354940711 > (guessed) fdo 10821023213 (precise) diff -10466082502, -96.72% > digits_2/30 bb 340 (very hot) afdo 8148862 (auto FDO) scaled 323087343 > (guessed) fdo 7398657585 (precise) diff -7075570242, -95.63% > digits_2/30 bb 4 (very hot) afdo 9774243 (auto FDO) scaled 387530701 > (guessed) fdo 4039694145 (precise) diff -3652163444, -90.41% > hidden_triplets/8 bb 380 (very hot) afdo 153532 (guessed) scaled 6087261 > (guessed) fdo 162401310 (precise) diff -156314049, -96.25% > hidden_triplets/8 bb 381 (very hot) afdo 136644 (guessed) scaled 5417682 > (guessed) fdo 146161179 (precise) diff -140743497, -96.29% > hidden_triplets/8 bb 383 (very hot) afdo 136644 (guessed) scaled 5417682 > (guessed) fdo 146161179 (precise) diff -140743497, -96.29% > hidden_triplets/8 bb 387 (very hot) afdo 53337 (guessed) scaled 2114714 > (guessed) fdo 162209420 (precise) diff -160094706, -98.70% > hidden_triplets/8 bb 388 (very hot) afdo 47470 (guessed) scaled 1882098 > (guessed) fdo 145988478 (precise) diff -144106380, -98.71% > hidden_triplets/8 bb 390 (very hot) afdo 47470 (guessed) scaled 1882098 > (guessed) fdo 145988478 (precise) diff -144106380, -98.71% > new_solver/9 bb 139 (very hot) afdo 365547 (guessed) scaled 14493264 > (guessed) fdo 399160962 (precise) diff -384667698, -96.37% > new_solver/9 bb 140 (very hot) afdo 345441 (guessed) scaled 13696098 > (guessed) fdo 361455947 (precise) diff -347759849, -96.21% > new_solver/9 bb 142 (very hot) afdo 326442 (guessed) scaled 12942823 > (guessed) fdo 352025127 (precise) diff -339082304, -96.32% > new_solver/9 bb 160 (very hot) afdo 365547 (guessed) scaled 14493264 > (guessed) fdo 397240050 (precise) diff -382746786, -96.35% > new_solver/9 bb 161 (very hot) afdo 328992 (guessed) scaled 13043926 > (guessed) fdo 359535035 (precise) diff -346491109, -96.37% > new_solver/9 bb 163 (very hot) afdo 310897 (guessed) scaled 12326492 > (guessed) fdo 350104215 (precise) diff -337777723, -96.48% > new_solver/9 bb 218 (very hot) afdo 365547 (guessed) scaled 14493264 > (guessed) fdo 399683609 (precise) diff -385190345, -96.37% > new_solver/9 bb 219 (very hot) afdo 345442 (guessed) scaled 13696138 > (guessed) fdo 361977164 (precise) diff -348281026, -96.22% > new_solver/9 bb 221 (very hot) afdo 326442 (guessed) scaled 12942823 > (guessed) fdo 352546094 (precise) diff -339603271, -96.33% > new_solver/9 bb 239 (very hot) afdo 365547 (guessed) scaled 14493264 > (guessed) fdo 397175144 (precise) diff -382681880, -96.35% > new_solver/9 bb 240 (very hot) afdo 328992 (guessed) scaled 13043926 > (guessed) fdo 359468699 (precise) diff -346424773, -96.37% > new_solver/9 bb 242 (very hot) afdo 310898 (guessed) scaled 12326532 > (guessed) fdo 350037629 (precise) diff -337711097, -96.48% > new_solver/9 bb 303 (very hot) afdo 124141 (guessed) scaled 4921962 (guessed) > fdo 112036204 (precise) diff -107114242, -95.61% > new_solver/9 bb 304 (very hot) afdo 117313 (guessed) scaled 4651244 (guessed) > fdo 112036204 (precise) diff -107384960, -95.85% > new_solver/9 bb 334 (very hot) afdo 124141 (guessed) scaled 4921962 (guessed) > fdo 112036204 (precise) diff -107114242, -95.61% > new_solver/9 bb 335 (very hot) afdo 117313 (guessed) scaled 4651244 (guessed) > fdo 112036204 (precise) diff -107384960, -95.85% > new_solver/9 bb 369 (very hot) afdo 124141 (guessed) scaled 4921962 (guessed) > fdo 113795352 (precise) diff -108873390, -95.67% > new_solver/9 bb 370 (very hot) afdo 117313 (guessed) scaled 4651244 (guessed) > fdo 113795352 (precise) diff -109144108, -95.91% > new_solver/9 bb 400 (very hot) afdo 124141 (guessed) scaled 4921962 (guessed) > fdo 113795352 (precise) diff -108873390, -95.67% > new_solver/9 bb 401 (very hot) afdo 117313 (guessed) scaled 4651244 (guessed) > fdo 113795352 (precise) diff -109144108, -95.91% > > One needs to do 3 builds > 1) build with -g to train auto-fdo + train run > 2) build with -fauto-profile -fprofile-generate to produce instrumentiton + > train run > 3) build with -fauto-profile -fprofile-use -fdump-tree-profile to produce > stats. > > I use: > > OPTIMIZE = -Ofast -fopt-info-optimized-optall -march=native -mtune=native > -Wno-error=implicit-int -Wno-error=implicit-function-declaration > -Wno-error=declaration-missing-parameter-type -Wno-error=return-mismatch > -Wno-error=int-conversion #-flto=auto > #OPTIMIZE = -Ofast -g -fopt-info-optimized-optall -march=native > -mtune=native --param max-peel-times=128 --param max-peel-branches=256 > --param max-peeled-insns=1024 --param profile-histogram-size-lin=129 > > fdo_pre0 = rm -rf ${benchmark}.data ${benchmark}.gcov; \\ > > fdo_run1 = perf record -e ex_ret_brn_tkn:Pu -c 100000 -b -o ${benchmark}.data > -- ${command}; \\ > create_gcov --binary=${baseexe} --profile=${benchmark}.data > --gcov=current.gcov -gcov_version=2; \\ > if test -e ${benchmark}.gcov ; then profile_merger current.gcov > ${benchmark}.gcov --output_file ${benchmark}.gcov ; else mv current.gcov > ${benchmark}.gcov ; fi \\ > > PASS1_OPTIMIZE = -g -fno-reorder-blocks-and-partition -fno-ipa-icf -fno-lto > PASS2_OPTIMIZE = -fauto-profile=${benchmark}.gcov -fprofile-generate -g > PASS3_OPTIMIZE = -fauto-profile=${benchmark}.gcov > -fdump-ipa-all-details-blocks -fdump-tree-optimized-details-blocks > -fdump-tree-einline-details -g -fprofile-use > > I am flying from China today so will see how much I can play around with the > patch. > Any improvements would be welcome. > > gcc/ChangeLog: > > * auto-profile.cc > (autofdo_source_profile::offline_external_functions): Add > missing newline. > (afdo_annotate_cfg): Update max bb count. > * coverage.cc (coverage_init): Add AUTO_PROFILE parameter. > * coverage.h (coverage_init): Update prototype. > * passes.cc (finish_optimization_passes): Do not call end_branch_prob. > (pass_manager::dump_profile_report): Also watch for afdo. > * profile.cc (struct afdo_fdo_record): New structure. > (compute_branch_probabilities): Record info about afdo/fdo compare > (end_branch_prob): Print the records. > * toplev.cc (do_compile): Initialize both FDO and AFDO if requested. > * tree-profile.cc (tree_profiling): End profiling > (pass_ipa_tree_profile::gate): Also run with auto-profile. > > gcc/lto/ChangeLog: > > * lto-symtab.cc (lto_symtab_merge_symbols_1): > > diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc > index 44e7faa8fee..b6dd552acab 100644 > --- a/gcc/auto-profile.cc > +++ b/gcc/auto-profile.cc > @@ -1308,7 +1308,7 @@ autofdo_source_profile::offline_external_functions () > if (dump_file) > fprintf (dump_file, > "string table in auto-profile contains" > - " duplicated name %s", n1); > + " duplicated name %s\n", n1); > to_symbol_name.put (i, index); > } > continue; > @@ -3119,6 +3119,7 @@ afdo_annotate_cfg (void) > FOR_ALL_BB_FN (bb, cfun) > if (bb->count.quality () == GUESSED_LOCAL) > bb->count = bb->count.global0afdo (); > + update_max_bb_count (); > > loop_optimizer_finalize (); > free_dominance_info (CDI_DOMINATORS); > diff --git a/gcc/coverage.cc b/gcc/coverage.cc > index c0ae76a40ef..af9091c4561 100644 > --- a/gcc/coverage.cc > +++ b/gcc/coverage.cc > @@ -1251,7 +1251,7 @@ coverage_obj_finish (vec<constructor_elt, va_gc> *ctor, > of notes file. */ > > void > -coverage_init (const char *filename) > +coverage_init (const char *filename, bool auto_profile) > { > /* If we are in LTO, the profile will be read from object files. */ > if (in_lto_p) > @@ -1315,7 +1315,7 @@ coverage_init (const char *filename) > strcpy (da_file_name + prefix_len + len, GCOV_DATA_SUFFIX); > > bbg_file_stamp = local_tick; > - if (flag_auto_profile) > + if (auto_profile) > read_autofdo_file (); > else if (flag_branch_probabilities) > read_counts_file (); > diff --git a/gcc/coverage.h b/gcc/coverage.h > index 9afbb605482..1f4b521bdea 100644 > --- a/gcc/coverage.h > +++ b/gcc/coverage.h > @@ -22,7 +22,7 @@ along with GCC; see the file COPYING3. If not see > > #include "gcov-io.h" > > -extern void coverage_init (const char *); > +extern void coverage_init (const char *, bool); > extern void coverage_finish (void); > extern void coverage_remove_note_file (void); > > diff --git a/gcc/passes.cc b/gcc/passes.cc > index 6c67ffe56ba..a33c8d924a5 100644 > --- a/gcc/passes.cc > +++ b/gcc/passes.cc > @@ -355,13 +355,6 @@ finish_optimization_passes (void) > gcc::dump_manager *dumps = m_ctxt->get_dumps (); > > timevar_push (TV_DUMP); > - if (coverage_instrumentation_p () || flag_test_coverage > - || flag_branch_probabilities) > - { > - dumps->dump_start (m_pass_profile_1->static_pass_number, NULL); > - end_branch_prob (); > - dumps->dump_finish (m_pass_profile_1->static_pass_number); > - } > > /* Do whatever is necessary to finish printing the graphs. */ > for (i = TDI_end; (dfi = dumps->get_dump_file_info (i)) != NULL; ++i) > @@ -2036,6 +2029,7 @@ pass_manager::dump_profile_report () const > fprintf (dump_file, "| %12.0f", profile_record[i].time); > /* Time units changes with profile estimate and feedback. */ > if (i == m_pass_profile_1->static_pass_number > + || i == m_pass_ipa_auto_profile_1->static_pass_number > || i == m_pass_ipa_tree_profile_1->static_pass_number) > fprintf (dump_file, "-------------"); > else if (rel_time_change) > diff --git a/gcc/profile.cc b/gcc/profile.cc > index 6234dd2d4e2..ecfc9bdc254 100644 > --- a/gcc/profile.cc > +++ b/gcc/profile.cc > @@ -423,6 +423,20 @@ cmp_stats (const void *ptr1, const void *ptr2) > return 0; > } > > +struct afdo_fdo_record > +{ > + cgraph_node *node; > + struct bb_record > + { > + int index; > + profile_count afdo; > + profile_count fdo; > + vec <int> preds; > + vec <int> succs; > + }; > + vec <bb_record> bbs; > +}; > +static vec <afdo_fdo_record> afdo_fdo_records; > > /* Compute the branch probabilities for the various branches. > Annotate them accordingly. > @@ -472,6 +486,22 @@ compute_branch_probabilities (unsigned cfg_checksum, > unsigned lineno_checksum) > BB_INFO (EXIT_BLOCK_PTR_FOR_FN (cfun))->succ_count = 2; > BB_INFO (ENTRY_BLOCK_PTR_FOR_FN (cfun))->pred_count = 2; > > + afdo_fdo_record record = {cgraph_node::get (current_function_decl), > vNULL};; > + if (dump_file && flag_auto_profile) > + { > + FOR_ALL_BB_FN (bb, cfun) > + { > + record.bbs.safe_push ({bb->index, bb->count.ipa (), > + profile_count::uninitialized (), vNULL, > vNULL}); > + record.bbs.last ().preds.reserve (EDGE_COUNT (bb->preds)); > + for (auto &e : bb->preds) > + record.bbs.last ().succs.safe_push (e->src->index); > + record.bbs.last ().succs.reserve (EDGE_COUNT (bb->succs)); > + for (auto &e : bb->succs) > + record.bbs.last ().succs.safe_push (e->dest->index); > + } > + } > + > num_edges = read_profile_edge_counts (exec_counts); > > if (dump_file) > @@ -744,7 +774,6 @@ compute_branch_probabilities (unsigned cfg_checksum, > unsigned lineno_checksum) > num_branches++; > } > } > - > if (exec_counts > && (bb_gcov_count (ENTRY_BLOCK_PTR_FOR_FN (cfun)) > || !flag_profile_partial_training)) > @@ -812,6 +841,18 @@ compute_branch_probabilities (unsigned cfg_checksum, > unsigned lineno_checksum) > delete edge_gcov_counts; > edge_gcov_counts = NULL; > > + if (dump_file && flag_auto_profile) > + { > + int i = 0; > + FOR_ALL_BB_FN (bb, cfun) > + { > + gcc_checking_assert (record.bbs[i].index == bb->index); > + record.bbs[i].fdo = bb->count.ipa(); > + i++; > + } > + afdo_fdo_records.safe_push (record); > + } > + > update_max_bb_count (); > > if (dump_file) > @@ -1804,6 +1845,49 @@ end_branch_prob (void) > } > fprintf (dump_file, "Total number of conditions: %d\n", > total_num_conds); > + if (afdo_fdo_records.length ()) > + { > + profile_count fdo_sum = profile_count::zero (); > + profile_count afdo_sum = profile_count::zero (); > + for (const auto &r : afdo_fdo_records) > + for (const auto &b : r.bbs) > + if (b.fdo.initialized_p () && b.afdo.initialized_p ()) > + { > + fdo_sum += b.fdo; > + afdo_sum += b.afdo; > + } > + for (auto &r : afdo_fdo_records) > + { > + for (auto &b : r.bbs) > + if (b.fdo.initialized_p () && b.afdo.initialized_p ()) > + { > + profile_count scaled = b.afdo.apply_scale (fdo_sum, > afdo_sum); > + fprintf (dump_file, "%s bb %i (%s) afdo ", > r.node->dump_name (), b.index, > + maybe_hot_count_p (NULL, b.fdo.apply_scale (1, > 1000)) ? "very hot" > + : maybe_hot_count_p (NULL, b.fdo) ? "hot" : > "cold"); > + b.afdo.dump (dump_file); > + fprintf (dump_file, " scaled "); > + scaled.dump (dump_file); > + fprintf (dump_file, " fdo "); > + b.fdo.dump (dump_file); > + fprintf (dump_file, " diff %" PRId64 ", %+2.2f%%\n", > + scaled.to_gcov_type () - b.fdo.to_gcov_type (), > + (scaled.to_gcov_type () - b.fdo.to_gcov_type ()) > * 100.0 > + / MAX (b.fdo.to_gcov_type (), 1)); > + fprintf (dump_file, " preds"); > + for (int val : b.preds) > + fprintf (dump_file, " %i", val); > + b.preds.release (); > + fprintf (dump_file, "\n succs"); > + for (int val : b.succs) > + fprintf (dump_file, " %i", val); > + b.succs.release (); > + fprintf (dump_file, "\n"); > + } > + r.bbs.release (); > + } > + } > + afdo_fdo_records.release (); > } > } > > diff --git a/gcc/toplev.cc b/gcc/toplev.cc > index 00a8ccb7a69..e6eba05ab0b 100644 > --- a/gcc/toplev.cc > +++ b/gcc/toplev.cc > @@ -2197,7 +2197,9 @@ do_compile () > > symtab->initialize (); > init_final (main_input_filename); > - coverage_init (aux_base_name); > + coverage_init (aux_base_name, false); > + if (flag_auto_profile) > + coverage_init (aux_base_name, true); > statistics_init (); > debuginfo_init (); > invoke_plugin_callbacks (PLUGIN_START_UNIT, NULL); > diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc > index fed218eb60b..fe20e84838d 100644 > --- a/gcc/tree-profile.cc > +++ b/gcc/tree-profile.cc > @@ -2031,6 +2031,7 @@ tree_profiling (void) > handle_missing_profiles (); > > del_node_map (); > + end_branch_prob (); > return 0; > } > > @@ -2065,10 +2066,8 @@ public: > bool > pass_ipa_tree_profile::gate (function *) > { > - /* When profile instrumentation, use or test coverage shall be performed. > - But for AutoFDO, this there is no instrumentation, thus this pass is > - disabled. */ > - return (!in_lto_p && !flag_auto_profile > + /* When profile instrumentation, use or test coverage shall be performed. > */ > + return (!in_lto_p > && (flag_branch_probabilities || flag_test_coverage > || coverage_instrumentation_p ()) > && !seen_error ());