> > > > On 24 Jun 2025, at 7:43 pm, Jan Hubicka <hubi...@ucw.cz> wrote: > > > > External email: Use caution opening links or attachments > > > > > > Hi, > > this pass removes early-inlining from afdo pass since all inlining > > should now happen from early inliner. I tedted this on spec and there > > are 3 inlines happening here which are blocked at early-inline time by > > hitting large function growth limit. We probably want to bypass that > > limit, I will look into that incrementaly. > > Thanks for doing this. Is the inlining difference here is due to annotation > that happens in auto-profile pass in the earlier implementation? > > One unrelated question about scaling profiles. We seem to scale-up AFDO with > and_count_scale and scale down local_profile in some other cases. Should we > instead scale up AFDO profile to local_profile scale. Lot of the inlining and > other parameters seem to work well with that.
Hi, I implemented simple afdo/fdo profile comparator. I think we will need to figure out what to print and what to look for. It currently records the afdo, fdo count pairs and then try to scale afdo counts to fdo counts printng differences. It prints info as follows: brute/32 bb 60 (cold) afdo 0 (auto FDO) scaled 0 (guessed) fdo 4847 (precise) diff -4847, -100.00% preds succs 59 169 brute/32 bb 61 (hot) afdo 629997 (auto FDO) scaled 24978219 (guessed) fdo 9332718 (precise) diff 15645501, +167.64% preds succs 59 62 brute/32 bb 62 (hot) afdo 2753633 (auto FDO) scaled 109176468 (guessed) fdo 37330872 (precise) diff 71845596, +192.46% preds succs 61 68 69 63 brute/32 bb 63 (hot) afdo 2031304 (auto FDO) scaled 80537456 (guessed) fdo 27998154 (precise) diff 52539302, +187.65% preds succs 62 64 brute/32 bb 64 (very hot) afdo 2661301 (auto FDO) scaled 105515674 (guessed) fdo 111992616 (precise) diff -6476942, -5.78% preds succs 63 67 68 65 I.e. function name, bb index, hotness according to afdo (I added very hot variant as those we want to look at first), afdo count, attempt to scale it correctly, fdo count and difference. So I think very hot with very large negative diff are first to look at. jh@shroud:~/cpu2017/benchspec/CPU/548.exchange2_r/build/build_peak_autofdo-cmp-m64.0000> grep "(very hot).*, -9[0-9]\\." *.profile | sort -t")" -n -k3 digits_2/30 bb 248 (very hot) afdo 1106295 (auto FDO) scaled 43862556 (guessed) fdo 2180591954 (precise) diff -2136729398, -97.99% digits_2/30 bb 276 (very hot) afdo 7848783 (auto FDO) scaled 311189765 (guessed) fdo 13764381875 (precise) diff -13453192110, -97.74% digits_2/30 bb 308 (very hot) afdo 8952263 (auto FDO) scaled 354940711 (guessed) fdo 10821023213 (precise) diff -10466082502, -96.72% digits_2/30 bb 340 (very hot) afdo 8148862 (auto FDO) scaled 323087343 (guessed) fdo 7398657585 (precise) diff -7075570242, -95.63% digits_2/30 bb 4 (very hot) afdo 9774243 (auto FDO) scaled 387530701 (guessed) fdo 4039694145 (precise) diff -3652163444, -90.41% hidden_triplets/8 bb 380 (very hot) afdo 153532 (guessed) scaled 6087261 (guessed) fdo 162401310 (precise) diff -156314049, -96.25% hidden_triplets/8 bb 381 (very hot) afdo 136644 (guessed) scaled 5417682 (guessed) fdo 146161179 (precise) diff -140743497, -96.29% hidden_triplets/8 bb 383 (very hot) afdo 136644 (guessed) scaled 5417682 (guessed) fdo 146161179 (precise) diff -140743497, -96.29% hidden_triplets/8 bb 387 (very hot) afdo 53337 (guessed) scaled 2114714 (guessed) fdo 162209420 (precise) diff -160094706, -98.70% hidden_triplets/8 bb 388 (very hot) afdo 47470 (guessed) scaled 1882098 (guessed) fdo 145988478 (precise) diff -144106380, -98.71% hidden_triplets/8 bb 390 (very hot) afdo 47470 (guessed) scaled 1882098 (guessed) fdo 145988478 (precise) diff -144106380, -98.71% new_solver/9 bb 139 (very hot) afdo 365547 (guessed) scaled 14493264 (guessed) fdo 399160962 (precise) diff -384667698, -96.37% new_solver/9 bb 140 (very hot) afdo 345441 (guessed) scaled 13696098 (guessed) fdo 361455947 (precise) diff -347759849, -96.21% new_solver/9 bb 142 (very hot) afdo 326442 (guessed) scaled 12942823 (guessed) fdo 352025127 (precise) diff -339082304, -96.32% new_solver/9 bb 160 (very hot) afdo 365547 (guessed) scaled 14493264 (guessed) fdo 397240050 (precise) diff -382746786, -96.35% new_solver/9 bb 161 (very hot) afdo 328992 (guessed) scaled 13043926 (guessed) fdo 359535035 (precise) diff -346491109, -96.37% new_solver/9 bb 163 (very hot) afdo 310897 (guessed) scaled 12326492 (guessed) fdo 350104215 (precise) diff -337777723, -96.48% new_solver/9 bb 218 (very hot) afdo 365547 (guessed) scaled 14493264 (guessed) fdo 399683609 (precise) diff -385190345, -96.37% new_solver/9 bb 219 (very hot) afdo 345442 (guessed) scaled 13696138 (guessed) fdo 361977164 (precise) diff -348281026, -96.22% new_solver/9 bb 221 (very hot) afdo 326442 (guessed) scaled 12942823 (guessed) fdo 352546094 (precise) diff -339603271, -96.33% new_solver/9 bb 239 (very hot) afdo 365547 (guessed) scaled 14493264 (guessed) fdo 397175144 (precise) diff -382681880, -96.35% new_solver/9 bb 240 (very hot) afdo 328992 (guessed) scaled 13043926 (guessed) fdo 359468699 (precise) diff -346424773, -96.37% new_solver/9 bb 242 (very hot) afdo 310898 (guessed) scaled 12326532 (guessed) fdo 350037629 (precise) diff -337711097, -96.48% new_solver/9 bb 303 (very hot) afdo 124141 (guessed) scaled 4921962 (guessed) fdo 112036204 (precise) diff -107114242, -95.61% new_solver/9 bb 304 (very hot) afdo 117313 (guessed) scaled 4651244 (guessed) fdo 112036204 (precise) diff -107384960, -95.85% new_solver/9 bb 334 (very hot) afdo 124141 (guessed) scaled 4921962 (guessed) fdo 112036204 (precise) diff -107114242, -95.61% new_solver/9 bb 335 (very hot) afdo 117313 (guessed) scaled 4651244 (guessed) fdo 112036204 (precise) diff -107384960, -95.85% new_solver/9 bb 369 (very hot) afdo 124141 (guessed) scaled 4921962 (guessed) fdo 113795352 (precise) diff -108873390, -95.67% new_solver/9 bb 370 (very hot) afdo 117313 (guessed) scaled 4651244 (guessed) fdo 113795352 (precise) diff -109144108, -95.91% new_solver/9 bb 400 (very hot) afdo 124141 (guessed) scaled 4921962 (guessed) fdo 113795352 (precise) diff -108873390, -95.67% new_solver/9 bb 401 (very hot) afdo 117313 (guessed) scaled 4651244 (guessed) fdo 113795352 (precise) diff -109144108, -95.91% One needs to do 3 builds 1) build with -g to train auto-fdo + train run 2) build with -fauto-profile -fprofile-generate to produce instrumentiton + train run 3) build with -fauto-profile -fprofile-use -fdump-tree-profile to produce stats. I use: OPTIMIZE = -Ofast -fopt-info-optimized-optall -march=native -mtune=native -Wno-error=implicit-int -Wno-error=implicit-function-declaration -Wno-error=declaration-missing-parameter-type -Wno-error=return-mismatch -Wno-error=int-conversion #-flto=auto #OPTIMIZE = -Ofast -g -fopt-info-optimized-optall -march=native -mtune=native --param max-peel-times=128 --param max-peel-branches=256 --param max-peeled-insns=1024 --param profile-histogram-size-lin=129 fdo_pre0 = rm -rf ${benchmark}.data ${benchmark}.gcov; \\ fdo_run1 = perf record -e ex_ret_brn_tkn:Pu -c 100000 -b -o ${benchmark}.data -- ${command}; \\ create_gcov --binary=${baseexe} --profile=${benchmark}.data --gcov=current.gcov -gcov_version=2; \\ if test -e ${benchmark}.gcov ; then profile_merger current.gcov ${benchmark}.gcov --output_file ${benchmark}.gcov ; else mv current.gcov ${benchmark}.gcov ; fi \\ PASS1_OPTIMIZE = -g -fno-reorder-blocks-and-partition -fno-ipa-icf -fno-lto PASS2_OPTIMIZE = -fauto-profile=${benchmark}.gcov -fprofile-generate -g PASS3_OPTIMIZE = -fauto-profile=${benchmark}.gcov -fdump-ipa-all-details-blocks -fdump-tree-optimized-details-blocks -fdump-tree-einline-details -g -fprofile-use I am flying from China today so will see how much I can play around with the patch. Any improvements would be welcome. gcc/ChangeLog: * auto-profile.cc (autofdo_source_profile::offline_external_functions): Add missing newline. (afdo_annotate_cfg): Update max bb count. * coverage.cc (coverage_init): Add AUTO_PROFILE parameter. * coverage.h (coverage_init): Update prototype. * passes.cc (finish_optimization_passes): Do not call end_branch_prob. (pass_manager::dump_profile_report): Also watch for afdo. * profile.cc (struct afdo_fdo_record): New structure. (compute_branch_probabilities): Record info about afdo/fdo compare (end_branch_prob): Print the records. * toplev.cc (do_compile): Initialize both FDO and AFDO if requested. * tree-profile.cc (tree_profiling): End profiling (pass_ipa_tree_profile::gate): Also run with auto-profile. gcc/lto/ChangeLog: * lto-symtab.cc (lto_symtab_merge_symbols_1): diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index 44e7faa8fee..b6dd552acab 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -1308,7 +1308,7 @@ autofdo_source_profile::offline_external_functions () if (dump_file) fprintf (dump_file, "string table in auto-profile contains" - " duplicated name %s", n1); + " duplicated name %s\n", n1); to_symbol_name.put (i, index); } continue; @@ -3119,6 +3119,7 @@ afdo_annotate_cfg (void) FOR_ALL_BB_FN (bb, cfun) if (bb->count.quality () == GUESSED_LOCAL) bb->count = bb->count.global0afdo (); + update_max_bb_count (); loop_optimizer_finalize (); free_dominance_info (CDI_DOMINATORS); diff --git a/gcc/coverage.cc b/gcc/coverage.cc index c0ae76a40ef..af9091c4561 100644 --- a/gcc/coverage.cc +++ b/gcc/coverage.cc @@ -1251,7 +1251,7 @@ coverage_obj_finish (vec<constructor_elt, va_gc> *ctor, of notes file. */ void -coverage_init (const char *filename) +coverage_init (const char *filename, bool auto_profile) { /* If we are in LTO, the profile will be read from object files. */ if (in_lto_p) @@ -1315,7 +1315,7 @@ coverage_init (const char *filename) strcpy (da_file_name + prefix_len + len, GCOV_DATA_SUFFIX); bbg_file_stamp = local_tick; - if (flag_auto_profile) + if (auto_profile) read_autofdo_file (); else if (flag_branch_probabilities) read_counts_file (); diff --git a/gcc/coverage.h b/gcc/coverage.h index 9afbb605482..1f4b521bdea 100644 --- a/gcc/coverage.h +++ b/gcc/coverage.h @@ -22,7 +22,7 @@ along with GCC; see the file COPYING3. If not see #include "gcov-io.h" -extern void coverage_init (const char *); +extern void coverage_init (const char *, bool); extern void coverage_finish (void); extern void coverage_remove_note_file (void); diff --git a/gcc/passes.cc b/gcc/passes.cc index 6c67ffe56ba..a33c8d924a5 100644 --- a/gcc/passes.cc +++ b/gcc/passes.cc @@ -355,13 +355,6 @@ finish_optimization_passes (void) gcc::dump_manager *dumps = m_ctxt->get_dumps (); timevar_push (TV_DUMP); - if (coverage_instrumentation_p () || flag_test_coverage - || flag_branch_probabilities) - { - dumps->dump_start (m_pass_profile_1->static_pass_number, NULL); - end_branch_prob (); - dumps->dump_finish (m_pass_profile_1->static_pass_number); - } /* Do whatever is necessary to finish printing the graphs. */ for (i = TDI_end; (dfi = dumps->get_dump_file_info (i)) != NULL; ++i) @@ -2036,6 +2029,7 @@ pass_manager::dump_profile_report () const fprintf (dump_file, "| %12.0f", profile_record[i].time); /* Time units changes with profile estimate and feedback. */ if (i == m_pass_profile_1->static_pass_number + || i == m_pass_ipa_auto_profile_1->static_pass_number || i == m_pass_ipa_tree_profile_1->static_pass_number) fprintf (dump_file, "-------------"); else if (rel_time_change) diff --git a/gcc/profile.cc b/gcc/profile.cc index 6234dd2d4e2..ecfc9bdc254 100644 --- a/gcc/profile.cc +++ b/gcc/profile.cc @@ -423,6 +423,20 @@ cmp_stats (const void *ptr1, const void *ptr2) return 0; } +struct afdo_fdo_record +{ + cgraph_node *node; + struct bb_record + { + int index; + profile_count afdo; + profile_count fdo; + vec <int> preds; + vec <int> succs; + }; + vec <bb_record> bbs; +}; +static vec <afdo_fdo_record> afdo_fdo_records; /* Compute the branch probabilities for the various branches. Annotate them accordingly. @@ -472,6 +486,22 @@ compute_branch_probabilities (unsigned cfg_checksum, unsigned lineno_checksum) BB_INFO (EXIT_BLOCK_PTR_FOR_FN (cfun))->succ_count = 2; BB_INFO (ENTRY_BLOCK_PTR_FOR_FN (cfun))->pred_count = 2; + afdo_fdo_record record = {cgraph_node::get (current_function_decl), vNULL};; + if (dump_file && flag_auto_profile) + { + FOR_ALL_BB_FN (bb, cfun) + { + record.bbs.safe_push ({bb->index, bb->count.ipa (), + profile_count::uninitialized (), vNULL, vNULL}); + record.bbs.last ().preds.reserve (EDGE_COUNT (bb->preds)); + for (auto &e : bb->preds) + record.bbs.last ().succs.safe_push (e->src->index); + record.bbs.last ().succs.reserve (EDGE_COUNT (bb->succs)); + for (auto &e : bb->succs) + record.bbs.last ().succs.safe_push (e->dest->index); + } + } + num_edges = read_profile_edge_counts (exec_counts); if (dump_file) @@ -744,7 +774,6 @@ compute_branch_probabilities (unsigned cfg_checksum, unsigned lineno_checksum) num_branches++; } } - if (exec_counts && (bb_gcov_count (ENTRY_BLOCK_PTR_FOR_FN (cfun)) || !flag_profile_partial_training)) @@ -812,6 +841,18 @@ compute_branch_probabilities (unsigned cfg_checksum, unsigned lineno_checksum) delete edge_gcov_counts; edge_gcov_counts = NULL; + if (dump_file && flag_auto_profile) + { + int i = 0; + FOR_ALL_BB_FN (bb, cfun) + { + gcc_checking_assert (record.bbs[i].index == bb->index); + record.bbs[i].fdo = bb->count.ipa(); + i++; + } + afdo_fdo_records.safe_push (record); + } + update_max_bb_count (); if (dump_file) @@ -1804,6 +1845,49 @@ end_branch_prob (void) } fprintf (dump_file, "Total number of conditions: %d\n", total_num_conds); + if (afdo_fdo_records.length ()) + { + profile_count fdo_sum = profile_count::zero (); + profile_count afdo_sum = profile_count::zero (); + for (const auto &r : afdo_fdo_records) + for (const auto &b : r.bbs) + if (b.fdo.initialized_p () && b.afdo.initialized_p ()) + { + fdo_sum += b.fdo; + afdo_sum += b.afdo; + } + for (auto &r : afdo_fdo_records) + { + for (auto &b : r.bbs) + if (b.fdo.initialized_p () && b.afdo.initialized_p ()) + { + profile_count scaled = b.afdo.apply_scale (fdo_sum, afdo_sum); + fprintf (dump_file, "%s bb %i (%s) afdo ", r.node->dump_name (), b.index, + maybe_hot_count_p (NULL, b.fdo.apply_scale (1, 1000)) ? "very hot" + : maybe_hot_count_p (NULL, b.fdo) ? "hot" : "cold"); + b.afdo.dump (dump_file); + fprintf (dump_file, " scaled "); + scaled.dump (dump_file); + fprintf (dump_file, " fdo "); + b.fdo.dump (dump_file); + fprintf (dump_file, " diff %" PRId64 ", %+2.2f%%\n", + scaled.to_gcov_type () - b.fdo.to_gcov_type (), + (scaled.to_gcov_type () - b.fdo.to_gcov_type ()) * 100.0 + / MAX (b.fdo.to_gcov_type (), 1)); + fprintf (dump_file, " preds"); + for (int val : b.preds) + fprintf (dump_file, " %i", val); + b.preds.release (); + fprintf (dump_file, "\n succs"); + for (int val : b.succs) + fprintf (dump_file, " %i", val); + b.succs.release (); + fprintf (dump_file, "\n"); + } + r.bbs.release (); + } + } + afdo_fdo_records.release (); } } diff --git a/gcc/toplev.cc b/gcc/toplev.cc index 00a8ccb7a69..e6eba05ab0b 100644 --- a/gcc/toplev.cc +++ b/gcc/toplev.cc @@ -2197,7 +2197,9 @@ do_compile () symtab->initialize (); init_final (main_input_filename); - coverage_init (aux_base_name); + coverage_init (aux_base_name, false); + if (flag_auto_profile) + coverage_init (aux_base_name, true); statistics_init (); debuginfo_init (); invoke_plugin_callbacks (PLUGIN_START_UNIT, NULL); diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc index fed218eb60b..fe20e84838d 100644 --- a/gcc/tree-profile.cc +++ b/gcc/tree-profile.cc @@ -2031,6 +2031,7 @@ tree_profiling (void) handle_missing_profiles (); del_node_map (); + end_branch_prob (); return 0; } @@ -2065,10 +2066,8 @@ public: bool pass_ipa_tree_profile::gate (function *) { - /* When profile instrumentation, use or test coverage shall be performed. - But for AutoFDO, this there is no instrumentation, thus this pass is - disabled. */ - return (!in_lto_p && !flag_auto_profile + /* When profile instrumentation, use or test coverage shall be performed. */ + return (!in_lto_p && (flag_branch_probabilities || flag_test_coverage || coverage_instrumentation_p ()) && !seen_error ());