Hi Honza,

On Sun, Jun 29, 2025 at 10:45 PM Jan Hubicka <hubi...@ucw.cz> wrote:
>
> >
> >
> > > On 24 Jun 2025, at 7:43 pm, Jan Hubicka <hubi...@ucw.cz> wrote:
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > Hi,
> > > this pass removes early-inlining from afdo pass since all inlining
> > > should now happen from early inliner.  I tedted this on spec and there
> > > are 3 inlines happening here which are blocked at early-inline time by
> > > hitting large function growth limit.  We probably want to bypass that
> > > limit, I will look into that incrementaly.
> >
> > Thanks for doing this. Is the inlining difference here is due to annotation 
> > that happens in auto-profile pass in the earlier implementation?
> >
> > One unrelated question about scaling profiles. We seem to scale-up AFDO  
> > with and_count_scale and scale down local_profile in some other cases. 
> > Should we instead scale up AFDO profile to local_profile scale. Lot of the 
> > inlining and other parameters seem to work well with that.
>
> Hi,
> I implemented simple afdo/fdo profile comparator.  I think we will need to 
> figure out
> what to print and what to look for.  It currently records the afdo, fdo count 
> pairs
> and then try to scale afdo counts to fdo counts printng differences.

I tried your patch on the current master and when I try running as you
mentioned, it looks like PGO counters are not read. I am seeing
something like (fdo is always zero):

update_stmt.part.0/2212 bb 0 (cold) afdo 3400152 (guessed) scaled 0
(guessed) fdo 0 (precise) diff 0, +0.00%
update_stmt.part.0/2212 bb 4 (cold) afdo 3400152 (auto FDO) scaled 0
(guessed) fdo 0 (precise) diff 0, +0.00%

Also, the patch didn't apply cleanly to the current master. Is the
patch complete?

Thanks,
Kugan

>
> It prints info as follows:
>
> brute/32 bb 60 (cold) afdo 0 (auto FDO) scaled 0 (guessed) fdo 4847 (precise) 
> diff -4847, -100.00%
>  preds
>  succs 59 169
> brute/32 bb 61 (hot) afdo 629997 (auto FDO) scaled 24978219 (guessed) fdo 
> 9332718 (precise) diff 15645501, +167.64%
>  preds
>  succs 59 62
> brute/32 bb 62 (hot) afdo 2753633 (auto FDO) scaled 109176468 (guessed) fdo 
> 37330872 (precise) diff 71845596, +192.46%
>  preds
>  succs 61 68 69 63
> brute/32 bb 63 (hot) afdo 2031304 (auto FDO) scaled 80537456 (guessed) fdo 
> 27998154 (precise) diff 52539302, +187.65%
>  preds
>  succs 62 64
> brute/32 bb 64 (very hot) afdo 2661301 (auto FDO) scaled 105515674 (guessed) 
> fdo 111992616 (precise) diff -6476942, -5.78%
>  preds
>  succs 63 67 68 65
>
> I.e. function name, bb index, hotness according to afdo (I added very hot 
> variant as those we want to look at first), afdo count,
> attempt to scale it correctly, fdo count and difference.
>
> So I think very hot with very large negative diff are first to look at.
>
> jh@shroud:~/cpu2017/benchspec/CPU/548.exchange2_r/build/build_peak_autofdo-cmp-m64.0000>
>  grep "(very hot).*, -9[0-9]\\." *.profile | sort -t")" -n -k3
> digits_2/30 bb 248 (very hot) afdo 1106295 (auto FDO) scaled 43862556 
> (guessed) fdo 2180591954 (precise) diff -2136729398, -97.99%
> digits_2/30 bb 276 (very hot) afdo 7848783 (auto FDO) scaled 311189765 
> (guessed) fdo 13764381875 (precise) diff -13453192110, -97.74%
> digits_2/30 bb 308 (very hot) afdo 8952263 (auto FDO) scaled 354940711 
> (guessed) fdo 10821023213 (precise) diff -10466082502, -96.72%
> digits_2/30 bb 340 (very hot) afdo 8148862 (auto FDO) scaled 323087343 
> (guessed) fdo 7398657585 (precise) diff -7075570242, -95.63%
> digits_2/30 bb 4 (very hot) afdo 9774243 (auto FDO) scaled 387530701 
> (guessed) fdo 4039694145 (precise) diff -3652163444, -90.41%
> hidden_triplets/8 bb 380 (very hot) afdo 153532 (guessed) scaled 6087261 
> (guessed) fdo 162401310 (precise) diff -156314049, -96.25%
> hidden_triplets/8 bb 381 (very hot) afdo 136644 (guessed) scaled 5417682 
> (guessed) fdo 146161179 (precise) diff -140743497, -96.29%
> hidden_triplets/8 bb 383 (very hot) afdo 136644 (guessed) scaled 5417682 
> (guessed) fdo 146161179 (precise) diff -140743497, -96.29%
> hidden_triplets/8 bb 387 (very hot) afdo 53337 (guessed) scaled 2114714 
> (guessed) fdo 162209420 (precise) diff -160094706, -98.70%
> hidden_triplets/8 bb 388 (very hot) afdo 47470 (guessed) scaled 1882098 
> (guessed) fdo 145988478 (precise) diff -144106380, -98.71%
> hidden_triplets/8 bb 390 (very hot) afdo 47470 (guessed) scaled 1882098 
> (guessed) fdo 145988478 (precise) diff -144106380, -98.71%
> new_solver/9 bb 139 (very hot) afdo 365547 (guessed) scaled 14493264 
> (guessed) fdo 399160962 (precise) diff -384667698, -96.37%
> new_solver/9 bb 140 (very hot) afdo 345441 (guessed) scaled 13696098 
> (guessed) fdo 361455947 (precise) diff -347759849, -96.21%
> new_solver/9 bb 142 (very hot) afdo 326442 (guessed) scaled 12942823 
> (guessed) fdo 352025127 (precise) diff -339082304, -96.32%
> new_solver/9 bb 160 (very hot) afdo 365547 (guessed) scaled 14493264 
> (guessed) fdo 397240050 (precise) diff -382746786, -96.35%
> new_solver/9 bb 161 (very hot) afdo 328992 (guessed) scaled 13043926 
> (guessed) fdo 359535035 (precise) diff -346491109, -96.37%
> new_solver/9 bb 163 (very hot) afdo 310897 (guessed) scaled 12326492 
> (guessed) fdo 350104215 (precise) diff -337777723, -96.48%
> new_solver/9 bb 218 (very hot) afdo 365547 (guessed) scaled 14493264 
> (guessed) fdo 399683609 (precise) diff -385190345, -96.37%
> new_solver/9 bb 219 (very hot) afdo 345442 (guessed) scaled 13696138 
> (guessed) fdo 361977164 (precise) diff -348281026, -96.22%
> new_solver/9 bb 221 (very hot) afdo 326442 (guessed) scaled 12942823 
> (guessed) fdo 352546094 (precise) diff -339603271, -96.33%
> new_solver/9 bb 239 (very hot) afdo 365547 (guessed) scaled 14493264 
> (guessed) fdo 397175144 (precise) diff -382681880, -96.35%
> new_solver/9 bb 240 (very hot) afdo 328992 (guessed) scaled 13043926 
> (guessed) fdo 359468699 (precise) diff -346424773, -96.37%
> new_solver/9 bb 242 (very hot) afdo 310898 (guessed) scaled 12326532 
> (guessed) fdo 350037629 (precise) diff -337711097, -96.48%
> new_solver/9 bb 303 (very hot) afdo 124141 (guessed) scaled 4921962 (guessed) 
> fdo 112036204 (precise) diff -107114242, -95.61%
> new_solver/9 bb 304 (very hot) afdo 117313 (guessed) scaled 4651244 (guessed) 
> fdo 112036204 (precise) diff -107384960, -95.85%
> new_solver/9 bb 334 (very hot) afdo 124141 (guessed) scaled 4921962 (guessed) 
> fdo 112036204 (precise) diff -107114242, -95.61%
> new_solver/9 bb 335 (very hot) afdo 117313 (guessed) scaled 4651244 (guessed) 
> fdo 112036204 (precise) diff -107384960, -95.85%
> new_solver/9 bb 369 (very hot) afdo 124141 (guessed) scaled 4921962 (guessed) 
> fdo 113795352 (precise) diff -108873390, -95.67%
> new_solver/9 bb 370 (very hot) afdo 117313 (guessed) scaled 4651244 (guessed) 
> fdo 113795352 (precise) diff -109144108, -95.91%
> new_solver/9 bb 400 (very hot) afdo 124141 (guessed) scaled 4921962 (guessed) 
> fdo 113795352 (precise) diff -108873390, -95.67%
> new_solver/9 bb 401 (very hot) afdo 117313 (guessed) scaled 4651244 (guessed) 
> fdo 113795352 (precise) diff -109144108, -95.91%
>
> One needs to do 3 builds
>  1) build with -g to train auto-fdo + train run
>  2) build with -fauto-profile -fprofile-generate to produce instrumentiton + 
> train run
>  3) build with -fauto-profile -fprofile-use -fdump-tree-profile to produce 
> stats.
>
> I use:
>
> OPTIMIZE    = -Ofast  -fopt-info-optimized-optall -march=native -mtune=native 
> -Wno-error=implicit-int  -Wno-error=implicit-function-declaration 
> -Wno-error=declaration-missing-parameter-type -Wno-error=return-mismatch 
> -Wno-error=int-conversion  #-flto=auto
> #OPTIMIZE    = -Ofast -g -fopt-info-optimized-optall -march=native 
> -mtune=native  --param max-peel-times=128 --param max-peel-branches=256 
> --param max-peeled-insns=1024  --param profile-histogram-size-lin=129
>
> fdo_pre0 = rm -rf ${benchmark}.data ${benchmark}.gcov; \\
>
> fdo_run1 = perf record -e ex_ret_brn_tkn:Pu -c 100000 -b -o ${benchmark}.data 
> -- ${command}; \\
>            create_gcov --binary=${baseexe} --profile=${benchmark}.data 
> --gcov=current.gcov -gcov_version=2;  \\
>            if test -e ${benchmark}.gcov ; then profile_merger current.gcov 
> ${benchmark}.gcov --output_file ${benchmark}.gcov ; else mv current.gcov 
> ${benchmark}.gcov ; fi \\
>
> PASS1_OPTIMIZE = -g -fno-reorder-blocks-and-partition  -fno-ipa-icf -fno-lto
> PASS2_OPTIMIZE = -fauto-profile=${benchmark}.gcov  -fprofile-generate -g
> PASS3_OPTIMIZE = -fauto-profile=${benchmark}.gcov  
> -fdump-ipa-all-details-blocks -fdump-tree-optimized-details-blocks 
> -fdump-tree-einline-details -g -fprofile-use
>
> I am flying from China today so will see how much I can play around with the 
> patch.
> Any improvements would be welcome.
>
> gcc/ChangeLog:
>
>         * auto-profile.cc 
> (autofdo_source_profile::offline_external_functions): Add
>         missing newline.
>         (afdo_annotate_cfg): Update max bb count.
>         * coverage.cc (coverage_init): Add AUTO_PROFILE parameter.
>         * coverage.h (coverage_init):  Update prototype.
>         * passes.cc (finish_optimization_passes): Do not call end_branch_prob.
>         (pass_manager::dump_profile_report): Also watch for afdo.
>         * profile.cc (struct afdo_fdo_record): New structure.
>         (compute_branch_probabilities): Record info about afdo/fdo compare
>         (end_branch_prob): Print the records.
>         * toplev.cc (do_compile): Initialize both FDO and AFDO if requested.
>         * tree-profile.cc (tree_profiling): End profiling
>         (pass_ipa_tree_profile::gate): Also run with auto-profile.
>
> gcc/lto/ChangeLog:
>
>         * lto-symtab.cc (lto_symtab_merge_symbols_1):
>
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
> index 44e7faa8fee..b6dd552acab 100644
> --- a/gcc/auto-profile.cc
> +++ b/gcc/auto-profile.cc
> @@ -1308,7 +1308,7 @@ autofdo_source_profile::offline_external_functions ()
>               if (dump_file)
>                 fprintf (dump_file,
>                          "string table in auto-profile contains"
> -                        " duplicated name %s", n1);
> +                        " duplicated name %s\n", n1);
>               to_symbol_name.put (i, index);
>             }
>           continue;
> @@ -3119,6 +3119,7 @@ afdo_annotate_cfg (void)
>        FOR_ALL_BB_FN (bb, cfun)
>         if (bb->count.quality () == GUESSED_LOCAL)
>           bb->count = bb->count.global0afdo ();
> +      update_max_bb_count ();
>
>        loop_optimizer_finalize ();
>        free_dominance_info (CDI_DOMINATORS);
> diff --git a/gcc/coverage.cc b/gcc/coverage.cc
> index c0ae76a40ef..af9091c4561 100644
> --- a/gcc/coverage.cc
> +++ b/gcc/coverage.cc
> @@ -1251,7 +1251,7 @@ coverage_obj_finish (vec<constructor_elt, va_gc> *ctor,
>     of notes file.  */
>
>  void
> -coverage_init (const char *filename)
> +coverage_init (const char *filename, bool auto_profile)
>  {
>    /* If we are in LTO, the profile will be read from object files.  */
>    if (in_lto_p)
> @@ -1315,7 +1315,7 @@ coverage_init (const char *filename)
>    strcpy (da_file_name + prefix_len + len, GCOV_DATA_SUFFIX);
>
>    bbg_file_stamp = local_tick;
> -  if (flag_auto_profile)
> +  if (auto_profile)
>      read_autofdo_file ();
>    else if (flag_branch_probabilities)
>      read_counts_file ();
> diff --git a/gcc/coverage.h b/gcc/coverage.h
> index 9afbb605482..1f4b521bdea 100644
> --- a/gcc/coverage.h
> +++ b/gcc/coverage.h
> @@ -22,7 +22,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  #include "gcov-io.h"
>
> -extern void coverage_init (const char *);
> +extern void coverage_init (const char *, bool);
>  extern void coverage_finish (void);
>  extern void coverage_remove_note_file (void);
>
> diff --git a/gcc/passes.cc b/gcc/passes.cc
> index 6c67ffe56ba..a33c8d924a5 100644
> --- a/gcc/passes.cc
> +++ b/gcc/passes.cc
> @@ -355,13 +355,6 @@ finish_optimization_passes (void)
>    gcc::dump_manager *dumps = m_ctxt->get_dumps ();
>
>    timevar_push (TV_DUMP);
> -  if (coverage_instrumentation_p () || flag_test_coverage
> -      || flag_branch_probabilities)
> -    {
> -      dumps->dump_start (m_pass_profile_1->static_pass_number, NULL);
> -      end_branch_prob ();
> -      dumps->dump_finish (m_pass_profile_1->static_pass_number);
> -    }
>
>    /* Do whatever is necessary to finish printing the graphs.  */
>    for (i = TDI_end; (dfi = dumps->get_dump_file_info (i)) != NULL; ++i)
> @@ -2036,6 +2029,7 @@ pass_manager::dump_profile_report () const
>             fprintf (dump_file, "| %12.0f", profile_record[i].time);
>             /* Time units changes with profile estimate and feedback.  */
>             if (i == m_pass_profile_1->static_pass_number
> +               || i == m_pass_ipa_auto_profile_1->static_pass_number
>                 || i == m_pass_ipa_tree_profile_1->static_pass_number)
>               fprintf (dump_file, "-------------");
>             else if (rel_time_change)
> diff --git a/gcc/profile.cc b/gcc/profile.cc
> index 6234dd2d4e2..ecfc9bdc254 100644
> --- a/gcc/profile.cc
> +++ b/gcc/profile.cc
> @@ -423,6 +423,20 @@ cmp_stats (const void *ptr1, const void *ptr2)
>    return 0;
>  }
>
> +struct afdo_fdo_record
> +{
> +  cgraph_node *node;
> +  struct bb_record
> +  {
> +    int index;
> +    profile_count afdo;
> +    profile_count fdo;
> +    vec <int> preds;
> +    vec <int> succs;
> +  };
> +  vec <bb_record> bbs;
> +};
> +static vec <afdo_fdo_record> afdo_fdo_records;
>
>  /* Compute the branch probabilities for the various branches.
>     Annotate them accordingly.
> @@ -472,6 +486,22 @@ compute_branch_probabilities (unsigned cfg_checksum, 
> unsigned lineno_checksum)
>    BB_INFO (EXIT_BLOCK_PTR_FOR_FN (cfun))->succ_count = 2;
>    BB_INFO (ENTRY_BLOCK_PTR_FOR_FN (cfun))->pred_count = 2;
>
> +  afdo_fdo_record record = {cgraph_node::get (current_function_decl), 
> vNULL};;
> +  if (dump_file && flag_auto_profile)
> +    {
> +      FOR_ALL_BB_FN (bb, cfun)
> +       {
> +         record.bbs.safe_push ({bb->index, bb->count.ipa (),
> +                               profile_count::uninitialized (), vNULL, 
> vNULL});
> +         record.bbs.last ().preds.reserve (EDGE_COUNT (bb->preds));
> +         for (auto &e : bb->preds)
> +           record.bbs.last ().succs.safe_push (e->src->index);
> +         record.bbs.last ().succs.reserve (EDGE_COUNT (bb->succs));
> +         for (auto &e : bb->succs)
> +           record.bbs.last ().succs.safe_push (e->dest->index);
> +       }
> +    }
> +
>    num_edges = read_profile_edge_counts (exec_counts);
>
>    if (dump_file)
> @@ -744,7 +774,6 @@ compute_branch_probabilities (unsigned cfg_checksum, 
> unsigned lineno_checksum)
>             num_branches++;
>         }
>      }
> -
>    if (exec_counts
>        && (bb_gcov_count (ENTRY_BLOCK_PTR_FOR_FN (cfun))
>           || !flag_profile_partial_training))
> @@ -812,6 +841,18 @@ compute_branch_probabilities (unsigned cfg_checksum, 
> unsigned lineno_checksum)
>    delete edge_gcov_counts;
>    edge_gcov_counts = NULL;
>
> +  if (dump_file && flag_auto_profile)
> +    {
> +      int i = 0;
> +      FOR_ALL_BB_FN (bb, cfun)
> +       {
> +         gcc_checking_assert (record.bbs[i].index == bb->index);
> +         record.bbs[i].fdo = bb->count.ipa();
> +         i++;
> +       }
> +      afdo_fdo_records.safe_push (record);
> +    }
> +
>    update_max_bb_count ();
>
>    if (dump_file)
> @@ -1804,6 +1845,49 @@ end_branch_prob (void)
>         }
>        fprintf (dump_file, "Total number of conditions: %d\n",
>                total_num_conds);
> +      if (afdo_fdo_records.length ())
> +       {
> +         profile_count fdo_sum = profile_count::zero ();
> +         profile_count afdo_sum = profile_count::zero ();
> +         for (const auto &r : afdo_fdo_records)
> +           for (const auto &b : r.bbs)
> +             if (b.fdo.initialized_p () && b.afdo.initialized_p ())
> +               {
> +                 fdo_sum += b.fdo;
> +                 afdo_sum += b.afdo;
> +               }
> +         for (auto &r : afdo_fdo_records)
> +           {
> +             for (auto &b : r.bbs)
> +               if (b.fdo.initialized_p () && b.afdo.initialized_p ())
> +                 {
> +                   profile_count scaled = b.afdo.apply_scale (fdo_sum, 
> afdo_sum);
> +                   fprintf (dump_file, "%s bb %i (%s) afdo ", 
> r.node->dump_name (), b.index,
> +                            maybe_hot_count_p (NULL, b.fdo.apply_scale (1, 
> 1000)) ? "very hot"
> +                            : maybe_hot_count_p (NULL, b.fdo) ?  "hot" : 
> "cold");
> +                   b.afdo.dump (dump_file);
> +                   fprintf (dump_file, " scaled ");
> +                   scaled.dump (dump_file);
> +                   fprintf (dump_file, " fdo ");
> +                   b.fdo.dump (dump_file);
> +                   fprintf (dump_file, " diff %" PRId64 ", %+2.2f%%\n",
> +                            scaled.to_gcov_type () - b.fdo.to_gcov_type (),
> +                            (scaled.to_gcov_type () - b.fdo.to_gcov_type ()) 
> * 100.0
> +                            / MAX (b.fdo.to_gcov_type (), 1));
> +                   fprintf (dump_file, " preds");
> +                   for (int val : b.preds)
> +                     fprintf (dump_file, " %i", val);
> +                   b.preds.release ();
> +                   fprintf (dump_file, "\n succs");
> +                   for (int val : b.succs)
> +                     fprintf (dump_file, " %i", val);
> +                   b.succs.release ();
> +                   fprintf (dump_file, "\n");
> +                 }
> +              r.bbs.release ();
> +            }
> +       }
> +      afdo_fdo_records.release ();
>      }
>  }
>
> diff --git a/gcc/toplev.cc b/gcc/toplev.cc
> index 00a8ccb7a69..e6eba05ab0b 100644
> --- a/gcc/toplev.cc
> +++ b/gcc/toplev.cc
> @@ -2197,7 +2197,9 @@ do_compile ()
>
>           symtab->initialize ();
>            init_final (main_input_filename);
> -          coverage_init (aux_base_name);
> +          coverage_init (aux_base_name, false);
> +         if (flag_auto_profile)
> +            coverage_init (aux_base_name, true);
>            statistics_init ();
>            debuginfo_init ();
>            invoke_plugin_callbacks (PLUGIN_START_UNIT, NULL);
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index fed218eb60b..fe20e84838d 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -2031,6 +2031,7 @@ tree_profiling (void)
>    handle_missing_profiles ();
>
>    del_node_map ();
> +  end_branch_prob ();
>    return 0;
>  }
>
> @@ -2065,10 +2066,8 @@ public:
>  bool
>  pass_ipa_tree_profile::gate (function *)
>  {
> -  /* When profile instrumentation, use or test coverage shall be performed.
> -     But for AutoFDO, this there is no instrumentation, thus this pass is
> -     disabled.  */
> -  return (!in_lto_p && !flag_auto_profile
> +  /* When profile instrumentation, use or test coverage shall be performed.  
> */
> +  return (!in_lto_p
>           && (flag_branch_probabilities || flag_test_coverage
>               || coverage_instrumentation_p ())
>           && !seen_error ());

Reply via email to