On Thu, Sep 20, 2018 at 6:43 PM Jan Hubicka <hubi...@ucw.cz> wrote:
>
> > On Thu, Sep 20, 2018 at 5:26 PM Jan Hubicka <hubi...@ucw.cz> wrote:
> > >
> > > > On Thu, Sep 20, 2018 at 2:11 AM Martin Liška <mli...@suse.cz> wrote:
> > > > >
> > > > > Hello.
> > > > >
> > > > > I've been working for some time on a patch that simplifies how we set
> > > > > the hotness threshold of basic blocks. Currently, we calculate so 
> > > > > called
> > > > > arc profile histograms that should identify edges that cover 99.9% of 
> > > > > all
> > > > > branching. These edges are then identified as hot. Disadvantage of 
> > > > > the approach
> > > > > is that it comes with significant overhead in run-time and GCC 
> > > > > related code
> > > > > is also not trivial. Moreover, anytime a histogram is merged after an 
> > > > > instrumented
> > > > > run, the resulting histogram is misleading.
> > > > >
> > > > > That said, I decided to simplify it again, remove usage of the 
> > > > > histogram and return
> > > > > to what we have before (--param hot-bb-count-fraction). That 
> > > > > basically says that
> > > > > we consider hot each edge that has execution count bigger than 
> > > > > sum_max / 10.000.
> > > > >
> > > > > Note that LTO+PGO remains untouched as it still uses histogram that 
> > > > > is dynamically
> > > > > calculated by read arc counts.
> > > > Hi,
> > > > Does this affect AutoFDO stuff?  AutoFDO is broken and I am fixing it
> > > > now, on the basis of current code.
> > >
> > > This is indpendent of Auto-FDO. There we probably can define cutoffs for 
> > > hot-cold
> > > partitions in the tool translating global data into per-file data read by 
> > > GCC.
> > > It is great you will take a deper look at autoFDO. it indeed needs work!
> > >
> > > The patch is OK, thank for working on it!  Histograms was added by google 
> > > as
> > > bit of experiment, but I do not think they turned out to be useful. The 
> > > data
> > I did some experiments showing it is somehow useful, for autoFDO.  To
> > which extend it is useful remains a question I need to investigate
> > later.
>
> Indeed auto-FDO has better idea about whole program behaviour. We could revive
> the patch for streaming histograms and reading them to compiler if that turns
> out to be a good idea. I can see that auto-FDO profile data tells you pretty
> clearly where the hot spots are and it is not as easy to recover this 
> information
> from profile annotated CFG becuase of all the transforms we do.
> Lets fix and benchmark auto-FDO first and then we could decide what is best 
> option.
> Putting the stream-in code back should not be hard if it turns out to be 
> useful.
>
> Main problem with current historams with normal FDO is the fact that you need
> to merge them between runs which is technically impossible job to do, so they
> work for programs run once, but not for programs run many times in train runs
> like gcc itself.  It seems to me that for those relaly interested in
> performance it is good idea to switch to LTO and that makes it possible to
> calculate histograms during the linking stage.

honza, thanks very much for detailed explanation.

Thanks,
bin
>
> Honza
> >
> > Thanks,
> > bin
> > > produced by them was not very related to what the IPA profile generation 
> > > produces
> > > and thus it did not seem to match reality very well.
> > >
> > > Honza
> > > >
> > > > Thanks,
> > > > bin
> > > > >
> > > > > Note the statistics of the patch:
> > > > >   19 files changed, 101 insertions(+), 1216 deletions(-)
> > > > >
> > > > > I'm attaching file sizes of SPEC2006 int benchmark.
> > > > >
> > > > > Patch survives testing on x86_64-linux-gnu machine.
> > > > > Ready to be installed?
> > > > >
> > > > > Martin
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 2018-09-19  Martin Liska  <mli...@suse.cz>
> > > > >
> > > > >         * auto-profile.c (autofdo_source_profile::read): Do not
> > > > >         set sum_all.
> > > > >         (read_profile): Do not add working sets.
> > > > >         (read_autofdo_file): Remove sum_all.
> > > > >         (afdo_callsite_hot_enough_for_early_inline): Remove const
> > > > >         qualifier.
> > > > >         * coverage.c (struct counts_entry): Remove gcov_summary.
> > > > >         (read_counts_file): Read new GCOV_TAG_OBJECT_SUMMARY,
> > > > >         do not support GCOV_TAG_PROGRAM_SUMMARY.
> > > > >         (get_coverage_counts): Remove summary and expected
> > > > >         arguments.
> > > > >         * coverage.h (get_coverage_counts): Likewise.
> > > > >         * doc/gcov-dump.texi: Remove -w option.
> > > > >         * gcov-dump.c (dump_working_sets): Remove.
> > > > >         (main): Do not support '-w' option.
> > > > >         (print_usage): Likewise.
> > > > >         (tag_summary): Likewise.
> > > > >         * gcov-io.c (gcov_write_summary): Do not dump
> > > > >         histogram.
> > > > >         (gcov_read_summary): Likewise.
> > > > >         (gcov_histo_index): Remove.
> > > > >         (gcov_histogram_merge): Likewise.
> > > > >         (compute_working_sets): Likewise.
> > > > >         * gcov-io.h (GCOV_TAG_OBJECT_SUMMARY): Mark
> > > > >         it not obsolete.
> > > > >         (GCOV_TAG_PROGRAM_SUMMARY): Mark it obsolete.
> > > > >         (GCOV_TAG_SUMMARY_LENGTH): Adjust.
> > > > >         (GCOV_HISTOGRAM_SIZE): Remove.
> > > > >         (GCOV_HISTOGRAM_BITVECTOR_SIZE): Likewise.
> > > > >         (struct gcov_summary): Simplify rapidly just
> > > > >         to runs and sum_max fields.
> > > > >         (gcov_histo_index): Remove.
> > > > >         (NUM_GCOV_WORKING_SETS): Likewise.
> > > > >         (compute_working_sets): Likewise.
> > > > >         * gcov-tool.c (print_overlap_usage_message): Remove
> > > > >         trailing empty line.
> > > > >         * gcov.c (read_count_file): Read GCOV_TAG_OBJECT_SUMMARY.
> > > > >         (output_lines): Remove program related line.
> > > > >         * ipa-profile.c (ipa_profile): Do not consider GCOV histogram.
> > > > >         * lto-cgraph.c (output_profile_summary): Do not stream GCOV
> > > > >         histogram.
> > > > >         (input_profile_summary): Do not read it.
> > > > >         (merge_profile_summaries): And do not merge it.
> > > > >         (input_symtab): Do not call removed function.
> > > > >         * modulo-sched.c (sms_schedule): Do not print sum_max.
> > > > >         * params.def (HOT_BB_COUNT_FRACTION): Reincarnate param that 
> > > > > was
> > > > >         removed when histogram method was invented.
> > > > >         (HOT_BB_COUNT_WS_PERMILLE): Mention that it's used only in LTO
> > > > >         mode.
> > > > >         * postreload-gcse.c (eliminate_partially_redundant_load): Fix
> > > > >         GCOV coding style.
> > > > >         * predict.c (get_hot_bb_threshold): Use HOT_BB_COUNT_FRACTION
> > > > >         and dump selected value.
> > > > >         * profile.c (add_working_set): Remove.
> > > > >         (get_working_sets): Likewise.
> > > > >         (find_working_set): Likewise.
> > > > >         (get_exec_counts): Do not work with working sets.
> > > > >         (read_profile_edge_counts): Do not inform as sum_max is 
> > > > > removed.
> > > > >         (compute_branch_probabilities): Likewise.
> > > > >         (compute_value_histograms): Remove argument for call of
> > > > >         get_coverage_counts.
> > > > >         * profile.h: Do not make gcov_summary const.
> > > > >
> > > > > libgcc/ChangeLog:
> > > > >
> > > > > 2018-09-19  Martin Liska  <mli...@suse.cz>
> > > > >
> > > > >         * libgcov-driver.c (crc32_unsigned): Remove.
> > > > >         (gcov_histogram_insert): Likewise.
> > > > >         (gcov_compute_histogram): Likewise.
> > > > >         (compute_summary): Simplify rapidly.
> > > > >         (merge_one_data): Do not handle PROGRAM_SUMMARY tag.
> > > > >         (merge_summary): Rapidly simplify.
> > > > >         (dump_one_gcov): Ignore gcov_summary.
> > > > >         (gcov_do_dump): Do not handle program summary, it's not
> > > > >         used.
> > > > >         * libgcov-util.c (tag_summary): Remove.
> > > > >         (read_gcda_finalize): Fix coding style.
> > > > >         (read_gcda_file): Initialize curr_object_summary.
> > > > >         (compute_summary): Remove.
> > > > >         (calculate_overlap): Remove settings of run_max.
> > > > > ---
> > > > >   gcc/auto-profile.c      |  21 +--
> > > > >   gcc/coverage.c          |  59 +-----
> > > > >   gcc/coverage.h          |   4 +-
> > > > >   gcc/doc/gcov-dump.texi  |   6 +-
> > > > >   gcc/gcov-dump.c         |  81 +-------
> > > > >   gcc/gcov-io.c           | 398 
> > > > > +---------------------------------------
> > > > >   gcc/gcov-io.h           |  71 +------
> > > > >   gcc/gcov-tool.c         |   1 -
> > > > >   gcc/gcov.c              |   7 +-
> > > > >   gcc/ipa-profile.c       |  26 +--
> > > > >   gcc/lto-cgraph.c        | 136 +-------------
> > > > >   gcc/modulo-sched.c      |   8 -
> > > > >   gcc/params.def          |   7 +-
> > > > >   gcc/postreload-gcse.c   |   2 +-
> > > > >   gcc/predict.c           |   9 +-
> > > > >   gcc/profile.c           | 116 +-----------
> > > > >   gcc/profile.h           |   2 +-
> > > > >   libgcc/libgcov-driver.c | 324 ++++----------------------------
> > > > >   libgcc/libgcov-util.c   |  39 +---
> > > > >   19 files changed, 101 insertions(+), 1216 deletions(-)
> > > > >
> > > > >

Reply via email to