> On Thu, Sep 20, 2018 at 5:26 PM Jan Hubicka <hubi...@ucw.cz> wrote: > > > > > On Thu, Sep 20, 2018 at 2:11 AM Martin Liška <mli...@suse.cz> wrote: > > > > > > > > Hello. > > > > > > > > I've been working for some time on a patch that simplifies how we set > > > > the hotness threshold of basic blocks. Currently, we calculate so called > > > > arc profile histograms that should identify edges that cover 99.9% of > > > > all > > > > branching. These edges are then identified as hot. Disadvantage of the > > > > approach > > > > is that it comes with significant overhead in run-time and GCC related > > > > code > > > > is also not trivial. Moreover, anytime a histogram is merged after an > > > > instrumented > > > > run, the resulting histogram is misleading. > > > > > > > > That said, I decided to simplify it again, remove usage of the > > > > histogram and return > > > > to what we have before (--param hot-bb-count-fraction). That basically > > > > says that > > > > we consider hot each edge that has execution count bigger than sum_max > > > > / 10.000. > > > > > > > > Note that LTO+PGO remains untouched as it still uses histogram that is > > > > dynamically > > > > calculated by read arc counts. > > > Hi, > > > Does this affect AutoFDO stuff? AutoFDO is broken and I am fixing it > > > now, on the basis of current code. > > > > This is indpendent of Auto-FDO. There we probably can define cutoffs for > > hot-cold > > partitions in the tool translating global data into per-file data read by > > GCC. > > It is great you will take a deper look at autoFDO. it indeed needs work! > > > > The patch is OK, thank for working on it! Histograms was added by google as > > bit of experiment, but I do not think they turned out to be useful. The data > I did some experiments showing it is somehow useful, for autoFDO. To > which extend it is useful remains a question I need to investigate > later.
Indeed auto-FDO has better idea about whole program behaviour. We could revive the patch for streaming histograms and reading them to compiler if that turns out to be a good idea. I can see that auto-FDO profile data tells you pretty clearly where the hot spots are and it is not as easy to recover this information from profile annotated CFG becuase of all the transforms we do. Lets fix and benchmark auto-FDO first and then we could decide what is best option. Putting the stream-in code back should not be hard if it turns out to be useful. Main problem with current historams with normal FDO is the fact that you need to merge them between runs which is technically impossible job to do, so they work for programs run once, but not for programs run many times in train runs like gcc itself. It seems to me that for those relaly interested in performance it is good idea to switch to LTO and that makes it possible to calculate histograms during the linking stage. Honza > > Thanks, > bin > > produced by them was not very related to what the IPA profile generation > > produces > > and thus it did not seem to match reality very well. > > > > Honza > > > > > > Thanks, > > > bin > > > > > > > > Note the statistics of the patch: > > > > 19 files changed, 101 insertions(+), 1216 deletions(-) > > > > > > > > I'm attaching file sizes of SPEC2006 int benchmark. > > > > > > > > Patch survives testing on x86_64-linux-gnu machine. > > > > Ready to be installed? > > > > > > > > Martin > > > > > > > > gcc/ChangeLog: > > > > > > > > 2018-09-19 Martin Liska <mli...@suse.cz> > > > > > > > > * auto-profile.c (autofdo_source_profile::read): Do not > > > > set sum_all. > > > > (read_profile): Do not add working sets. > > > > (read_autofdo_file): Remove sum_all. > > > > (afdo_callsite_hot_enough_for_early_inline): Remove const > > > > qualifier. > > > > * coverage.c (struct counts_entry): Remove gcov_summary. > > > > (read_counts_file): Read new GCOV_TAG_OBJECT_SUMMARY, > > > > do not support GCOV_TAG_PROGRAM_SUMMARY. > > > > (get_coverage_counts): Remove summary and expected > > > > arguments. > > > > * coverage.h (get_coverage_counts): Likewise. > > > > * doc/gcov-dump.texi: Remove -w option. > > > > * gcov-dump.c (dump_working_sets): Remove. > > > > (main): Do not support '-w' option. > > > > (print_usage): Likewise. > > > > (tag_summary): Likewise. > > > > * gcov-io.c (gcov_write_summary): Do not dump > > > > histogram. > > > > (gcov_read_summary): Likewise. > > > > (gcov_histo_index): Remove. > > > > (gcov_histogram_merge): Likewise. > > > > (compute_working_sets): Likewise. > > > > * gcov-io.h (GCOV_TAG_OBJECT_SUMMARY): Mark > > > > it not obsolete. > > > > (GCOV_TAG_PROGRAM_SUMMARY): Mark it obsolete. > > > > (GCOV_TAG_SUMMARY_LENGTH): Adjust. > > > > (GCOV_HISTOGRAM_SIZE): Remove. > > > > (GCOV_HISTOGRAM_BITVECTOR_SIZE): Likewise. > > > > (struct gcov_summary): Simplify rapidly just > > > > to runs and sum_max fields. > > > > (gcov_histo_index): Remove. > > > > (NUM_GCOV_WORKING_SETS): Likewise. > > > > (compute_working_sets): Likewise. > > > > * gcov-tool.c (print_overlap_usage_message): Remove > > > > trailing empty line. > > > > * gcov.c (read_count_file): Read GCOV_TAG_OBJECT_SUMMARY. > > > > (output_lines): Remove program related line. > > > > * ipa-profile.c (ipa_profile): Do not consider GCOV histogram. > > > > * lto-cgraph.c (output_profile_summary): Do not stream GCOV > > > > histogram. > > > > (input_profile_summary): Do not read it. > > > > (merge_profile_summaries): And do not merge it. > > > > (input_symtab): Do not call removed function. > > > > * modulo-sched.c (sms_schedule): Do not print sum_max. > > > > * params.def (HOT_BB_COUNT_FRACTION): Reincarnate param that was > > > > removed when histogram method was invented. > > > > (HOT_BB_COUNT_WS_PERMILLE): Mention that it's used only in LTO > > > > mode. > > > > * postreload-gcse.c (eliminate_partially_redundant_load): Fix > > > > GCOV coding style. > > > > * predict.c (get_hot_bb_threshold): Use HOT_BB_COUNT_FRACTION > > > > and dump selected value. > > > > * profile.c (add_working_set): Remove. > > > > (get_working_sets): Likewise. > > > > (find_working_set): Likewise. > > > > (get_exec_counts): Do not work with working sets. > > > > (read_profile_edge_counts): Do not inform as sum_max is removed. > > > > (compute_branch_probabilities): Likewise. > > > > (compute_value_histograms): Remove argument for call of > > > > get_coverage_counts. > > > > * profile.h: Do not make gcov_summary const. > > > > > > > > libgcc/ChangeLog: > > > > > > > > 2018-09-19 Martin Liska <mli...@suse.cz> > > > > > > > > * libgcov-driver.c (crc32_unsigned): Remove. > > > > (gcov_histogram_insert): Likewise. > > > > (gcov_compute_histogram): Likewise. > > > > (compute_summary): Simplify rapidly. > > > > (merge_one_data): Do not handle PROGRAM_SUMMARY tag. > > > > (merge_summary): Rapidly simplify. > > > > (dump_one_gcov): Ignore gcov_summary. > > > > (gcov_do_dump): Do not handle program summary, it's not > > > > used. > > > > * libgcov-util.c (tag_summary): Remove. > > > > (read_gcda_finalize): Fix coding style. > > > > (read_gcda_file): Initialize curr_object_summary. > > > > (compute_summary): Remove. > > > > (calculate_overlap): Remove settings of run_max. > > > > --- > > > > gcc/auto-profile.c | 21 +-- > > > > gcc/coverage.c | 59 +----- > > > > gcc/coverage.h | 4 +- > > > > gcc/doc/gcov-dump.texi | 6 +- > > > > gcc/gcov-dump.c | 81 +------- > > > > gcc/gcov-io.c | 398 +--------------------------------------- > > > > gcc/gcov-io.h | 71 +------ > > > > gcc/gcov-tool.c | 1 - > > > > gcc/gcov.c | 7 +- > > > > gcc/ipa-profile.c | 26 +-- > > > > gcc/lto-cgraph.c | 136 +------------- > > > > gcc/modulo-sched.c | 8 - > > > > gcc/params.def | 7 +- > > > > gcc/postreload-gcse.c | 2 +- > > > > gcc/predict.c | 9 +- > > > > gcc/profile.c | 116 +----------- > > > > gcc/profile.h | 2 +- > > > > libgcc/libgcov-driver.c | 324 ++++---------------------------- > > > > libgcc/libgcov-util.c | 39 +--- > > > > 19 files changed, 101 insertions(+), 1216 deletions(-) > > > > > > > >