[PATCH] Don't set full_profile in auto-profile [PR113765]
auto-profile currently doesn't guarantee that it will set probabilities on all edges because of zero basic block counts. Normally those edges just have probabilities set by the preceding profile_estimate pass but under -O0 profile_estimate pass doesn't run. The patch removes setting of full_profile to true in auto-profile. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * auto-profile.cc (afdo_annotate_cfg): Don't set full_profile to true --- gcc/auto-profile.cc | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index e5407d32fbb..de59b94bcb3 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -1580,7 +1580,6 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts) } update_max_bb_count (); profile_status_for_fn (cfun) = PROFILE_READ; - cfun->cfg->full_profile = true; if (flag_value_profile_transformations) { gimple_value_profile_transformations (); -- 2.25.1
RE: [EXTERNAL] [PATCH] skip debug stmts when assigning locus discriminators
The fix looks good to me. Will this also fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107169 ? It was also a bad interaction of -gstatement-frontiers and discriminators. Eugene -Original Message- From: Alexandre Oliva Sent: Wednesday, November 8, 2023 7:51 AM To: gcc-patches@gcc.gnu.org Cc: Eugene Rozenfeld Subject: [EXTERNAL] [PATCH] skip debug stmts when assigning locus discriminators c-c++-common/goacc/kernels-loop-g.c has been failing (compare-debug) on i686-linux-gnu since r13-3172, because the implementation enabled debug stmts to cause discriminators to be assigned differently, and the discriminators are printed in the .gkd dumps that -fcompare-debug compares. This patch prevents debug stmts from affecting the discriminators in nondebug stmts, but enables debug stmts to get discriminators just as nondebug stmts would if their line numbers match. I suppose we could arrange for discriminators to be omitted from the -fcompare-debug dumps, but keeping discriminators in sync is probably good to avoid other potential sources of divergence between debug and nondebug. Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and x86_64-. Ok to install? (Eugene, I suppose what's special about this testcase, that may not apply to most other uses of assign_discriminators, is that goacc creates new functions out of already optimized code. I think assign_discriminators may not be suitable for new functions, with code that isn't exactly pristinely in-order. WDYT?) for gcc/ChangeLog * tree-cfg.cc (assign_discriminators): Handle debug stmts. --- gcc/tree-cfg.cc | 16 1 file changed, 16 insertions(+) diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index 40a6f2a3b529f..a30a2de33a106 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -1214,6 +1214,22 @@ assign_discriminators (void) { gimple *stmt = gsi_stmt (gsi); + /* Don't allow debug stmts to affect discriminators, but +allow them to take discriminators when they're on the +same line as the preceding nondebug stmt. */ + if (is_gimple_debug (stmt)) + { + if (curr_locus != UNKNOWN_LOCATION + && same_line_p (curr_locus, &curr_locus_e, + gimple_location (stmt))) + { + location_t loc = gimple_location (stmt); + location_t dloc = location_with_discriminator (loc, +curr_discr); + gimple_set_location (stmt, dloc); + } + continue; + } if (curr_locus == UNKNOWN_LOCATION) { curr_locus = gimple_location (stmt); -- Alexandre Oliva, happy hackerhttps://fsfla.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
[PATCH] Fixes for profile count/probability maintenance
Verifier checks have recently been strengthened to check that all counts and probabilities are initialized. The checks fired during autoprofiledbootstrap build and this patch fixes it. gcc/ChangeLog: * auto-profile.cc (afdo_calculate_branch_prob): Fix count comparisons * ipa-utils.cc (ipa_merge_profiles): Guard against zero count when computing probabilities * tree-vect-loop-manip.cc (vect_do_peeling): Guard against zero count when scaling loop profile Tested on x86_64-pc-linux-gnu. --- gcc/auto-profile.cc | 4 ++-- gcc/ipa-utils.cc| 16 +--- gcc/tree-vect-loop-manip.cc | 2 +- 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index ff3b763945c..3e61f36c29b 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -1434,7 +1434,7 @@ afdo_calculate_branch_prob (bb_set *annotated_bb) else total_count += AFDO_EINFO (e)->get_count (); } -if (num_unknown_succ == 0 && total_count > profile_count::zero ()) +if (num_unknown_succ == 0 && total_count > profile_count::zero ().afdo ()) { FOR_EACH_EDGE (e, ei, bb->succs) e->probability @@ -1571,7 +1571,7 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts) DECL_SOURCE_LOCATION (current_function_decl)); afdo_source_profile->mark_annotated (cfun->function_start_locus); afdo_source_profile->mark_annotated (cfun->function_end_locus); - if (max_count > profile_count::zero ()) + if (max_count > profile_count::zero ().afdo ()) { /* Calculate, propagate count and probability information on CFG. */ afdo_calculate_branch_prob (&annotated_bb); diff --git a/gcc/ipa-utils.cc b/gcc/ipa-utils.cc index 956c6294fd7..3aaf7e595df 100644 --- a/gcc/ipa-utils.cc +++ b/gcc/ipa-utils.cc @@ -651,13 +651,15 @@ ipa_merge_profiles (struct cgraph_node *dst, { edge srce = EDGE_SUCC (srcbb, i); edge dste = EDGE_SUCC (dstbb, i); - dste->probability = - dste->probability * dstbb->count.ipa ().probability_in -(dstbb->count.ipa () - + srccount.ipa ()) - + srce->probability * srcbb->count.ipa ().probability_in -(dstbb->count.ipa () - + srccount.ipa ()); + profile_count total = dstbb->count.ipa () + srccount.ipa (); + if (total.nonzero_p ()) + { + dste->probability = + dste->probability * dstbb->count.ipa ().probability_in + (total) + + srce->probability * srcbb->count.ipa ().probability_in + (total); + } } dstbb->count = dstbb->count.ipa () + srccount.ipa (); } diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index 09641901ff1..2608c286e5d 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -3335,7 +3335,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, free (bbs); free (original_bbs); } - else + else if (old_count.nonzero_p ()) scale_loop_profile (epilog, guard_to->count.probability_in (old_count), -1); /* Only need to handle basic block before epilog loop if it's not -- 2.25.1
[PATCH] Remove .PHONY targets when building .fda files during autoprofiledbootstrap
These .PHONY targets are always executed and were breaking `make install` for autoprofiledbootstrap build. gcc/ChangeLog: * c/Make-lang.in: Make create_fdas_for_cc1 target not .PHONY * cp/Make-lang.in: Make create_fdas_for_cc1plus target not .PHONY * lto/Make-lang.in: Make create_fdas_for_lto1 target not .PHONY Tested on x86_64-pc-linux-gnu. --- gcc/c/Make-lang.in | 4 ++-- gcc/cp/Make-lang.in | 4 ++-- gcc/lto/Make-lang.in | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/gcc/c/Make-lang.in b/gcc/c/Make-lang.in index 79bc0dfd1cf..3ef8a674971 100644 --- a/gcc/c/Make-lang.in +++ b/gcc/c/Make-lang.in @@ -91,8 +91,6 @@ cc1$(exeext): $(C_OBJS) cc1-checksum.o $(BACKEND) $(LIBDEPS) components_in_prev = "bfd opcodes binutils fixincludes gas gcc gmp mpfr mpc isl gold intl ld libbacktrace libcpp libcody libdecnumber libiberty libiberty-linker-plugin libiconv zlib lto-plugin libctf libsframe" components_in_prev_target = "libstdc++-v3 libsanitizer libvtv libgcc libbacktrace libphobos zlib libgomp libatomic" -.PHONY: create_fdas_for_cc1 - cc1.fda: create_fdas_for_cc1 $(PROFILE_MERGER) $(shell ls -ha cc1_*.fda) --output_file cc1.fda -gcov_version 2 @@ -116,6 +114,8 @@ create_fdas_for_cc1: ../stage1-gcc/cc1$(exeext) ../prev-gcc/$(PERF_DATA) $(CREATE_GCOV) -binary ../prev-gcc/cc1$(exeext) -gcov $$profile_name -profile $$perf_path -gcov_version 2; \ fi; \ done; + + $(STAMP) $@ # # Build hooks: diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in index ba5e8766e99..2727fb7f8cc 100644 --- a/gcc/cp/Make-lang.in +++ b/gcc/cp/Make-lang.in @@ -189,8 +189,6 @@ cp/name-lookup.o: $(srcdir)/cp/std-name-hint.h components_in_prev = "bfd opcodes binutils fixincludes gas gcc gmp mpfr mpc isl gold intl ld libbacktrace libcpp libcody libdecnumber libiberty libiberty-linker-plugin libiconv zlib lto-plugin libctf libsframe" components_in_prev_target = "libstdc++-v3 libsanitizer libvtv libgcc libbacktrace libphobos zlib libgomp libatomic" -.PHONY: create_fdas_for_cc1plus - cc1plus.fda: create_fdas_for_cc1plus $(PROFILE_MERGER) $(shell ls -ha cc1plus_*.fda) --output_file cc1plus.fda -gcov_version 2 @@ -214,6 +212,8 @@ create_fdas_for_cc1plus: ../stage1-gcc/cc1plus$(exeext) ../prev-gcc/$(PERF_DATA) $(CREATE_GCOV) -binary ../prev-gcc/cc1plus$(exeext) -gcov $$profile_name -profile $$perf_path -gcov_version 2; \ fi; \ done; + + $(STAMP) $@ # # Build hooks: diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in index 98aa9f4cc39..7dc0a9fef42 100644 --- a/gcc/lto/Make-lang.in +++ b/gcc/lto/Make-lang.in @@ -108,8 +108,6 @@ lto/lto-dump.o: $(LTO_OBJS) components_in_prev = "bfd opcodes binutils fixincludes gas gcc gmp mpfr mpc isl gold intl ld libbacktrace libcpp libcody libdecnumber libiberty libiberty-linker-plugin libiconv zlib lto-plugin libctf libsframe" components_in_prev_target = "libstdc++-v3 libsanitizer libvtv libgcc libbacktrace libphobos zlib libgomp libatomic" -.PHONY: create_fdas_for_lto1 - lto1.fda: create_fdas_for_lto1 $(PROFILE_MERGER) $(shell ls -ha lto1_*.fda) --output_file lto1.fda -gcov_version 2 @@ -134,6 +132,8 @@ create_fdas_for_lto1: ../stage1-gcc/lto1$(exeext) ../prev-gcc/$(PERF_DATA) fi; \ done; + $(STAMP) $@ + # LTO testing is done as part of C/C++/Fortran etc. testing. check-lto: -- 2.25.1
RE: [EXTERNAL] [PATCH] Enable autofdo bootstrap for lto/fortran
This line in gcc/fortran/Make-lang.in looks wrong (copy/paste?): +f95.fda: create_fdas_for_lto1 There are no invocations of $(CREATE_GCOV in gcc/fortran/Make-lang.in so this is incomplete. -Original Message- From: Andi Kleen Sent: Thursday, October 31, 2024 4:19 PM To: gcc-patches@gcc.gnu.org Cc: Eugene Rozenfeld ; Andi Kleen Subject: [EXTERNAL] [PATCH] Enable autofdo bootstrap for lto/fortran From: Andi Kleen When autofdo bootstrap support was originally implemented there were issues with the LTO bootstrap, that is why it wasn't enabled for them. I retested this now and it works on x86_64-linux. Fortran was also missing, not sure why. Also enabled now. gcc/fortran/ChangeLog: * Make-lang.in: Enable autofdo. gcc/lto/ChangeLog: * Make-lang.in: Enable autofdo. --- gcc/fortran/Make-lang.in | 13 +++-- gcc/lto/Make-lang.in | 14 +- 2 files changed, 16 insertions(+), 11 deletions(-) diff --git a/gcc/fortran/Make-lang.in b/gcc/fortran/Make-lang.in index 0be3c6b654b1..7295118185fd 100644 --- a/gcc/fortran/Make-lang.in +++ b/gcc/fortran/Make-lang.in @@ -69,6 +69,15 @@ F95_OBJS = $(F95_PARSER_OBJS) $(FORTRAN_TARGET_OBJS) \ fortran_OBJS = $(F95_OBJS) fortran/gfortranspec.o +ifeq ($(if $(wildcard ../stage_current),$(shell cat \ + ../stage_current)),stageautofeedback) +$(fortran_OBJS): CFLAGS += -fauto-profile=f95.fda +$(fortran_OBJS): f95.fda +endif + +f95.fda: create_fdas_for_lto1 + $(PROFILE_MERGER) $(shell ls -ha f95_*.fda) --output_file f95.fda +-gcov_version 2 + # # Define the names for selecting gfortran in LANGUAGES. fortran: f951$(exeext) @@ -272,7 +281,7 @@ fortran.install-info: $(DESTDIR)$(infodir)/gfortran.info fortran.install-man: $(DESTDIR)$(man1dir)/$(GFORTRAN_INSTALL_NAME)$(man1ext) $(DESTDIR)$(man1dir)/$(GFORTRAN_INSTALL_NAME)$(man1ext): doc/gfortran.1 \ - installdirs + installdirs f95*.fda -rm -f $@ -$(INSTALL_DATA) $< $@ -chmod a-x $@ @@ -293,7 +302,7 @@ fortran.uninstall: # We just have to delete files specific to us. fortran.mostlyclean: - -rm -f gfortran$(exeext) gfortran-cross$(exeext) f951$(exeext) + -rm -f gfortran$(exeext) gfortran-cross$(exeext) f951$(exeext) +f95*.fda -rm -f fortran/*.o fortran.clean: diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in index b62ddcbe0dc9..4f9f21cdfc9e 100644 --- a/gcc/lto/Make-lang.in +++ b/gcc/lto/Make-lang.in @@ -29,15 +29,11 @@ lto_OBJS = $(LTO_OBJS) LTO_DUMP_OBJS = lto/lto-lang.o lto/lto-object.o attribs.o lto/lto-partition.o lto/lto-symtab.o lto/lto-dump.o lto/lto-common.o lto_dump_OBJS = $(LTO_DUMP_OBJS) -# this is only useful in a LTO bootstrap, but this does not work right -# now. Should reenable after this is fixed, but only when LTO bootstrap -# is enabled. - -#ifeq ($(if $(wildcard ../stage_current),$(shell cat \ -# ../stage_current)),stageautofeedback) -#$(LTO_OBJS): CFLAGS += -fauto-profile=lto1.fda -#$(LTO_OBJS): lto1.fda -#endif +ifeq ($(if $(wildcard ../stage_current),$(shell cat \ + ../stage_current)),stageautofeedback) +$(LTO_OBJS): CFLAGS += -fauto-profile=lto1.fda +$(LTO_OBJS): lto1.fda +endif # Rules -- 2.46.2
RE: [EXTERNAL] Re: [PATCH] PR117350: Keep assembler name for abstract decls for autofdo
The patch looks good to me. -Original Message- From: Richard Biener Sent: Wednesday, November 6, 2024 12:01 AM To: Andi Kleen Cc: Jason Merrill ; Andi Kleen ; gcc-patches@gcc.gnu.org; Eugene Rozenfeld ; pins...@gmail.com; Andi Kleen Subject: [EXTERNAL] Re: [PATCH] PR117350: Keep assembler name for abstract decls for autofdo On Tue, Nov 5, 2024 at 6:08 PM Andi Kleen wrote: > > On Tue, Nov 05, 2024 at 09:47:17AM +0100, Richard Biener wrote: > > On Tue, Nov 5, 2024 at 2:02 AM Jason Merrill wrote: > > > > > > On 10/31/24 4:40 PM, Andi Kleen wrote: > > > > From: Andi Kleen > > > > > > > > autofdo looks up inline stacks and tries to match them with the > > > > profile data using their symbol name. Make sure all decls that > > > > can be in a inline stack have a valid assembler name. > > > > > > > > This fixes a bootstrap problem with autoprofiledbootstrap and LTO. > > > > > > OK in a week if no other comments. > > > > Hmm, but DECL_ABSTRACT_P should be only set on entities that generate no > > code. > > > > How does autofdo look them up? Are you sure it's the abstract decl > > autofdo wants to lookup? Or is autofdo processing not serializing > > the compilation and thus it affects code generation on parts that > > have not been processed yet? > > > autofdo tries to match inlines to an inline stack derived from dwarf. > So if something appears in dwarf it has to be in its stack. For the > test case the abstract entity is in the dwarf stack. > > For the inside gcc lookup it walks the BLOCK_SUPERCONTEXT links and > looks at BLOCK_ABSTRACT_ORIGIN and ignores everything that has unknown > location > > Maybe there could be some filtering there, but it would need to be on > both sides, which would be a version break for the file format. Ah, OK - I wasn't aware it uses/requires debug info. It's still a bit odd that only the abstract decl has the correct symbol name. If it's abstract shouldn't it have the same symbol name as an actual entry? Maybe the symbol name noted in the debug info is wrong? I think the only DECL_ABSTRACT_P decls we have are from C++ CTOR/DTOR stuff. In any case, I withdraw my objection. Thanks for explaining, Richard. > -Andi >
RE: [EXTERNAL] [PATCH] Update gcc-auto-profile / gen_autofdo_event.py
The patch looks good to me. Thank you for fixing this, Andi. -Original Message- From: Andi Kleen Sent: Thursday, October 31, 2024 4:37 PM To: gcc-patches@gcc.gnu.org Cc: Eugene Rozenfeld ; Andi Kleen Subject: [EXTERNAL] [PATCH] Update gcc-auto-profile / gen_autofdo_event.py From: Andi Kleen - Fix warnings with newer python versions about bad escapes by making all the python string raw. - Add a fallback for using the builtin perf event list if the CPU model number is unknown. - Regenerate the shipped gcc-auto-profile with the changes. contrib/ChangeLog: * gen_autofdo_event.py: Convert strings to raw. Add fallback to using builtin perf event list. gcc/ChangeLog: * config/i386/gcc-auto-profile: Regenerate. --- contrib/gen_autofdo_event.py | 36 ++-- gcc/config/i386/gcc-auto-profile | 21 --- 2 files changed, 34 insertions(+), 23 deletions(-) diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py index 4c201943b5c7..4e58a5320fff 100755 --- a/contrib/gen_autofdo_event.py +++ b/contrib/gen_autofdo_event.py @@ -112,7 +112,7 @@ for j in u: u.close() if args.script: -print('''#!/bin/sh +print(r'''#!/bin/sh # Profile workload for gcc profile feedback (autofdo) using Linux perf. # Auto generated. To regenerate for new CPUs run # contrib/gen_autofdo_event.py --script --all in gcc source @@ -152,22 +152,26 @@ case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo && for event, mod in eventmap.items(): for m in mod[:-1]: print("model*:\ %s|\\" % m) -print('model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event)) -print('''*) +print(r'model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event)) +print(r'''*) +if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; then +E=br_inst_retired.near_taken:p +else echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script." - exit 1 ;;''') -print("esac") -print("set -x") -print('if ! perf record -e $E -b "$@" ; then') -print(' # PEBS may not actually be working even if the processor supports it') -print(' # (e.g., in a virtual machine). Trying to run without /p.') -print(' set +x') -print(' echo >&2 "Retrying without /p."') -print(' E="$(echo "${E}" | sed -e \'s/\/p/\//\')"') -print(' set -x') -print(' exec perf record -e $E -b "$@"') -print(' set +x') -print('fi') + exit 1 +fi ;;''') +print(r"esac") +print(r"set -x") +print(r'if ! perf record -e $E -b "$@" ; then') +print(r' # PEBS may not actually be working even if the processor supports it') +print(r' # (e.g., in a virtual machine). Trying to run without /p.') +print(r' set +x') +print(r' echo >&2 "Retrying without /p."') +print(r' E="$(echo "${E}" | sed -e \'s/\/p/\//\ -e s/:p//)"') +print(r' set -x') +print(r' exec perf record -e $E -b "$@"') +print(r' set +x') +print(r'fi') if cpufound == 0 and not args.all: sys.exit('CPU %s not found' % cpu) diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile index 04f7d35dcc51..528b34e42400 100755 --- a/gcc/config/i386/gcc-auto-profile +++ b/gcc/config/i386/gcc-auto-profile @@ -82,17 +82,24 @@ model*:\ 126|\ model*:\ 167|\ model*:\ 140|\ model*:\ 141|\ -model*:\ 143|\ -model*:\ 207|\ model*:\ 106|\ -model*:\ 108) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;; +model*:\ 108|\ +model*:\ 173|\ +model*:\ 174) E="cpu/event=0xc4,umask=0x20/$FLAGS" ;; model*:\ 134|\ model*:\ 150|\ -model*:\ 156|\ -model*:\ 190) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;; +model*:\ 156) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;; model*:\ 143|\ +model*:\ 207) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;; model*:\ 190) +E="cpu/event=0xc4,umask=0xc0/$FLAGS" ;; model*:\ 190) +E="cpu/event=0xc4,umask=0xfe/$FLAGS" ;; *) +if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; then +E=br_inst_retired.near_taken:p +else echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script." - exit 1 ;; + exit 1 +fi ;; esac set -x if ! perf record -e $E -b "$@" ; then @@ -100,7 +107,7 @@ if ! perf record -e $E -b "$@" ; then # (e.g., in a virtual machine). Trying to run without /p. set +x echo >&2 "Retrying without /p." - E="$(echo "${E}" | sed -e 's/\/p/\//')" + E="$(echo "${E}" | sed -e \'s/\/p/\//\ -e s/:p//)" set -x exec perf record -e $E -b "$@" set +x -- 2.46.2
[PATCH] Fix setting of call graph node AutoFDO count [PR116743]
We are initializing both the call graph node count and the entry block count of the function with the head_count value from the profile. Count propagation algorithm may refine the entry block count and we may end up with a case where the call graph node count is set to 0 but the entry block count is non-zero. That becomes a problem because we have this code in execute_fixup_cfg: profile_count num = node->count; profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; bool scale = num.initialized_p () && !(num == den); Here if num is 0 but den is not 0, scale becomes true and we lose the counts in if (scale) bb->count = bb->count.apply_scale (num, den); This is what happened the issue reported in PR116743 (a 10% regression in MySQL HAMMERDB tests). 3d9e6767939e9658260e2506e81ec32b37cba041 made an improvement in AutoFDO count propagation, which caused the mismatch between the call graph node count (zero) and the entry block count (non-zero) and subsequent loss of counts as described above. The fix is to update the call graph node count once we've done count propagation. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: PR gcov-profile/116743 * auto-profile.c (afdo_annotate_cfg): Fix mismatch between the call graph node count and the entry block count. --- gcc/auto-profile.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index 5d0e8afb9a1..aa4d1634f01 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -1538,8 +1538,6 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts) if (s == NULL) return; - cgraph_node::get (current_function_decl)->count - = profile_count::from_gcov_type (s->head_count ()).afdo (); ENTRY_BLOCK_PTR_FOR_FN (cfun)->count = profile_count::from_gcov_type (s->head_count ()).afdo (); EXIT_BLOCK_PTR_FOR_FN (cfun)->count = profile_count::zero ().afdo (); @@ -1578,6 +1576,8 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts) /* Calculate, propagate count and probability information on CFG. */ afdo_calculate_branch_prob (&annotated_bb); } + cgraph_node::get(current_function_decl)->count + = ENTRY_BLOCK_PTR_FOR_FN(cfun)->count; update_max_bb_count (); profile_status_for_fn (cfun) = PROFILE_READ; if (flag_value_profile_transformations) -- 2.34.1
RE: [EXTERNAL] Re: [PATCH] Fix setting of call graph node AutoFDO count [PR116743]
I committed the patch to trunk. Is it ok to backport to gcc-12, gcc-13, and gcc-14? -Original Message- From: Richard Biener Sent: Monday, January 13, 2025 11:22 PM To: Eugene Rozenfeld Cc: GCC-Patches-ML ; Jan Hubicka ; rvmal...@amazon.com Subject: [EXTERNAL] Re: [PATCH] Fix setting of call graph node AutoFDO count [PR116743] On Mon, Jan 13, 2025 at 10:47 PM Eugene Rozenfeld wrote: > > We are initializing both the call graph node count and > > the entry block count of the function with the head_count value > > from the profile. > > > > Count propagation algorithm may refine the entry block count > > and we may end up with a case where the call graph node count > > is set to 0 but the entry block count is non-zero. That becomes > > a problem because we have this code in execute_fixup_cfg: > > > > profile_count num = node->count; > > profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; > > bool scale = num.initialized_p () && !(num == den); > > > > Here if num is 0 but den is not 0, scale becomes true and we > > lose the counts in > > > > if (scale) > > bb->count = bb->count.apply_scale (num, den); > > > > This is what happened the issue reported in PR116743 > > (a 10% regression in MySQL HAMMERDB tests). > > 3d9e6767939e9658260e2506e81ec32b37cba041 made an improvement in > > AutoFDO count propagation, which caused the mismatch between > > the call graph node count (zero) and the entry block count (non-zero) > > and subsequent loss of counts as described above. > > > > The fix is to update the call graph node count once we've done count > propagation. > > > > Tested on x86_64-pc-linux-gnu. OK. Thanks, Richard. > > > gcc/ChangeLog: > > PR gcov-profile/116743 > > * auto-profile.c (afdo_annotate_cfg): Fix mismatch > between the call graph node count > > and the entry block count. > > --- > > gcc/auto-profile.cc | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc > > index 5d0e8afb9a1..aa4d1634f01 100644 > > --- a/gcc/auto-profile.cc > > +++ b/gcc/auto-profile.cc > > @@ -1538,8 +1538,6 @@ afdo_annotate_cfg (const stmt_set > &promoted_stmts) > >if (s == NULL) > > return; > > - cgraph_node::get (current_function_decl)->count > > - = profile_count::from_gcov_type (s->head_count ()).afdo (); > >ENTRY_BLOCK_PTR_FOR_FN (cfun)->count > > = profile_count::from_gcov_type (s->head_count ()).afdo (); > >EXIT_BLOCK_PTR_FOR_FN (cfun)->count = profile_count::zero ().afdo > (); > > @@ -1578,6 +1576,8 @@ afdo_annotate_cfg (const stmt_set > &promoted_stmts) > >/* Calculate, propagate count and probability information on > CFG. */ > >afdo_calculate_branch_prob (&annotated_bb); > > } > > + cgraph_node::get(current_function_decl)->count > > + = ENTRY_BLOCK_PTR_FOR_FN(cfun)->count; > >update_max_bb_count (); > >profile_status_for_fn (cfun) = PROFILE_READ; > >if (flag_value_profile_transformations) > > -- > > 2.34.1 > >
RE: [EXTERNAL] [PATCH] Update perf auto profile script
Ok for trunk. Thank you for updating this! Eugene -Original Message- From: Gcc-patches On Behalf Of Andi Kleen via Gcc-patches Sent: Tuesday, May 30, 2023 4:08 AM To: gcc-patches@gcc.gnu.org Cc: Andi Kleen Subject: [EXTERNAL] [PATCH] Update perf auto profile script - Fix gen_autofdo_event: The download URL for the Intel Perfmon Event list has changed, as well as the JSON format. Also it now uses pattern matching to match CPUs. Update the script to support all of this. - Regenerate gcc-auto-profile with the latest published Intel model numbers, so it works with recent systems. - So far it's still broken on hybrid systems --- contrib/gen_autofdo_event.py | 7 --- gcc/config/i386/gcc-auto-profile | 9 - 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py index ac23b83888db..533c706c090b 100755 --- a/contrib/gen_autofdo_event.py +++ b/contrib/gen_autofdo_event.py @@ -32,8 +32,9 @@ import json import argparse import collections import os +import fnmatch -baseurl = "https://download.01.org/perfmon"; +baseurl = "https://raw.githubusercontent.com/intel/perfmon/main"; target_events = ('BR_INST_RETIRED.NEAR_TAKEN', 'BR_INST_EXEC.TAKEN', @@ -74,7 +75,7 @@ def get_cpustr(): def find_event(eventurl, model): print("Downloading", eventurl, file = sys.stderr) u = urllib.request.urlopen(eventurl) -events = json.loads(u.read()) +events = json.loads(u.read())["Events"] u.close() found = 0 @@ -102,7 +103,7 @@ found = 0 cpufound = 0 for j in u: n = j.rstrip().decode().split(',') -if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core": +if len(n) >= 4 and (args.all or fnmatch.fnmatch(cpu, n[0])) and n[3] == "core": components = n[0].split("-") model = components[2] model = int(model, 16) diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile index 5ab224b041b9..04f7d35dcc51 100755 --- a/gcc/config/i386/gcc-auto-profile +++ b/gcc/config/i386/gcc-auto-profile @@ -43,8 +43,10 @@ model*:\ 47|\ model*:\ 37|\ model*:\ 44) E="cpu/event=0x88,umask=0x40/$FLAGS" ;; model*:\ 55|\ +model*:\ 74|\ model*:\ 77|\ model*:\ 76|\ +model*:\ 90|\ model*:\ 92|\ model*:\ 95|\ model*:\ 87|\ @@ -75,14 +77,19 @@ model*:\ 165|\ model*:\ 166|\ model*:\ 85|\ model*:\ 85) E="cpu/event=0xC4,umask=0x20/p$FLAGS" ;; +model*:\ 125|\ model*:\ 126|\ +model*:\ 167|\ model*:\ 140|\ model*:\ 141|\ model*:\ 143|\ +model*:\ 207|\ model*:\ 106|\ model*:\ 108) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;; model*:\ 134|\ -model*:\ 150) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;; +model*:\ 150|\ +model*:\ 156|\ +model*:\ 190) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;; *) echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script." exit 1 ;; -- 2.40.1
RE: [PING][PATCH] Add instruction level discriminator support.
One more ping for this patch https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596065.html CC Jason since this changes discriminators emitted in dwarf. Thanks, Eugene -Original Message- From: Eugene Rozenfeld Sent: Monday, June 27, 2022 12:45 PM To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan Hubicka Subject: RE: [PING][PATCH] Add instruction level discriminator support. Another ping for https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596065.html . I got a review from Andi (https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596549.html) but I also need a review from someone who can approve the changes. Thanks, Eugene -Original Message- From: Eugene Rozenfeld Sent: Friday, June 10, 2022 12:03 PM To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan Hubicka Subject: [PING][PATCH] Add instruction level discriminator support. Hello, I'd like to ping this patch: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596065.html Thanks, Eugene -Original Message- From: Gcc-patches On Behalf Of Eugene Rozenfeld via Gcc-patches Sent: Thursday, June 02, 2022 12:22 AM To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan Hubicka Subject: [EXTERNAL] [PATCH] Add instruction level discriminator support. This is the first in a series of patches to enable discriminator support in AutoFDO. This patch switches to tracking discriminators per statement/instruction instead of per basic block. Tracking per basic block was problematic since not all statements in a basic block needed a discriminator and, also, later optimizations could move statements between basic blocks making correlation during AutoFDO compilation unreliable. Tracking per statement also allows us to assign different discriminators to multiple function calls in the same basic block. A subsequent patch will add that support. The idea of this patch is based on commit 4c311d95cf6d9519c3c20f641cc77af7df491fdf by Dehao Chen in vendors/google/heads/gcc-4_8 but uses a slightly different approach. In Dehao's work special (normally unused) location ids and side tables were used to keep track of locations with discriminators. Things have changed since then and I don't think we have unused location ids anymore. Instead, I made discriminators a part of ad-hoc locations. The difference from Dehao's work also includes support for discriminator reading/writing in lto streaming and in modules. Tested on x86_64-pc-linux-gnu. 0001-Add-instruction-level-discriminator-support.patch Description: 0001-Add-instruction-level-discriminator-support.patch
[committed] MAINTAINERS: Add myself as AutoFDO maintainer
ChangeLog: * MAINTAINERS: Add myself as AutoFDO maintainer. --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 1a37f4419b9..02ced0c43aa 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -239,6 +239,7 @@ tree-ssaAndrew MacLeod tree browser/unparser Sebastian Pop scev, data dependence Sebastian Pop profile feedback Jan Hubicka +AutoFDOEugene Rozenfeld reload Ulrich Weigand RTL optimizers Eric Botcazou instruction combiner Segher Boessenkool @@ -603,7 +604,6 @@ Craig Rodrigues Erven Rohou Ira Rosen Yvan Roux -Eugene Rozenfeld Silvius Rus Matthew Sachs Ankur Saini -- 2.25.1
RE: [EXTERNAL] [PATCH] contrib: modernize gen_autofdo_event.py
The changes look good to me. Also adding Andi, the author of the script. Eugene -Original Message- From: Gcc-patches On Behalf Of Xi Ruoyao via Gcc-patches Sent: Sunday, June 26, 2022 11:15 PM To: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] [PATCH] contrib: modernize gen_autofdo_event.py Python 2 has been EOL'ed for two years. egrep has been deprecated for many years and the next grep release will start to print warning if it is used. -E option may be unsupported by some non-POSIX grep implementations, but gcc-auto-profile won't work on non-Linux systems anyway. contrib/ChangeLog: * gen_autofdo_event.py: Port to Python 3, and use grep -E instead of egrep. gcc/ChangeLog: * config/i386/gcc-auto-profile: Regenerate. --- contrib/gen_autofdo_event.py | 80 gcc/config/i386/gcc-auto-profile | 31 +++-- 2 files changed, 57 insertions(+), 54 deletions(-) diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py index 1eb6f1d6d85..7da2876530d 100755 --- a/contrib/gen_autofdo_event.py +++ b/contrib/gen_autofdo_event.py @@ -1,4 +1,4 @@ -#!/usr/bin/python +#!/usr/bin/python3 # Generate Intel taken branches Linux perf event script for autofdo profiling. # Copyright (C) 2016 Free Software Foundation, Inc. @@ -26,18 +26,19 @@ # Requires internet (https) access. This may require setting up a proxy # with export https_proxy=... # -import urllib2 +import urllib.request import sys import json import argparse import collections +import os baseurl = "https://download.01.org/perfmon"; -target_events = (u'BR_INST_RETIRED.NEAR_TAKEN', - u'BR_INST_EXEC.TAKEN', - u'BR_INST_RETIRED.TAKEN_JCC', - u'BR_INST_TYPE_RETIRED.COND_TAKEN') +target_events = ('BR_INST_RETIRED.NEAR_TAKEN', + 'BR_INST_EXEC.TAKEN', + 'BR_INST_RETIRED.TAKEN_JCC', + 'BR_INST_TYPE_RETIRED.COND_TAKEN') ap = argparse.ArgumentParser() ap.add_argument('--all', '-a', help='Print for all CPUs', action='store_true') @@ -71,47 +72,46 @@ def get_cpustr(): return "%s-%d-%X" % tuple(cpu)[:3] def find_event(eventurl, model): -print >>sys.stderr, "Downloading", eventurl -u = urllib2.urlopen(eventurl) +print("Downloading", eventurl, file = sys.stderr) +u = urllib.request.urlopen(eventurl) events = json.loads(u.read()) u.close() found = 0 for j in events: -if j[u'EventName'] in target_events: -event = "cpu/event=%s,umask=%s/" % (j[u'EventCode'], j[u'UMask']) -if u'PEBS' in j and j[u'PEBS'] > 0: +if j['EventName'] in target_events: +event = "cpu/event=%s,umask=%s/" % (j['EventCode'], j['UMask']) +if 'PEBS' in j and int(j['PEBS']) > 0: event += "p" if args.script: eventmap[event].append(model) else: -print j[u'EventName'], "event for model", model, "is", event +print(j['EventName'], "event for model", model, "is", + event) found += 1 return found if not args.all: -cpu = get_cpu_str() +cpu = get_cpustr() if not cpu: sys.exit("Unknown CPU type") url = baseurl + "/mapfile.csv" -print >>sys.stderr, "Downloading", url -u = urllib2.urlopen(url) +print("Downloading", url, file = sys.stderr) u = +urllib.request.urlopen(url) found = 0 cpufound = 0 for j in u: -n = j.rstrip().split(',') +n = j.rstrip().decode().split(',') if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core": -if args.all: -components = n[0].split("-") -model = components[2] -model = int(model, 16) +components = n[0].split("-") +model = components[2] +model = int(model, 16) cpufound += 1 found += find_event(baseurl + n[2], model) u.close() if args.script: -print '''#!/bin/sh +print('''#!/bin/sh # Profile workload for gcc profile feedback (autofdo) using Linux perf. # Auto generated. To regenerate for new CPUs run # contrib/gen_autofdo_event.py --script --all in gcc source @@ -146,27 +146,27 @@ if grep -q hypervisor /proc/cpuinfo ; then echo >&2 "Warning: branch profiling may not be functional in VMs" fi -case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo && - egrep "^model\s*:" /proc/cpuinfo | head -n1` in''' -for event, mod in eventmap.iteritems(): +case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo && + grep -E "^model\s*:" /proc/cpuinfo | head -n1` in''') +for event, mod in eventmap.items(): for m in mod[:-1]: -print "model*:\ %s|\\" % m -print 'model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event) -print '''*) +print("model*:\ %s|\\" % m) +print('model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event)) +print('''*) echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py
[PATCH] Fix collection and processing of autoprofile data for target libs
cc1, cc1plus, and lto built during STAGEautoprofile need to be built with debug info since they are used to build target libs. -gtoggle was turning off debug info for this stage. create_gcov should be passed prev-gcc/cc1, prev-gcc/cc1plus, and prev-gcc/lto instead of stage1-gcc/cc1, stage1-gcc/cc1plus, and stage1-gcc/lto when processing profile data collected while building target libraries. Tested on x86_64-pc-linux-gnu. ChangeLog: * Makefile.in: Remove -gtoggle for STAGEautoprofile * Makefile.tpl: Remove -gtoggle for STAGEautoprofile gcc/c/ChangeLog: * c/Make-lang.in: Pass correct stage cc1 when processing profile data collected while building target libraries gcc/cp/ChangeLog: * cp/Make-lang.in: Pass correct stage cc1plus when processing profile data collected while building target libraries gcc/lto/ChangeLog: * lto/Make-lang.in: Pass correct stage lto when processing profile data collected while building target libraries --- Makefile.in | 2 +- Makefile.tpl | 2 +- gcc/c/Make-lang.in | 4 ++-- gcc/cp/Make-lang.in | 4 ++-- gcc/lto/Make-lang.in | 2 +- 5 files changed, 7 insertions(+), 7 deletions(-) diff --git a/Makefile.in b/Makefile.in index b559454cc90..61e5faf550f 100644 --- a/Makefile.in +++ b/Makefile.in @@ -635,7 +635,7 @@ STAGEtrain_TFLAGS = $(filter-out -fchecking=1,$(STAGE3_TFLAGS)) STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use -fprofile-reproducible=parallel-runs STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS) -STAGEautoprofile_CFLAGS = $(STAGE2_CFLAGS) -g +STAGEautoprofile_CFLAGS = $(filter-out -gtoggle,$(STAGE2_CFLAGS)) -g STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS) STAGEautofeedback_CFLAGS = $(STAGE3_CFLAGS) diff --git a/Makefile.tpl b/Makefile.tpl index 6bcee3021c9..3a5b7ed3c92 100644 --- a/Makefile.tpl +++ b/Makefile.tpl @@ -558,7 +558,7 @@ STAGEtrain_TFLAGS = $(filter-out -fchecking=1,$(STAGE3_TFLAGS)) STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use -fprofile-reproducible=parallel-runs STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS) -STAGEautoprofile_CFLAGS = $(STAGE2_CFLAGS) -g +STAGEautoprofile_CFLAGS = $(filter-out -gtoggle,$(STAGE2_CFLAGS)) -g STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS) STAGEautofeedback_CFLAGS = $(STAGE3_CFLAGS) diff --git a/gcc/c/Make-lang.in b/gcc/c/Make-lang.in index 20840aceab6..79bc0dfd1cf 100644 --- a/gcc/c/Make-lang.in +++ b/gcc/c/Make-lang.in @@ -113,10 +113,10 @@ create_fdas_for_cc1: ../stage1-gcc/cc1$(exeext) ../prev-gcc/$(PERF_DATA) echo $$perf_path; \ if [ -f $$perf_path ]; then \ profile_name=cc1_$$component_in_prev_target.fda; \ - $(CREATE_GCOV) -binary ../stage1-gcc/cc1$(exeext) -gcov $$profile_name -profile $$perf_path -gcov_version 2; \ + $(CREATE_GCOV) -binary ../prev-gcc/cc1$(exeext) -gcov $$profile_name -profile $$perf_path -gcov_version 2; \ fi; \ done; -# +# # Build hooks: c.info: diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in index c08ee91447e..ba5e8766e99 100644 --- a/gcc/cp/Make-lang.in +++ b/gcc/cp/Make-lang.in @@ -211,10 +211,10 @@ create_fdas_for_cc1plus: ../stage1-gcc/cc1plus$(exeext) ../prev-gcc/$(PERF_DATA) echo $$perf_path; \ if [ -f $$perf_path ]; then \ profile_name=cc1plus_$$component_in_prev_target.fda; \ - $(CREATE_GCOV) -binary ../stage1-gcc/cc1plus$(exeext) -gcov $$profile_name -profile $$perf_path -gcov_version 2; \ + $(CREATE_GCOV) -binary ../prev-gcc/cc1plus$(exeext) -gcov $$profile_name -profile $$perf_path -gcov_version 2; \ fi; \ done; -# +# # Build hooks: c++.all.cross: g++-cross$(exeext) diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in index 4f6025100a3..98aa9f4cc39 100644 --- a/gcc/lto/Make-lang.in +++ b/gcc/lto/Make-lang.in @@ -130,7 +130,7 @@ create_fdas_for_lto1: ../stage1-gcc/lto1$(exeext) ../prev-gcc/$(PERF_DATA) echo $$perf_path; \ if [ -f $$perf_path ]; then \ profile_name=lto1_$$component_in_prev_target.fda; \ - $(CREATE_GCOV) -binary ../stage1-gcc/lto1$(exeext) -gcov $$profile_name -profile $$perf_path -gcov_version 2; \ + $(CREATE_GCOV) -binary ../prev-gcc/lto1$(exeext) -gcov $$profile_name -profile $$perf_path -gcov_version 2; \ fi; \ done; -- 2.25.1
[PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap
When we collect just user events for autofdo with lbr we get some events where branch sources are kernel addresses and branch targets are user addresses. Without kernel MMAP events create_gcov can't make sense of kernel addresses. Currently create_gcov fails if it can't map at least 95% of events. We sometimes get below this threshold with just user events. The change is to collect both user events and kernel events. Tested on x86_64-pc-linux-gnu. ChangeLog: * Makefile.in: Collect both kernel and user events for autofdo * Makefile.tpl: Collect both kernel and user events for autofdo gcc/testsuite/ChangeLog: * lib/target-supports.exp: Collect both kernel and user events for autofdo --- Makefile.in | 2 +- Makefile.tpl | 2 +- gcc/testsuite/lib/target-supports.exp | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Makefile.in b/Makefile.in index f19a9db621e..04307ca561b 100644 --- a/Makefile.in +++ b/Makefile.in @@ -404,7 +404,7 @@ MAKEINFO = @MAKEINFO@ EXPECT = @EXPECT@ RUNTEST = @RUNTEST@ -AUTO_PROFILE = gcc-auto-profile -c 1000 +AUTO_PROFILE = gcc-auto-profile --all -c 1000 # This just becomes part of the MAKEINFO definition passed down to # sub-makes. It lets flags be given on the command line while still diff --git a/Makefile.tpl b/Makefile.tpl index 3a5b7ed3c92..d0fe7e2fb77 100644 --- a/Makefile.tpl +++ b/Makefile.tpl @@ -407,7 +407,7 @@ MAKEINFO = @MAKEINFO@ EXPECT = @EXPECT@ RUNTEST = @RUNTEST@ -AUTO_PROFILE = gcc-auto-profile -c 1000 +AUTO_PROFILE = gcc-auto-profile --all -c 1000 # This just becomes part of the MAKEINFO definition passed down to # sub-makes. It lets flags be given on the command line while still diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 4d04df2a709..b16853d76df 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -704,7 +704,7 @@ proc check_effective_target_keeps_null_pointer_checks { } { # this allows parallelism of 16 and higher of parallel gcc-auto-profile proc profopt-perf-wrapper { } { global srcdir -return "$srcdir/../config/i386/gcc-auto-profile -m8 " +return "$srcdir/../config/i386/gcc-auto-profile --all -m8 " } # Return true if profiling is supported on the target. -- 2.25.1
RE: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap
I don't run this with elevated privileges but I set /proc/sys/kernel/kptr_restrict to 0. Setting that does require elevated privileges. If that's not acceptable, the only fix I can think of is to make that event mapping threshold percentage a parameter to create_gcov and pass something low enough. 80% instead of the current threshold of 95% should work, although it's a bit fragile. Eugene -Original Message- From: Sam James Sent: Friday, June 30, 2023 1:59 AM To: Richard Biener Cc: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap [You don't often get email from s...@gentoo.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] Richard Biener via Gcc-patches writes: > On Fri, Jun 30, 2023 at 7:28 AM Eugene Rozenfeld via Gcc-patches > wrote: >> >> When we collect just user events for autofdo with lbr we get some >> events where branch sources are kernel addresses and branch targets >> are user addresses. Without kernel MMAP events create_gcov can't make >> sense of kernel addresses. Currently create_gcov fails if it can't >> map at least 95% of events. We sometimes get below this threshold with just >> user events. The change is to collect both user events and kernel events. > > Does this require elevated privileges? Can we instead "fix" create_gcov here? Right, requiring privileges for this is going to be a no-go for a lot of builders. In a distro context, for example, it means we can't consider autofdo at all.
RE: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap
I also set /proc/sys/kernel/perf_event_paranoid to 1 instead of the default 2. -Original Message- From: Gcc-patches On Behalf Of Eugene Rozenfeld via Gcc-patches Sent: Friday, June 30, 2023 2:44 PM To: Sam James ; Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: RE: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap I don't run this with elevated privileges but I set /proc/sys/kernel/kptr_restrict to 0. Setting that does require elevated privileges. If that's not acceptable, the only fix I can think of is to make that event mapping threshold percentage a parameter to create_gcov and pass something low enough. 80% instead of the current threshold of 95% should work, although it's a bit fragile. Eugene -Original Message- From: Sam James Sent: Friday, June 30, 2023 1:59 AM To: Richard Biener Cc: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap [You don't often get email from s...@gentoo.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] Richard Biener via Gcc-patches writes: > On Fri, Jun 30, 2023 at 7:28 AM Eugene Rozenfeld via Gcc-patches > wrote: >> >> When we collect just user events for autofdo with lbr we get some >> events where branch sources are kernel addresses and branch targets >> are user addresses. Without kernel MMAP events create_gcov can't make >> sense of kernel addresses. Currently create_gcov fails if it can't >> map at least 95% of events. We sometimes get below this threshold with just >> user events. The change is to collect both user events and kernel events. > > Does this require elevated privileges? Can we instead "fix" create_gcov here? Right, requiring privileges for this is going to be a no-go for a lot of builders. In a distro context, for example, it means we can't consider autofdo at all.
RE: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap
There is no warning and perf /uk succeeds when kptr_restrict is set to 1 and perf_event_paranoid set to 2. However, create_gcov may fail since it won't be able to understand kernel addresses and it requires at least 95% of events to be successfully mapped. If I set both kptr_restrict and perf_event_paranoid to 1, then I do get warnings from perf (but it still succeeds and exits with a 0 code). And, of course create_gcov will also fail to map some events since it won't understand kernel addresses. WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check /proc/sys/kernel/kptr_restrict and /proc/sys/kernel/perf_event_paranoid. Samples in kernel functions may not be resolved if a suitable vmlinux file is not found in the buildid cache or in the vmlinux path. Samples in kernel modules won't be resolved at all. If some relocation was applied (e.g. kexec) symbols may be misresolved even with a suitable vmlinux or kallsyms file. Couldn't record kernel reference relocation symbol Symbol resolution may be skewed if relocation was used (e.g. kexec). Check /proc/kallsyms permission or run as root. [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.037 MB /home/erozen/gcc1_objdir/gcc/testsuite/gcc/indir-call-prof.perf.data (86 samples) ] Eugene -Original Message- From: Richard Biener Sent: Monday, July 3, 2023 12:47 AM To: Eugene Rozenfeld Cc: Sam James ; gcc-patches@gcc.gnu.org Subject: Re: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap On Sat, Jul 1, 2023 at 12:05 AM Eugene Rozenfeld wrote: > > I also set /proc/sys/kernel/perf_event_paranoid to 1 instead of the default 2. Does the perf attempt fail when the privileges are not adjusted and you specify --all? I see it adds /uk as flags, when I do > perf record -e instructions//uk ./a.out it doesn't complain in any way with > cat /proc/sys/kernel/kptr_restrict 1 > cat /proc/sys/kernel/perf_event_paranoid 2 so in case the 'kernel' side is simply ignored when profiling there isn't permitted/possible then I guess the patch is OK? Can you confirm? Thanks, Richard. > -Original Message- > From: Gcc-patches > On Behalf Of > Eugene Rozenfeld via Gcc-patches > Sent: Friday, June 30, 2023 2:44 PM > To: Sam James ; Richard Biener > > Cc: gcc-patches@gcc.gnu.org > Subject: RE: [EXTERNAL] Re: [PATCH] Collect both user and kernel > events for autofdo tests and autoprofiledbootstrap > > I don't run this with elevated privileges but I set > /proc/sys/kernel/kptr_restrict to 0. Setting that does require elevated > privileges. > > If that's not acceptable, the only fix I can think of is to make that event > mapping threshold percentage a parameter to create_gcov and pass something > low enough. 80% instead of the current threshold of 95% should work, although > it's a bit fragile. > > Eugene > > -Original Message- > From: Sam James > Sent: Friday, June 30, 2023 1:59 AM > To: Richard Biener > Cc: Eugene Rozenfeld ; > gcc-patches@gcc.gnu.org > Subject: [EXTERNAL] Re: [PATCH] Collect both user and kernel events > for autofdo tests and autoprofiledbootstrap > > [You don't often get email from s...@gentoo.org. Learn why this is > important at https://aka.ms/LearnAboutSenderIdentification ] > > Richard Biener via Gcc-patches writes: > > > On Fri, Jun 30, 2023 at 7:28 AM Eugene Rozenfeld via Gcc-patches > > wrote: > >> > >> When we collect just user events for autofdo with lbr we get some > >> events where branch sources are kernel addresses and branch targets > >> are user addresses. Without kernel MMAP events create_gcov can't > >> make sense of kernel addresses. Currently create_gcov fails if it > >> can't map at least 95% of events. We sometimes get below this threshold > >> with just user events. The change is to collect both user events and > >> kernel events. > > > > Does this require elevated privileges? Can we instead "fix" create_gcov > > here? > > Right, requiring privileges for this is going to be a no-go for a lot of > builders. In a distro context, for example, it means we can't consider > autofdo at all.
RE: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build
Hi Jeff, I revived profile_merger tool in http://github.com/google/autofdo and re-worked the patch to merge profiles for compiling the libraries. Please take a look at the attached patch. Thanks, Eugene -Original Message- From: Jeff Law Sent: Tuesday, November 22, 2022 10:16 PM To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org; Andi Kleen Subject: Re: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build [You don't often get email from jeffreya...@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] On 11/22/22 14:20, Eugene Rozenfeld wrote: > I took another look at this. We actually collect perf data when building the > libraries. So, we have ./prev-gcc/perf.data, ./prev-libcpp/perf.data, > ./prev-libiberty/perf.data, etc. But when creating gcov data for > -fauto-profile build of cc1plus or cc1 we only use ./prev-gcc/perf.data . So, > a better solution would be either having a single perf.data for all builds > (gcc and libraries) or merging perf.data files before attempting > autostagefeedback. What would you recommend? ISTM that if neither approach loses data, then they're functionally equivalent -- meaning that we can select whichever is easier to wire into our build system. A single perf.data might serialize the build. So perhaps separate, then merge right before autostagefeedback. But I'm willing to go with whatever you think is best. Jeff 0001-Fix-autoprofiledbootstrap-build.patch Description: 0001-Fix-autoprofiledbootstrap-build.patch
RE: [EXTERNAL] Re: [PATCH][PUSHED] Fix cfg maintenance after inlining in AutoFDO
>> free_dominance_info (CDI_DOMINATORS); @@ -1674,7 +1677,7 @@ >> auto_profile (void) >> pop_cfun (); >>} >> >> - return TODO_rebuild_cgraph_edges; >> + return 0; >This change isn't mentioned - was it accidential? No, it wasn't accidental. There is no reason to return TODO_rebuild_cgraph_edges since we called cgraph_edge::rebuild_edges () after each early_inline (). Here is more context before that diff: todo |= early_inline (); autofdo::afdo_annotate_cfg (promoted_stmts); compute_function_frequency (); /* Local pure-const may imply need to fixup the cfg. */ todo |= execute_fixup_cfg (); if (todo & TODO_cleanup_cfg) cleanup_tree_cfg (); free_dominance_info (CDI_DOMINATORS); free_dominance_info (CDI_POST_DOMINATORS); cgraph_edge::rebuild_edges (); compute_fn_summary (cgraph_node::get (current_function_decl), true); pop_cfun (); } return 0; Thanks, Eugene -Original Message- From: Richard Biener Sent: Monday, May 8, 2023 11:40 PM To: Eugene Rozenfeld Cc: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: [PATCH][PUSHED] Fix cfg maintenance after inlining in AutoFDO On Tue, May 9, 2023 at 12:27 AM Eugene Rozenfeld via Gcc-patches wrote: > > Todo from early_inliner needs to be propagated so that > cleanup_tree_cfg () is called if necessary. > > This bug was causing an assert in get_loop_body during ipa-sra in > autoprofiledbootstrap build since loops weren't fixed up and one of > the loops had num_nodes set to 0. > > Tested on x86_64-pc-linux-gnu. > > gcc/ChangeLog: > > * auto-profile.cc (auto_profile): Check todo from early_inline > to see if cleanup_tree_vfg needs to be called. _cfg > (early_inline): Return todo from early_inliner. > --- > gcc/auto-profile.cc | 21 - > 1 file changed, 12 insertions(+), 9 deletions(-) > > diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index > 360c42c4b89..e3af3555e75 100644 > --- a/gcc/auto-profile.cc > +++ b/gcc/auto-profile.cc > @@ -1589,13 +1589,14 @@ afdo_annotate_cfg (const stmt_set > &promoted_stmts) > > /* Wrapper function to invoke early inliner. */ > > -static void > +static unsigned int > early_inline () > { >compute_fn_summary (cgraph_node::get (current_function_decl), > true); > - unsigned todo = early_inliner (cfun); > + unsigned int todo = early_inliner (cfun); >if (todo & TODO_update_ssa_any) > update_ssa (TODO_update_ssa); > + return todo; > } > > /* Use AutoFDO profile to annoate the control flow graph. > @@ -1651,20 +1652,22 @@ auto_profile (void) > function before annotation, so the profile inside bar@loc_foo2 > will be useful. */ > autofdo::stmt_set promoted_stmts; > +unsigned int todo = 0; > for (int i = 0; i < 10; i++) >{ > -if (!flag_value_profile_transformations > -|| !autofdo::afdo_vpt_for_early_inline (&promoted_stmts)) > - break; > -early_inline (); > + if (!flag_value_profile_transformations > + || !autofdo::afdo_vpt_for_early_inline (&promoted_stmts)) > + break; > + todo |= early_inline (); >} > > -early_inline (); > +todo |= early_inline (); > autofdo::afdo_annotate_cfg (promoted_stmts); > compute_function_frequency (); > > /* Local pure-const may imply need to fixup the cfg. */ > -if (execute_fixup_cfg () & TODO_cleanup_cfg) > +todo |= execute_fixup_cfg (); > +if (todo & TODO_cleanup_cfg) >cleanup_tree_cfg (); > > free_dominance_info (CDI_DOMINATORS); @@ -1674,7 +1677,7 @@ > auto_profile (void) > pop_cfun (); >} > > - return TODO_rebuild_cgraph_edges; > + return 0; This change isn't mentioned - was it accidential? Otherwise looks OK. Thanks, Richard. > } > } /* namespace autofdo. */ > > -- > 2.25.1 >
[PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build
autoprofiledbootstrap build produces new warnings since inlining decisions are different from other builds. This patch contains fixes and workarounds for those warnings. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec_perm_interleave2): Work around -Wstringop-overflow false positive during autoprofiledbootstrap * ipa-devirt.cc (debug_tree_odr_name): Fix for -Wformat-overflow warning during autoprofiledbootstrap * lra-eliminations.cc (setup_can_eliminate): Work around -Wmaybe-uninitialized false positive during autoprofiledbootstrap * opts-common.cc (candidates_list_and_hint): Work around -Wstringop-overflow false positive during autoprofiledbootstrap * tree-ssa-ccp.cc (bit_value_unop): Work around -Wmaybe-uninitialized false positive during autoprofiledbootstrap * wide-int.h (wi::copy): Work around -Wmaybe-uninitialized false positive during autoprofiledbootstrap --- gcc/config/i386/i386-expand.cc | 11 +++ gcc/ipa-devirt.cc | 3 ++- gcc/lra-eliminations.cc| 11 +++ gcc/opts-common.cc | 1 + gcc/tree-ssa-ccp.cc| 11 +++ gcc/wide-int.h | 11 +++ 6 files changed, 47 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 634fe61ba79..be9f912775b 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -20419,6 +20419,13 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d) static bool expand_vec_perm_interleave3 (struct expand_vec_perm_d *d); +/* Work around -Wstringop-overflow false positive during autoprofiledbootstrap. */ + +# if GCC_VERSION >= 7001 +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wstringop-overflow" +#endif + /* A subroutine of ix86_expand_vec_perm_const_1. Try to simplify a two vector permutation into a single vector permutation by using an interleave operation to merge the vectors. */ @@ -20737,6 +20744,10 @@ expand_vec_perm_interleave2 (struct expand_vec_perm_d *d) return true; } +# if GCC_VERSION >= 7001 +#pragma GCC diagnostic pop +#endif + /* A subroutine of ix86_expand_vec_perm_const_1. Try to simplify a single vector cross-lane permutation into vpermq followed by any of the single insn permutations. */ diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc index 819860258d1..36ea266e834 100644 --- a/gcc/ipa-devirt.cc +++ b/gcc/ipa-devirt.cc @@ -4033,7 +4033,8 @@ debug_tree_odr_name (tree type, bool demangle) odr = cplus_demangle (odr, opts); } - fprintf (stderr, "%s\n", odr); + if (odr != NULL) +fprintf (stderr, "%s\n", odr); } /* Register ODR enum so we later stream record about its values. */ diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc index 4220639..05e2a7e0d68 100644 --- a/gcc/lra-eliminations.cc +++ b/gcc/lra-eliminations.cc @@ -138,6 +138,13 @@ lra_debug_elim_table (void) print_elim_table (stderr); } +/* Work around -Wmaybe-uninitialized false positive during autoprofiledbootstrap. */ + +# if GCC_VERSION >= 4007 +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wmaybe-uninitialized" +#endif + /* Setup possibility of elimination in elimination table element EP to VALUE. Setup FRAME_POINTER_NEEDED if elimination from frame pointer to stack pointer is not possible anymore. */ @@ -152,6 +159,10 @@ setup_can_eliminate (class lra_elim_table *ep, bool value) REGNO_POINTER_ALIGN (HARD_FRAME_POINTER_REGNUM) = 0; } +# if GCC_VERSION >= 4007 +#pragma GCC diagnostic pop +#endif + /* Map: eliminable "from" register -> its current elimination, or NULL if none. The elimination table may contain more than one elimination for the same hard register, but this map specifies diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc index 23ddcaa3b55..0bb8e34e2b0 100644 --- a/gcc/opts-common.cc +++ b/gcc/opts-common.cc @@ -1388,6 +1388,7 @@ candidates_list_and_hint (const char *arg, char *&str, p[len] = ' '; p += len + 1; } + gcc_assert(p > str); p[-1] = '\0'; return find_closest_string (arg, &candidates); } diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc index 03a984f2adf..a54e5a90464 100644 --- a/gcc/tree-ssa-ccp.cc +++ b/gcc/tree-ssa-ccp.cc @@ -1976,6 +1976,13 @@ bit_value_binop (enum tree_code code, signop sgn, int width, } } +/* Work around -Wmaybe-uninitialized false positive during autoprofiledbootstrap. */ + +# if GCC_VERSION >= 4007 +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wmaybe-uninitialized" +#endif + /* Return the propagation value when applying the operation CODE to the value RHS yielding type TYPE. */ @@ -2011,6 +2018,10 @@ bit_value_unop (enum tree_code code, tree type, tree rhs) return val; } +# if GCC_VERSION >= 4007 +#pragma GCC diagnostic pop +
RE: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build
> I cannot find a call to this debug function on trunk. How exactly did this > trigger a warning? Here is the command during autoprofiledbootstrap build that resulted in a warning: ~/gcc1_objdir/gcc$ /home/erozen/gcc1_objdir/./prev-gcc/xg++ -B/home/erozen/gcc1_objdir/./prev-gcc/ -B/home/erozen/GCC1/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/home/erozen/gcc1_objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/home/erozen/gcc1_objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/home/erozen/gcc1_objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/home/erozen/gcc1_objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/home/erozen/gcc1/libstdc++-v3/libsupc++ -L/home/erozen/gcc1_objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/home/erozen/gcc1_objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c -g -O2 -fchecking=1 -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -fauto-profile=cc1plus.fda -I. -I. -I/home/erozen/gcc1_objdir/../gcc1/gcc -I/home/erozen/gcc1_objdir/../gcc1/gcc/. -I/home/erozen/gcc1_objdir/../gcc1/gcc/../include -I/home/erozen/gcc1_objdir/../gcc1/gcc/../libcpp/include -I/home/erozen/gcc1_objdir/../gcc1/gcc/../libcody -I/home/erozen/gcc1_objdir/./gmp -I/home/erozen/gcc1/gmp -I/home/erozen/gcc1_objdir/./mpfr/src -I/home/erozen/gcc1/mpfr/src -I/home/erozen/gcc1/mpc/src -I/home/erozen/gcc1_objdir/../gcc1/gcc/../libdecnumber -I/home/erozen/gcc1_objdir/../gcc1/gcc/../libdecnumber/bid -I../libdecnumber -I/home/erozen/gcc1_objdir/../gcc1/gcc/../libbacktrace -I/home/erozen/gcc1_objdir/./isl/include -I/home/erozen/gcc1/isl/include -o ipa-devirt.o -MT ipa-devirt.o -MMD -MP -MF ./.deps/ipa-devirt.TPo /home/erozen/gcc1_objdir/../gcc1/gcc/ipa-devirt.cc /home/erozen/gcc1_objdir/../gcc1/gcc/ipa-devirt.cc: In function 'void debug_tree_odr_name(tree, bool)': /home/erozen/gcc1_objdir/../gcc1/gcc/ipa-devirt.cc:4037:23: error: '%s' directive argument is null [-Werror=format-overflow=] 4037 | fprintf (stderr, "%s\n", odr); > In any case, IMHO the function should rather print something that makes it > clear that an odr name could not be obtained rather than printing nothing. > I also think that if we want to handle the case, we should do it before also > possibly passing odr to demangler. I'll modify the fix unless we end up suppressing warnings as errors for autoprofiledbootstrap as Richard suggested. Thanks, Martin
RE: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build
I'm ok with disabling warnings as errors for autoprofiledbootstrap. What's the proper way to do that? Searching for "--disable-werror" I see matches in lib configure files but not in gcc files. Thanks, Eugene -Original Message- From: Richard Biener Sent: Tuesday, May 9, 2023 11:40 PM To: Eugene Rozenfeld Cc: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build On Wed, May 10, 2023 at 3:38 AM Eugene Rozenfeld via Gcc-patches wrote: > > autoprofiledbootstrap build produces new warnings since inlining > decisions are different from other builds. This patch contains fixes > and workarounds for those warnings. > > Tested on x86_64-pc-linux-gnu. Rather than this would it make sense to add --disable-werror to autoprofiledbootstrap configs like we do for others? I also wonder how "stable" the afdo bootstrap inlining decisions are, so applying these workarounds may not be sustainable? > gcc/ChangeLog: > > * config/i386/i386-expand.cc (expand_vec_perm_interleave2): Work > around > -Wstringop-overflow false positive during autoprofiledbootstrap > * ipa-devirt.cc (debug_tree_odr_name): Fix for -Wformat-overflow > warning during autoprofiledbootstrap > * lra-eliminations.cc (setup_can_eliminate): Work around > -Wmaybe-uninitialized false positive during autoprofiledbootstrap > * opts-common.cc (candidates_list_and_hint): Work around > -Wstringop-overflow false positive during autoprofiledbootstrap > * tree-ssa-ccp.cc (bit_value_unop): Work around -Wmaybe-uninitialized > false positive during autoprofiledbootstrap > * wide-int.h (wi::copy): Work around -Wmaybe-uninitialized false > positive during autoprofiledbootstrap > --- > gcc/config/i386/i386-expand.cc | 11 +++ > gcc/ipa-devirt.cc | 3 ++- > gcc/lra-eliminations.cc| 11 +++ > gcc/opts-common.cc | 1 + > gcc/tree-ssa-ccp.cc| 11 +++ > gcc/wide-int.h | 11 +++ > 6 files changed, 47 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/i386/i386-expand.cc > b/gcc/config/i386/i386-expand.cc index 634fe61ba79..be9f912775b 100644 > --- a/gcc/config/i386/i386-expand.cc > +++ b/gcc/config/i386/i386-expand.cc > @@ -20419,6 +20419,13 @@ expand_vec_perm_pblendv (struct > expand_vec_perm_d *d) > > static bool expand_vec_perm_interleave3 (struct expand_vec_perm_d > *d); > > +/* Work around -Wstringop-overflow false positive during > +autoprofiledbootstrap. */ > + > +# if GCC_VERSION >= 7001 > +#pragma GCC diagnostic push > +#pragma GCC diagnostic ignored "-Wstringop-overflow" > +#endif > + > /* A subroutine of ix86_expand_vec_perm_const_1. Try to simplify > a two vector permutation into a single vector permutation by using > an interleave operation to merge the vectors. */ @@ -20737,6 > +20744,10 @@ expand_vec_perm_interleave2 (struct expand_vec_perm_d *d) >return true; > } > > +# if GCC_VERSION >= 7001 > +#pragma GCC diagnostic pop > +#endif > + > /* A subroutine of ix86_expand_vec_perm_const_1. Try to simplify > a single vector cross-lane permutation into vpermq followed > by any of the single insn permutations. */ diff --git > a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc index 819860258d1..36ea266e834 > 100644 > --- a/gcc/ipa-devirt.cc > +++ b/gcc/ipa-devirt.cc > @@ -4033,7 +4033,8 @@ debug_tree_odr_name (tree type, bool demangle) >odr = cplus_demangle (odr, opts); > } > > - fprintf (stderr, "%s\n", odr); > + if (odr != NULL) > +fprintf (stderr, "%s\n", odr); > } > > /* Register ODR enum so we later stream record about its values. */ > diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc index > 4220639..05e2a7e0d68 100644 > --- a/gcc/lra-eliminations.cc > +++ b/gcc/lra-eliminations.cc > @@ -138,6 +138,13 @@ lra_debug_elim_table (void) >print_elim_table (stderr); > } > > +/* Work around -Wmaybe-uninitialized false positive during > +autoprofiledbootstrap. */ > + > +# if GCC_VERSION >= 4007 > +#pragma GCC diagnostic push > +#pragma GCC diagnostic ignored "-Wmaybe-uninitialized" > +#endif > + > /* Setup possibility of elimination in elimination table element EP to > VALUE. Setup FRAME_POINTER_NEEDED if elimination from frame > pointer to stack pointer is not possible anymore. */ @@ -152,6 > +159,10 @@ setup_can_eliminate (class lra_elim_table *ep, bool value) > REGNO_POINTER_ALIGN (HARD_FRAME_POINTER_REGNUM) = 0; } > &g
RE: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build
Thank you, Richard. I went with your suggestion. New patch: [PATCH] Disable warnings as errors for STAGEautofeedback. Compilation during STAGEautofeedback produces additional warnings since inlining decisions with -fauto-profile are different from other builds. This patches disables warnings as errors for STAGEautofeedback. Tested on x86_64-pc-linux-gnu. ChangeLog: * Makefile.in: Disable warnings as errors for STAGEautofeedback --- Makefile.in | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/Makefile.in b/Makefile.in index 33f3c862557..4c14c73ea61 100644 --- a/Makefile.in +++ b/Makefile.in @@ -590,8 +590,7 @@ STAGEautofeedback_CXXFLAGS = $(CXXFLAGS) STAGEautofeedback_CXXFLAGS = $(STAGEautofeedback_CFLAGS) @endif target-libstdc++-v3-bootstrap STAGEautofeedback_TFLAGS = $(STAGE_TFLAGS) -STAGEautofeedback_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS) - +STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out --enable-werror-always,$(STAGE_CONFIGURE_FLAGS)) # By default, C and C++ are the only stage1 languages, because they are the # only ones we require to build with the bootstrap compiler, and also the -- 2.25.1 Thanks, Eugene -Original Message- From: Richard Biener Sent: Thursday, May 11, 2023 1:58 AM To: Eugene Rozenfeld Cc: gcc-patches@gcc.gnu.org Subject: Re: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build On Thu, May 11, 2023 at 4:23 AM Eugene Rozenfeld wrote: > > I'm ok with disabling warnings as errors for autoprofiledbootstrap. What's > the proper way to do that? Searching for "--disable-werror" I see matches in > lib configure files but not in gcc files. We have --with-build-config selecting things like bootstrap-O3 and configure then disables werror by default if the build config is anything other than the default or bootstrap-debug. Of course profiledbootstrap and autoprofiledbootstrap are not build configs but make targets - that makes it more difficult (or impossible) to use the --disable-werror machinery here. There is STAGE_CONFIGURE_FLAGS=@stage2_werror_flag@ so it might be possible to filter out --enable-werror-always from STAGEautofeedback_CONFIGURE_FLAGS? Richard. > Thanks, > > Eugene > > -Original Message- > From: Richard Biener > Sent: Tuesday, May 9, 2023 11:40 PM > To: Eugene Rozenfeld > Cc: gcc-patches@gcc.gnu.org > Subject: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings > during autoprofiledbootstrap build > > On Wed, May 10, 2023 at 3:38 AM Eugene Rozenfeld via Gcc-patches > wrote: > > > > autoprofiledbootstrap build produces new warnings since inlining > > decisions are different from other builds. This patch contains fixes > > and workarounds for those warnings. > > > > Tested on x86_64-pc-linux-gnu. > > Rather than this would it make sense to add --disable-werror to > autoprofiledbootstrap configs like we do for others? I also wonder how > "stable" the afdo bootstrap inlining decisions are, so applying these > workarounds may not be sustainable? > > > gcc/ChangeLog: > > > > * config/i386/i386-expand.cc (expand_vec_perm_interleave2): Work > > around > > -Wstringop-overflow false positive during autoprofiledbootstrap > > * ipa-devirt.cc (debug_tree_odr_name): Fix for -Wformat-overflow > > warning during autoprofiledbootstrap > > * lra-eliminations.cc (setup_can_eliminate): Work around > > -Wmaybe-uninitialized false positive during autoprofiledbootstrap > > * opts-common.cc (candidates_list_and_hint): Work around > > -Wstringop-overflow false positive during autoprofiledbootstrap > > * tree-ssa-ccp.cc (bit_value_unop): Work around > > -Wmaybe-uninitialized > > false positive during autoprofiledbootstrap > > * wide-int.h (wi::copy): Work around -Wmaybe-uninitialized false > > positive during autoprofiledbootstrap > > --- > > gcc/config/i386/i386-expand.cc | 11 +++ > > gcc/ipa-devirt.cc | 3 ++- > > gcc/lra-eliminations.cc| 11 +++ > > gcc/opts-common.cc | 1 + > > gcc/tree-ssa-ccp.cc| 11 +++ > > gcc/wide-int.h | 11 +++ > > 6 files changed, 47 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/config/i386/i386-expand.cc > > b/gcc/config/i386/i386-expand.cc index 634fe61ba79..be9f912775b > > 100644 > > --- a/gcc/config/i386/i386-expand.cc > > +++ b/gcc/config/i386/i386-expand.cc > > @@ -20419,6 +20419,13 @@ expand_vec_perm_pblendv (struct > > expand_vec_perm_d *d) > > > > static bool expand_vec_perm_interlea
RE: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build
Thank you for catching this, Thomas! I modified Makefile.tmp and regenerated Makefile.in. Here is the patch I pushed: [PATCH] Disable warnings as errors for STAGEautofeedback. Compilation during STAGEautofeedback produces additional warnings since inlining decisions with -fauto-profile are different from other builds. This patches disables warnings as errors for STAGEautofeedback. Tested on x86_64-pc-linux-gnu. ChangeLog: * Makefile.tpl: Disable warnings as errors for STAGEautofeedback * Makefile.in: Regenerate --- Makefile.in | 8 +--- Makefile.tpl | 3 +++ 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/Makefile.in b/Makefile.in index a89bac02351..b559454cc90 100644 --- a/Makefile.in +++ b/Makefile.in @@ -590,9 +590,8 @@ STAGEautofeedback_CXXFLAGS = $(CXXFLAGS) STAGEautofeedback_CXXFLAGS = $(STAGEautofeedback_CFLAGS) @endif target-libstdc++-v3-bootstrap STAGEautofeedback_TFLAGS = $(STAGE_TFLAGS) -# Disable warnings as errors since inlining decisions with -fauto-profile -# may result in additional warnings. -STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out --enable-werror-always,$(STAGE_CONFIGURE_FLAGS)) +STAGEautofeedback_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS) + # By default, C and C++ are the only stage1 languages, because they are the # only ones we require to build with the bootstrap compiler, and also the @@ -641,6 +640,9 @@ STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS) STAGEautofeedback_CFLAGS = $(STAGE3_CFLAGS) STAGEautofeedback_TFLAGS = $(STAGE3_TFLAGS) +# Disable warnings as errors since inlining decisions with -fauto-profile +# may result in additional warnings. +STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out --enable-werror-always,$(STAGE_CONFIGURE_FLAGS)) do-compare = @do_compare@ do-compare3 = $(do-compare) diff --git a/Makefile.tpl b/Makefile.tpl index 9d8ef9cf678..6bcee3021c9 100644 --- a/Makefile.tpl +++ b/Makefile.tpl @@ -563,6 +563,9 @@ STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS) STAGEautofeedback_CFLAGS = $(STAGE3_CFLAGS) STAGEautofeedback_TFLAGS = $(STAGE3_TFLAGS) +# Disable warnings as errors since inlining decisions with -fauto-profile +# may result in additional warnings. +STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out --enable-werror-always,$(STAGE_CONFIGURE_FLAGS)) do-compare = @do_compare@ do-compare3 = $(do-compare) -- 2.25.1 Eugene -Original Message- From: Thomas Schwinge Sent: Wednesday, May 17, 2023 12:05 AM To: Richard Biener ; Eugene Rozenfeld Cc: gcc-patches@gcc.gnu.org Subject: Re: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build Hi! On 2023-05-15T09:30:35+0200, Richard Biener via Gcc-patches wrote: > On Fri, May 12, 2023 at 10:35 PM Eugene Rozenfeld > wrote: >> >> Thank you, Richard. I went with your suggestion. New patch: >> >> >> [PATCH] Disable warnings as errors for STAGEautofeedback. >> >> Compilation during STAGEautofeedback produces additional warnings >> since inlining decisions with -fauto-profile are different from other >> builds. >> >> This patches disables warnings as errors for STAGEautofeedback. > > Can you add a comment before the filtering? > > Otherwise looks good to me - please leave others 24h to comment before > you commit. >> --- a/Makefile.in >> +++ b/Makefile.in >> @@ -590,8 +590,7 @@ STAGEautofeedback_CXXFLAGS = $(CXXFLAGS) >> STAGEautofeedback_CXXFLAGS = $(STAGEautofeedback_CFLAGS) @endif >> target-libstdc++-v3-bootstrap STAGEautofeedback_TFLAGS = >> $(STAGE_TFLAGS) -STAGEautofeedback_CONFIGURE_FLAGS = >> $(STAGE_CONFIGURE_FLAGS) >> - >> +STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out >> +--enable-werror-always,$(STAGE_CONFIGURE_FLAGS)) That's not how it works; the next person running 'autogen Makefile.def' to regenerate 'Makefile.in' is going to undo those changes. Instead, modify 'Makefile.def', 'Makefile.tpl', and then 'autogen Makefile.def'. Grüße Thomas >> -Original Message- >> From: Richard Biener >> Sent: Thursday, May 11, 2023 1:58 AM >> To: Eugene Rozenfeld >> Cc: gcc-patches@gcc.gnu.org >> Subject: Re: [EXTERNAL] Re: [PATCH] Fixes and workarounds for >> warnings during autoprofiledbootstrap build >> >> On Thu, May 11, 2023 at 4:23 AM Eugene Rozenfeld >> wrote: >> > >> > I'm ok with disabling warnings as errors for autoprofiledbootstrap. What's >> > the proper way to do that? Searching for "--disable-werror" I see matches >> > in lib configure files but not in gcc files. >> >> We have --with-build-config selecting things like bootstrap-O3 and configure >> then disables werror by default if the build config is anything other than
RE: [EXTERNAL] Re: [PATCH] gcov-profile/71672 Fix indirect call inlining with AutoFDO
Thank you for the reviews, Andy and Richard. I split up the patch into 4 commits and pushed to trunk. Eugene -Original Message- From: Richard Biener Sent: Monday, August 2, 2021 2:57 AM To: Eugene Rozenfeld Cc: gcc-patches@gcc.gnu.org; mli...@suse.cz; Andi Kleen Subject: [EXTERNAL] Re: [PATCH] gcov-profile/71672 Fix indirect call inlining with AutoFDO On Fri, Jul 30, 2021 at 9:09 AM Eugene Rozenfeld via Gcc-patches wrote: > > This patch has the following changes: > > 1. The main fix is in auto-profile.c: the histogram value for >indirect calls was incorrectly set up. That is fixed now. > > 2. Several tests now have -fdump-ipa-afdo-optimized instead of -fdump-ipa-afdo >in dg-options so that the expected output can be found. > > 3. I increased the number of iterations in several tests so that perf can have >enough sampling events. > > 4. indir-call-prof-2.c has -fno-early-inlining but AutoFDO can't work without >early inlining (it needs to match the inlining of the profiled binary). >I changed profopt.exp to always pass -fearly-inlining for AutoFDO. >With that the indirect call inlining in indir-call-prof-2.c happens in the > early inliner >so I changed the dg-final-use-autofdo. > > 5. create_gcov tool doesn't currently support dwarf 5 so I made a change in > profopt.exp >to pass -gdwarf-4 when compiling the binary to profile. > > 6. I updated the invocation of create_gcov in profopt.exp to pass > -gcov_version=2. >I recently made a change to create_gcov to support version 2: > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith > ub.com%2Fgoogle%2Fautofdo%2Fpull%2F117&data=04%7C01%7CEugene.Rozen > feld%40microsoft.com%7C92927d4029754d0d6b4708d9559be06d%7C72f988bf86f1 > 41af91ab2d7cd011db47%7C1%7C0%7C637634950245832767%7CUnknown%7CTWFpbGZs > b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D > %7C1000&sdata=Ex1OpS0gt9dpsBVIK71k7hvjJbfIkN%2BlRr%2BYD86%2FqEs%3D > &reserved=0 > > 7. I removed useless -o perf.data from the invocation of gcc-auto-profile in >target-supports.exp. > > With these changes the tests checking indirect call inlining in gcc.dg > and g++.dg are passing. OK. Thanks, Richard. > gcc/ChangeLog: > PR gcov-profile/71672 > * auto-profile.c (afdo_indirect_call): Fix the setup of the > historgram value for indirect calls. > > gcc/testsuite/ChangeLog: > PR gcov-profile/71672 > * g++.dg/tree-prof/indir-call-prof.C: Fix options, increase the > number of iterations. > * g++.dg/tree-prof/morefunc.C: Fix options, increase the number of > iterations. > * g++.dg/tree-prof/reorder.C: Fix options, increase the number of > iterations. > * gcc.dg/tree-prof/indir-call-prof-2.c: Fix options, fix > dg-final-use-autofdo, increase the number of iterations. > * gcc.dg/tree-prof/indir-call-prof.c: Fix options. > * lib/profopt.exp: Pass gdwarf-4 when compiling binary to profile; > pass -fearly-inlining when compiling with AutoFDO; pass -gcov_version=2 to > create_gcov. > * lib/target-supports.exp: Remove unnecessary -o perf.data passed to > gcc-auto-profile. > --- > gcc/auto-profile.c | 13 + > gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C | 4 ++-- > gcc/testsuite/g++.dg/tree-prof/morefunc.C | 7 --- > gcc/testsuite/g++.dg/tree-prof/reorder.C | 6 +++--- > gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c | 8 > gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c | 2 +- > gcc/testsuite/lib/profopt.exp | 6 +++--- > gcc/testsuite/lib/target-supports.exp | 2 +- > 8 files changed, 27 insertions(+), 21 deletions(-) > > diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c index > b23b82b2df4..4c1fc6b536b 100644 > --- a/gcc/auto-profile.c > +++ b/gcc/auto-profile.c > @@ -1009,13 +1009,18 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, > const icall_target_map &map, > >histogram_value hist = gimple_alloc_histogram_value ( >cfun, HIST_TYPE_INDIR_CALL, stmt, callee); > - hist->n_counters = 3; > + hist->n_counters = 4; >hist->hvalue.counters = XNEWVEC (gcov_type, hist->n_counters); >gimple_add_histogram_value (cfun, stmt, hist); > > - hist->hvalue.counters[0] = direct_call->profile_id; > - hist->hvalue.counters[1] = max_iter->second; > - hist->hvalue.counters[2] = total; > + // Total counter > + hist->hvalue.counters[0] = total; > + // Number of value/counter pairs > + hist->hvalue.counters[1] = 1; > + // Value > + hist->hvalue.
[PATCH][PUSHED] Fix for an AutoFDO test.
After https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c17975d81aaed49ff759c20c68b31304a6953d58 the expected inlining in indir-call-prof-2.c test happens during afdo phase instead of einline. This patch adjusts the test accordingly. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/indir-call-prof-2.c: Fix dg-final-use-autofdo. --- gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c index 594c3f34d57..1d64d9f3f62 100644 --- a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c @@ -1,4 +1,4 @@ -/* { dg-options "-O2 -fno-early-inlining -fdump-ipa-profile-optimized -fdump-tree-einline-optimized" } */ +/* { dg-options "-O2 -fno-early-inlining -fdump-ipa-profile-optimized -fdump-ipa-afdo-optimized" } */ volatile int one; static int add1 (int val) @@ -31,5 +31,5 @@ main (void) } /* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* add1 .will resolve by ipa-profile" "profile"} } */ /* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* sub1 .will resolve by ipa-profile" "profile"} } */ -/* { dg-final-use-autofdo { scan-tree-dump "Inlining add1/1 into main/4." "einline"} } */ -/* { dg-final-use-autofdo { scan-tree-dump "Inlining sub1/2 into main/4." "einline"} } */ +/* { dg-final-use-autofdo { scan-ipa-dump "Inlining add1/1 into main/4." "afdo"} } */ +/* { dg-final-use-autofdo { scan-ipa-dump "Inlining sub1/2 into main/4." "afdo"} } */ -- 2.25.1
RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support.
Hi Jason, Do you have any more feedback for this patch? Thanks, Eugene -Original Message- From: Eugene Rozenfeld Sent: Thursday, September 08, 2022 5:46 PM To: Jason Merrill ; gcc-patches@gcc.gnu.org Cc: Andi Kleen ; Jan Hubicka Subject: RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support. Jason, Thank for your suggestion. The patch is updated (attached). Thanks, Eugene -Original Message- From: Jason Merrill Sent: Thursday, September 08, 2022 10:26 AM To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org Cc: Andi Kleen ; Jan Hubicka Subject: Re: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support. On 9/1/22 18:22, Eugene Rozenfeld wrote: > Jason, > > I made another small change in addressing your feedback (attached). > > Thanks, > > Eugene > > -Original Message- > From: Gcc-patches > On Behalf Of > Eugene Rozenfeld via Gcc-patches > Sent: Thursday, September 01, 2022 1:49 PM > To: Jason Merrill ; gcc-patches@gcc.gnu.org > Cc: Andi Kleen ; Jan Hubicka > Subject: RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator > support. > > Jason, > > Thank you for your review. You are correct, we only need to check > has_discriminator for the statement that's on the same line. > I updated the patch (attached). > > Thanks, > > Eugene > > -Original Message- > From: Jason Merrill > Sent: Thursday, August 18, 2022 6:55 PM > To: Eugene Rozenfeld ; > gcc-patches@gcc.gnu.org > Cc: Andi Kleen ; Jan Hubicka > Subject: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator > support. > > On 8/3/22 17:25, Eugene Rozenfeld wrote: >> One more ping for this patch >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc. >> gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data= >> 0 >> 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da8 >> 1 >> 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63796470932569195 >> 1 >> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6 >> I >> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2d >> T >> DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0 >> >> CC Jason since this changes discriminators emitted in dwarf. >> >> Thanks, >> >> Eugene >> >> -Original Message- >> From: Eugene Rozenfeld >> Sent: Monday, June 27, 2022 12:45 PM >> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan >> Hubicka >> Subject: RE: [PING][PATCH] Add instruction level discriminator support. >> >> Another ping for >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=05%7C01%7Ceugene.rozenfeld%40microsoft.com%7Cc7c9fab519184eee0bb808da91bf2fea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637982547579085499%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Kj3YWJrDqjX%2B0Ml3At5P8NRWc1Xs6mbI%2F1vCk9%2BLaQs%3D&reserved=0 >> . >> >> I got a review from Andi >> (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596549.html&data=05%7C01%7Ceugene.rozenfeld%40microsoft.com%7Cc7c9fab519184eee0bb808da91bf2fea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637982547579085499%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NBzFtD0mH7EYKxsVWfqZgSpQmUG18SIt01XRYUlwwW4%3D&reserved=0) >> but I also need a review from someone who can approve the changes. >> >> Thanks, >> >> Eugene >> >> -Original Message- >> From: Eugene Rozenfeld >> Sent: Friday, June 10, 2022 12:03 PM >> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan >> Hubicka >> Subject: [PING][PATCH] Add instruction level discriminator support. >> >> Hello, >> >> I'd like to ping this patch: >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc. >> gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data= >> 0 >> 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da8 >> 1 >> 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63796470932569195 >> 1 >> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6 >> I >> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2d >> T >> DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0 >> >> Thanks, >> >> Eugene >> >> -Orig
[PATCH] Fix profile count comparison.
The comparison was incorrect when the counts weren't PRECISE. For example, crossmodule-indir-call-topn-1.c was failing with AutoFDO: when count_sum is 0 with quality AFDO, count_sum > profile_count::zero() evaluates to true. Taking that branch then leads to an assert in the call to to_sreal(). Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * ipa-cp.cc (good_cloning_opportunity_p): Fix profile count comparison. --- gcc/ipa-cp.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index 543a9334e2c..66bba71c068 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -3338,9 +3338,9 @@ good_cloning_opportunity_p (struct cgraph_node *node, sreal time_benefit, ipa_node_params *info = ipa_node_params_sum->get (node); int eval_threshold = opt_for_fn (node->decl, param_ipa_cp_eval_threshold); - if (count_sum > profile_count::zero ()) + if (count_sum.nonzero_p ()) { - gcc_assert (base_count > profile_count::zero ()); + gcc_assert (base_count.nonzero_p ()); sreal factor = count_sum.probability_in (base_count).to_sreal (); sreal evaluation = (time_benefit * factor) / size_cost; evaluation = incorporate_penalties (node, info, evaluation); -- 2.25.1
[PATCH][PUSHED] Fix AutoFDO tests to not look for hot/cold splitting.
AutoFDO counts are not reliable and we are currently not performing hot/cold splitting based on them. This change adjusts several tree-prof tests not to check for hot/cold splitting when run with AutoFDO. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/cold_partition_label.c: Don't check for hot/cold splitting with AutoFDO. * gcc.dg/tree-prof/section-attr-1.c: Don't check for hot/cold splitting with AutoFDO. * gcc.dg/tree-prof/section-attr-2.c: Don't check for hot/cold splitting with AutoFDO. * gcc.dg/tree-prof/section-attr-3.c: Don't check for hot/cold splitting with AutoFDO. --- gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c | 4 ++-- gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c | 4 ++-- gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c | 4 ++-- gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c | 4 ++-- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c index 511b61067c0..b85e6c1f93d 100644 --- a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c +++ b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c @@ -43,6 +43,6 @@ main (int argc, char *argv[]) return 0; } -/* { dg-final-use { scan-assembler "foo\[._\]+cold" { target *-*-linux* *-*-gnu* } } } */ -/* { dg-final-use { scan-assembler "size\[ \ta-zA-Z0-0\]+foo\[._\]+cold" { target *-*-linux* *-*-gnu* } } } */ +/* { dg-final-use-not-autofdo { scan-assembler "foo\[._\]+cold" { target *-*-linux* *-*-gnu* } } } */ +/* { dg-final-use-not-autofdo { scan-assembler "size\[ \ta-zA-Z0-0\]+foo\[._\]+cold" { target *-*-linux* *-*-gnu* } } } */ /* { dg-final-use { scan-tree-dump-not "Invalid sum" "optimized"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c b/gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c index 2087d0d2059..5376de14a2f 100644 --- a/gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c +++ b/gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c @@ -52,5 +52,5 @@ foo (int path) } } -/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target *-*-linux* *-*-gnu* } } } */ -/* { dg-final-use { scan-assembler {.section[\t ]*__TEXT,__text_cold[^\n]*[\n\r]+_foo.cold:} { target *-*-darwin* } } } */ +/* { dg-final-use-not-autofdo { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target *-*-linux* *-*-gnu* } } } */ +/* { dg-final-use-not-autofdo { scan-assembler {.section[\t ]*__TEXT,__text_cold[^\n]*[\n\r]+_foo.cold:} { target *-*-darwin* } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c b/gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c index b02526beaea..90de2c08ca4 100644 --- a/gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c +++ b/gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c @@ -51,5 +51,5 @@ foo (int path) } } -/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target *-*-linux* *-*-gnu* } } } */ -/* { dg-final-use { scan-assembler {.section[\t ]*__TEXT,__text_cold[^\n]*[\n\r]+_foo.cold:} { target *-*-darwin* } } } */ +/* { dg-final-use-not-autofdo { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target *-*-linux* *-*-gnu* } } } */ +/* { dg-final-use-not-autofdo { scan-assembler {.section[\t ]*__TEXT,__text_cold[^\n]*[\n\r]+_foo.cold:} { target *-*-darwin* } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c b/gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c index da064070653..29a48f05feb 100644 --- a/gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c +++ b/gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c @@ -52,5 +52,5 @@ foo (int path) } } -/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target *-*-linux* *-*-gnu* } } } */ -/* { dg-final-use { scan-assembler {.section[\t ]*__TEXT,__text_cold[^\n]*[\n\r]+_foo.cold:} { target *-*-darwin* } } } */ +/* { dg-final-use-not-autofdo { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target *-*-linux* *-*-gnu* } } } */ +/* { dg-final-use-not-autofdo { scan-assembler {.section[\t ]*__TEXT,__text_cold[^\n]*[\n\r]+_foo.cold:} { target *-*-darwin* } } } */ -- 2.25.1
[PATCH] Emit discriminators for inlined call sites.
This change is based on commit 9fa26998a63d4b22b637ed8702520819e408a694 by Dehao Chen in vendors/google/heads/gcc-4_8. gcc/ChangeLog: * dwarf2out.cc (add_call_src_coords_attributes): Emit discriminators for inlined call sites. --- gcc/dwarf2out.cc | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc index 2df75904022..e81044b8c48 100644 --- a/gcc/dwarf2out.cc +++ b/gcc/dwarf2out.cc @@ -24783,7 +24783,8 @@ add_call_src_coords_attributes (tree stmt, dw_die_ref die) if (RESERVED_LOCATION_P (BLOCK_SOURCE_LOCATION (stmt))) return; - expanded_location s = expand_location (BLOCK_SOURCE_LOCATION (stmt)); + location_t locus = BLOCK_SOURCE_LOCATION (stmt); + expanded_location s = expand_location (locus); if (dwarf_version >= 3 || !dwarf_strict) { @@ -24791,6 +24792,9 @@ add_call_src_coords_attributes (tree stmt, dw_die_ref die) add_AT_unsigned (die, DW_AT_call_line, s.line); if (debug_column_info && s.column) add_AT_unsigned (die, DW_AT_call_column, s.column); + unsigned discr = get_discriminator_from_loc (locus); + if (discr != 0) + add_AT_unsigned (die, DW_AT_GNU_discriminator, discr); } } -- 2.25.1
[PATCH] Set discriminators for call stmts on the same line within the same basic block
This change is based on commit 1e6c4a7a8fb8e20545bb9f9032d3854f3f794c18 by Dehao Chen in vendors/google/heads/gcc-4_8. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * tree-cfg.cc (assign_discriminators): Set discriminators for call stmts on the same line within the same basic block. --- gcc/tree-cfg.cc | 31 +++ 1 file changed, 31 insertions(+) diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index ade66c54499..8e2a3a5f6c6 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -1203,8 +1203,39 @@ assign_discriminators (void) { edge e; edge_iterator ei; + gimple_stmt_iterator gsi; gimple *last = last_stmt (bb); location_t locus = last ? gimple_location (last) : UNKNOWN_LOCATION; + location_t curr_locus = UNKNOWN_LOCATION; + int curr_discr = 0; + + /* Traverse the basic block, if two function calls within a basic block + are mapped to the same line, assign a new discriminator because a call + stmt could be a split point of a basic block. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + expanded_location curr_locus_e; + if (curr_locus == UNKNOWN_LOCATION) + { + curr_locus = gimple_location (stmt); + curr_locus_e = expand_location (curr_locus); + } + else if (!same_line_p (curr_locus, &curr_locus_e, gimple_location (stmt))) + { + curr_locus = gimple_location (stmt); + curr_locus_e = expand_location (curr_locus); + curr_discr = 0; + } + else if (curr_discr != 0) + { + gimple_set_location (stmt, location_with_discriminator ( + gimple_location (stmt), curr_discr)); + } + /* Allocate a new discriminator for CALL stmt. */ + if (gimple_code (stmt) == GIMPLE_CALL) + curr_discr = next_discriminator_for_locus (curr_locus); + } if (locus == UNKNOWN_LOCATION) continue; -- 2.25.1
Re: [PATCH] Set discriminators for call stmts on the same line within the same basic block
Thank you for the review Jason. I fixed formatting and updated the commit description: Call statements are possible split points of a basic block so they may end up in different basic blocks by the time pass_ipa_auto_profile executes. This change will also simplify call site lookups since now location with discriminator will uniquely identify the call site (no callee function name is needed). This change is based on commit 1e6c4a7a8fb8e20545bb9f9032d3854f3f794c18 by Dehao Chen in vendors/google/heads/gcc-4_8. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * tree-cfg.cc (assign_discriminators): Set discriminators for call stmts on the same line within the same basic block. --- gcc/tree-cfg.cc | 32 1 file changed, 32 insertions(+) diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index ade66c54499..f6a465f4c91 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -1203,8 +1203,40 @@ assign_discriminators (void) { edge e; edge_iterator ei; + gimple_stmt_iterator gsi; gimple *last = last_stmt (bb); location_t locus = last ? gimple_location (last) : UNKNOWN_LOCATION; + location_t curr_locus = UNKNOWN_LOCATION; + int curr_discr = 0; + + /* Traverse the basic block, if two function calls within a basic block + are mapped to the same line, assign a new discriminator because a call + stmt could be a split point of a basic block. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + expanded_location curr_locus_e; + if (curr_locus == UNKNOWN_LOCATION) + { + curr_locus = gimple_location (stmt); + curr_locus_e = expand_location (curr_locus); + } + else if (!same_line_p (curr_locus, &curr_locus_e, gimple_location (stmt))) + { + curr_locus = gimple_location (stmt); + curr_locus_e = expand_location (curr_locus); + curr_discr = 0; + } + else if (curr_discr != 0) + { + location_t loc = gimple_location (stmt); + location_t dloc = location_with_discriminator (loc, curr_discr); + gimple_set_location (stmt, dloc); + } + /* Allocate a new discriminator for CALL stmt. */ + if (gimple_code (stmt) == GIMPLE_CALL) + curr_discr = next_discriminator_for_locus (curr_locus); + } if (locus == UNKNOWN_LOCATION) continue; -- 2.25.1 -Original Message- From: Jason Merrill Sent: Tuesday, October 04, 2022 3:21 PM To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: [PATCH] Set discriminators for call stmts on the same line within the same basic block On 10/3/22 02:08, Eugene Rozenfeld wrote: > This change is based on commit > 1e6c4a7a8fb8e20545bb9f9032d3854f3f794c18 > by Dehao Chen in vendors/google/heads/gcc-4_8. > > Tested on x86_64-pc-linux-gnu. Brief rationale for the change? > gcc/ChangeLog: > * tree-cfg.cc (assign_discriminators): Set discriminators for call > stmts > on the same line within the same basic block. > --- > gcc/tree-cfg.cc | 31 +++ > 1 file changed, 31 insertions(+) > > diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index > ade66c54499..8e2a3a5f6c6 100644 > --- a/gcc/tree-cfg.cc > +++ b/gcc/tree-cfg.cc > @@ -1203,8 +1203,39 @@ assign_discriminators (void) > { > edge e; > edge_iterator ei; > + gimple_stmt_iterator gsi; > gimple *last = last_stmt (bb); > location_t locus = last ? gimple_location (last) : > UNKNOWN_LOCATION; > + location_t curr_locus = UNKNOWN_LOCATION; > + int curr_discr = 0; > + > + /* Traverse the basic block, if two function calls within a basic block > + are mapped to the same line, assign a new discriminator because a call > + stmt could be a split point of a basic block. */ > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > + { > + gimple *stmt = gsi_stmt (gsi); > + expanded_location curr_locus_e; > + if (curr_locus == UNKNOWN_LOCATION) > + { > + curr_locus = gimple_location (stmt); > + curr_locus_e = expand_location (curr_locus); > + } > + else if (!same_line_p (curr_locus, &curr_locus_e, gimple_location > (stmt))) > + { > + curr_locus = gimple_location (stmt); > + curr_locus_e = expand_location (curr_locus); > + curr_discr = 0; > + } > + else if (curr_discr != 0) > + { > + gimple_set_location (stmt, location_with_discrim
[PATCH][ICE] Fix for PR107193.
The bug was introduced in f30e9fd33e56a5a721346ea6140722e1b193db42. A variable (cur_locus_e) was incorrectly declared inside a loop. I also moved two other declarations (last and locus) down to make the code more clear. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: PR debug/107193 * tree-cfg.cc (assign_discriminators): Move declaration of cur_locus_e out of the loop. --- gcc/tree-cfg.cc | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index 41f2925665f..ae781871a19 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -1204,9 +1204,8 @@ assign_discriminators (void) edge e; edge_iterator ei; gimple_stmt_iterator gsi; - gimple *last = last_stmt (bb); - location_t locus = last ? gimple_location (last) : UNKNOWN_LOCATION; location_t curr_locus = UNKNOWN_LOCATION; + expanded_location curr_locus_e = {}; int curr_discr = 0; /* Traverse the basic block, if two function calls within a basic block @@ -1215,7 +1214,7 @@ assign_discriminators (void) for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { gimple *stmt = gsi_stmt (gsi); - expanded_location curr_locus_e; + if (curr_locus == UNKNOWN_LOCATION) { curr_locus = gimple_location (stmt); @@ -1238,6 +1237,8 @@ assign_discriminators (void) curr_discr = next_discriminator_for_locus (curr_locus); } + gimple *last = last_stmt (bb); + location_t locus = last ? gimple_location (last) : UNKNOWN_LOCATION; if (locus == UNKNOWN_LOCATION) continue; -- 2.25.1
Re: [EXTERNAL] Re: [PATCH] Set discriminators for call stmts on the same line within the same basic block
I sent a patch that fixes a bug introduced by this patch: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603203.html What you are seeing could have been caused by the same bug since it involves an uninitialized variable. Eugene On Oct 10, 2022, at 5:54 PM, David Edelsohn wrote: This patch causes a bootstrap comparison failure on AIX. It apparently does not cause a failure on PPC64BE Linux with the same ABI, so I suspect that the failure may be related to the way that function aliases are implemented on AIX, which doesn't have ELF symbol alias semantics. "This change will also simplify call site lookups since now location with discriminator will uniquely identify the call site (no callee function name is needed)." I will open a PR with more information about the comparison difference now that I have a work-around to bring AIX back to a bootstrappable state. Any thoughts about what could be going wrong? Thanks, David
[PATCH] AutoFDO: don't set param_early_inliner_max_iterations to 10.
param_early_inliner_max_iterations specifies the maximum number of nested indirect inlining iterations performed by early inliner. Normally, the default value is 1. For AutoFDO this parameter was also used as the number of iteration for its indirect call promotion loop and the default value was set to 10. While it makes sense to have 10 in the indirect call promotion loop (we want to make the IR match the profiled binary before actual annotation) there is no reason to have a special default value for the regular early inliner. This change removes the special AutoFDO default value setting for param_early_inliner_max_iterations while keeping 10 as the number of iterations for the AutoFDO indirect call promotion loop. This change improves a simple fibonacci benchmark in AutoFDO mode by 15% on x86_64-pc-linux-gnu. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * auto-profile.cc (auto_profile): Hard-code the number of iterations (10). gcc/ChangeLog: * opt.cc (common_handle_option): Don't set param_early_inliner_max_iterations to 10 for AutoFDO. --- gcc/auto-profile.cc | 3 +-- gcc/opts.cc | 2 -- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index 0bfaae7b091..c7cee639c85 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -1644,8 +1644,7 @@ auto_profile (void) function before annotation, so the profile inside bar@loc_foo2 will be useful. */ autofdo::stmt_set promoted_stmts; -for (int i = 0; i < opt_for_fn (node->decl, - param_early_inliner_max_iterations); i++) +for (int i = 0; i < 10; i++) { if (!flag_value_profile_transformations || !autofdo::afdo_vpt_for_early_inline (&promoted_stmts)) diff --git a/gcc/opts.cc b/gcc/opts.cc index 17e1884f0e4..f6f6a8e1709 100644 --- a/gcc/opts.cc +++ b/gcc/opts.cc @@ -2899,8 +2899,6 @@ common_handle_option (struct gcc_options *opts, case OPT_fauto_profile: enable_fdo_optimizations (opts, opts_set, value); SET_OPTION_IF_UNSET (opts, opts_set, flag_profile_correction, value); - SET_OPTION_IF_UNSET (opts, opts_set, - param_early_inliner_max_iterations, 10); break; case OPT_fprofile_generate_: -- 2.25.1
RE: [EXTERNAL] Re: [PATCH] AutoFDO: don't set param_early_inliner_max_iterations to 10.
Thank you for the review Richard. Is it ok to commit this change to trunk now or should I wait till GCC 13 development starts? This seems like a very low risk change. Thanks, Eugene -Original Message- From: Richard Biener Sent: Monday, January 31, 2022 1:23 AM To: Eugene Rozenfeld Cc: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: [PATCH] AutoFDO: don't set param_early_inliner_max_iterations to 10. On Sat, Jan 29, 2022 at 12:24 AM Eugene Rozenfeld via Gcc-patches wrote: > > param_early_inliner_max_iterations specifies the maximum number of > nested indirect inlining iterations performed by early inliner. > Normally, the default value is 1. > > For AutoFDO this parameter was also used as the number of iteration > for its indirect call promotion loop and the default value was set to 10. > While it makes sense to have 10 in the indirect call promotion loop > (we want to make the IR match the profiled binary before actual > annotation) there is no reason to have a special default value for the > regular early inliner. > > This change removes the special AutoFDO default value setting for > param_early_inliner_max_iterations while keeping 10 as the number of > iterations for the AutoFDO indirect call promotion loop. > > This change improves a simple fibonacci benchmark in AutoFDO mode by > 15% on x86_64-pc-linux-gnu. > > Tested on x86_64-pc-linux-gnu. OK. Richard. > gcc/ChangeLog: > * auto-profile.cc (auto_profile): Hard-code the number of iterations > (10). > > gcc/ChangeLog: > * opt.cc (common_handle_option): Don't set > param_early_inliner_max_iterations to 10 for AutoFDO. > --- > gcc/auto-profile.cc | 3 +-- > gcc/opts.cc | 2 -- > 2 files changed, 1 insertion(+), 4 deletions(-) > > diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index > 0bfaae7b091..c7cee639c85 100644 > --- a/gcc/auto-profile.cc > +++ b/gcc/auto-profile.cc > @@ -1644,8 +1644,7 @@ auto_profile (void) > function before annotation, so the profile inside bar@loc_foo2 > will be useful. */ > autofdo::stmt_set promoted_stmts; > -for (int i = 0; i < opt_for_fn (node->decl, > - param_early_inliner_max_iterations); i++) > +for (int i = 0; i < 10; i++) >{ > if (!flag_value_profile_transformations > || !autofdo::afdo_vpt_for_early_inline (&promoted_stmts)) > diff --git a/gcc/opts.cc b/gcc/opts.cc index 17e1884f0e4..f6f6a8e1709 > 100644 > --- a/gcc/opts.cc > +++ b/gcc/opts.cc > @@ -2899,8 +2899,6 @@ common_handle_option (struct gcc_options *opts, > case OPT_fauto_profile: >enable_fdo_optimizations (opts, opts_set, value); >SET_OPTION_IF_UNSET (opts, opts_set, flag_profile_correction, value); > - SET_OPTION_IF_UNSET (opts, opts_set, > - param_early_inliner_max_iterations, 10); >break; > > case OPT_fprofile_generate_: > -- > 2.25.1
[PATCH] AutoFDO: Don't try to promote indirect calls that result in recursive direct calls
AutoFDO tries to promote and inline all indirect calls that were promoted and inlined in the original binary and that are still hot. In the included test case, the promotion results in a direct call that is a recursive call. inline_call and optimize_inline_calls can't handle recursive calls at this stage. Currently, inline_call fails with a segmentation fault. This change leaves the indirect call alone if promotion will result in a recursive call. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * auto-profile.cc (afdo_indirect_call): Don't attempt to promote indirect calls that will result in direct recursive calls. gcc/testsuite/g++.dg/tree-prof/ChangeLog: * indir-call-recursive-inlining.C : New test. --- gcc/auto-profile.cc | 40 -- .../tree-prof/indir-call-recursive-inlining.C | 54 +++ 2 files changed, 78 insertions(+), 16 deletions(-) create mode 100644 gcc/testsuite/g++.dg/tree-prof/indir-call-recursive-inlining.C diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index c7cee639c85..2b34b80b82d 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -975,7 +975,7 @@ read_profile (void) * after annotation, we just need to mark, and let follow-up logic to decide if it needs to promote and inline. */ -static void +static bool afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map, bool transform) { @@ -983,12 +983,12 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map, tree callee; if (map.size () == 0) -return; +return false; gcall *stmt = dyn_cast (gs); if (!stmt || gimple_call_internal_p (stmt) || gimple_call_fndecl (stmt) != NULL_TREE) -return; +return false; gcov_type total = 0; icall_target_map::const_iterator max_iter = map.end (); @@ -1003,7 +1003,7 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map, struct cgraph_node *direct_call = cgraph_node::get_for_asmname ( get_identifier (afdo_string_table->get_name (max_iter->first))); if (direct_call == NULL || !direct_call->profile_id) -return; +return false; callee = gimple_call_fn (stmt); @@ -1013,20 +1013,27 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map, hist->hvalue.counters = XNEWVEC (gcov_type, hist->n_counters); gimple_add_histogram_value (cfun, stmt, hist); - // Total counter + /* Total counter */ hist->hvalue.counters[0] = total; - // Number of value/counter pairs + /* Number of value/counter pairs */ hist->hvalue.counters[1] = 1; - // Value + /* Value */ hist->hvalue.counters[2] = direct_call->profile_id; - // Counter + /* Counter */ hist->hvalue.counters[3] = max_iter->second; if (!transform) -return; +return false; + + cgraph_node* current_function_node = cgraph_node::get (current_function_decl); + + /* If the direct call is a recursive call, don't promote it since + we are not set up to inline recursive calls at this stage. */ + if (direct_call == current_function_node) +return false; struct cgraph_edge *indirect_edge - = cgraph_node::get (current_function_decl)->get_edge (stmt); + = current_function_node->get_edge (stmt); if (dump_file) { @@ -1040,13 +1047,13 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map, { if (dump_file) fprintf (dump_file, " not transforming\n"); - return; + return false; } if (DECL_STRUCT_FUNCTION (direct_call->decl) == NULL) { if (dump_file) fprintf (dump_file, " no declaration\n"); - return; + return false; } if (dump_file) @@ -1063,16 +1070,17 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map, cgraph_edge::redirect_call_stmt_to_callee (new_edge); gimple_remove_histogram_value (cfun, stmt, hist); inline_call (new_edge, true, NULL, NULL, false); + return true; } /* From AutoFDO profiles, find values inside STMT for that we want to measure histograms and adds them to list VALUES. */ -static void +static bool afdo_vpt (gimple_stmt_iterator *gsi, const icall_target_map &map, bool transform) { - afdo_indirect_call (gsi, map, transform); + return afdo_indirect_call (gsi, map, transform); } typedef std::set bb_set; @@ -1498,8 +1506,8 @@ afdo_vpt_for_early_inline (stmt_set *promoted_stmts) { /* Promote the indirect call and update the promoted_stmts. */ promoted_stmts->insert (stmt); -afdo_vpt (&gsi, info.targets, true); -has_vpt = true; +if (afdo_vpt (&gsi, info.targets, true)) + has_vpt = true; } } } diff --git a/gcc/testsuite/g++.dg/tree-prof/indir-call-recursive-inlining.C b/gcc/testsuite/g++.dg/tree-prof/indir-call-recursive-inlining.C new file mode
[PATCH] Improve AutoFDO count propagation algorithm
When a basic block A has been annotated with a count and it has only one successor (or predecessor) B, we can propagate the A's count to B. The algorithm without this change could leave B without an annotation if B had other unannotated predecessors (or successors). For example, in the test case I added, the loop header block was left unannotated, which prevented loop unrolling. gcc/ChangeLog: * auto-profile.c (afdo_propagate_edge): Improve count propagation algorithm. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/init-array.c: New test for unrolling inner loops. --- gcc/auto-profile.c | 20 +- gcc/testsuite/gcc.dg/tree-prof/init-array.c | 43 + 2 files changed, 61 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-prof/init-array.c diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c index 4c1fc6b536b..dfcd68113aa 100644 --- a/gcc/auto-profile.c +++ b/gcc/auto-profile.c @@ -1192,7 +1192,8 @@ afdo_find_equiv_class (bb_set *annotated_bb) /* If a basic block's count is known, and only one of its in/out edges' count is unknown, its count can be calculated. Meanwhile, if all of the in/out edges' counts are known, then the basic block's unknown count can also be - calculated. + calculated. Also, if a block has a single predecessor or successor, the block's + count can be propagated to that predecessor or successor. IS_SUCC is true if out edges of a basic blocks are examined. Update ANNOTATED_BB accordingly. Return TRUE if any basic block/edge count is changed. */ @@ -1208,6 +1209,7 @@ afdo_propagate_edge (bool is_succ, bb_set *annotated_bb) edge e, unknown_edge = NULL; edge_iterator ei; int num_unknown_edge = 0; +int num_edge = 0; profile_count total_known_count = profile_count::zero ().afdo (); FOR_EACH_EDGE (e, ei, is_succ ? bb->succs : bb->preds) @@ -1217,6 +1219,7 @@ afdo_propagate_edge (bool is_succ, bb_set *annotated_bb) num_unknown_edge++, unknown_edge = e; else total_known_count += AFDO_EINFO (e)->get_count (); + num_edge++; } /* Be careful not to annotate block with no successor in special cases. */ @@ -1230,7 +1233,20 @@ afdo_propagate_edge (bool is_succ, bb_set *annotated_bb) else if (num_unknown_edge == 1 && is_bb_annotated (bb, *annotated_bb)) { if (bb->count > total_known_count) - AFDO_EINFO (unknown_edge)->set_count (bb->count - total_known_count); + { + profile_count new_count = bb->count - total_known_count; + AFDO_EINFO(unknown_edge)->set_count(new_count); + if (num_edge == 1) + { + basic_block succ_or_pred_bb = is_succ ? unknown_edge->dest : unknown_edge->src; + if (new_count > succ_or_pred_bb->count) + { + succ_or_pred_bb->count = new_count; + if (!is_bb_annotated (succ_or_pred_bb, *annotated_bb)) + set_bb_annotated (succ_or_pred_bb, annotated_bb); + } + } + } else AFDO_EINFO (unknown_edge)->set_count (profile_count::zero().afdo ()); AFDO_EINFO (unknown_edge)->set_annotated (); diff --git a/gcc/testsuite/gcc.dg/tree-prof/init-array.c b/gcc/testsuite/gcc.dg/tree-prof/init-array.c new file mode 100644 index 000..0f7a5c84481 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-prof/init-array.c @@ -0,0 +1,43 @@ +/* { dg-options "-O3 -fdump-tree-cunrolli-details" } */ + +static int s[10][10][10]; +static int d[10][10][10]; + +__attribute__((noipa)) +int array() +{ + int i; + register int j, k; + for (i = 0; i < 10; i++) + for (j = 0; j < 10; j++) + for (k = 0; k < 10; k++) + d[i][j][k] = s[i][j][k]; + + return(0); +} + +__attribute__((noipa)) +void TestBench() +{ + for (int i = 0; i < 15; ++i) + { + array(); + } +} + +int main(int argc, char *argv[]) +{ + + TestBench(); + + if (d[9][9][9] == 0 && s[9][9][9] == 0) + { + return 0; + } + else + { + return -1; + } +} + +/* { dg-final-use { scan-tree-dump-times "loop with 10 iterations completely unrolled" 2 "cunrolli"} } */ -- 2.25.1
[PATCH] Don't print discriminators for -fcompare-debug.
With -gstatement-frontiers we may end up with different IR coming from the front end with and without debug information turned on. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100733 for details. That may result in differences in discriminator values and -fcompare-debug failures. This patch disables printing of discriminators when the dump is intended for -fcompare-debug comparison and reverses the workaround in a test. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: PR debug/107231 PR debug/107169 * print-rtl.cc (print_rtx_operand_code_i): Don't print discriminators for -fdebug-compare. gcc/testsuite/ChangeLog: * c-c++-common/ubsan/pr85213.c: Reverse the workaround for discriminators. --- gcc/print-rtl.cc | 13 ++--- gcc/testsuite/c-c++-common/ubsan/pr85213.c | 7 +-- 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/gcc/print-rtl.cc b/gcc/print-rtl.cc index e115f987173..0476f3d7e79 100644 --- a/gcc/print-rtl.cc +++ b/gcc/print-rtl.cc @@ -453,10 +453,17 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, int idx) expanded_location xloc = insn_location (in_insn); fprintf (m_outfile, " \"%s\":%i:%i", xloc.file, xloc.line, xloc.column); - int discriminator = insn_discriminator (in_insn); - if (discriminator) - fprintf (m_outfile, " discrim %d", discriminator); + /* Don't print discriminators for -fcompare-debug since the IR +coming from the front end may be different with and without +debug information turned on. That may result in different +discriminator values. */ + if (!(dump_flags & TDF_COMPARE_DEBUG)) + { + int discriminator = insn_discriminator (in_insn); + if (discriminator) + fprintf (m_outfile, " discrim %d", discriminator); + } } #endif } diff --git a/gcc/testsuite/c-c++-common/ubsan/pr85213.c b/gcc/testsuite/c-c++-common/ubsan/pr85213.c index e903e976f2c..8a6be81d20f 100644 --- a/gcc/testsuite/c-c++-common/ubsan/pr85213.c +++ b/gcc/testsuite/c-c++-common/ubsan/pr85213.c @@ -1,11 +1,6 @@ /* PR sanitizer/85213 */ /* { dg-do compile } */ -/* Pass -gno-statement-frontiers to work around - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100733 : - without it the IR coming from the front end may be different with and without - debug information turned on. That may cause e.g., different discriminator values - and -fcompare-debug failures. */ -/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug -gno-statement-frontiers" } */ +/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug" } */ int foo (int x) -- 2.25.1
RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64
That commit had a bug that was fixed in https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=80f414e6d73f9f1683f93d83ce63a6a482e54bee Was that fix included in your GCC build? From: Jiang, Haochen Sent: Sunday, October 16, 2022 8:09 PM To: gcc-patches@gcc.gnu.org; Eugene Rozenfeld ; Jiang, Haochen ; gcc-regress...@gcc.gnu.org Subject: [EXTERNAL] [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64 You don't often get email from haochen.ji...@intel.com<mailto:haochen.ji...@intel.com>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> On Linux/x86_64, f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit commit f30e9fd33e56a5a721346ea6140722e1b193db42 Author: Eugene Rozenfeld mailto:ero...@microsoft.com>> Date: Thu Apr 21 16:43:24 2022 -0700 Set discriminators for call stmts on the same line within the same basic block. caused FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c --target_board='unix{-m64\ -march=cascadelake}'"
RE: [EXTERNAL] Re: [PATCH] Don't print discriminators for -fcompare-debug.
Yes, -gstatement-frontiers is the root cause here but the new approach to discriminators is especially prone to this. I added the workaround to pr85213.c in my original discriminator patch but now two more -fcompare-debug bugs were opened (PR107231 and PR107169). I suspect we'll keep getting more. So I'd like to disable printing discriminators in -fcompare-debug dums until -gstatement-frontier issue is fixed. Eugene -Original Message- From: Richard Biener Sent: Monday, October 17, 2022 12:06 AM To: Eugene Rozenfeld Cc: gcc-patches@gcc.gnu.org; Jason Merrill Subject: [EXTERNAL] Re: [PATCH] Don't print discriminators for -fcompare-debug. On Sun, Oct 16, 2022 at 10:25 PM Eugene Rozenfeld via Gcc-patches wrote: > > With -gstatement-frontiers we may end up with different IR coming from > the front end with and without debug information turned on. > See > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D100733&data=05%7C01%7CEugene.Rozenfeld%40microsoft.com%7C5d3df88ec7e14f5eec2708dab00e0440%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638015871510301049%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2JjQHAgDi6%2Fet1vowA1IRcdInJMkkjuva9DbM5rHawc%3D&reserved=0 > for details. > That may result in differences in discriminator values and > -fcompare-debug failures. > > This patch disables printing of discriminators when the dump is > intended for -fcompare-debug comparison and reverses the workaround in a test. I don't think this is the correct approach. -gstatement-frontiers is known to be prone to these issues and is the one to blame here. I think the bugs should be SUSPENDED until -gstatement-frontiers is fixed or at least disabled by default (IIRC Jakub tried that but failed last time) > Tested on x86_64-pc-linux-gnu. > > gcc/ChangeLog: > PR debug/107231 > PR debug/107169 > * print-rtl.cc (print_rtx_operand_code_i): Don't print discriminators > for -fdebug-compare. > > gcc/testsuite/ChangeLog: > > * c-c++-common/ubsan/pr85213.c: Reverse the workaround for > discriminators. > --- > gcc/print-rtl.cc | 13 ++--- > gcc/testsuite/c-c++-common/ubsan/pr85213.c | 7 +-- > 2 files changed, 11 insertions(+), 9 deletions(-) > > diff --git a/gcc/print-rtl.cc b/gcc/print-rtl.cc index > e115f987173..0476f3d7e79 100644 > --- a/gcc/print-rtl.cc > +++ b/gcc/print-rtl.cc > @@ -453,10 +453,17 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, > int idx) > expanded_location xloc = insn_location (in_insn); > fprintf (m_outfile, " \"%s\":%i:%i", xloc.file, xloc.line, >xloc.column); > - int discriminator = insn_discriminator (in_insn); > - if (discriminator) > - fprintf (m_outfile, " discrim %d", discriminator); > > + /* Don't print discriminators for -fcompare-debug since the IR > +coming from the front end may be different with and without > +debug information turned on. That may result in different > +discriminator values. */ > + if (!(dump_flags & TDF_COMPARE_DEBUG)) > + { > + int discriminator = insn_discriminator (in_insn); > + if (discriminator) > + fprintf (m_outfile, " discrim %d", discriminator); > + } > } > #endif > } > diff --git a/gcc/testsuite/c-c++-common/ubsan/pr85213.c > b/gcc/testsuite/c-c++-common/ubsan/pr85213.c > index e903e976f2c..8a6be81d20f 100644 > --- a/gcc/testsuite/c-c++-common/ubsan/pr85213.c > +++ b/gcc/testsuite/c-c++-common/ubsan/pr85213.c > @@ -1,11 +1,6 @@ > /* PR sanitizer/85213 */ > /* { dg-do compile } */ > -/* Pass -gno-statement-frontiers to work around > - > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D100733&data=05%7C01%7CEugene.Rozenfeld%40microsoft.com%7C5d3df88ec7e14f5eec2708dab00e0440%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638015871510301049%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2JjQHAgDi6%2Fet1vowA1IRcdInJMkkjuva9DbM5rHawc%3D&reserved=0 > : > - without it the IR coming from the front end may be different with and > without > - debug information turned on. That may cause e.g., different discriminator > values > - and -fcompare-debug failures. */ > -/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug > -gno-statement-frontiers" } */ > +/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug" } */ > > int > foo (int x) > -- > 2.25.1
RE: [EXTERNAL] RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors)
Yes, I received that one. The root cause is the -gstatement-frontiers issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100733 . I submitted a workaround patch for that ( https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603673.html ) but it hasn't been approved yet. Another workaround is passing -gno-statement-frontiers to the test. Eugene -Original Message- From: Thomas Schwinge Sent: Monday, October 17, 2022 3:46 AM To: Eugene Rozenfeld Cc: haochen.ji...@intel.com; gcc-patches@gcc.gnu.org; gcc-regress...@gcc.gnu.org Subject: [EXTERNAL] RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64 Hi! On 2022-10-17T06:11:01+, "Jiang, Haochen via Gcc-patches" wrote: > I just checkout to your commit and the test still got failed. > > It is reporting like this: > xgcc: error: > /export/users2/haochenj/src/gcc/master/./libgomp/testsuite/libgomp.oac > c-c++/../libgomp.oacc-c-c++-common/kernels-loop-g.c: '-fcompare-debug' > failure (length) Right. I had filed <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2FPR107231&data=05%7C01%7CEugene.Rozenfeld%40microsoft.com%7C05c1cd8b5db4455fe3de08dab02ce66e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638016004172688130%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=BfdPnXz5%2Bxr8NwIcBvaOnkFPyzfLQBbyrbMc777EIsk%3D&reserved=0> "[13 Regression] c-c++-common/goacc/kernels-loop-g.c: '-fcompare-debug' failure (length)" with you in CC: . (Have you not received that one?) Grüße Thomas > Also fix a typo in manually sending, should be this to reproduce > > To reproduce: > > $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check > RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c > --target_board='unix{-m32}'" > $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check > RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c > --target_board='unix{-m32\ -march=cascadelake}'" > $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check > RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c > --target_board='unix{-m64}'" > $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check > RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c > --target_board='unix{-m64\ -march=cascadelake}'" > > BRs, > Haochen > > From: Jiang, Haochen > Sent: Monday, October 17, 2022 1:41 PM > To: Eugene Rozenfeld ; > gcc-patches@gcc.gnu.org; gcc-regress...@gcc.gnu.org > Subject: RE: [r13-3172 Regression] > FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c > -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 > (test for excess errors) on Linux/x86_64 > > If that has been fixed, just ignore that mail. > > It is run through by a script and got the result few days ago. > However, the sendmail service was down on that machine and I just > noticed that issue. So I sent that result manually today in case that is not > fixed. > > Sorry for the disturb! > > BRs, > Haochen > > From: Eugene Rozenfeld > mailto:eugene.rozenf...@microsoft.com> > > > Sent: Monday, October 17, 2022 1:23 PM > To: Jiang, Haochen > mailto:haochen.ji...@intel.com>>; > gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; > gcc-regress...@gcc.gnu.org<mailto:gcc-regress...@gcc.gnu.org> > Subject: RE: [r13-3172 Regression] > FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c > -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 > (test for excess errors) on Linux/x86_64 > > That commit had a bug that was fixed in > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc. > gnu.org%2Fgit%2F%3Fp%3Dgcc.git%3Ba%3Dcommit%3Bh%3D80f414e6d73f9f1683f9 > 3d83ce63a6a482e54bee&data=05%7C01%7CEugene.Rozenfeld%40microsoft.c > om%7C05c1cd8b5db4455fe3de08dab02ce66e%7C72f988bf86f141af91ab2d7cd011db > 47%7C1%7C0%7C638016004172688130%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj > AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&a > mp;sdata=50kAEt5C%2F8DO%2BvuyNvE7c1ydxriGeQwY5G%2Bqs%2FgSGE0%3D&re > served=0 > > Was that fix included in your GCC build? > > From: Jiang, Haochen > mailto:haochen.ji...@intel.com>> > Sent: Sunday, October 16, 2022 8:09 PM > To: gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; Eugene > Rozenfeld > mailto:e
[PATCH][PUSHED] Start using discriminators in AutoFDO
Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * auto-profile.cc (get_combined_location): Include discriminator in the returned combined location. (read_function_instance): Read discriminators from profiles. --- gcc/auto-profile.cc | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index ca48404eaf1..97307321cbf 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -363,7 +363,8 @@ get_combined_location (location_t loc, tree decl) /* TODO: allow more bits for line and less bits for discriminator. */ if (LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl) >= (1<<16)) warning_at (loc, OPT_Woverflow, "offset exceeds 16 bytes"); - return ((LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16); + return ((LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16) +| get_discriminator_from_loc (loc); } /* Return the function decl of a given lexical BLOCK. */ @@ -652,7 +653,7 @@ function_instance::read_function_instance (function_instance_stack *stack, for (unsigned i = 0; i < num_pos_counts; i++) { - unsigned offset = gcov_read_unsigned () & 0x; + unsigned offset = gcov_read_unsigned (); unsigned num_targets = gcov_read_unsigned (); gcov_type count = gcov_read_counter (); s->pos_counts[offset].count = count; -- 2.25.1
[PATCH][PUSHED] Don't force DWARF4 for AutoFDO tests
Support for DWARF5 was added to create_gcov in https://github.com/google/autofdo so we no longer need to force DWARF4 for AutoFDO tests. Tested on x86_64-pc-linux-gnu. gcc/testsuite/ChangeLog: * lib/profopt.exp: Don't force DWARF4 for AutoFDO tests --- gcc/testsuite/lib/profopt.exp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/lib/profopt.exp b/gcc/testsuite/lib/profopt.exp index ac7712a5b42..fe88d2fce37 100644 --- a/gcc/testsuite/lib/profopt.exp +++ b/gcc/testsuite/lib/profopt.exp @@ -289,7 +289,7 @@ proc auto-profopt-execute { src } { return } set profile_wrapper [profopt-perf-wrapper] -set profile_option "-gdwarf-4 -DFOR_AUTOFDO_TESTING" +set profile_option "-g -DFOR_AUTOFDO_TESTING" set feedback_option "-fauto-profile -DFOR_AUTOFDO_TESTING -fearly-inlining" set run_autofdo 1 profopt-execute $src -- 2.25.1
RE: [EXTERNAL] [PATCH] contrib: modernize gen_autofdo_event.py
The patch is approved. Eugene -Original Message- From: Andi Kleen Sent: Friday, August 05, 2022 11:29 PM To: Eugene Rozenfeld ; Xi Ruoyao ; gcc-patches@gcc.gnu.org Subject: Re: [EXTERNAL] [PATCH] contrib: modernize gen_autofdo_event.py On 8/6/2022 1:07 AM, Eugene Rozenfeld wrote: > The changes look good to me. Also adding Andi, the author of the script. Looks all good to me too. -Andi
RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support.
Jason, Thank you for your review. You are correct, we only need to check has_discriminator for the statement that's on the same line. I updated the patch (attached). Thanks, Eugene -Original Message- From: Jason Merrill Sent: Thursday, August 18, 2022 6:55 PM To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org Cc: Andi Kleen ; Jan Hubicka Subject: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support. On 8/3/22 17:25, Eugene Rozenfeld wrote: > One more ping for this patch > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc. > gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=0 > 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da81 > 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637964709325691951 > %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I > k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2dT > DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0 > > CC Jason since this changes discriminators emitted in dwarf. > > Thanks, > > Eugene > > -Original Message- > From: Eugene Rozenfeld > Sent: Monday, June 27, 2022 12:45 PM > To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan > Hubicka > Subject: RE: [PING][PATCH] Add instruction level discriminator support. > > Another ping for > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=05%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da8185dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637964709325691951%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2dTDYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0 > . > > I got a review from Andi > (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596549.html&data=05%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da8185dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637964709325691951%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=se6x1LD0GQyFz%2B28gdVqsye3Aw8kPoMRhVQO1BSPg6I%3D&reserved=0) > but I also need a review from someone who can approve the changes. > > Thanks, > > Eugene > > -Original Message- > From: Eugene Rozenfeld > Sent: Friday, June 10, 2022 12:03 PM > To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan > Hubicka > Subject: [PING][PATCH] Add instruction level discriminator support. > > Hello, > > I'd like to ping this patch: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc. > gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=0 > 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da81 > 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637964709325691951 > %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I > k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2dT > DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0 > > Thanks, > > Eugene > > -Original Message- > From: Gcc-patches > On Behalf Of > Eugene Rozenfeld via Gcc-patches > Sent: Thursday, June 02, 2022 12:22 AM > To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan > Hubicka > Subject: [EXTERNAL] [PATCH] Add instruction level discriminator support. > > This is the first in a series of patches to enable discriminator support in > AutoFDO. > > This patch switches to tracking discriminators per statement/instruction > instead of per basic block. Tracking per basic block was problematic since > not all statements in a basic block needed a discriminator and, also, later > optimizations could move statements between basic blocks making correlation > during AutoFDO compilation unreliable. Tracking per statement also allows us > to assign different discriminators to multiple function calls in the same > basic block. A subsequent patch will add that support. > > The idea of this patch is based on commit > 4c311d95cf6d9519c3c20f641cc77af7df491fdf > by Dehao Chen in vendors/google/heads/gcc-4_8 but uses a slightly different > approach. In Dehao's work special (normally unused) location ids and side > tables were used to keep track of locations with discriminators. Things have > changed since then and I don't think we have unused location ids anymore. > Instead, I made discriminators a part of ad-hoc locations. > > The difference from Dehao's work also includes support for discriminator > reading/writing in lto streaming and in modules. &g
RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support.
Jason, I made another small change in addressing your feedback (attached). Thanks, Eugene -Original Message- From: Gcc-patches On Behalf Of Eugene Rozenfeld via Gcc-patches Sent: Thursday, September 01, 2022 1:49 PM To: Jason Merrill ; gcc-patches@gcc.gnu.org Cc: Andi Kleen ; Jan Hubicka Subject: RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support. Jason, Thank you for your review. You are correct, we only need to check has_discriminator for the statement that's on the same line. I updated the patch (attached). Thanks, Eugene -Original Message- From: Jason Merrill Sent: Thursday, August 18, 2022 6:55 PM To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org Cc: Andi Kleen ; Jan Hubicka Subject: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support. On 8/3/22 17:25, Eugene Rozenfeld wrote: > One more ping for this patch > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc. > gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=0 > 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da81 > 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637964709325691951 > %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I > k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2dT > DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0 > > CC Jason since this changes discriminators emitted in dwarf. > > Thanks, > > Eugene > > -Original Message- > From: Eugene Rozenfeld > Sent: Monday, June 27, 2022 12:45 PM > To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan > Hubicka > Subject: RE: [PING][PATCH] Add instruction level discriminator support. > > Another ping for > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=05%7C01%7CEugene.Rozenfeld%40microsoft.com%7Cf217ebc45428465857bd08da8c5b6fb2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637976621612503972%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=b0kTdzWRyiwdtcEFasyNlKv1vj%2FqwnipN3776C7xWcg%3D&reserved=0 > . > > I got a review from Andi > (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596549.html&data=05%7C01%7CEugene.Rozenfeld%40microsoft.com%7Cf217ebc45428465857bd08da8c5b6fb2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637976621612503972%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qxjBUCcGiKXtR4%2BOJq%2FFQN11C2M6BBurTguOBOjWJDw%3D&reserved=0) > but I also need a review from someone who can approve the changes. > > Thanks, > > Eugene > > -Original Message- > From: Eugene Rozenfeld > Sent: Friday, June 10, 2022 12:03 PM > To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan > Hubicka > Subject: [PING][PATCH] Add instruction level discriminator support. > > Hello, > > I'd like to ping this patch: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc. > gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=0 > 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da81 > 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637964709325691951 > %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I > k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2dT > DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0 > > Thanks, > > Eugene > > -Original Message- > From: Gcc-patches > On Behalf Of > Eugene Rozenfeld via Gcc-patches > Sent: Thursday, June 02, 2022 12:22 AM > To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan > Hubicka > Subject: [EXTERNAL] [PATCH] Add instruction level discriminator support. > > This is the first in a series of patches to enable discriminator support in > AutoFDO. > > This patch switches to tracking discriminators per statement/instruction > instead of per basic block. Tracking per basic block was problematic since > not all statements in a basic block needed a discriminator and, also, later > optimizations could move statements between basic blocks making correlation > during AutoFDO compilation unreliable. Tracking per statement also allows us > to assign different discriminators to multiple function calls in the same > basic block. A subsequent patch will add that support. > > The idea of this patch is based on commit > 4c311d95cf6d9519c3c20f641cc77af7df491fdf > by Dehao Chen in vendors/google/heads/gcc-4_8 but uses a slightly different > approach. In Dehao's work special (normally unused) location ids and side > tables
RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support.
Jason, Thank for your suggestion. The patch is updated (attached). Thanks, Eugene -Original Message- From: Jason Merrill Sent: Thursday, September 08, 2022 10:26 AM To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org Cc: Andi Kleen ; Jan Hubicka Subject: Re: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support. On 9/1/22 18:22, Eugene Rozenfeld wrote: > Jason, > > I made another small change in addressing your feedback (attached). > > Thanks, > > Eugene > > -Original Message- > From: Gcc-patches > On Behalf Of > Eugene Rozenfeld via Gcc-patches > Sent: Thursday, September 01, 2022 1:49 PM > To: Jason Merrill ; gcc-patches@gcc.gnu.org > Cc: Andi Kleen ; Jan Hubicka > Subject: RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator > support. > > Jason, > > Thank you for your review. You are correct, we only need to check > has_discriminator for the statement that's on the same line. > I updated the patch (attached). > > Thanks, > > Eugene > > -Original Message- > From: Jason Merrill > Sent: Thursday, August 18, 2022 6:55 PM > To: Eugene Rozenfeld ; > gcc-patches@gcc.gnu.org > Cc: Andi Kleen ; Jan Hubicka > Subject: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator > support. > > On 8/3/22 17:25, Eugene Rozenfeld wrote: >> One more ping for this patch >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc. >> gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data= >> 0 >> 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da8 >> 1 >> 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63796470932569195 >> 1 >> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6 >> I >> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2d >> T >> DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0 >> >> CC Jason since this changes discriminators emitted in dwarf. >> >> Thanks, >> >> Eugene >> >> -Original Message- >> From: Eugene Rozenfeld >> Sent: Monday, June 27, 2022 12:45 PM >> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan >> Hubicka >> Subject: RE: [PING][PATCH] Add instruction level discriminator support. >> >> Another ping for >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=05%7C01%7Ceugene.rozenfeld%40microsoft.com%7Cc7c9fab519184eee0bb808da91bf2fea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637982547579085499%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Kj3YWJrDqjX%2B0Ml3At5P8NRWc1Xs6mbI%2F1vCk9%2BLaQs%3D&reserved=0 >> . >> >> I got a review from Andi >> (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596549.html&data=05%7C01%7Ceugene.rozenfeld%40microsoft.com%7Cc7c9fab519184eee0bb808da91bf2fea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637982547579085499%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NBzFtD0mH7EYKxsVWfqZgSpQmUG18SIt01XRYUlwwW4%3D&reserved=0) >> but I also need a review from someone who can approve the changes. >> >> Thanks, >> >> Eugene >> >> -Original Message- >> From: Eugene Rozenfeld >> Sent: Friday, June 10, 2022 12:03 PM >> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan >> Hubicka >> Subject: [PING][PATCH] Add instruction level discriminator support. >> >> Hello, >> >> I'd like to ping this patch: >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc. >> gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data= >> 0 >> 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da8 >> 1 >> 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63796470932569195 >> 1 >> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6 >> I >> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2d >> T >> DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0 >> >> Thanks, >> >> Eugene >> >> -Original Message- >> From: Gcc-patches >> On Behalf Of >> Eugene Rozenfeld via Gcc-patches >> Sent: Thursday, June 02, 2022 12:22 AM >> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan >> Hubicka >> Subject: [EXTERNAL] [PATCH] Add instruction level discriminator support. >&
[PATCH] Guard against applying scale with 0 denominator
Calling count.apply_scale with a 0 denominator causes an assert. This change guards against that. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * tree-loop-vect-manip.cc (vect_do_peeling): Guard against applying scale with 0 denominator. --- gcc/tree-vect-loop-manip.cc | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index 1d4337eb261..db54ae69e45 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -2989,10 +2989,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, get lost if we scale down to 0. */ basic_block *bbs = get_loop_body (epilog); for (unsigned int i = 0; i < epilog->num_nodes; i++) - bbs[i]->count = bbs[i]->count.apply_scale -(bbs[i]->count, - bbs[i]->count.apply_probability - (prob_vector)); + if (bbs[i]->count.nonzero_p ()) + bbs[i]->count = bbs[i]->count.apply_scale + (bbs[i]->count, + bbs[i]->count.apply_probability + (prob_vector)); free (bbs); } -- 2.25.1
RE: [EXTERNAL] Re: [PATCH] Guard against applying scale with 0 denominator
In my case this is not exactly what the FIXME in the comment above says. The count is 0 even before the initial scaling happens. I hit this case with some changes I'm working on to enable per-instruction discriminators for AutoFDO. I looked into saving/restoring the old counts but it would be awkward to do. The initial scaling happens here: if (skip_vector) { split_edge (loop_preheader_edge (loop)); /* Due to the order in which we peel prolog and epilog, we first propagate probability to the whole loop. The purpose is to avoid adjusting probabilities of both prolog and vector loops separately. Note in this case, the probability of epilog loop needs to be scaled back later. */ basic_block bb_before_loop = loop_preheader_edge (loop)->src; if (prob_vector.initialized_p ()) { scale_bbs_frequencies (&bb_before_loop, 1, prob_vector); scale_loop_profile (loop, prob_vector, 0); } } The scaling happens before we do the cloning for the epilog so to save/restore the counts we would need to maintain a mapping between the original basic blocks and the cloned basic blocks in the epilog. I'd like to get my simple fix in since it makes things better even if it doesn't address the issue mentioned In the FIXME. -Original Message- From: Richard Biener Sent: Monday, May 09, 2022 12:42 AM To: Eugene Rozenfeld ; Jan Hubicka Cc: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: [PATCH] Guard against applying scale with 0 denominator On Fri, May 6, 2022 at 10:32 PM Eugene Rozenfeld via Gcc-patches wrote: > > Calling count.apply_scale with a 0 denominator causes an assert. > This change guards against that. > > Tested on x86_64-pc-linux-gnu. > > gcc/ChangeLog: > * tree-loop-vect-manip.cc (vect_do_peeling): Guard against applying > scale with 0 denominator. > --- > gcc/tree-vect-loop-manip.cc | 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index 1d4337eb261..db54ae69e45 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -2989,10 +2989,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > niters, tree nitersm1, > get lost if we scale down to 0. */ > basic_block *bbs = get_loop_body (epilog); > for (unsigned int i = 0; i < epilog->num_nodes; i++) > - bbs[i]->count = bbs[i]->count.apply_scale > -(bbs[i]->count, > - bbs[i]->count.apply_probability > - (prob_vector)); > + if (bbs[i]->count.nonzero_p ()) > + bbs[i]->count = bbs[i]->count.apply_scale > + (bbs[i]->count, > + bbs[i]->count.apply_probability > + (prob_vector)); So exactly what the FIXME in the comment above says happens. It might be better to save/restore the old counts if the intent is to get them back. I'm not exactly sure where the other scaling happens though. Richard. > free (bbs); > } > > -- > 2.25.1
RE: [EXTERNAL] Re: [PATCH] Guard against applying scale with 0 denominator
Thank you for the feedback Richard. I attached a patch that saves/restores counts if the epilog doesn't use a scalar loop. Eugene -Original Message- From: Richard Biener Sent: Thursday, May 12, 2022 12:34 AM To: Eugene Rozenfeld Cc: Jan Hubicka ; gcc-patches@gcc.gnu.org Subject: Re: [EXTERNAL] Re: [PATCH] Guard against applying scale with 0 denominator On Thu, May 12, 2022 at 3:37 AM Eugene Rozenfeld wrote: > > In my case this is not exactly what the FIXME in the comment above > says. The count is 0 even before the initial scaling happens. I hit this case > with some changes I'm working on to enable per-instruction discriminators for > AutoFDO. > > I looked into saving/restoring the old counts but it would be awkward to do. > The initial scaling happens here: > > if (skip_vector) > { > split_edge (loop_preheader_edge (loop)); > > /* Due to the order in which we peel prolog and epilog, we first > propagate probability to the whole loop. The purpose is to > avoid adjusting probabilities of both prolog and vector loops > separately. Note in this case, the probability of epilog loop > needs to be scaled back later. */ > basic_block bb_before_loop = loop_preheader_edge (loop)->src; > if (prob_vector.initialized_p ()) > { > scale_bbs_frequencies (&bb_before_loop, 1, prob_vector); > scale_loop_profile (loop, prob_vector, 0); > } > } > > The scaling happens before we do the cloning for the epilog so to > save/restore the counts we would need to maintain a mapping between the > original basic blocks and the cloned basic blocks in the epilog. There is one already - after the epilogue is copied you can use get_bb_original (epilouge_bb) to get at the block it was copied from. It could also be that we can rely on the basic-block order to be preserved between the copies (I _think_ that would work ... but then assert () for this using get_bb_original ()). That means the simplest fix would be to have an auto_vec and initialize that from the BB counts in loop body order (we also have exactly two BBs in all peeled loops ... But note we only scaled the scalar if-converted loop but eventually used the not if-coverted copy for the epilogue (if not vectorizing the epilogue), so indiscriminate scaling back looks wrong in some cases (I'd have to double-check the details here). > I'd like to get my simple fix in since it makes things better even if > it doesn't address the issue mentioned In the FIXME. But don't you need to check that bbs[i]->count.apply_probability (prob_vector) is not zero instead of checking that bbs[i].count is not zero? Richard. > -Original Message- > From: Richard Biener > Sent: Monday, May 09, 2022 12:42 AM > To: Eugene Rozenfeld ; Jan Hubicka > > Cc: gcc-patches@gcc.gnu.org > Subject: [EXTERNAL] Re: [PATCH] Guard against applying scale with 0 > denominator > > On Fri, May 6, 2022 at 10:32 PM Eugene Rozenfeld via Gcc-patches > wrote: > > > > Calling count.apply_scale with a 0 denominator causes an assert. > > This change guards against that. > > > > Tested on x86_64-pc-linux-gnu. > > > > gcc/ChangeLog: > > * tree-loop-vect-manip.cc (vect_do_peeling): Guard against applying > > scale with 0 denominator. > > --- > > gcc/tree-vect-loop-manip.cc | 9 + > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > diff --git a/gcc/tree-vect-loop-manip.cc > > b/gcc/tree-vect-loop-manip.cc index 1d4337eb261..db54ae69e45 100644 > > --- a/gcc/tree-vect-loop-manip.cc > > +++ b/gcc/tree-vect-loop-manip.cc > > @@ -2989,10 +2989,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > > niters, tree nitersm1, > > get lost if we scale down to 0. */ > > basic_block *bbs = get_loop_body (epilog); > > for (unsigned int i = 0; i < epilog->num_nodes; i++) > > - bbs[i]->count = bbs[i]->count.apply_scale > > -(bbs[i]->count, > > - bbs[i]->count.apply_probability > > - (prob_vector)); > > + if (bbs[i]->count.nonzero_p ()) > > + bbs[i]->count = bbs[i]->count.apply_scale > > + (bbs[i]->count, > > + bbs[i]->count.apply_probability > > + (prob_vector)); > > So exactly what the FIXME in the comment above says happens. It > might be better > to save/restore the old counts if the intent is to get them back. I'm not > exactly sure where the other scaling happens though. > > Richard. > > > > > free (bbs); > > } > > > > -- > > 2.25.1 0001-Fix-profile-count-maintenance-in-vectorizer-peeling.patch Description: 0001-Fix-profile-count-maintenance-in-vectorizer-peeling.patch
[PATCH] [tree-optimization] Optimize two patterns with three xors.
Simplify (a ^ b) & ((b ^ c) ^ a) --> (a ^ b) & ~c. int f(int a, int b, int c) { return (a ^ b) & ((b ^ c) ^ a); } Code without the patch: moveax,edx xoreax,esi xoreax,edi xoredi,esi andeax,edi ret Code with the patch: xoredi,esi andn eax,edx,edi ret Simplify (a ^ b) | ((b ^ c) ^ a) --> (a ^ b) | c. int g(int a, int b, int c) { return (a ^ b) | ((b ^ c) ^ a); } Code without the patch: moveax,edx xoreax,esi xoreax,edi xoredi,esi or eax,edi ret Code with the patch: xoredi,esi moveax,edi or eax,edx ret This fixes PR96671. Tested on x86_64-pc-linux-gnu. 0001-Optimize-two-patterns-with-three-xors.patch Description: 0001-Optimize-two-patterns-with-three-xors.patch
Re: [PATCH] [tree-optimization] Optimize two patterns with three xors.
Thank you for the review Richard! I re-worked the patch based on your suggestions (attached). I made the change to reuse the first bit_xor in both patterns and I added :s to the last xor in the first pattern. For the second pattern I didn't add :s because I think the simplification is beneficial even if the second or third bit_xor has more than one use since we are simplifying them to just a single operand (@2). If that is incorrect, please explain why. Eugene -Original Message- From: Richard Biener Sent: Monday, November 16, 2020 4:11 AM To: Eugene Rozenfeld Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [tree-optimization] Optimize two patterns with three xors. On Thu, Nov 12, 2020 at 2:53 AM Eugene Rozenfeld via Gcc-patches wrote: > > Simplify (a ^ b) & ((b ^ c) ^ a) --> (a ^ b) & ~c. > > int f(int a, int b, int c) > { > return (a ^ b) & ((b ^ c) ^ a); > } > > Code without the patch: > moveax,edx > xoreax,esi > xoreax,edi > xoredi,esi > andeax,edi > ret > > Code with the patch: > xoredi,esi > andn eax,edx,edi > ret > > Simplify (a ^ b) | ((b ^ c) ^ a) --> (a ^ b) | c. > int g(int a, int b, int c) > { > return (a ^ b) | ((b ^ c) ^ a); > } > > Code without the patch: > moveax,edx > xoreax,esi > xoreax,edi > xoredi,esi > or eax,edi > ret > > Code with the patch: > xoredi,esi > moveax,edi > or eax,edx > ret > > This fixes PR96671. +/* (a ^ b) & ((b ^ c) ^ a) --> (a ^ b) & ~c */ (simplify (bit_and:c +(bit_xor:c @0 @1) (bit_xor:cs (bit_xor:c @1 @2) @0)) (bit_and (bit_xor +@0 @1) (bit_not @2))) you should be able to re-use the first bit_xor by doing +/* (a ^ b) & ((b ^ c) ^ a) --> (a ^ b) & ~c */ (simplify (bit_and:c +(bit_xor:c@3 @0 @1) (bit_xor:cs (bit_xor:c @1 @2) @0)) (bit_and @3 +(bit_not @2))) For completeness the last (bit_xor:c @1 @2) should also have a :s +/* (a ^ b) | ((b ^ c) ^ a) --> (a ^ b) | c */ (simplify (bit_ior:c +(bit_xor:c @0 @1) (bit_xor:c (bit_xor:c @1 @2) @0)) (bit_ior (bit_xor +@0 @1) @2)) completely misses :s here and the same comment about the re-use of the existing xor applies. Otherwise looks OK to me, sorry for the delay. Thanks, Richard. > Tested on x86_64-pc-linux-gnu. > 0001-Optimize-two-patterns-with-three-xors.patch Description: 0001-Optimize-two-patterns-with-three-xors.patch
[PATCH 1/1] Fix a typo in an AutoFDO error string
Trivial change, committed to trunk. [PATCH 1/1] Fix a typo in an AutoFDO error string gcc/ChangeLog: * auto-profile.c (read_profile): fix a typo in an error string --- gcc/auto-profile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c index 2a6d9a1fc24..a4601243dc9 100644 --- a/gcc/auto-profile.c +++ b/gcc/auto-profile.c @@ -939,7 +939,7 @@ read_profile (void) unsigned version = gcov_read_unsigned (); if (version != AUTO_PROFILE_VERSION) { - error ("AutoFDO profile version %u does match %u", + error ("AutoFDO profile version %u does not match %u", version, AUTO_PROFILE_VERSION); return; } -- 2.25.1
[PATCH] gcov-profile/71672 Fix indirect call inlining with AutoFDO
This patch has the following changes: 1. The main fix is in auto-profile.c: the histogram value for indirect calls was incorrectly set up. That is fixed now. 2. Several tests now have -fdump-ipa-afdo-optimized instead of -fdump-ipa-afdo in dg-options so that the expected output can be found. 3. I increased the number of iterations in several tests so that perf can have enough sampling events. 4. indir-call-prof-2.c has -fno-early-inlining but AutoFDO can't work without early inlining (it needs to match the inlining of the profiled binary). I changed profopt.exp to always pass -fearly-inlining for AutoFDO. With that the indirect call inlining in indir-call-prof-2.c happens in the early inliner so I changed the dg-final-use-autofdo. 5. create_gcov tool doesn't currently support dwarf 5 so I made a change in profopt.exp to pass -gdwarf-4 when compiling the binary to profile. 6. I updated the invocation of create_gcov in profopt.exp to pass -gcov_version=2. I recently made a change to create_gcov to support version 2: https://github.com/google/autofdo/pull/117 7. I removed useless -o perf.data from the invocation of gcc-auto-profile in target-supports.exp. With these changes the tests checking indirect call inlining in gcc.dg and g++.dg are passing. gcc/ChangeLog: PR gcov-profile/71672 * auto-profile.c (afdo_indirect_call): Fix the setup of the historgram value for indirect calls. gcc/testsuite/ChangeLog: PR gcov-profile/71672 * g++.dg/tree-prof/indir-call-prof.C: Fix options, increase the number of iterations. * g++.dg/tree-prof/morefunc.C: Fix options, increase the number of iterations. * g++.dg/tree-prof/reorder.C: Fix options, increase the number of iterations. * gcc.dg/tree-prof/indir-call-prof-2.c: Fix options, fix dg-final-use-autofdo, increase the number of iterations. * gcc.dg/tree-prof/indir-call-prof.c: Fix options. * lib/profopt.exp: Pass gdwarf-4 when compiling binary to profile; pass -fearly-inlining when compiling with AutoFDO; pass -gcov_version=2 to create_gcov. * lib/target-supports.exp: Remove unnecessary -o perf.data passed to gcc-auto-profile. --- gcc/auto-profile.c | 13 + gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C | 4 ++-- gcc/testsuite/g++.dg/tree-prof/morefunc.C | 7 --- gcc/testsuite/g++.dg/tree-prof/reorder.C | 6 +++--- gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c | 8 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c | 2 +- gcc/testsuite/lib/profopt.exp | 6 +++--- gcc/testsuite/lib/target-supports.exp | 2 +- 8 files changed, 27 insertions(+), 21 deletions(-) diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c index b23b82b2df4..4c1fc6b536b 100644 --- a/gcc/auto-profile.c +++ b/gcc/auto-profile.c @@ -1009,13 +1009,18 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map, histogram_value hist = gimple_alloc_histogram_value ( cfun, HIST_TYPE_INDIR_CALL, stmt, callee); - hist->n_counters = 3; + hist->n_counters = 4; hist->hvalue.counters = XNEWVEC (gcov_type, hist->n_counters); gimple_add_histogram_value (cfun, stmt, hist); - hist->hvalue.counters[0] = direct_call->profile_id; - hist->hvalue.counters[1] = max_iter->second; - hist->hvalue.counters[2] = total; + // Total counter + hist->hvalue.counters[0] = total; + // Number of value/counter pairs + hist->hvalue.counters[1] = 1; + // Value + hist->hvalue.counters[2] = direct_call->profile_id; + // Counter + hist->hvalue.counters[3] = max_iter->second; if (!transform) return; diff --git a/gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C b/gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C index 3374744613e..b45417106d0 100644 --- a/gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C +++ b/gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C @@ -1,4 +1,4 @@ -/* { dg-options "-O2 -fdump-tree-optimized -fdump-ipa-profile-optimized -fdump-ipa-afdo" } */ +/* { dg-options "-O2 -fdump-tree-optimized -fdump-ipa-profile-optimized -fdump-ipa-afdo-optimized" } */ struct A { A () {} @@ -26,7 +26,7 @@ main (void) int i; - for (i = 0; i < 100; i++) + for (i = 0; i < 1000; i++) { p = (A *)wrap ((void *)&a); p->AA (); diff --git a/gcc/testsuite/g++.dg/tree-prof/morefunc.C b/gcc/testsuite/g++.dg/tree-prof/morefunc.C index 621d09aec5b..96e0073ca8f 100644 --- a/gcc/testsuite/g++.dg/tree-prof/morefunc.C +++ b/gcc/testsuite/g++.dg/tree-prof/morefunc.C @@ -1,4 +1,5 @@ -/* { dg-options "-O2 -fno-devirtualize --param=profile-func-internal-id=0 -fdump-ipa-profile-optimized -fdump-ipa-afdo -Wno-attributes -Wno-coverage-mismatch -Wno-missing-profile" } */ +/* { dg-options "-O2 -fno-devirtualize --param=profile-func-internal-id=0 -fdump-ipa-profile-optimized -fdump-ipa-afdo-optimized -Wno-attribu
RE: [EXTERNAL] Check that passes do not forget to define profile
Hi Jan, These new checks are too strong for AutoFDO. For example, the edge probabilities are not guaranteed to be initialized (see afdo_calculate_branch_prob). This currently breaks autoprofiledbootstrap build. I suggest removing cfun->cfg->full_profile = true; from auto-profile.cc. Eugene -Original Message- From: Gcc-patches On Behalf Of Jan Hubicka via Gcc-patches Sent: Thursday, August 24, 2023 6:15 AM To: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Check that passes do not forget to define profile Hi, this patch extends verifier to check that all probabilities and counts are initialized if profile is supposed to be present. This is a bit complicated by the posibility that we inline !flag_guess_branch_probability function into function with profile defined and in this case we need to stop verification. For this reason I added flag to cfg structure tracking this. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * cfg.h (struct control_flow_graph): New field full_profile. * auto-profile.cc (afdo_annotate_cfg): Set full_profile to true. * cfg.cc (init_flow): Set full_profile to false. * graphite.cc (graphite_transform_loops): Set full_profile to false. * lto-streamer-in.cc (input_cfg): Initialize full_profile flag. * predict.cc (pass_profile::execute): Set full_profile to true. * symtab-thunks.cc (expand_thunk): Set full_profile to true. * tree-cfg.cc (gimple_verify_flow_info): Verify that profile is full if full_profile is set. * tree-inline.cc (initialize_cfun): Initialize full_profile. (expand_call_inline): Combine full_profile. diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index e3af3555e75..ff3b763945c 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -1578,6 +1578,7 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts) } update_max_bb_count (); profile_status_for_fn (cfun) = PROFILE_READ; + cfun->cfg->full_profile = true; if (flag_value_profile_transformations) { gimple_value_profile_transformations (); diff --git a/gcc/cfg.cc b/gcc/cfg.cc index 9eb9916f61a..b7865f14e7f 100644 --- a/gcc/cfg.cc +++ b/gcc/cfg.cc @@ -81,6 +81,7 @@ init_flow (struct function *the_fun) = ENTRY_BLOCK_PTR_FOR_FN (the_fun); the_fun->cfg->edge_flags_allocated = EDGE_ALL_FLAGS; the_fun->cfg->bb_flags_allocated = BB_ALL_FLAGS; + the_fun->cfg->full_profile = false; } /* Helper function for remove_edge and free_cffg. Frees edge structure diff --git a/gcc/cfg.h b/gcc/cfg.h index a0e944979c8..53e2553012c 100644 --- a/gcc/cfg.h +++ b/gcc/cfg.h @@ -78,6 +78,9 @@ struct GTY(()) control_flow_graph { /* Dynamically allocated edge/bb flags. */ int edge_flags_allocated; int bb_flags_allocated; + + /* Set if the profile is computed on every edge and basic block. */ + bool full_profile; }; diff --git a/gcc/graphite.cc b/gcc/graphite.cc index 19f8975ffa2..2b387d5b016 100644 --- a/gcc/graphite.cc +++ b/gcc/graphite.cc @@ -512,6 +512,8 @@ graphite_transform_loops (void) if (changed) { + /* FIXME: Graphite does not update profile meaningfully currently. */ + cfun->cfg->full_profile = false; cleanup_tree_cfg (); profile_status_for_fn (cfun) = PROFILE_ABSENT; release_recorded_exits (cfun); diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc index 0cce14414ca..d3128fcebe4 100644 --- a/gcc/lto-streamer-in.cc +++ b/gcc/lto-streamer-in.cc @@ -1030,6 +1030,7 @@ input_cfg (class lto_input_block *ib, class data_in *data_in, basic_block p_bb; unsigned int i; int index; + bool full_profile = false; init_empty_tree_cfg_for_function (fn); @@ -1071,6 +1072,8 @@ input_cfg (class lto_input_block *ib, class data_in *data_in, data_in->location_cache.input_location_and_block (&e->goto_locus, &bp, ib, data_in); e->probability = profile_probability::stream_in (ib); + if (!e->probability.initialized_p ()) + full_profile = false; } @@ -1145,6 +1148,7 @@ input_cfg (class lto_input_block *ib, class data_in *data_in, /* Rebuild the loop tree. */ flow_loops_find (loops); + cfun->cfg->full_profile = full_profile; } diff --git a/gcc/predict.cc b/gcc/predict.cc index 5a1a561cc24..396746cbfd1 100644 --- a/gcc/predict.cc +++ b/gcc/predict.cc @@ -4131,6 +4131,7 @@ pass_profile::execute (function *fun) scev_initialize (); tree_estimate_probability (false); + cfun->cfg->full_profile = true; if (nb_loops > 1) scev_finalize (); diff --git a/gcc/symtab-thunks.cc b/gcc/symtab-thunks.cc index 4c04235c41b..23ead0d2138 100644 --- a/gcc/symtab-thunks.cc +++ b/gcc/symtab-thunks.cc @@ -648,6 +648,7 @@ expand_thunk (cgraph_node *node, bool output_asm_thunks, ? PROFILE_READ : PROFILE_GUESSED; /* FIXME: C++ FE should stop setting TREE_AS
Re: Optimize combination of comparisons to dec+compare
Richard, > Do we already handle x < y || x <= CST to x <= y - CST? That is an invalid transformation: e.g., consider x=3, y=4, CST=2. Can you please clarify? Thanks, Eugene -Original Message- From: Richard Biener Sent: Thursday, December 10, 2020 12:21 AM To: Eugene Rozenfeld Cc: gcc-patches@gcc.gnu.org Subject: Re: Optimize combination of comparisons to dec+compare On Thu, Dec 10, 2020 at 1:52 AM Eugene Rozenfeld via Gcc-patches wrote: > > This patch adds a pattern for optimizing x < y || x == XXX_MIN to x <= > y-1 if y is an integer with TYPE_OVERFLOW_WRAPS. Do we already handle x < y || x <= CST to x <= y - CST? That is, the XXX_MIN case is just a special-case of generic anti-range testing? For anti-range testing with signed types we pun to unsigned when possible. > This fixes pr96674. > > Tested on x86_64-pc-linux-gnu. > > For this function > > bool f(unsigned a, unsigned b) > { > return (b == 0) | (a < b); > } > > the code without the patch is > > test esi,esi > sete al > cmpesi,edi > seta dl > or eax,edx > ret > > the code with the patch is > > subesi,0x1 > cmpesi,edi > setae al > ret > > Eugene > > gcc/ > PR tree-optimization/96674 > * match.pd: New pattern x < y || x == XXX_MIN --> x <= y - 1 > > gcc/testsuite > * gcc.dg/pr96674.c: New test. >
RE: Optimize combination of comparisons to dec+compare
Re-sending my question and re-attaching the patch. Richard, can you please clarify your feedback? Thanks, Eugene -Original Message- From: Gcc-patches On Behalf Of Eugene Rozenfeld via Gcc-patches Sent: Tuesday, December 15, 2020 2:06 PM To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: Optimize combination of comparisons to dec+compare Richard, > Do we already handle x < y || x <= CST to x <= y - CST? That is an invalid transformation: e.g., consider x=3, y=4, CST=2. Can you please clarify? Thanks, Eugene -Original Message- From: Richard Biener Sent: Thursday, December 10, 2020 12:21 AM To: Eugene Rozenfeld Cc: gcc-patches@gcc.gnu.org Subject: Re: Optimize combination of comparisons to dec+compare On Thu, Dec 10, 2020 at 1:52 AM Eugene Rozenfeld via Gcc-patches wrote: > > This patch adds a pattern for optimizing x < y || x == XXX_MIN to x <= > y-1 if y is an integer with TYPE_OVERFLOW_WRAPS. Do we already handle x < y || x <= CST to x <= y - CST? That is, the XXX_MIN case is just a special-case of generic anti-range testing? For anti-range testing with signed types we pun to unsigned when possible. > This fixes pr96674. > > Tested on x86_64-pc-linux-gnu. > > For this function > > bool f(unsigned a, unsigned b) > { > return (b == 0) | (a < b); > } > > the code without the patch is > > test esi,esi > sete al > cmpesi,edi > seta dl > or eax,edx > ret > > the code with the patch is > > subesi,0x1 > cmpesi,edi > setae al > ret > > Eugene > > gcc/ > PR tree-optimization/96674 > * match.pd: New pattern x < y || x == XXX_MIN --> x <= y - 1 > > gcc/testsuite > * gcc.dg/pr96674.c: New test. > 0001-Optimize-combination-of-comparisons-to-dec-compare.patch Description: 0001-Optimize-combination-of-comparisons-to-dec-compare.patch
[PATCH][tree-optimization]Optimize combination of comparisons to dec+compare
Ping. -Original Message- From: Eugene Rozenfeld Sent: Tuesday, December 22, 2020 3:01 PM To: Richard Biener ; gcc-patches@gcc.gnu.org Subject: RE: Optimize combination of comparisons to dec+compare Re-sending my question and re-attaching the patch. Richard, can you please clarify your feedback? Thanks, Eugene -Original Message- From: Gcc-patches On Behalf Of Eugene Rozenfeld via Gcc-patches Sent: Tuesday, December 15, 2020 2:06 PM To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: Optimize combination of comparisons to dec+compare Richard, > Do we already handle x < y || x <= CST to x <= y - CST? That is an invalid transformation: e.g., consider x=3, y=4, CST=2. Can you please clarify? Thanks, Eugene -Original Message- From: Richard Biener Sent: Thursday, December 10, 2020 12:21 AM To: Eugene Rozenfeld Cc: gcc-patches@gcc.gnu.org Subject: Re: Optimize combination of comparisons to dec+compare On Thu, Dec 10, 2020 at 1:52 AM Eugene Rozenfeld via Gcc-patches wrote: > > This patch adds a pattern for optimizing x < y || x == XXX_MIN to x <= > y-1 if y is an integer with TYPE_OVERFLOW_WRAPS. Do we already handle x < y || x <= CST to x <= y - CST? That is, the XXX_MIN case is just a special-case of generic anti-range testing? For anti-range testing with signed types we pun to unsigned when possible. > This fixes pr96674. > > Tested on x86_64-pc-linux-gnu. > > For this function > > bool f(unsigned a, unsigned b) > { > return (b == 0) | (a < b); > } > > the code without the patch is > > test esi,esi > sete al > cmpesi,edi > seta dl > or eax,edx > ret > > the code with the patch is > > subesi,0x1 > cmpesi,edi > setae al > ret > > Eugene > > gcc/ > PR tree-optimization/96674 > * match.pd: New pattern x < y || x == XXX_MIN --> x <= y - 1 > > gcc/testsuite > * gcc.dg/pr96674.c: New test. > 0001-Optimize-combination-of-comparisons-to-dec-compare.patch Description: 0001-Optimize-combination-of-comparisons-to-dec-compare.patch
RE: [EXTERNAL] Re: [PATCH][tree-optimization]Optimize combination of comparisons to dec+compare
I got more feedback for the patch from Gabriel Ravier and Jakub Jelinek in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96674 and re-worked it accordingly. The changes from the previous patch are: 1. Switched the tests to use __attribute__((noipa)) instead of __attribute__((noinline)) . 2. Fixed a type in the pattern comment. 3. Added :c for top-level bit_ior expression. 4. Added :s for the subexpressions. 5. Added a pattern for the negated expression: x >= y && y != XXX_MIN --> x > y - 1 and the corresponding tests. The new patch is attached. Eugene -Original Message- From: Richard Biener Sent: Tuesday, January 5, 2021 4:21 AM To: Eugene Rozenfeld Cc: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: [PATCH][tree-optimization]Optimize combination of comparisons to dec+compare On Mon, Jan 4, 2021 at 9:50 PM Eugene Rozenfeld wrote: > > Ping. > > -Original Message- > From: Eugene Rozenfeld > Sent: Tuesday, December 22, 2020 3:01 PM > To: Richard Biener ; > gcc-patches@gcc.gnu.org > Subject: RE: Optimize combination of comparisons to dec+compare > > Re-sending my question and re-attaching the patch. > > Richard, can you please clarify your feedback? Hmm, OK. The patch is OK. Thanks, Richard. > Thanks, > > Eugene > > -Original Message- > From: Gcc-patches On Behalf Of > Eugene Rozenfeld via Gcc-patches > Sent: Tuesday, December 15, 2020 2:06 PM > To: Richard Biener > Cc: gcc-patches@gcc.gnu.org > Subject: [EXTERNAL] Re: Optimize combination of comparisons to > dec+compare > > Richard, > > > Do we already handle x < y || x <= CST to x <= y - CST? > > That is an invalid transformation: e.g., consider x=3, y=4, CST=2. > Can you please clarify? > > Thanks, > > Eugene > > -Original Message- > From: Richard Biener > Sent: Thursday, December 10, 2020 12:21 AM > To: Eugene Rozenfeld > Cc: gcc-patches@gcc.gnu.org > Subject: Re: Optimize combination of comparisons to dec+compare > > On Thu, Dec 10, 2020 at 1:52 AM Eugene Rozenfeld via Gcc-patches > wrote: > > > > This patch adds a pattern for optimizing x < y || x == XXX_MIN to x > > <= > > y-1 if y is an integer with TYPE_OVERFLOW_WRAPS. > > Do we already handle x < y || x <= CST to x <= y - CST? > That is, the XXX_MIN case is just a special-case of generic anti-range > testing? For anti-range testing with signed types we pun to unsigned when > possible. > > > This fixes pr96674. > > > > Tested on x86_64-pc-linux-gnu. > > > > For this function > > > > bool f(unsigned a, unsigned b) > > { > > return (b == 0) | (a < b); > > } > > > > the code without the patch is > > > > test esi,esi > > sete al > > cmpesi,edi > > seta dl > > or eax,edx > > ret > > > > the code with the patch is > > > > subesi,0x1 > > cmpesi,edi > > setae al > > ret > > > > Eugene > > > > gcc/ > > PR tree-optimization/96674 > > * match.pd: New pattern x < y || x == XXX_MIN --> x <= y - 1 > > > > gcc/testsuite > > * gcc.dg/pr96674.c: New test. > > 0002-Optimize-combination-of-comparisons-to-dec-compare.patch Description: 0002-Optimize-combination-of-comparisons-to-dec-compare.patch
RE: [EXTERNAL] Re: [PATCH][tree-optimization]Optimize combination of comparisons to dec+compare
Richard, Can you please commit this patch for me? I don't have write access yet, I'm still working on getting copyright assignment/disclaimer signed by my employer. Thanks, Eugene -Original Message- From: Richard Biener Sent: Friday, January 15, 2021 3:55 AM To: Eugene Rozenfeld Cc: gabrav...@gmail.com; ja...@gcc.gnu.org; gcc-patches@gcc.gnu.org Subject: Re: [EXTERNAL] Re: [PATCH][tree-optimization]Optimize combination of comparisons to dec+compare On Thu, Jan 14, 2021 at 10:04 PM Eugene Rozenfeld wrote: > > I got more feedback for the patch from Gabriel Ravier and Jakub Jelinek in > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D96674&data=04%7C01%7CEugene.Rozenfeld%40microsoft.com%7Cf4b1e41de6b4469fd3bb08d8b94c5a20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637463084866262315%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2Fc77uS%2BNlNGmXwOmK729BByEW0VDiq1HEe8BA7DpI30%3D&reserved=0 > and re-worked it accordingly. > > The changes from the previous patch are: > 1. Switched the tests to use __attribute__((noipa)) instead of > __attribute__((noinline)) . > 2. Fixed a type in the pattern comment. > 3. Added :c for top-level bit_ior expression. > 4. Added :s for the subexpressions. > 5. Added a pattern for the negated expression: > x >= y && y != XXX_MIN --> x > y - 1 > and the corresponding tests. > > The new patch is attached. OK. Thanks, Richard. > Eugene > > -----Original Message- > From: Richard Biener > Sent: Tuesday, January 5, 2021 4:21 AM > To: Eugene Rozenfeld > Cc: gcc-patches@gcc.gnu.org > Subject: [EXTERNAL] Re: [PATCH][tree-optimization]Optimize combination > of comparisons to dec+compare > > On Mon, Jan 4, 2021 at 9:50 PM Eugene Rozenfeld > wrote: > > > > Ping. > > > > -Original Message- > > From: Eugene Rozenfeld > > Sent: Tuesday, December 22, 2020 3:01 PM > > To: Richard Biener ; > > gcc-patches@gcc.gnu.org > > Subject: RE: Optimize combination of comparisons to dec+compare > > > > Re-sending my question and re-attaching the patch. > > > > Richard, can you please clarify your feedback? > > Hmm, OK. > > The patch is OK. > > Thanks, > Richard. > > > > Thanks, > > > > Eugene > > > > -Original Message- > > From: Gcc-patches On Behalf Of > > Eugene Rozenfeld via Gcc-patches > > Sent: Tuesday, December 15, 2020 2:06 PM > > To: Richard Biener > > Cc: gcc-patches@gcc.gnu.org > > Subject: [EXTERNAL] Re: Optimize combination of comparisons to > > dec+compare > > > > Richard, > > > > > Do we already handle x < y || x <= CST to x <= y - CST? > > > > That is an invalid transformation: e.g., consider x=3, y=4, CST=2. > > Can you please clarify? > > > > Thanks, > > > > Eugene > > > > -Original Message- > > From: Richard Biener > > Sent: Thursday, December 10, 2020 12:21 AM > > To: Eugene Rozenfeld > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: Optimize combination of comparisons to dec+compare > > > > On Thu, Dec 10, 2020 at 1:52 AM Eugene Rozenfeld via Gcc-patches > > wrote: > > > > > > This patch adds a pattern for optimizing x < y || x == XXX_MIN to > > > x <= > > > y-1 if y is an integer with TYPE_OVERFLOW_WRAPS. > > > > Do we already handle x < y || x <= CST to x <= y - CST? > > That is, the XXX_MIN case is just a special-case of generic anti-range > > testing? For anti-range testing with signed types we pun to unsigned when > > possible. > > > > > This fixes pr96674. > > > > > > Tested on x86_64-pc-linux-gnu. > > > > > > For this function > > > > > > bool f(unsigned a, unsigned b) > > > { > > > return (b == 0) | (a < b); > > > } > > > > > > the code without the patch is > > > > > > test esi,esi > > > sete al > > > cmpesi,edi > > > seta dl > > > or eax,edx > > > ret > > > > > > the code with the patch is > > > > > > subesi,0x1 > > > cmpesi,edi > > > setae al > > > ret > > > > > > Eugene > > > > > > gcc/ > > > PR tree-optimization/96674 > > > * match.pd: New pattern x < y || x == XXX_MIN --> x <= y - 1 > > > > > > gcc/testsuite > > > * gcc.dg/pr96674.c: New test. > > >
RE: [EXTERNAL] Re: [PATCH] [tree-optimization] Optimize two patterns with three xors.
Thank you for installing my patch Jeff! Yes, I intend to contribute regularly. I'm working on getting copyright assignment/disclaimer paperwork approved by my employer. I'll apply for commit privs after that. Eugene -Original Message- From: Jeff Law Sent: Wednesday, November 18, 2020 11:33 AM To: Richard Biener ; Eugene Rozenfeld Cc: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: [PATCH] [tree-optimization] Optimize two patterns with three xors. On 11/17/20 12:57 AM, Richard Biener via Gcc-patches wrote: > On Tue, Nov 17, 2020 at 3:19 AM Eugene Rozenfeld > wrote: >> Thank you for the review Richard! >> >> I re-worked the patch based on your suggestions (attached). >> I made the change to reuse the first bit_xor in both patterns and I added :s >> to the last xor in the first pattern. >> For the second pattern I didn't add :s because I think the simplification is >> beneficial even if the second or third bit_xor has more than one use since >> we are simplifying them to just a single operand (@2). If that is incorrect, >> please explain why. > Ah, true, that's correct. > > The patch is OK. I've installed this on the trunk. Eugene, if you're going to contribute regularly you should probably go ahead and get commit privs so that you can commit ACK's patches yourself. There should be a link to a form from this page: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fgitwrite.html&data=04%7C01%7Ceugene.rozenfeld%40microsoft.com%7Ca31f5335968749cdfe1708d88bf8bfb2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637413247775369684%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BfQGAtU%2F8IJ8%2BRcvuNI8qKWgnC9oPFtWQhXo1RzlBTU%3D&reserved=0 Jeff
[PATCH] [tree-optimization] Optimize or+and+or pattern to and+or
Simplify ((b | c) & a) | b to (a & c) | b. This fixes PR96679. Tested on x86_64-pc-linux-gnu. int f(int a, int b, int c) { return ((b | c) & a) | b; } Code without the patch: or edx,esi andedx,edi moveax,edx or eax,esi ret Code with the patch: andedi,edx moveax,edi or eax,esi ret Eugene 0001-Optimize-or-and-or-pattern-to-and-or.patch Description: 0001-Optimize-or-and-or-pattern-to-and-or.patch
[PATCH] [tree-optimization] Optimize max/min pattern with comparison
Make the following simplifications: X <= MAX(X, Y) -> true X > MAX(X, Y) -> false X >= MIN(X, Y) -> true X < MIN(X, Y) -> false This fixes PR96708. Tested on x86_64-pc-linux-gnu. bool f(int a, int b) { int tmp = (a < b) ? b : a; return tmp >= a; } Code without the patch: vmovd xmm0,edi vmovd xmm1,esi vpmaxsd xmm0,xmm0,xmm1 vmovd eax,xmm0 cmpeax,edi setge al ret Code with the patch: moveax,0x1 ret Eugene 0001-Optimize-max-pattern-with-comparison.patch Description: 0001-Optimize-max-pattern-with-comparison.patch
Re: [PATCH] [tree-optimization] Optimize max/min pattern with comparison
Thank you for the review Jeff. I don't need to look at the opcode to know the result. The pattern will be matched only in these 4 cases: X <= MAX(X, Y) -> true X > MAX(X, Y) -> false X >= MIN(X, Y) -> true X < MIN(X, Y) -> false So, the result will be true for GE_EXPR and LE_EXPR and false otherwise. I added two test files: one for positive cases and one for negative cases. The updated patch is attached. Thanks, Eugene -Original Message- From: Jeff Law Sent: Monday, November 30, 2020 9:51 AM To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [tree-optimization] Optimize max/min pattern with comparison On 11/25/20 3:04 PM, Eugene Rozenfeld via Gcc-patches wrote: > Make the following simplifications: > X <= MAX(X, Y) -> true > X > MAX(X, Y) -> false > X >= MIN(X, Y) -> true > X < MIN(X, Y) -> false > > This fixes PR96708. > > Tested on x86_64-pc-linux-gnu. > > bool f(int a, int b) > { > int tmp = (a < b) ? b : a; > return tmp >= a; > } > > Code without the patch: > > vmovd xmm0,edi > vmovd xmm1,esi > vpmaxsd xmm0,xmm0,xmm1 > vmovd eax,xmm0 > cmpeax,edi > setge al > ret > > Code with the patch: > > moveax,0x1 > ret > > Eugene > > 0001-Optimize-max-pattern-with-comparison.patch > > From f6391c197b670b516238ac7707512c1358336520 Mon Sep 17 00:00:00 2001 > From: Eugene Rozenfeld > Date: Sat, 21 Nov 2020 01:08:50 -0800 > Subject: [PATCH] Optimize max pattern with comparison > > Make the following simplifications: > X <= MAX(X, Y) -> true > X > MAX(X, Y) -> false > X >= MIN(X, Y) -> true > X < MIN(X, Y) -> false > > This fixes PR96708. > > gcc/ > * match.pd : New patterns. > --- > gcc/match.pd | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd index > cbb4bf0b32d..75237741946 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -2851,6 +2851,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >(cmp (minmax @0 INTEGER_CST@1) INTEGER_CST@2) >(comb (cmp @0 @2) (cmp @1 @2 > > +/* X <= MAX(X, Y) -> true > + X > MAX(X, Y) -> false > + X >= MIN(X, Y) -> true > + X < MIN(X, Y) -> false */ > +(for minmax (min min max max ) > + cmp(ge lt le gt ) > + (simplify > + (cmp @0 (minmax:c @0 @1)) > + { constant_boolean_node (cmp == GE_EXPR || cmp == LE_EXPR, type); } > +)) Don't you need to look at the opcode (MIN vs MAX) vs CMP to know the result? I'd really like to see some tests for the testsuite.   In particular I'd like to see positive tests where we should apply the optimization and negative tests when we should not apply the optimization. I also wonder if there's value in handling this in Ranger and/or DOM. Though I'd probably wait to see if fixing in match.pd is sufficient to cover the cases I'm thinking of in Ranger & DOM. Jeff 0001-Optimize-min-and-max-patterns-with-comparison.patch Description: 0001-Optimize-min-and-max-patterns-with-comparison.patch
Optimize combination of comparisons to dec+compare
This patch adds a pattern for optimizing x < y || x == XXX_MIN to x <= y-1 if y is an integer with TYPE_OVERFLOW_WRAPS. This fixes pr96674. Tested on x86_64-pc-linux-gnu. For this function bool f(unsigned a, unsigned b) { return (b == 0) | (a < b); } the code without the patch is test esi,esi sete al cmpesi,edi seta dl or eax,edx ret the code with the patch is subesi,0x1 cmpesi,edi setae al ret Eugene gcc/ PR tree-optimization/96674 * match.pd: New pattern x < y || x == XXX_MIN --> x <= y - 1 gcc/testsuite * gcc.dg/pr96674.c: New test. 0001-Optimize-combination-of-comparisons-to-dec-compare.patch Description: 0001-Optimize-combination-of-comparisons-to-dec-compare.patch
[PATCH] [tree-optimization] Fix for PR97223
This patch adds a pattern for folding x < (short) ((unsigned short)x + const) to x <= SHORT_MAX - const (and similarly for other integral types) if const is not 0. as described in PR97223. For example, without this patch the x86_64-pc-linux code generated for this function bool f(char x) { return x < (char)(x + 12); } is leaeax,[rdi+0xc] cmpal,dil setg al ret With the patch the code is cmpdil,0x73 setle al ret Tested on x86_64-pc-linux. Eugene 0001-Add-a-tree-optimization-described-in-PR97223.patch Description: 0001-Add-a-tree-optimization-described-in-PR97223.patch
RE: [EXTERNAL] Re: [PATCH] [tree-optimization] Fix for PR97223
Thank you for the review Richard! I re-worked the patch based on your suggestions. I combined the two patterns. Neither one requires a signedness check as long as the type of the 'add' has overflow wrap semantics. I had to modify the regular expression in no-strict-overflow-4.c test. In that test the following function is compiled with -fno-strict-overflow : int foo (int i) { return i + 1 > i; } We now optimize this function so that the tree-optimized dump has ;; Function foo (foo, funcdef_no=0, decl_uid=1931, cgraph_uid=1, symbol_order=0) foo (int i) { _Bool _1; int _3; [local count: 1073741824]: _1 = i_2(D) != 2147483647; _3 = (int) _1; return _3; } This is a correct optimization since -fno-strict-overflow implies -fwrapv. Eugene -Original Message- From: Richard Biener Sent: Tuesday, October 27, 2020 2:23 AM To: Eugene Rozenfeld Cc: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: [PATCH] [tree-optimization] Fix for PR97223 On Sat, Oct 24, 2020 at 2:20 AM Eugene Rozenfeld via Gcc-patches wrote: > > This patch adds a pattern for folding > x < (short) ((unsigned short)x + const) to > x <= SHORT_MAX - const > (and similarly for other integral types) if const is not 0. > as described in PR97223. > > For example, without this patch the x86_64-pc-linux code generated for > this function > > bool f(char x) > { > return x < (char)(x + 12); > } > > is > > leaeax,[rdi+0xc] > cmpal,dil > setg al > ret > > With the patch the code is > > cmpdil,0x73 > setle al > ret > > Tested on x86_64-pc-linux. +/* Similar to the previous pattern but with additional casts. */ (for +cmp (lt le ge gt) + out (gt gt le le) + (simplify + (cmp:c (convert@3 (plus@2 (convert@4 @0) INTEGER_CST@1)) @0) + (if (!TYPE_UNSIGNED (TREE_TYPE (@0)) + && types_match (TREE_TYPE (@0), TREE_TYPE (@3)) + && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE (@0))) + && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@4)) + && wi::to_wide (@1) != 0 + && single_use (@2)) + (with { unsigned int prec = TYPE_PRECISION (TREE_TYPE (@0)); } +(out @0 { wide_int_to_tree (TREE_TYPE (@0), + wi::max_value (prec, SIGNED) + - wi::to_wide (@1)); }) I think it's reasonable but the comment can be made more precise. In particular I wonder why we require a signed comparison here while the previous pattern requires an unsigned comparison. It might be an artifact and the restriction instead only applies to the plus? Note that + && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE + (@0))) unsigned_type_for should be avoided since it's quite expensive. May I suggest && TYPE_UNSIGNED (TREE_TYPE (@4)) && tree_nop_conversion_p (TREE_TYPE (@4), TREE_TYPE (@0)) instead? I originally wondered if "but with additional casts" could be done in a single pattern via (convert? ...) uses but then I noticed the strange difference in the comparison signedness requirement ... Richard. > Eugene > 0001-Add-a-tree-optimization-described-in-PR97223.patch Description: 0001-Add-a-tree-optimization-described-in-PR97223.patch
[PATCH] [tree-optimization] Fix for PR96701
This patch adds a pattern for folding x >> x to 0 as described in PR96701. Without this patch the x86_64-pc-linux-gnu code generated for this function int foo (int i) { return i >> i; } is movecx,edi saredi,cl test edi,edi setne al ret With the patch the code is xoreax,eax ret Tested on x86_64-pc-linux-gnu. Eugene 0001-Optimize-self-right-shift-to-0.patch Description: 0001-Optimize-self-right-shift-to-0.patch
[PATCH] [tree-optimization] Fix for PR96701
This patch adds a pattern for folding x >> x to 0 as described in PR96701. Without this patch the x86_64-pc-linux-gnu code generated for this function int foo (int i) { return i >> i; } is mov ecx,edi sar edi,cl test edi,edi setne al ret With the patch the code is xor eax,eax ret Tested on x86_64-pc-linux-gnu. Eugene 0001-Optimize-self-right-shift-to-0.patch Description: 0001-Optimize-self-right-shift-to-0.patch
[PATCH][PUSHED] Fix cfg maintenance after inlining in AutoFDO
Todo from early_inliner needs to be propagated so that cleanup_tree_cfg () is called if necessary. This bug was causing an assert in get_loop_body during ipa-sra in autoprofiledbootstrap build since loops weren't fixed up and one of the loops had num_nodes set to 0. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * auto-profile.cc (auto_profile): Check todo from early_inline to see if cleanup_tree_vfg needs to be called. (early_inline): Return todo from early_inliner. --- gcc/auto-profile.cc | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index 360c42c4b89..e3af3555e75 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -1589,13 +1589,14 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts) /* Wrapper function to invoke early inliner. */ -static void +static unsigned int early_inline () { compute_fn_summary (cgraph_node::get (current_function_decl), true); - unsigned todo = early_inliner (cfun); + unsigned int todo = early_inliner (cfun); if (todo & TODO_update_ssa_any) update_ssa (TODO_update_ssa); + return todo; } /* Use AutoFDO profile to annoate the control flow graph. @@ -1651,20 +1652,22 @@ auto_profile (void) function before annotation, so the profile inside bar@loc_foo2 will be useful. */ autofdo::stmt_set promoted_stmts; +unsigned int todo = 0; for (int i = 0; i < 10; i++) { -if (!flag_value_profile_transformations -|| !autofdo::afdo_vpt_for_early_inline (&promoted_stmts)) - break; -early_inline (); + if (!flag_value_profile_transformations + || !autofdo::afdo_vpt_for_early_inline (&promoted_stmts)) + break; + todo |= early_inline (); } -early_inline (); +todo |= early_inline (); autofdo::afdo_annotate_cfg (promoted_stmts); compute_function_frequency (); /* Local pure-const may imply need to fixup the cfg. */ -if (execute_fixup_cfg () & TODO_cleanup_cfg) +todo |= execute_fixup_cfg (); +if (todo & TODO_cleanup_cfg) cleanup_tree_cfg (); free_dominance_info (CDI_DOMINATORS); @@ -1674,7 +1677,7 @@ auto_profile (void) pop_cfun (); } - return TODO_rebuild_cgraph_edges; + return 0; } } /* namespace autofdo. */ -- 2.25.1
[PATCH] Add instruction level discriminator support.
This is the first in a series of patches to enable discriminator support in AutoFDO. This patch switches to tracking discriminators per statement/instruction instead of per basic block. Tracking per basic block was problematic since not all statements in a basic block needed a discriminator and, also, later optimizations could move statements between basic blocks making correlation during AutoFDO compilation unreliable. Tracking per statement also allows us to assign different discriminators to multiple function calls in the same basic block. A subsequent patch will add that support. The idea of this patch is based on commit 4c311d95cf6d9519c3c20f641cc77af7df491fdf by Dehao Chen in vendors/google/heads/gcc-4_8 but uses a slightly different approach. In Dehao's work special (normally unused) location ids and side tables were used to keep track of locations with discriminators. Things have changed since then and I don't think we have unused location ids anymore. Instead, I made discriminators a part of ad-hoc locations. The difference from Dehao's work also includes support for discriminator reading/writing in lto streaming and in modules. Tested on x86_64-pc-linux-gnu. 0001-Add-instruction-level-discriminator-support.patch Description: 0001-Add-instruction-level-discriminator-support.patch
[PING][PATCH] Add instruction level discriminator support.
Hello, I'd like to ping this patch: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596065.html Thanks, Eugene -Original Message- From: Gcc-patches On Behalf Of Eugene Rozenfeld via Gcc-patches Sent: Thursday, June 02, 2022 12:22 AM To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan Hubicka Subject: [EXTERNAL] [PATCH] Add instruction level discriminator support. This is the first in a series of patches to enable discriminator support in AutoFDO. This patch switches to tracking discriminators per statement/instruction instead of per basic block. Tracking per basic block was problematic since not all statements in a basic block needed a discriminator and, also, later optimizations could move statements between basic blocks making correlation during AutoFDO compilation unreliable. Tracking per statement also allows us to assign different discriminators to multiple function calls in the same basic block. A subsequent patch will add that support. The idea of this patch is based on commit 4c311d95cf6d9519c3c20f641cc77af7df491fdf by Dehao Chen in vendors/google/heads/gcc-4_8 but uses a slightly different approach. In Dehao's work special (normally unused) location ids and side tables were used to keep track of locations with discriminators. Things have changed since then and I don't think we have unused location ids anymore. Instead, I made discriminators a part of ad-hoc locations. The difference from Dehao's work also includes support for discriminator reading/writing in lto streaming and in modules. Tested on x86_64-pc-linux-gnu. 0001-Add-instruction-level-discriminator-support.patch Description: 0001-Add-instruction-level-discriminator-support.patch
RE: [EXTERNAL] Re: [PATCH] Add instruction level discriminator support.
Thank you for the review Andi. Yes, the corresponding reader change is @@ -15671,8 +15673,9 @@ module_state::read_location (bytes_in &sec) const if (range.m_start == UNKNOWN_LOCATION) range.m_start = locus; range.m_finish = read_location (sec); + unsigned discriminator = sec.u (); if (locus != loc && range.m_start != loc && range.m_finish != loc) - locus = get_combined_adhoc_loc (line_table, locus, range, NULL); + locus = get_combined_adhoc_loc (line_table, locus, range, NULL, discriminator); } break; -Original Message- From: Andi Kleen Sent: Sunday, June 12, 2022 4:53 PM To: Eugene Rozenfeld via Gcc-patches Cc: Jan Hubicka ; Eugene Rozenfeld Subject: [EXTERNAL] Re: [PATCH] Add instruction level discriminator support. Eugene Rozenfeld via Gcc-patches writes: > { > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index > d1dc73724d1..5ed6b7b0f94 100644 > --- a/gcc/cp/module.cc > +++ b/gcc/cp/module.cc > @@ -15587,6 +15587,8 @@ module_state::write_location (bytes_out &sec, > location_t loc) > range.m_start = UNKNOWN_LOCATION; >write_location (sec, range.m_start); >write_location (sec, range.m_finish); > + unsigned discriminator = get_discriminator_from_adhoc_loc (line_table, > loc); > + sec.u (discriminator); I hope this has a corresponding reader change, wasn't fully clear. Should it use some common function? The patch looks good to me, but I cannot approve. -Andi
RE: [PING][PATCH] Add instruction level discriminator support.
Another ping for https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596065.html . I got a review from Andi (https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596549.html) but I also need a review from someone who can approve the changes. Thanks, Eugene -Original Message- From: Eugene Rozenfeld Sent: Friday, June 10, 2022 12:03 PM To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan Hubicka Subject: [PING][PATCH] Add instruction level discriminator support. Hello, I'd like to ping this patch: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596065.html Thanks, Eugene -Original Message- From: Gcc-patches On Behalf Of Eugene Rozenfeld via Gcc-patches Sent: Thursday, June 02, 2022 12:22 AM To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan Hubicka Subject: [EXTERNAL] [PATCH] Add instruction level discriminator support. This is the first in a series of patches to enable discriminator support in AutoFDO. This patch switches to tracking discriminators per statement/instruction instead of per basic block. Tracking per basic block was problematic since not all statements in a basic block needed a discriminator and, also, later optimizations could move statements between basic blocks making correlation during AutoFDO compilation unreliable. Tracking per statement also allows us to assign different discriminators to multiple function calls in the same basic block. A subsequent patch will add that support. The idea of this patch is based on commit 4c311d95cf6d9519c3c20f641cc77af7df491fdf by Dehao Chen in vendors/google/heads/gcc-4_8 but uses a slightly different approach. In Dehao's work special (normally unused) location ids and side tables were used to keep track of locations with discriminators. Things have changed since then and I don't think we have unused location ids anymore. Instead, I made discriminators a part of ad-hoc locations. The difference from Dehao's work also includes support for discriminator reading/writing in lto streaming and in modules. Tested on x86_64-pc-linux-gnu. 0001-Add-instruction-level-discriminator-support.patch Description: 0001-Add-instruction-level-discriminator-support.patch
[COMMITTED] MAINTAINERS: Add myself for write after approval
ChangeLog: 2021-03-12 Eugene Rozenfeld * MAINTAINERS (Write After Approval): Add myself. 0001-MAINTAINERS-Add-myself-for-write-after-approval.patch Description: 0001-MAINTAINERS-Add-myself-for-write-after-approval.patch
RE: [EXTERNAL] [PATCH] gcov: Use system IO buffering
The commit from this patch (https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=23eb66d1d46a34cb28c4acbdf8a1deb80a7c5a05) changed the semantics of gcov_read_string and gcov_write_string. Before this change the string storage was as described in gcov-io.h: "Strings are padded with 1 to 4 NUL bytes, to bring the length up to a multiple of 4. The number of 4 bytes is stored, followed by the padded string." After this change the number before the string indicates the number of bytes (not words) and there is no padding. Was this file format change intentional? It breaks AutoFDO because create_gcov produces strings in the format specified in gcov-io.h. Thanks, Eugene -Original Message- From: Gcc-patches On Behalf Of Martin Liška Sent: Wednesday, April 21, 2021 12:52 AM To: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] [PATCH] gcov: Use system IO buffering Hey. I/O buffering in gcov seems duplicite to what modern C library can provide. The patch is a simplification and can provide easier interface for system that don't have a filesystem and would like using GCOV. I'm going to install the patch after 11.1 if there are no objections. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Thanks, Martin gcc/ChangeLog: * gcov-io.c (gcov_write_block): Remove. (gcov_write_words): Likewise. (gcov_read_words): Re-implement using gcov_read_bytes. (gcov_allocate): Remove. (GCOV_BLOCK_SIZE): Likewise. (struct gcov_var): Remove most of the fields. (gcov_position): Implement with ftell. (gcov_rewrite): Remove setting of start and offset fields. (from_file): Re-format. (gcov_open): Remove setbuf call. It should not be needed. (gcov_close): Remove internal buffer handling. (gcov_magic): Use __builtin_bswap32. (gcov_write_counter): Use directly gcov_write_unsigned. (gcov_write_string): Use direct fwrite and do not round to 4 bytes. (gcov_seek): Use directly fseek. (gcov_write_tag): Use gcov_write_unsigned directly. (gcov_write_length): Likewise. (gcov_write_tag_length): Likewise. (gcov_read_bytes): Use directly fread. (gcov_read_unsigned): Use gcov_read_words. (gcov_read_counter): Likewise. (gcov_read_string): Use gcov_read_bytes. * gcov-io.h (GCOV_WORD_SIZE): Adjust to reflect that size is not in bytes, but words (4B). (GCOV_TAG_FUNCTION_LENGTH): Likewise. (GCOV_TAG_ARCS_LENGTH): Likewise. (GCOV_TAG_ARCS_NUM): Likewise. (GCOV_TAG_COUNTER_LENGTH): Likewise. (GCOV_TAG_COUNTER_NUM): Likewise. (GCOV_TAG_SUMMARY_LENGTH): Likewise. libgcc/ChangeLog: * libgcov-driver.c: Fix GNU coding style. --- gcc/gcov-io.c | 282 +--- gcc/gcov-io.h | 17 ++- libgcc/libgcov-driver.c | 6 +- 3 files changed, 76 insertions(+), 229 deletions(-) diff --git a/gcc/gcov-io.c b/gcc/gcov-io.c index 80c9082a649..bd2316dedab 100644 --- a/gcc/gcov-io.c +++ b/gcc/gcov-io.c @@ -27,40 +27,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see /* Routines declared in gcov-io.h. This file should be #included by another source file, after having #included gcov-io.h. */ -#if !IN_GCOV -static void gcov_write_block (unsigned); -static gcov_unsigned_t *gcov_write_words (unsigned); -#endif -static const gcov_unsigned_t *gcov_read_words (unsigned); -#if !IN_LIBGCOV -static void gcov_allocate (unsigned); -#endif - -/* Optimum number of gcov_unsigned_t's read from or written to disk. */ -#define GCOV_BLOCK_SIZE (1 << 10) +static gcov_unsigned_t *gcov_read_words (void *buffer, unsigned); struct gcov_var { FILE *file; - gcov_position_t start; /* Position of first byte of block */ - unsigned offset; /* Read/write position within the block. */ - unsigned length; /* Read limit in the block. */ - unsigned overread; /* Number of words overread. */ int error; /* < 0 overflow, > 0 disk error. */ - int mode;/* < 0 writing, > 0 reading */ + int mode;/* < 0 writing, > 0 reading. */ int endian; /* Swap endianness. */ -#if IN_LIBGCOV - /* Holds one block plus 4 bytes, thus all coverage reads & writes - fit within this buffer and we always can transfer GCOV_BLOCK_SIZE - to and from the disk. libgcov never backtracks and only writes 4 - or 8 byte objects. */ - gcov_unsigned_t buffer[GCOV_BLOCK_SIZE + 1]; -#else - /* Holds a variable length block, as the compiler can write - strings and needs to backtrack. */ - size_t alloc; - gcov_unsigned_t *buffer; -#endif } gcov_var; /* Save the current position in the gcov file. */ @@ -71,8 +45,7 @@ static inline gcov_position_t gcov_position (void) { - gcov_nonruntime_assert (gcov_var.mode
RE: [EXTERNAL] [PATCH] gcov: Use system IO buffering
Thank you for your reply Martin! AUTO_PROFILE_VERSION should also be changed. Then create_gcov can be updated to support both the old format and the new format. Eugene -Original Message- From: Martin Liška Sent: Thursday, June 17, 2021 2:38 AM To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org Subject: Re: [EXTERNAL] [PATCH] gcov: Use system IO buffering On 6/17/21 3:59 AM, Eugene Rozenfeld wrote: > |The commit from this patch > (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fgit%2F%3Fp%3Dgcc.git%3Ba%3Dcommit%3Bh%3D23eb66d1d46a34cb28c4acbdf8a1deb80a7c5a05&data=04%7C01%7CEugene.Rozenfeld%40microsoft.com%7C508d63026ea84be211cc08d9317395bb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637595194782996821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UG%2B41tXMZ94%2Ff80qCnmq%2BtZsFkLXc9NrdWF8KXwPjnk%3D&reserved=0) > changed the semantics of gcov_read_string and gcov_write_string. Before this > change the string storage was as described in gcov-io.h: "Strings are padded > with 1 to 4 NUL bytes, to bring the length up to a multiple of 4. The number > of 4 bytes is stored, followed by the padded string." After this change the > number before the string indicates the number of bytes (not words) and there > is no padding. Was this file format change intentional? Hello. Thanks for heads up! Yes, the change was intentional and I'm going to update documentation entry in gcov-io.h. > It breaks AutoFDO because create_gcov produces strings in the format > specified in gcov-io.h. Thanks, Eugene Sure, that needs to be adjusted. Martin
RE: [PATCH][pushed] gcov: update documentation entry about string format
Thank you for updating the documentation Martin. The following line can now be removed: padding: | char:0 | char:0 char:0 | char:0 char:0 char:0 Eugene -Original Message- From: Gcc-patches On Behalf Of Martin Liška Sent: Thursday, June 17, 2021 2:40 AM To: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] [PATCH][pushed] gcov: update documentation entry about string format gcc/ChangeLog: * gcov-io.h: Update documentation entry about string format. --- gcc/gcov-io.h | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h index f7584eb9679..ff92afe63df 100644 --- a/gcc/gcov-io.h +++ b/gcc/gcov-io.h @@ -42,15 +42,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see Numbers are recorded in the 32 bit unsigned binary form of the endianness of the machine generating the file. 64 bit numbers are - stored as two 32 bit numbers, the low part first. Strings are - padded with 1 to 4 NUL bytes, to bring the length up to a multiple - of 4. The number of 4 bytes is stored, followed by the padded + stored as two 32 bit numbers, the low part first. + The number of bytes is stored, followed by the string. Zero length and NULL strings are simply stored as a length of zero (they have no trailing NUL or padding). int32: byte3 byte2 byte1 byte0 | byte0 byte1 byte2 byte3 int64: int32:low int32:high - string: int32:0 | int32:length char* char:0 padding + string: int32:0 | int32:length char* char:0 padding: | char:0 | char:0 char:0 | char:0 char:0 char:0 item: int32 | int64 | string -- 2.32.0
[PATCH] Update gen_autofdo_event.py and gcc-auto-profile
gen_autofdo_event.py was stumbling on models with stepping so I updated the script to handle this case similar to the code in https://github.com/andikleen/pmu-tools/blob/c6a5f63aede19def8886d6a8b74d7a55c38ca947/event_download.py The second change was to tolerate cases when the CPU supports PEBS but the perf command with /p fails. This can happen in, e.g., a virtual machine. I regenerated gcc-auto-profile using the updated script. contrib/ChangeLog: * gen_autofdo_event.py: handle stepping, non-working PEBS gcc/ChangeLog: * config/i386/gcc-auto-profile: regenerate --- contrib/gen_autofdo_event.py | 54 ++-- gcc/config/i386/gcc-auto-profile | 41 +++- 2 files changed, 71 insertions(+), 24 deletions(-) diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py index c97460c61c6..1eb6f1d6d85 100755 --- a/contrib/gen_autofdo_event.py +++ b/contrib/gen_autofdo_event.py @@ -46,20 +46,29 @@ args = ap.parse_args() eventmap = collections.defaultdict(list) -def get_cpu_str(): -with open('/proc/cpuinfo', 'r') as c: -vendor, fam, model = None, None, None -for j in c: -n = j.split() -if n[0] == 'vendor_id': -vendor = n[2] -elif n[0] == 'model' and n[1] == ':': -model = int(n[2]) -elif n[0] == 'cpu' and n[1] == 'family': -fam = int(n[3]) -if vendor and fam and model: -return "%s-%d-%X" % (vendor, fam, model), model -return None, None +def get_cpustr(): +cpuinfo = os.getenv("CPUINFO") +if cpuinfo is None: +cpuinfo = '/proc/cpuinfo' +f = open(cpuinfo, 'r') +cpu = [None, None, None, None] +for j in f: +n = j.split() +if n[0] == 'vendor_id': +cpu[0] = n[2] +elif n[0] == 'model' and n[1] == ':': +cpu[2] = int(n[2]) +elif n[0] == 'cpu' and n[1] == 'family': +cpu[1] = int(n[3]) +elif n[0] == 'stepping' and n[1] == ':': +cpu[3] = int(n[2]) +if all(v is not None for v in cpu): +break +# stepping for SKX only +stepping = cpu[0] == "GenuineIntel" and cpu[1] == 6 and cpu[2] == 0x55 +if stepping: +return "%s-%d-%X-%X" % tuple(cpu) +return "%s-%d-%X" % tuple(cpu)[:3] def find_event(eventurl, model): print >>sys.stderr, "Downloading", eventurl @@ -81,7 +90,7 @@ def find_event(eventurl, model): return found if not args.all: -cpu, model = get_cpu_str() +cpu = get_cpu_str() if not cpu: sys.exit("Unknown CPU type") @@ -94,7 +103,8 @@ for j in u: n = j.rstrip().split(',') if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core": if args.all: -vendor, fam, model = n[0].split("-") +components = n[0].split("-") +model = components[2] model = int(model, 16) cpufound += 1 found += find_event(baseurl + n[2], model) @@ -146,7 +156,17 @@ case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo && echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script." exit 1 ;;''' print "esac" -print 'exec perf record -e $E -b "$@"' +print "set -x" +print 'if ! perf record -e $E -b "$@" ; then' +print ' # PEBS may not actually be working even if the processor supports it' +print ' # (e.g., in a virtual machine). Trying to run without /p.' +print ' set +x' +print ' echo >&2 "Retrying without /p."' +print ' E="$(echo "${E}" | sed -e \'s/\/p/\//\')"' +print ' set -x' +print ' exec perf record -e $E -b "$@"' +print ' set +x' +print 'fi' if cpufound == 0 and not args.all: sys.exit('CPU %s not found' % cpu) diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile index 5da5c63cd84..56f64cbff1f 100755 --- a/gcc/config/i386/gcc-auto-profile +++ b/gcc/config/i386/gcc-auto-profile @@ -1,7 +1,7 @@ #!/bin/sh -# profile workload for gcc profile feedback (autofdo) using Linux perf -# auto generated. to regenerate for new CPUs run -# contrib/gen_autofdo_event.py --shell --all in gcc source +# Profile workload for gcc profile feedback (autofdo) using Linux perf. +# Auto generated. To regenerate for new CPUs run +# contrib/gen_autofdo_event.py --script --all in gcc source # usages: # gcc-auto-profile program (profile program and children) @@ -10,7 +10,7 @@ # gcc-auto-profile --kernel -a sleep X (profile kernel) # gcc-auto-profile --all -a sleep X(profile kernel and user space) -# identify branches taken event for CPU +# Identify branches taken event for CPU. # FLAGS=u @@ -37,7 +37,12 @@ case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo && egrep "^model\s*:" /proc/cpuinfo | head -n1` in model*:\ 55|\ model*:\ 77|\ -model*:\ 76) E="cpu/event=0xC4,umask=0xFE/p$FLAGS" ;; +model*:\ 76|\ +model*:\ 92|\ +
[PATCH] Fix count comparison in ipa-cp
The existing comparison was incorrect for non-PRECISE counts (e.g., AFDO): we could end up with a 0 base_count, which could lead to asserts, e.g., in good_cloning_opportunity_p. gcc/ChangeLog: * ipa-cp.cc (ipcp_propagate_stage): Fix profile count comparison. --- gcc/ipa-cp.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index d2bcd5e5e69..9df8b456759 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -4225,7 +4225,7 @@ ipcp_propagate_stage (class ipa_topo_info *topo) for (cgraph_edge *cs = node->callees; cs; cs = cs->next_callee) { profile_count count = cs->count.ipa (); - if (!(count > profile_count::zero ())) + if (!count.nonzero_p ()) continue; enum availability avail; -- 2.25.1
[PATCH] Fix autoprofiledbootstrap build
1. Fix gcov version 2. Don't attempt to create an autoprofile file for cc1 since cc1plus (not cc1) is not invoked when building cc1 3. Fix documentation typo Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * c/Make-lang.in: Don't attempt to create an autoprofile file for cc1 * cp/Make-lang.in: Fix gcov version * lto/Make-lang.in: Fix gcov version * doc/install.texi: Fix documentation typo --- gcc/c/Make-lang.in | 15 +-- gcc/cp/Make-lang.in | 2 +- gcc/doc/install.texi | 2 +- gcc/lto/Make-lang.in | 2 +- 4 files changed, 4 insertions(+), 17 deletions(-) diff --git a/gcc/c/Make-lang.in b/gcc/c/Make-lang.in index 9bd9c0ea123..ba33ec03bf0 100644 --- a/gcc/c/Make-lang.in +++ b/gcc/c/Make-lang.in @@ -62,12 +62,6 @@ c_OBJS = $(C_OBJS) cc1-checksum.o c/gccspec.o # Use strict warnings for this front end. c-warn = $(STRICT_WARN) -ifeq ($(if $(wildcard ../stage_current),$(shell cat \ - ../stage_current)),stageautofeedback) -$(C_OBJS): ALL_COMPILERFLAGS += -fauto-profile=cc1.fda -$(C_OBJS): cc1.fda -endif - # compute checksum over all object files and the options # re-use the checksum from the prev-final stage so it passes # the bootstrap comparison and allows comparing of the cc1 binary @@ -88,9 +82,6 @@ cc1$(exeext): $(C_OBJS) cc1-checksum.o $(BACKEND) $(LIBDEPS) cc1-checksum.o $(BACKEND) $(LIBS) $(BACKENDLIBS) @$(call LINK_PROGRESS,$(INDEX.c),end) -cc1.fda: ../stage1-gcc/cc1$(exeext) ../prev-gcc/$(PERF_DATA) - $(CREATE_GCOV) -binary ../stage1-gcc/cc1$(exeext) -gcov cc1.fda -profile ../prev-gcc/$(PERF_DATA) -gcov_version 1 - # # Build hooks: @@ -180,7 +171,6 @@ c.mostlyclean: -rm -f cc1$(exeext) -rm -f c/*$(objext) -rm -f c/*$(coverageexts) - -rm -f cc1.fda c.clean: c.distclean: -rm -f c/config.status c/Makefile @@ -201,7 +191,4 @@ c.stageprofile: stageprofile-start -mv c/*$(objext) stageprofile/c c.stagefeedback: stagefeedback-start -mv c/*$(objext) stagefeedback/c -c.autostageprofile: autostageprofile-start - -mv c/*$(objext) autostageprofile/c -c.autostagefeedback: autostagefeedback-start - -mv c/*$(objext) autostagefeedback/c + diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in index 291835d326e..49e5cd66912 100644 --- a/gcc/cp/Make-lang.in +++ b/gcc/cp/Make-lang.in @@ -178,7 +178,7 @@ endif cp/name-lookup.o: $(srcdir)/cp/std-name-hint.h cc1plus.fda: ../stage1-gcc/cc1plus$(exeext) ../prev-gcc/$(PERF_DATA) - $(CREATE_GCOV) -binary ../stage1-gcc/cc1plus$(exeext) -gcov cc1plus.fda -profile ../prev-gcc/$(PERF_DATA) -gcov_version 1 + $(CREATE_GCOV) -binary ../stage1-gcc/cc1plus$(exeext) -gcov cc1plus.fda -profile ../prev-gcc/$(PERF_DATA) -gcov_version 2 # # Build hooks: diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index c1876f24a84..61a483bc410 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -3059,7 +3059,7 @@ It is recommended to only use GCC for this. On Linux/x86_64 hosts with some restrictions (no virtualization) it is also possible to do autofdo build with @samp{make -autoprofiledback}. This uses Linux perf to sample branches in the +autoprofiledbootstrap}. This uses Linux perf to sample branches in the binary and then rebuild it with feedback derived from the profile. Linux perf and the @code{autofdo} toolkit needs to be installed for this. diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in index a2dcf0dfc12..3ee748489ac 100644 --- a/gcc/lto/Make-lang.in +++ b/gcc/lto/Make-lang.in @@ -106,7 +106,7 @@ $(LTO_DUMP_EXE): $(LTO_DUMP_OBJS) $(BACKEND) $(LIBDEPS) $(lto2.prev) lto/lto-dump.o: $(LTO_OBJS) lto1.fda: ../prev-gcc/lto1$(exeext) ../prev-gcc/$(PERF_DATA) - $(CREATE_GCOV) -binary ../prev-gcc/lto1$(exeext) -gcov lto1.fda -profile ../prev-gcc/$(PERF_DATA) -gcov_version 1 + $(CREATE_GCOV) -binary ../prev-gcc/lto1$(exeext) -gcov lto1.fda -profile ../prev-gcc/$(PERF_DATA) -gcov_version 2 # LTO testing is done as part of C/C++/Fortran etc. testing. check-lto: -- 2.25.1
RE: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build
I took another look at this. We actually collect perf data when building the libraries. So, we have ./prev-gcc/perf.data, ./prev-libcpp/perf.data, ./prev-libiberty/perf.data, etc. But when creating gcov data for -fauto-profile build of cc1plus or cc1 we only use ./prev-gcc/perf.data . So, a better solution would be either having a single perf.data for all builds (gcc and libraries) or merging perf.data files before attempting autostagefeedback. What would you recommend? Thanks, Eugene -Original Message- From: Jeff Law Sent: Tuesday, November 22, 2022 12:01 PM To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org; Andi Kleen Subject: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build [You don't often get email from jeffreya...@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] On 11/21/22 14:57, Eugene Rozenfeld via Gcc-patches wrote: > 1. Fix gcov version > 2. Don't attempt to create an autoprofile file for cc1 since cc1plus > (not cc1) is not invoked when building cc1 3. Fix documentation typo > > Tested on x86_64-pc-linux-gnu. > > gcc/ChangeLog: > > * c/Make-lang.in: Don't attempt to create an autoprofile file for cc1 > * cp/Make-lang.in: Fix gcov version > * lto/Make-lang.in: Fix gcov version > * doc/install.texi: Fix documentation typo Just to be 100% sure. While the compiler is built with cc1plus, various runtime libraries are still build with the C compiler and thus would use cc1. AFAICT it looks like we don't try to build the runtime libraries to get any data about the behavior of the C compiler. Can you confirm? Assuming that's correct, this is fine for the trunk. Thanks, Jeff
RE: [EXTERNAL] Re: [PATCH] Fix count comparison in ipa-cp
I initially ran into this while reviving autoprofiledbootstrap build. I was able to create a simple reliable test for this bug and captured it in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108000 I also included the test case in the updated patch below. Eugene = The existing comparison was incorrect for non-PRECISE counts (e.g., AFDO): we could end up with a 0 base_count, which could lead to asserts, e.g., in good_cloning_opportunity_p. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: PR ipa/108000 * ipa-cp.cc (ipcp_propagate_stage): Fix profile count comparison gcc/testsuite * gcc.dg/tree-prof/pr108000.c: Regression test --- gcc/ipa-cp.cc | 2 +- gcc/testsuite/gcc.dg/tree-prof/pr108000.c | 93 +++ 2 files changed, 94 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/tree-prof/pr108000.c diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index d5230c7c5e6..cc031ebed0f 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -4225,7 +4225,7 @@ ipcp_propagate_stage (class ipa_topo_info *topo) for (cgraph_edge *cs = node->callees; cs; cs = cs->next_callee) { profile_count count = cs->count.ipa (); - if (!(count > profile_count::zero ())) + if (!count.nonzero_p ()) continue; enum availability avail; diff --git a/gcc/testsuite/gcc.dg/tree-prof/pr108000.c b/gcc/testsuite/gcc.dg/tree-prof/pr108000.c new file mode 100644 index 000..c59ea799748 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-prof/pr108000.c @@ -0,0 +1,93 @@ +/* { dg-options "-O2" } */ + +#include + +volatile int flag; +const int array_size = 10; +int* array; +int iterations = 1000; + +#define BAR(num) \ +int __attribute__((noinline)) \ +bar##num (int i, int j) \ +{ \ + if (i == 0) \ +return 2*num - 1; \ + else \ +return 2*num; \ +} + +BAR(1) +BAR(2) +BAR(3) +BAR(4) +BAR(5) +BAR(6) +BAR(7) +BAR(8) +BAR(9) +BAR(10) +BAR(11) +BAR(12) +BAR(13) +BAR(14) +BAR(15) +BAR(16) +BAR(17) +BAR(18) +BAR(19) + +int __attribute__((noinline)) +foo () +{ + switch (flag) + { + case 1: + return bar1 (0, 0); + case 2: + return bar2 (0, 0); + case 3: + return bar3 (0, 0); + case 4: + return bar4 (0, 0); + case 5: + return bar5 (0, 0); + case 6: + return bar6 (0, 0); + case 7: + return bar7 (0, 0); + case 8: + return bar8 (0, 0); + case 9: + return bar9 (0, 0); + case 10: + return bar10 (0, 0); + case 11: + return bar11 (0, 0); + case 12: + return bar12 (0, 0); + case 13: + return bar13 (0, 0); + case 14: + return bar14 (0, 0); + case 15: + return bar15 (0, 0); + case 16: + return bar16 (0, 0); + case 17: + return bar17 (0, 0); + case 18: + return bar18 (0, 0); + default: + return bar19(0, 0); + } +} + +int +main () +{ + flag = 0; + array = calloc(array_size, sizeof(int)); + for (int i = 0, j = 0; i < iterations; ++i, j = (j + 1) % 10) +array[j] = foo (); +} -- 2.25.1 -Original Message- From: Jeff Law Sent: Tuesday, November 22, 2022 12:03 PM To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: [PATCH] Fix count comparison in ipa-cp [You don't often get email from jeffreya...@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] On 11/21/22 14:26, Eugene Rozenfeld via Gcc-patches wrote: > The existing comparison was incorrect for non-PRECISE counts (e.g., > AFDO): we could end up with a 0 base_count, which could lead to > asserts, e.g., in good_cloning_opportunity_p. > > gcc/ChangeLog: > > * ipa-cp.cc (ipcp_propagate_stage): Fix profile count comparison. OK. Probably somewhat painful to pull together a reliable test for this, right? Jeff
RE: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build
Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613974.html Thanks, Eugene -Original Message- From: Eugene Rozenfeld Sent: Tuesday, March 14, 2023 2:21 PM To: Jeff Law ; gcc-patches@gcc.gnu.org; Andi Kleen Subject: RE: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build Hi Jeff, I revived profile_merger tool in http://github.com/google/autofdo and re-worked the patch to merge profiles for compiling the libraries. Please take a look at the attached patch. Thanks, Eugene -Original Message- From: Jeff Law Sent: Tuesday, November 22, 2022 10:16 PM To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org; Andi Kleen Subject: Re: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build [You don't often get email from jeffreya...@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] On 11/22/22 14:20, Eugene Rozenfeld wrote: > I took another look at this. We actually collect perf data when building the > libraries. So, we have ./prev-gcc/perf.data, ./prev-libcpp/perf.data, > ./prev-libiberty/perf.data, etc. But when creating gcov data for > -fauto-profile build of cc1plus or cc1 we only use ./prev-gcc/perf.data . So, > a better solution would be either having a single perf.data for all builds > (gcc and libraries) or merging perf.data files before attempting > autostagefeedback. What would you recommend? ISTM that if neither approach loses data, then they're functionally equivalent -- meaning that we can select whichever is easier to wire into our build system. A single perf.data might serialize the build. So perhaps separate, then merge right before autostagefeedback. But I'm willing to go with whatever you think is best. Jeff