from:"Eugene Rozenfeld"

[PATCH] Don't set full_profile in auto-profile [PR113765]

2024-03-27 Thread Eugene Rozenfeld

auto-profile currently doesn't guarantee that it will set probabilities
on all edges because of zero basic block counts. Normally those edges
just have probabilities set by the preceding profile_estimate pass but
under -O0 profile_estimate pass doesn't run. The patch removes setting
of full_profile to true in auto-profile.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
* auto-profile.cc (afdo_annotate_cfg): Don't set full_profile to true
---
 gcc/auto-profile.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index e5407d32fbb..de59b94bcb3 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1580,7 +1580,6 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts)
 }
   update_max_bb_count ();
   profile_status_for_fn (cfun) = PROFILE_READ;
-  cfun->cfg->full_profile = true;
   if (flag_value_profile_transformations)
 {
   gimple_value_profile_transformations ();
-- 
2.25.1

RE: [EXTERNAL] [PATCH] skip debug stmts when assigning locus discriminators

2023-11-08 Thread Eugene Rozenfeld

The fix looks good to me. Will this also fix 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107169 ? It was also a bad 
interaction of -gstatement-frontiers and discriminators.

Eugene

-Original Message-
From: Alexandre Oliva 
Sent: Wednesday, November 8, 2023 7:51 AM
To: gcc-patches@gcc.gnu.org
Cc: Eugene Rozenfeld 
Subject: [EXTERNAL] [PATCH] skip debug stmts when assigning locus discriminators

c-c++-common/goacc/kernels-loop-g.c has been failing (compare-debug)
on i686-linux-gnu since r13-3172, because the implementation enabled debug 
stmts to cause discriminators to be assigned differently, and the 
discriminators are printed in the .gkd dumps that -fcompare-debug compares.

This patch prevents debug stmts from affecting the discriminators in nondebug 
stmts, but enables debug stmts to get discriminators just as nondebug stmts 
would if their line numbers match.

I suppose we could arrange for discriminators to be omitted from the 
-fcompare-debug dumps, but keeping discriminators in sync is probably good to 
avoid other potential sources of divergence between debug and nondebug.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and x86_64-.  
Ok to install?

(Eugene, I suppose what's special about this testcase, that may not apply to 
most other uses of assign_discriminators, is that goacc creates new functions 
out of already optimized code.  I think assign_discriminators may not be 
suitable for new functions, with code that isn't exactly pristinely in-order.  
WDYT?)

for  gcc/ChangeLog

* tree-cfg.cc (assign_discriminators): Handle debug stmts.
---
 gcc/tree-cfg.cc |   16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index 
40a6f2a3b529f..a30a2de33a106 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -1214,6 +1214,22 @@ assign_discriminators (void)
{
  gimple *stmt = gsi_stmt (gsi);

+ /* Don't allow debug stmts to affect discriminators, but
+allow them to take discriminators when they're on the
+same line as the preceding nondebug stmt.  */
+ if (is_gimple_debug (stmt))
+   {
+ if (curr_locus != UNKNOWN_LOCATION
+ && same_line_p (curr_locus, &curr_locus_e,
+ gimple_location (stmt)))
+   {
+ location_t loc = gimple_location (stmt);
+ location_t dloc = location_with_discriminator (loc,
+curr_discr);
+ gimple_set_location (stmt, dloc);
+   }
+ continue;
+   }
  if (curr_locus == UNKNOWN_LOCATION)
{
  curr_locus = gimple_location (stmt);

--
Alexandre Oliva, happy hackerhttps://fsfla.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity Excluding 
neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH] Fixes for profile count/probability maintenance

2023-09-19 Thread Eugene Rozenfeld

Verifier checks have recently been strengthened to check that
all counts and probabilities are initialized. The checks fired
during autoprofiledbootstrap build and this patch fixes it.

gcc/ChangeLog:

* auto-profile.cc (afdo_calculate_branch_prob): Fix count comparisons
* ipa-utils.cc (ipa_merge_profiles): Guard against zero count when
computing probabilities
* tree-vect-loop-manip.cc (vect_do_peeling): Guard against zero count
when scaling loop profile

Tested on x86_64-pc-linux-gnu.

---
 gcc/auto-profile.cc |  4 ++--
 gcc/ipa-utils.cc| 16 +---
 gcc/tree-vect-loop-manip.cc |  2 +-
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index ff3b763945c..3e61f36c29b 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1434,7 +1434,7 @@ afdo_calculate_branch_prob (bb_set *annotated_bb)
   else
 total_count += AFDO_EINFO (e)->get_count ();
 }
-if (num_unknown_succ == 0 && total_count > profile_count::zero ())
+if (num_unknown_succ == 0 && total_count > profile_count::zero ().afdo ())
   {
FOR_EACH_EDGE (e, ei, bb->succs)
  e->probability
@@ -1571,7 +1571,7 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts)
   DECL_SOURCE_LOCATION (current_function_decl));
   afdo_source_profile->mark_annotated (cfun->function_start_locus);
   afdo_source_profile->mark_annotated (cfun->function_end_locus);
-  if (max_count > profile_count::zero ())
+  if (max_count > profile_count::zero ().afdo ())
 {
   /* Calculate, propagate count and probability information on CFG.  */
   afdo_calculate_branch_prob (&annotated_bb);
diff --git a/gcc/ipa-utils.cc b/gcc/ipa-utils.cc
index 956c6294fd7..3aaf7e595df 100644
--- a/gcc/ipa-utils.cc
+++ b/gcc/ipa-utils.cc
@@ -651,13 +651,15 @@ ipa_merge_profiles (struct cgraph_node *dst,
{
  edge srce = EDGE_SUCC (srcbb, i);
  edge dste = EDGE_SUCC (dstbb, i);
- dste->probability =
-   dste->probability * dstbb->count.ipa ().probability_in
-(dstbb->count.ipa ()
- + srccount.ipa ())
-   + srce->probability * srcbb->count.ipa ().probability_in
-(dstbb->count.ipa ()
- + srccount.ipa ());
+ profile_count total = dstbb->count.ipa () + srccount.ipa ();
+ if (total.nonzero_p ())
+   {
+ dste->probability =
+   dste->probability * dstbb->count.ipa ().probability_in
+   (total)
+   + srce->probability * srcbb->count.ipa ().probability_in
+   (total);
+   }
}
  dstbb->count = dstbb->count.ipa () + srccount.ipa ();
}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 09641901ff1..2608c286e5d 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3335,7 +3335,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
  free (bbs);
  free (original_bbs);
}
- else
+ else if (old_count.nonzero_p ())
scale_loop_profile (epilog, guard_to->count.probability_in 
(old_count), -1);

  /* Only need to handle basic block before epilog loop if it's not
--
2.25.1

[PATCH] Remove .PHONY targets when building .fda files during autoprofiledbootstrap

2023-09-19 Thread Eugene Rozenfeld

These .PHONY targets are always executed and were breaking `make install`
for autoprofiledbootstrap build.

gcc/ChangeLog:

* c/Make-lang.in: Make create_fdas_for_cc1 target not .PHONY
* cp/Make-lang.in: Make create_fdas_for_cc1plus target not .PHONY
* lto/Make-lang.in: Make create_fdas_for_lto1 target not .PHONY

Tested on x86_64-pc-linux-gnu.

---
 gcc/c/Make-lang.in   | 4 ++--
 gcc/cp/Make-lang.in  | 4 ++--
 gcc/lto/Make-lang.in | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/c/Make-lang.in b/gcc/c/Make-lang.in
index 79bc0dfd1cf..3ef8a674971 100644
--- a/gcc/c/Make-lang.in
+++ b/gcc/c/Make-lang.in
@@ -91,8 +91,6 @@ cc1$(exeext): $(C_OBJS) cc1-checksum.o $(BACKEND) $(LIBDEPS)
 components_in_prev = "bfd opcodes binutils fixincludes gas gcc gmp mpfr mpc 
isl gold intl ld libbacktrace libcpp libcody libdecnumber libiberty 
libiberty-linker-plugin libiconv zlib lto-plugin libctf libsframe"
 components_in_prev_target = "libstdc++-v3 libsanitizer libvtv libgcc 
libbacktrace libphobos zlib libgomp libatomic"

-.PHONY: create_fdas_for_cc1
-
 cc1.fda: create_fdas_for_cc1
$(PROFILE_MERGER) $(shell ls -ha cc1_*.fda) --output_file cc1.fda 
-gcov_version 2

@@ -116,6 +114,8 @@ create_fdas_for_cc1: ../stage1-gcc/cc1$(exeext) 
../prev-gcc/$(PERF_DATA)
$(CREATE_GCOV) -binary ../prev-gcc/cc1$(exeext) -gcov 
$$profile_name -profile $$perf_path -gcov_version 2; \  fi; \
done;
+
+   $(STAMP) $@
 #
 # Build hooks:

diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in
index ba5e8766e99..2727fb7f8cc 100644
--- a/gcc/cp/Make-lang.in
+++ b/gcc/cp/Make-lang.in
@@ -189,8 +189,6 @@ cp/name-lookup.o: $(srcdir)/cp/std-name-hint.h
 components_in_prev = "bfd opcodes binutils fixincludes gas gcc gmp mpfr mpc 
isl gold intl ld libbacktrace libcpp libcody libdecnumber libiberty 
libiberty-linker-plugin libiconv zlib lto-plugin libctf libsframe"
 components_in_prev_target = "libstdc++-v3 libsanitizer libvtv libgcc 
libbacktrace libphobos zlib libgomp libatomic"

-.PHONY: create_fdas_for_cc1plus
-
 cc1plus.fda: create_fdas_for_cc1plus
$(PROFILE_MERGER) $(shell ls -ha cc1plus_*.fda) --output_file 
cc1plus.fda -gcov_version 2

@@ -214,6 +212,8 @@ create_fdas_for_cc1plus: ../stage1-gcc/cc1plus$(exeext) 
../prev-gcc/$(PERF_DATA)
$(CREATE_GCOV) -binary ../prev-gcc/cc1plus$(exeext) -gcov 
$$profile_name -profile $$perf_path -gcov_version 2; \
  fi; \
done;
+
+   $(STAMP) $@
 #
 # Build hooks:

diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in
index 98aa9f4cc39..7dc0a9fef42 100644
--- a/gcc/lto/Make-lang.in
+++ b/gcc/lto/Make-lang.in
@@ -108,8 +108,6 @@ lto/lto-dump.o: $(LTO_OBJS)
 components_in_prev = "bfd opcodes binutils fixincludes gas gcc gmp mpfr mpc 
isl gold intl ld libbacktrace libcpp libcody libdecnumber libiberty 
libiberty-linker-plugin libiconv zlib lto-plugin libctf libsframe"
 components_in_prev_target = "libstdc++-v3 libsanitizer libvtv libgcc 
libbacktrace libphobos zlib libgomp libatomic"

-.PHONY: create_fdas_for_lto1
-
 lto1.fda: create_fdas_for_lto1
$(PROFILE_MERGER) $(shell ls -ha lto1_*.fda) --output_file lto1.fda 
-gcov_version 2

@@ -134,6 +132,8 @@ create_fdas_for_lto1: ../stage1-gcc/lto1$(exeext) 
../prev-gcc/$(PERF_DATA)
  fi; \
done;

+   $(STAMP) $@
+
 # LTO testing is done as part of C/C++/Fortran etc. testing.
 check-lto:

--
2.25.1

RE: [EXTERNAL] [PATCH] Enable autofdo bootstrap for lto/fortran

2024-11-08 Thread Eugene Rozenfeld

This line in gcc/fortran/Make-lang.in looks wrong (copy/paste?):

+f95.fda: create_fdas_for_lto1

There are no invocations of $(CREATE_GCOV in gcc/fortran/Make-lang.in so this 
is incomplete.

-Original Message-
From: Andi Kleen  
Sent: Thursday, October 31, 2024 4:19 PM
To: gcc-patches@gcc.gnu.org
Cc: Eugene Rozenfeld ; Andi Kleen 

Subject: [EXTERNAL] [PATCH] Enable autofdo bootstrap for lto/fortran

From: Andi Kleen 

When autofdo bootstrap support was originally implemented there were issues 
with the LTO bootstrap, that is why it wasn't enabled for them. I retested this 
now and it works on x86_64-linux.

Fortran was also missing, not sure why. Also enabled now.

gcc/fortran/ChangeLog:

* Make-lang.in: Enable autofdo.

gcc/lto/ChangeLog:

* Make-lang.in: Enable autofdo.
---
 gcc/fortran/Make-lang.in | 13 +++--
 gcc/lto/Make-lang.in | 14 +-
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/gcc/fortran/Make-lang.in b/gcc/fortran/Make-lang.in index 
0be3c6b654b1..7295118185fd 100644
--- a/gcc/fortran/Make-lang.in
+++ b/gcc/fortran/Make-lang.in
@@ -69,6 +69,15 @@ F95_OBJS = $(F95_PARSER_OBJS) $(FORTRAN_TARGET_OBJS) \
 
 fortran_OBJS = $(F95_OBJS) fortran/gfortranspec.o
 
+ifeq ($(if $(wildcard ../stage_current),$(shell cat \
+  ../stage_current)),stageautofeedback)
+$(fortran_OBJS): CFLAGS += -fauto-profile=f95.fda
+$(fortran_OBJS): f95.fda
+endif
+
+f95.fda: create_fdas_for_lto1
+   $(PROFILE_MERGER) $(shell ls -ha f95_*.fda) --output_file f95.fda 
+-gcov_version 2
+
 #

 # Define the names for selecting gfortran in LANGUAGES.
 fortran: f951$(exeext)
@@ -272,7 +281,7 @@ fortran.install-info: $(DESTDIR)$(infodir)/gfortran.info
 fortran.install-man: $(DESTDIR)$(man1dir)/$(GFORTRAN_INSTALL_NAME)$(man1ext)
 
 $(DESTDIR)$(man1dir)/$(GFORTRAN_INSTALL_NAME)$(man1ext): doc/gfortran.1 \
-   installdirs
+   installdirs f95*.fda
-rm -f $@
-$(INSTALL_DATA) $< $@
-chmod a-x $@
@@ -293,7 +302,7 @@ fortran.uninstall:
 # We just have to delete files specific to us.
 
 fortran.mostlyclean:
-   -rm -f gfortran$(exeext) gfortran-cross$(exeext) f951$(exeext)
+   -rm -f gfortran$(exeext) gfortran-cross$(exeext) f951$(exeext) 
+f95*.fda
-rm -f fortran/*.o
 
 fortran.clean:
diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in index 
b62ddcbe0dc9..4f9f21cdfc9e 100644
--- a/gcc/lto/Make-lang.in
+++ b/gcc/lto/Make-lang.in
@@ -29,15 +29,11 @@ lto_OBJS = $(LTO_OBJS)  LTO_DUMP_OBJS = lto/lto-lang.o 
lto/lto-object.o attribs.o lto/lto-partition.o lto/lto-symtab.o lto/lto-dump.o 
lto/lto-common.o  lto_dump_OBJS = $(LTO_DUMP_OBJS)
 
-# this is only useful in a LTO bootstrap, but this does not work right -# now. 
Should reenable after this is fixed, but only when LTO bootstrap -# is enabled.
-
-#ifeq ($(if $(wildcard ../stage_current),$(shell cat \ -#  
../stage_current)),stageautofeedback)
-#$(LTO_OBJS): CFLAGS += -fauto-profile=lto1.fda
-#$(LTO_OBJS): lto1.fda
-#endif
+ifeq ($(if $(wildcard ../stage_current),$(shell cat \
+  ../stage_current)),stageautofeedback)
+$(LTO_OBJS): CFLAGS += -fauto-profile=lto1.fda
+$(LTO_OBJS): lto1.fda
+endif
 
 # Rules
 
--
2.46.2

RE: [EXTERNAL] Re: [PATCH] PR117350: Keep assembler name for abstract decls for autofdo

2024-11-08 Thread Eugene Rozenfeld

The patch looks good to me.

-Original Message-
From: Richard Biener  
Sent: Wednesday, November 6, 2024 12:01 AM
To: Andi Kleen 
Cc: Jason Merrill ; Andi Kleen ; 
gcc-patches@gcc.gnu.org; Eugene Rozenfeld ; 
pins...@gmail.com; Andi Kleen 
Subject: [EXTERNAL] Re: [PATCH] PR117350: Keep assembler name for abstract 
decls for autofdo

On Tue, Nov 5, 2024 at 6:08 PM Andi Kleen  wrote:
>
> On Tue, Nov 05, 2024 at 09:47:17AM +0100, Richard Biener wrote:
> > On Tue, Nov 5, 2024 at 2:02 AM Jason Merrill  wrote:
> > >
> > > On 10/31/24 4:40 PM, Andi Kleen wrote:
> > > > From: Andi Kleen 
> > > >
> > > > autofdo looks up inline stacks and tries to match them with the 
> > > > profile data using their symbol name. Make sure all decls that 
> > > > can be in a inline stack have a valid assembler name.
> > > >
> > > > This fixes a bootstrap problem with autoprofiledbootstrap and LTO.
> > >
> > > OK in a week if no other comments.
> >
> > Hmm, but DECL_ABSTRACT_P should be only set on entities that generate no 
> > code.
> >
> > How does autofdo look them up?  Are you sure it's the abstract decl 
> > autofdo wants to lookup?  Or is autofdo processing not serializing 
> > the compilation and thus it affects code generation on parts that 
> > have not been processed yet?
>
>
> autofdo tries to match inlines to an inline stack derived from dwarf.
> So if something appears in dwarf it has to be in its stack. For the 
> test case the abstract entity is in the dwarf stack.
>
> For the inside gcc lookup it walks the BLOCK_SUPERCONTEXT links and 
> looks at BLOCK_ABSTRACT_ORIGIN and ignores everything that has unknown 
> location
>
> Maybe there could be some filtering there, but it would need to be on 
> both sides, which would be a version break for the file format.

Ah, OK - I wasn't aware it uses/requires debug info.  It's still a bit odd that 
only the abstract decl has the correct symbol name.  If it's abstract shouldn't 
it have the same symbol name as an actual entry?  Maybe the symbol name noted 
in the debug info is wrong?  I think the only DECL_ABSTRACT_P decls we have are 
from C++ CTOR/DTOR stuff.

In any case, I withdraw my objection.

Thanks for explaining,
Richard.

> -Andi
>

RE: [EXTERNAL] [PATCH] Update gcc-auto-profile / gen_autofdo_event.py

2024-11-08 Thread Eugene Rozenfeld

The patch looks good to me. Thank you for fixing this, Andi.

-Original Message-
From: Andi Kleen  
Sent: Thursday, October 31, 2024 4:37 PM
To: gcc-patches@gcc.gnu.org
Cc: Eugene Rozenfeld ; Andi Kleen 

Subject: [EXTERNAL] [PATCH] Update gcc-auto-profile / gen_autofdo_event.py

From: Andi Kleen 

- Fix warnings with newer python versions about bad escapes by making all the 
python string raw.
- Add a fallback for using the builtin perf event list if the CPU model number 
is unknown.
- Regenerate the shipped gcc-auto-profile with the changes.

contrib/ChangeLog:

* gen_autofdo_event.py: Convert strings to raw.
Add fallback to using builtin perf event list.

gcc/ChangeLog:

* config/i386/gcc-auto-profile: Regenerate.
---
 contrib/gen_autofdo_event.py | 36 ++--
 gcc/config/i386/gcc-auto-profile | 21 ---
 2 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py index 
4c201943b5c7..4e58a5320fff 100755
--- a/contrib/gen_autofdo_event.py
+++ b/contrib/gen_autofdo_event.py
@@ -112,7 +112,7 @@ for j in u:
 u.close()
 
 if args.script:
-print('''#!/bin/sh
+print(r'''#!/bin/sh
 # Profile workload for gcc profile feedback (autofdo) using Linux perf.
 # Auto generated. To regenerate for new CPUs run  # 
contrib/gen_autofdo_event.py --script --all in gcc source @@ -152,22 +152,26 @@ 
case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
 for event, mod in eventmap.items():
 for m in mod[:-1]:
 print("model*:\ %s|\\" % m)
-print('model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event))
-print('''*)
+print(r'model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event))
+print(r'''*)
+if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; 
then
+E=br_inst_retired.near_taken:p
+else
 echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
update script."
-   exit 1 ;;''')
-print("esac")
-print("set -x")
-print('if ! perf record -e $E -b "$@" ; then')
-print('  # PEBS may not actually be working even if the processor supports 
it')
-print('  # (e.g., in a virtual machine). Trying to run without /p.')
-print('  set +x')
-print('  echo >&2 "Retrying without /p."')
-print('  E="$(echo "${E}" | sed -e \'s/\/p/\//\')"')
-print('  set -x')
-print('  exec perf record -e $E -b "$@"')
-print(' set +x')
-print('fi')
+ exit 1
+fi ;;''')
+print(r"esac")
+print(r"set -x")
+print(r'if ! perf record -e $E -b "$@" ; then')
+print(r'  # PEBS may not actually be working even if the processor 
supports it')
+print(r'  # (e.g., in a virtual machine). Trying to run without /p.')
+print(r'  set +x')
+print(r'  echo >&2 "Retrying without /p."')
+print(r'  E="$(echo "${E}" | sed -e \'s/\/p/\//\ -e s/:p//)"')
+print(r'  set -x')
+print(r'  exec perf record -e $E -b "$@"')
+print(r' set +x')
+print(r'fi')
 
 if cpufound == 0 and not args.all:
 sys.exit('CPU %s not found' % cpu)
diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
index 04f7d35dcc51..528b34e42400 100755
--- a/gcc/config/i386/gcc-auto-profile
+++ b/gcc/config/i386/gcc-auto-profile
@@ -82,17 +82,24 @@ model*:\ 126|\
 model*:\ 167|\
 model*:\ 140|\
 model*:\ 141|\
-model*:\ 143|\
-model*:\ 207|\
 model*:\ 106|\
-model*:\ 108) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;;
+model*:\ 108|\
+model*:\ 173|\
+model*:\ 174) E="cpu/event=0xc4,umask=0x20/$FLAGS" ;;
 model*:\ 134|\
 model*:\ 150|\
-model*:\ 156|\
-model*:\ 190) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;;
+model*:\ 156) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;; model*:\ 143|\ 
+model*:\ 207) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;; model*:\ 190) 
+E="cpu/event=0xc4,umask=0xc0/$FLAGS" ;; model*:\ 190) 
+E="cpu/event=0xc4,umask=0xfe/$FLAGS" ;;
 *)
+if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; 
then
+E=br_inst_retired.near_taken:p
+else
 echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
update script."
-   exit 1 ;;
+ exit 1
+fi ;;
 esac
 set -x
 if ! perf record -e $E -b "$@" ; then
@@ -100,7 +107,7 @@ if ! perf record -e $E -b "$@" ; then
   # (e.g., in a virtual machine). Trying to run without /p.
   set +x
   echo >&2 "Retrying without /p."
-  E="$(echo "${E}" | sed -e 's/\/p/\//')"
+  E="$(echo "${E}" | sed -e \'s/\/p/\//\ -e s/:p//)"
   set -x
   exec perf record -e $E -b "$@"
  set +x
--
2.46.2

[PATCH] Fix setting of call graph node AutoFDO count [PR116743]

2025-01-13 Thread Eugene Rozenfeld

We are initializing both the call graph node count and
the entry block count of the function with the head_count value
from the profile.

Count propagation algorithm may refine the entry block count
and we may end up with a case where the call graph node count
is set to 0 but the entry block count is non-zero. That becomes
a problem because we have this code in execute_fixup_cfg:

profile_count num = node->count;
profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
bool scale = num.initialized_p () && !(num == den);

Here if num is 0 but den is not 0, scale becomes true and we
lose the counts in

if (scale)
  bb->count = bb->count.apply_scale (num, den);

This is what happened the issue reported in PR116743
(a 10% regression in MySQL HAMMERDB tests).
3d9e6767939e9658260e2506e81ec32b37cba041 made an improvement in
AutoFDO count propagation, which caused the mismatch between
the call graph node count (zero) and the entry block count (non-zero)
and subsequent loss of counts as described above.

The fix is to update the call graph node count once we've done count 
propagation.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
PR gcov-profile/116743
* auto-profile.c (afdo_annotate_cfg): Fix mismatch between the 
call graph node count
and the entry block count.
---
gcc/auto-profile.cc | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 5d0e8afb9a1..aa4d1634f01 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1538,8 +1538,6 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts)
   if (s == NULL)
 return;
-  cgraph_node::get (current_function_decl)->count
- = profile_count::from_gcov_type (s->head_count ()).afdo ();
   ENTRY_BLOCK_PTR_FOR_FN (cfun)->count
  = profile_count::from_gcov_type (s->head_count ()).afdo ();
   EXIT_BLOCK_PTR_FOR_FN (cfun)->count = profile_count::zero ().afdo ();
@@ -1578,6 +1576,8 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts)
   /* Calculate, propagate count and probability information on CFG.  */
   afdo_calculate_branch_prob (&annotated_bb);
 }
+  cgraph_node::get(current_function_decl)->count
+  = ENTRY_BLOCK_PTR_FOR_FN(cfun)->count;
   update_max_bb_count ();
   profile_status_for_fn (cfun) = PROFILE_READ;
   if (flag_value_profile_transformations)
--
2.34.1

RE: [EXTERNAL] Re: [PATCH] Fix setting of call graph node AutoFDO count [PR116743]

2025-01-15 Thread Eugene Rozenfeld

I committed the patch to trunk. Is it ok to backport to gcc-12, gcc-13, and 
gcc-14?

-Original Message-
From: Richard Biener  
Sent: Monday, January 13, 2025 11:22 PM
To: Eugene Rozenfeld 
Cc: GCC-Patches-ML ; Jan Hubicka ; 
rvmal...@amazon.com
Subject: [EXTERNAL] Re: [PATCH] Fix setting of call graph node AutoFDO count 
[PR116743]

On Mon, Jan 13, 2025 at 10:47 PM Eugene Rozenfeld 
 wrote:
>
> We are initializing both the call graph node count and
>
> the entry block count of the function with the head_count value
>
> from the profile.
>
>
>
> Count propagation algorithm may refine the entry block count
>
> and we may end up with a case where the call graph node count
>
> is set to 0 but the entry block count is non-zero. That becomes
>
> a problem because we have this code in execute_fixup_cfg:
>
>
>
> profile_count num = node->count;
>
> profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
>
> bool scale = num.initialized_p () && !(num == den);
>
>
>
> Here if num is 0 but den is not 0, scale becomes true and we
>
> lose the counts in
>
>
>
> if (scale)
>
>   bb->count = bb->count.apply_scale (num, den);
>
>
>
> This is what happened the issue reported in PR116743
>
> (a 10% regression in MySQL HAMMERDB tests).
>
> 3d9e6767939e9658260e2506e81ec32b37cba041 made an improvement in
>
> AutoFDO count propagation, which caused the mismatch between
>
> the call graph node count (zero) and the entry block count (non-zero)
>
> and subsequent loss of counts as described above.
>
>
>
> The fix is to update the call graph node count once we've done count 
> propagation.
>
>
>
> Tested on x86_64-pc-linux-gnu.

OK.

Thanks,
Richard.

>
>
> gcc/ChangeLog:
>
> PR gcov-profile/116743
>
> * auto-profile.c (afdo_annotate_cfg): Fix mismatch 
> between the call graph node count
>
> and the entry block count.
>
> ---
>
> gcc/auto-profile.cc | 4 ++--
>
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>
>
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
>
> index 5d0e8afb9a1..aa4d1634f01 100644
>
> --- a/gcc/auto-profile.cc
>
> +++ b/gcc/auto-profile.cc
>
> @@ -1538,8 +1538,6 @@ afdo_annotate_cfg (const stmt_set 
> &promoted_stmts)
>
>if (s == NULL)
>
>  return;
>
> -  cgraph_node::get (current_function_decl)->count
>
> - = profile_count::from_gcov_type (s->head_count ()).afdo ();
>
>ENTRY_BLOCK_PTR_FOR_FN (cfun)->count
>
>   = profile_count::from_gcov_type (s->head_count ()).afdo ();
>
>EXIT_BLOCK_PTR_FOR_FN (cfun)->count = profile_count::zero ().afdo 
> ();
>
> @@ -1578,6 +1576,8 @@ afdo_annotate_cfg (const stmt_set 
> &promoted_stmts)
>
>/* Calculate, propagate count and probability information on 
> CFG.  */
>
>afdo_calculate_branch_prob (&annotated_bb);
>
>  }
>
> +  cgraph_node::get(current_function_decl)->count
>
> +  = ENTRY_BLOCK_PTR_FOR_FN(cfun)->count;
>
>update_max_bb_count ();
>
>profile_status_for_fn (cfun) = PROFILE_READ;
>
>if (flag_value_profile_transformations)
>
> --
>
> 2.34.1
>
>

RE: [EXTERNAL] [PATCH] Update perf auto profile script

2023-06-05 Thread Eugene Rozenfeld via Gcc-patches

Ok for trunk. Thank you for updating this!

Eugene

-Original Message-
From: Gcc-patches  On 
Behalf Of Andi Kleen via Gcc-patches
Sent: Tuesday, May 30, 2023 4:08 AM
To: gcc-patches@gcc.gnu.org
Cc: Andi Kleen 
Subject: [EXTERNAL] [PATCH] Update perf auto profile script

- Fix gen_autofdo_event: The download URL for the Intel Perfmon Event
  list has changed, as well as the JSON format.
  Also it now uses pattern matching to match CPUs. Update the script to support 
all of this.
- Regenerate gcc-auto-profile with the latest published Intel model
  numbers, so it works with recent systems.
- So far it's still broken on hybrid systems
---
 contrib/gen_autofdo_event.py | 7 ---
 gcc/config/i386/gcc-auto-profile | 9 -
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py index 
ac23b83888db..533c706c090b 100755
--- a/contrib/gen_autofdo_event.py
+++ b/contrib/gen_autofdo_event.py
@@ -32,8 +32,9 @@ import json
 import argparse
 import collections
 import os
+import fnmatch

-baseurl = "https://download.01.org/perfmon";
+baseurl = "https://raw.githubusercontent.com/intel/perfmon/main";

 target_events = ('BR_INST_RETIRED.NEAR_TAKEN',
  'BR_INST_EXEC.TAKEN',
@@ -74,7 +75,7 @@ def get_cpustr():
 def find_event(eventurl, model):
 print("Downloading", eventurl, file = sys.stderr)
 u = urllib.request.urlopen(eventurl)
-events = json.loads(u.read())
+events = json.loads(u.read())["Events"]
 u.close()

 found = 0
@@ -102,7 +103,7 @@ found = 0
 cpufound = 0
 for j in u:
 n = j.rstrip().decode().split(',')
-if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core":
+if len(n) >= 4 and (args.all or fnmatch.fnmatch(cpu, n[0])) and n[3] == 
"core":
 components = n[0].split("-")
 model = components[2]
 model = int(model, 16)
diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
index 5ab224b041b9..04f7d35dcc51 100755
--- a/gcc/config/i386/gcc-auto-profile
+++ b/gcc/config/i386/gcc-auto-profile
@@ -43,8 +43,10 @@ model*:\ 47|\
 model*:\ 37|\
 model*:\ 44) E="cpu/event=0x88,umask=0x40/$FLAGS" ;;  model*:\ 55|\
+model*:\ 74|\
 model*:\ 77|\
 model*:\ 76|\
+model*:\ 90|\
 model*:\ 92|\
 model*:\ 95|\
 model*:\ 87|\
@@ -75,14 +77,19 @@ model*:\ 165|\
 model*:\ 166|\
 model*:\ 85|\
 model*:\ 85) E="cpu/event=0xC4,umask=0x20/p$FLAGS" ;;
+model*:\ 125|\
 model*:\ 126|\
+model*:\ 167|\
 model*:\ 140|\
 model*:\ 141|\
 model*:\ 143|\
+model*:\ 207|\
 model*:\ 106|\
 model*:\ 108) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;;  model*:\ 134|\ 
-model*:\ 150) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;;
+model*:\ 150|\
+model*:\ 156|\
+model*:\ 190) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;;
 *)
 echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
update script."
exit 1 ;;
--
2.40.1

RE: [PING][PATCH] Add instruction level discriminator support.

2022-08-03 Thread Eugene Rozenfeld via Gcc-patches

One more ping for this patch 
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596065.html

CC Jason since this changes discriminators emitted in dwarf.

Thanks,

Eugene

-Original Message-
From: Eugene Rozenfeld 
Sent: Monday, June 27, 2022 12:45 PM
To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan Hubicka 

Subject: RE: [PING][PATCH] Add instruction level discriminator support.

Another ping for 
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596065.html .

I got a review from Andi 
(https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596549.html) but I also 
need a review from someone who can approve the changes.

Thanks,

Eugene

-Original Message-
From: Eugene Rozenfeld 
Sent: Friday, June 10, 2022 12:03 PM
To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan Hubicka 

Subject: [PING][PATCH] Add instruction level discriminator support.

Hello,

I'd like to ping this patch: 
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596065.html

Thanks,

Eugene

-Original Message-
From: Gcc-patches  On 
Behalf Of Eugene Rozenfeld via Gcc-patches
Sent: Thursday, June 02, 2022 12:22 AM
To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan Hubicka 

Subject: [EXTERNAL] [PATCH] Add instruction level discriminator support.

This is the first in a series of patches to enable discriminator support in 
AutoFDO.

This patch switches to tracking discriminators per statement/instruction 
instead of per basic block. Tracking per basic block was problematic since not 
all statements in a basic block needed a discriminator and, also, later 
optimizations could move statements between basic blocks making correlation 
during AutoFDO compilation unreliable. Tracking per statement also allows us to 
assign different discriminators to multiple function calls in the same basic 
block. A subsequent patch will add that support.

The idea of this patch is based on commit 
4c311d95cf6d9519c3c20f641cc77af7df491fdf
by Dehao Chen in vendors/google/heads/gcc-4_8 but uses a slightly different 
approach. In Dehao's work special (normally unused) location ids and side 
tables were used to keep track of locations with discriminators. Things have 
changed since then and I don't think we have unused location ids anymore. 
Instead, I made discriminators a part of ad-hoc locations.

The difference from Dehao's work also includes support for discriminator 
reading/writing in lto streaming and in modules.

Tested on x86_64-pc-linux-gnu.


0001-Add-instruction-level-discriminator-support.patch
Description: 0001-Add-instruction-level-discriminator-support.patch

[committed] MAINTAINERS: Add myself as AutoFDO maintainer

2022-08-04 Thread Eugene Rozenfeld via Gcc-patches

ChangeLog:

* MAINTAINERS: Add myself as AutoFDO maintainer.
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1a37f4419b9..02ced0c43aa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -239,6 +239,7 @@ tree-ssaAndrew MacLeod  

 tree browser/unparser  Sebastian Pop   
 scev, data dependence  Sebastian Pop   
 profile feedback   Jan Hubicka 
+AutoFDOEugene Rozenfeld
 reload Ulrich Weigand  
 RTL optimizers Eric Botcazou   
 instruction combiner   Segher Boessenkool  
@@ -603,7 +604,6 @@ Craig Rodrigues 

 Erven Rohou
 Ira Rosen  
 Yvan Roux  
-Eugene Rozenfeld   
 Silvius Rus
 Matthew Sachs  
 Ankur Saini
-- 
2.25.1

RE: [EXTERNAL] [PATCH] contrib: modernize gen_autofdo_event.py

2022-08-05 Thread Eugene Rozenfeld via Gcc-patches

The changes look good to me. Also adding Andi, the author of the script.

Eugene

-Original Message-
From: Gcc-patches  On 
Behalf Of Xi Ruoyao via Gcc-patches
Sent: Sunday, June 26, 2022 11:15 PM
To: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] [PATCH] contrib: modernize gen_autofdo_event.py

Python 2 has been EOL'ed for two years.  egrep has been deprecated for many 
years and the next grep release will start to print warning if it is used.

-E option may be unsupported by some non-POSIX grep implementations, but 
gcc-auto-profile won't work on non-Linux systems anyway.

contrib/ChangeLog:

* gen_autofdo_event.py: Port to Python 3, and use grep -E
instead of egrep.

gcc/ChangeLog:

* config/i386/gcc-auto-profile: Regenerate.
---
 contrib/gen_autofdo_event.py | 80 
 gcc/config/i386/gcc-auto-profile | 31 +++--
 2 files changed, 57 insertions(+), 54 deletions(-)

diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py index 
1eb6f1d6d85..7da2876530d 100755
--- a/contrib/gen_autofdo_event.py
+++ b/contrib/gen_autofdo_event.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python
+#!/usr/bin/python3
 # Generate Intel taken branches Linux perf event script for autofdo profiling.
 
 # Copyright (C) 2016 Free Software Foundation, Inc.
@@ -26,18 +26,19 @@
 # Requires internet (https) access. This may require setting up a proxy  # 
with export https_proxy=...
 #
-import urllib2
+import urllib.request
 import sys
 import json
 import argparse
 import collections
+import os
 
 baseurl = "https://download.01.org/perfmon";
 
-target_events = (u'BR_INST_RETIRED.NEAR_TAKEN',
- u'BR_INST_EXEC.TAKEN',
- u'BR_INST_RETIRED.TAKEN_JCC',
- u'BR_INST_TYPE_RETIRED.COND_TAKEN')
+target_events = ('BR_INST_RETIRED.NEAR_TAKEN',
+ 'BR_INST_EXEC.TAKEN',
+ 'BR_INST_RETIRED.TAKEN_JCC',
+ 'BR_INST_TYPE_RETIRED.COND_TAKEN')
 
 ap = argparse.ArgumentParser()
 ap.add_argument('--all', '-a', help='Print for all CPUs', action='store_true') 
@@ -71,47 +72,46 @@ def get_cpustr():
 return "%s-%d-%X" % tuple(cpu)[:3]
 
 def find_event(eventurl, model):
-print >>sys.stderr, "Downloading", eventurl
-u = urllib2.urlopen(eventurl)
+print("Downloading", eventurl, file = sys.stderr)
+u = urllib.request.urlopen(eventurl)
 events = json.loads(u.read())
 u.close()
 
 found = 0
 for j in events:
-if j[u'EventName'] in target_events:
-event = "cpu/event=%s,umask=%s/" % (j[u'EventCode'], j[u'UMask'])
-if u'PEBS' in j and j[u'PEBS'] > 0:
+if j['EventName'] in target_events:
+event = "cpu/event=%s,umask=%s/" % (j['EventCode'], j['UMask'])
+if 'PEBS' in j and int(j['PEBS']) > 0:
 event += "p"
 if args.script:
 eventmap[event].append(model)
 else:
-print j[u'EventName'], "event for model", model, "is", event
+print(j['EventName'], "event for model", model, "is", 
+ event)
 found += 1
 return found
 
 if not args.all:
-cpu = get_cpu_str()
+cpu = get_cpustr()
 if not cpu:
 sys.exit("Unknown CPU type")
 
 url = baseurl + "/mapfile.csv"
-print >>sys.stderr, "Downloading", url
-u = urllib2.urlopen(url)
+print("Downloading", url, file = sys.stderr) u = 
+urllib.request.urlopen(url)
 found = 0
 cpufound = 0
 for j in u:
-n = j.rstrip().split(',')
+n = j.rstrip().decode().split(',')
 if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core":
-if args.all:
-components = n[0].split("-")
-model = components[2]
-model = int(model, 16)
+components = n[0].split("-")
+model = components[2]
+model = int(model, 16)
 cpufound += 1
 found += find_event(baseurl + n[2], model)
 u.close()
 
 if args.script:
-print '''#!/bin/sh
+print('''#!/bin/sh
 # Profile workload for gcc profile feedback (autofdo) using Linux perf.
 # Auto generated. To regenerate for new CPUs run  # 
contrib/gen_autofdo_event.py --script --all in gcc source @@ -146,27 +146,27 @@ 
if grep -q hypervisor /proc/cpuinfo ; then
   echo >&2 "Warning: branch profiling may not be functional in VMs"
 fi
 
-case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
-  egrep "^model\s*:" /proc/cpuinfo | head -n1` in'''
-for event, mod in eventmap.iteritems():
+case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
+  grep -E "^model\s*:" /proc/cpuinfo | head -n1` in''')
+for event, mod in eventmap.items():
 for m in mod[:-1]:
-print "model*:\ %s|\\" % m
-print 'model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event)
-print '''*)
+print("model*:\ %s|\\" % m)
+print('model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event))
+print('''*)
 echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py

[PATCH] Fix collection and processing of autoprofile data for target libs

2023-06-27 Thread Eugene Rozenfeld via Gcc-patches

cc1, cc1plus, and lto built during STAGEautoprofile need to be built with
debug info since they are used to build target libs. -gtoggle was
turning off debug info for this stage.

create_gcov should be passed prev-gcc/cc1, prev-gcc/cc1plus, and prev-gcc/lto
instead of stage1-gcc/cc1, stage1-gcc/cc1plus, and stage1-gcc/lto when
processing profile data collected while building target libraries.

Tested on x86_64-pc-linux-gnu.

ChangeLog:

* Makefile.in: Remove -gtoggle for STAGEautoprofile
* Makefile.tpl: Remove -gtoggle for STAGEautoprofile

gcc/c/ChangeLog:

* c/Make-lang.in: Pass correct stage cc1 when processing
profile data collected while building target libraries

gcc/cp/ChangeLog:

* cp/Make-lang.in: Pass correct stage cc1plus when processing
profile data collected while building target libraries

gcc/lto/ChangeLog:

* lto/Make-lang.in: Pass correct stage lto when processing
profile data collected while building target libraries
---
 Makefile.in  | 2 +-
 Makefile.tpl | 2 +-
 gcc/c/Make-lang.in   | 4 ++--
 gcc/cp/Make-lang.in  | 4 ++--
 gcc/lto/Make-lang.in | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index b559454cc90..61e5faf550f 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -635,7 +635,7 @@ STAGEtrain_TFLAGS = $(filter-out 
-fchecking=1,$(STAGE3_TFLAGS))
 STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use 
-fprofile-reproducible=parallel-runs
 STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
 
-STAGEautoprofile_CFLAGS = $(STAGE2_CFLAGS) -g
+STAGEautoprofile_CFLAGS = $(filter-out -gtoggle,$(STAGE2_CFLAGS)) -g
 STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS)
 
 STAGEautofeedback_CFLAGS = $(STAGE3_CFLAGS)
diff --git a/Makefile.tpl b/Makefile.tpl
index 6bcee3021c9..3a5b7ed3c92 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -558,7 +558,7 @@ STAGEtrain_TFLAGS = $(filter-out 
-fchecking=1,$(STAGE3_TFLAGS))
 STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use 
-fprofile-reproducible=parallel-runs
 STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
 
-STAGEautoprofile_CFLAGS = $(STAGE2_CFLAGS) -g
+STAGEautoprofile_CFLAGS = $(filter-out -gtoggle,$(STAGE2_CFLAGS)) -g
 STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS)
 
 STAGEautofeedback_CFLAGS = $(STAGE3_CFLAGS)
diff --git a/gcc/c/Make-lang.in b/gcc/c/Make-lang.in
index 20840aceab6..79bc0dfd1cf 100644
--- a/gcc/c/Make-lang.in
+++ b/gcc/c/Make-lang.in
@@ -113,10 +113,10 @@ create_fdas_for_cc1: ../stage1-gcc/cc1$(exeext) 
../prev-gcc/$(PERF_DATA)
  echo $$perf_path; \
  if [ -f $$perf_path ]; then \
profile_name=cc1_$$component_in_prev_target.fda; \
-   $(CREATE_GCOV) -binary ../stage1-gcc/cc1$(exeext) -gcov 
$$profile_name -profile $$perf_path -gcov_version 2; \
+   $(CREATE_GCOV) -binary ../prev-gcc/cc1$(exeext) -gcov 
$$profile_name -profile $$perf_path -gcov_version 2; \
  fi; \
done;
-#

+#
 # Build hooks:
 
 c.info:
diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in
index c08ee91447e..ba5e8766e99 100644
--- a/gcc/cp/Make-lang.in
+++ b/gcc/cp/Make-lang.in
@@ -211,10 +211,10 @@ create_fdas_for_cc1plus: ../stage1-gcc/cc1plus$(exeext) 
../prev-gcc/$(PERF_DATA)
  echo $$perf_path; \
  if [ -f $$perf_path ]; then \
profile_name=cc1plus_$$component_in_prev_target.fda; \
-   $(CREATE_GCOV) -binary ../stage1-gcc/cc1plus$(exeext) -gcov 
$$profile_name -profile $$perf_path -gcov_version 2; \
+   $(CREATE_GCOV) -binary ../prev-gcc/cc1plus$(exeext) -gcov 
$$profile_name -profile $$perf_path -gcov_version 2; \
  fi; \
done;
-#

+#
 # Build hooks:
 
 c++.all.cross: g++-cross$(exeext)
diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in
index 4f6025100a3..98aa9f4cc39 100644
--- a/gcc/lto/Make-lang.in
+++ b/gcc/lto/Make-lang.in
@@ -130,7 +130,7 @@ create_fdas_for_lto1: ../stage1-gcc/lto1$(exeext) 
../prev-gcc/$(PERF_DATA)
  echo $$perf_path; \
  if [ -f $$perf_path ]; then \
profile_name=lto1_$$component_in_prev_target.fda; \
-   $(CREATE_GCOV) -binary ../stage1-gcc/lto1$(exeext) -gcov 
$$profile_name -profile $$perf_path -gcov_version 2; \
+   $(CREATE_GCOV) -binary ../prev-gcc/lto1$(exeext) -gcov 
$$profile_name -profile $$perf_path -gcov_version 2; \
  fi; \
done;
 
-- 
2.25.1

[PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap

2023-06-29 Thread Eugene Rozenfeld via Gcc-patches

When we collect just user events for autofdo with lbr we get some events where 
branch
sources are kernel addresses and branch targets are user addresses. Without 
kernel MMAP
events create_gcov can't make sense of kernel addresses. Currently create_gcov 
fails if
it can't map at least 95% of events. We sometimes get below this threshold with 
just
user events. The change is to collect both user events and kernel events.

Tested on x86_64-pc-linux-gnu.

ChangeLog:

* Makefile.in: Collect both kernel and user events for autofdo
* Makefile.tpl: Collect both kernel and user events for autofdo

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Collect both kernel and user events for 
autofdo
---
 Makefile.in   | 2 +-
 Makefile.tpl  | 2 +-
 gcc/testsuite/lib/target-supports.exp | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index f19a9db621e..04307ca561b 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -404,7 +404,7 @@ MAKEINFO = @MAKEINFO@
 EXPECT = @EXPECT@
 RUNTEST = @RUNTEST@
 
-AUTO_PROFILE = gcc-auto-profile -c 1000
+AUTO_PROFILE = gcc-auto-profile --all -c 1000
 
 # This just becomes part of the MAKEINFO definition passed down to
 # sub-makes.  It lets flags be given on the command line while still
diff --git a/Makefile.tpl b/Makefile.tpl
index 3a5b7ed3c92..d0fe7e2fb77 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -407,7 +407,7 @@ MAKEINFO = @MAKEINFO@
 EXPECT = @EXPECT@
 RUNTEST = @RUNTEST@
 
-AUTO_PROFILE = gcc-auto-profile -c 1000
+AUTO_PROFILE = gcc-auto-profile --all -c 1000
 
 # This just becomes part of the MAKEINFO definition passed down to
 # sub-makes.  It lets flags be given on the command line while still
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4d04df2a709..b16853d76df 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -704,7 +704,7 @@ proc check_effective_target_keeps_null_pointer_checks { } {
 # this allows parallelism of 16 and higher of parallel gcc-auto-profile
 proc profopt-perf-wrapper { } {
 global srcdir
-return "$srcdir/../config/i386/gcc-auto-profile -m8 "
+return "$srcdir/../config/i386/gcc-auto-profile --all -m8 "
 }
 
 # Return true if profiling is supported on the target.
-- 
2.25.1

RE: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap

2023-06-30 Thread Eugene Rozenfeld via Gcc-patches

I don't run this with elevated privileges but I set 
/proc/sys/kernel/kptr_restrict to 0. Setting that does require elevated 
privileges.

If that's not acceptable, the only fix I can think of is to make that event 
mapping threshold percentage a parameter to create_gcov and pass something low 
enough. 80% instead of the current threshold of 95% should work, although it's 
a bit fragile.

Eugene

-Original Message-
From: Sam James  
Sent: Friday, June 30, 2023 1:59 AM
To: Richard Biener 
Cc: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo 
tests and autoprofiledbootstrap

[You don't often get email from s...@gentoo.org. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]

Richard Biener via Gcc-patches  writes:

> On Fri, Jun 30, 2023 at 7:28 AM Eugene Rozenfeld via Gcc-patches 
>  wrote:
>>
>> When we collect just user events for autofdo with lbr we get some 
>> events where branch sources are kernel addresses and branch targets 
>> are user addresses. Without kernel MMAP events create_gcov can't make 
>> sense of kernel addresses. Currently create_gcov fails if it can't 
>> map at least 95% of events. We sometimes get below this threshold with just 
>> user events. The change is to collect both user events and kernel events.
>
> Does this require elevated privileges?  Can we instead "fix" create_gcov here?

Right, requiring privileges for this is going to be a no-go for a lot of 
builders. In a distro context, for example, it means we can't consider autofdo 
at all.

RE: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap

2023-06-30 Thread Eugene Rozenfeld via Gcc-patches

I also set /proc/sys/kernel/perf_event_paranoid to 1 instead of the default 2.

-Original Message-
From: Gcc-patches  On 
Behalf Of Eugene Rozenfeld via Gcc-patches
Sent: Friday, June 30, 2023 2:44 PM
To: Sam James ; Richard Biener 
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for 
autofdo tests and autoprofiledbootstrap

I don't run this with elevated privileges but I set 
/proc/sys/kernel/kptr_restrict to 0. Setting that does require elevated 
privileges.

If that's not acceptable, the only fix I can think of is to make that event 
mapping threshold percentage a parameter to create_gcov and pass something low 
enough. 80% instead of the current threshold of 95% should work, although it's 
a bit fragile.

Eugene

-Original Message-
From: Sam James 
Sent: Friday, June 30, 2023 1:59 AM
To: Richard Biener 
Cc: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo 
tests and autoprofiledbootstrap

[You don't often get email from s...@gentoo.org. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]

Richard Biener via Gcc-patches  writes:

> On Fri, Jun 30, 2023 at 7:28 AM Eugene Rozenfeld via Gcc-patches 
>  wrote:
>>
>> When we collect just user events for autofdo with lbr we get some 
>> events where branch sources are kernel addresses and branch targets 
>> are user addresses. Without kernel MMAP events create_gcov can't make 
>> sense of kernel addresses. Currently create_gcov fails if it can't 
>> map at least 95% of events. We sometimes get below this threshold with just 
>> user events. The change is to collect both user events and kernel events.
>
> Does this require elevated privileges?  Can we instead "fix" create_gcov here?

Right, requiring privileges for this is going to be a no-go for a lot of 
builders. In a distro context, for example, it means we can't consider autofdo 
at all.

RE: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap

2023-07-05 Thread Eugene Rozenfeld via Gcc-patches

There is no warning and perf /uk succeeds when kptr_restrict is set to 1 and 
perf_event_paranoid set to 2. However, create_gcov may fail since it won't be 
able to understand kernel addresses and it requires at least 95% of events to 
be successfully mapped.

If I set both kptr_restrict and perf_event_paranoid to 1, then I do get 
warnings from perf (but it still succeeds and exits with a 0 code). And, of 
course create_gcov will also fail to map some events since it won't understand 
kernel addresses.

WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict and /proc/sys/kernel/perf_event_paranoid.

Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.

Samples in kernel modules won't be resolved at all.

If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.

Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.037 MB 
/home/erozen/gcc1_objdir/gcc/testsuite/gcc/indir-call-prof.perf.data (86 
samples) ]

Eugene

-Original Message-
From: Richard Biener  
Sent: Monday, July 3, 2023 12:47 AM
To: Eugene Rozenfeld 
Cc: Sam James ; gcc-patches@gcc.gnu.org
Subject: Re: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for 
autofdo tests and autoprofiledbootstrap

On Sat, Jul 1, 2023 at 12:05 AM Eugene Rozenfeld 
 wrote:
>
> I also set /proc/sys/kernel/perf_event_paranoid to 1 instead of the default 2.

Does the perf attempt fail when the privileges are not adjusted and you specify 
--all?  I see it adds /uk as flags, when I do

> perf record -e instructions//uk ./a.out

it doesn't complain in any way with

> cat /proc/sys/kernel/kptr_restrict
1
> cat /proc/sys/kernel/perf_event_paranoid
2

so in case the 'kernel' side is simply ignored when profiling there isn't 
permitted/possible then I guess the patch is OK?

Can you confirm?

Thanks,
Richard.

> -Original Message-
> From: Gcc-patches 
>  On Behalf Of 
> Eugene Rozenfeld via Gcc-patches
> Sent: Friday, June 30, 2023 2:44 PM
> To: Sam James ; Richard Biener 
> 
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [EXTERNAL] Re: [PATCH] Collect both user and kernel 
> events for autofdo tests and autoprofiledbootstrap
>
> I don't run this with elevated privileges but I set 
> /proc/sys/kernel/kptr_restrict to 0. Setting that does require elevated 
> privileges.
>
> If that's not acceptable, the only fix I can think of is to make that event 
> mapping threshold percentage a parameter to create_gcov and pass something 
> low enough. 80% instead of the current threshold of 95% should work, although 
> it's a bit fragile.
>
> Eugene
>
> -Original Message-
> From: Sam James 
> Sent: Friday, June 30, 2023 1:59 AM
> To: Richard Biener 
> Cc: Eugene Rozenfeld ; 
> gcc-patches@gcc.gnu.org
> Subject: [EXTERNAL] Re: [PATCH] Collect both user and kernel events 
> for autofdo tests and autoprofiledbootstrap
>
> [You don't often get email from s...@gentoo.org. Learn why this is 
> important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Richard Biener via Gcc-patches  writes:
>
> > On Fri, Jun 30, 2023 at 7:28 AM Eugene Rozenfeld via Gcc-patches 
> >  wrote:
> >>
> >> When we collect just user events for autofdo with lbr we get some 
> >> events where branch sources are kernel addresses and branch targets 
> >> are user addresses. Without kernel MMAP events create_gcov can't 
> >> make sense of kernel addresses. Currently create_gcov fails if it 
> >> can't map at least 95% of events. We sometimes get below this threshold 
> >> with just user events. The change is to collect both user events and 
> >> kernel events.
> >
> > Does this require elevated privileges?  Can we instead "fix" create_gcov 
> > here?
>
> Right, requiring privileges for this is going to be a no-go for a lot of 
> builders. In a distro context, for example, it means we can't consider 
> autofdo at all.

RE: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build

2023-03-14 Thread Eugene Rozenfeld via Gcc-patches

Hi Jeff,

I revived profile_merger tool in http://github.com/google/autofdo and re-worked 
the patch to merge profiles for compiling the libraries.

Please take a look at the attached patch.

Thanks,

Eugene

-Original Message-
From: Jeff Law  
Sent: Tuesday, November 22, 2022 10:16 PM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org; 
Andi Kleen 
Subject: Re: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build

[You don't often get email from jeffreya...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

On 11/22/22 14:20, Eugene Rozenfeld wrote:
> I took another look at this. We actually collect perf data when building the 
> libraries. So, we have ./prev-gcc/perf.data, ./prev-libcpp/perf.data, 
> ./prev-libiberty/perf.data, etc. But when creating gcov data for  
> -fauto-profile build of cc1plus or cc1 we only use ./prev-gcc/perf.data . So, 
> a better solution would be either having a single perf.data for all builds 
> (gcc and libraries) or merging perf.data files before attempting 
> autostagefeedback. What would you recommend?

ISTM that if neither approach loses data, then they're functionally equivalent 
-- meaning that we can select whichever is easier to wire into our build system.

A single perf.data might serialize the build.  So perhaps separate, then merge 
right before autostagefeedback.

But I'm willing to go with whatever you think is best.

Jeff

0001-Fix-autoprofiledbootstrap-build.patch
Description: 0001-Fix-autoprofiledbootstrap-build.patch

RE: [EXTERNAL] Re: [PATCH][PUSHED] Fix cfg maintenance after inlining in AutoFDO

2023-05-09 Thread Eugene Rozenfeld via Gcc-patches

>>  free_dominance_info (CDI_DOMINATORS); @@ -1674,7 +1677,7 @@ 
>> auto_profile (void)
>>  pop_cfun ();
>>}
>>
>> -  return TODO_rebuild_cgraph_edges;
>> +  return 0;

>This change isn't mentioned - was it accidential?

No, it wasn't accidental. There is no reason to return 
TODO_rebuild_cgraph_edges since we called cgraph_edge::rebuild_edges () after 
each early_inline ().
Here is more context before that diff:

todo |= early_inline ();
autofdo::afdo_annotate_cfg (promoted_stmts);
compute_function_frequency ();

/* Local pure-const may imply need to fixup the cfg.  */
todo |= execute_fixup_cfg ();
if (todo & TODO_cleanup_cfg)
  cleanup_tree_cfg ();

free_dominance_info (CDI_DOMINATORS);
free_dominance_info (CDI_POST_DOMINATORS);
cgraph_edge::rebuild_edges ();
compute_fn_summary (cgraph_node::get (current_function_decl), true);
pop_cfun ();
  }

  return 0;

Thanks,

Eugene

-Original Message-
From: Richard Biener  
Sent: Monday, May 8, 2023 11:40 PM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH][PUSHED] Fix cfg maintenance after inlining in 
AutoFDO

On Tue, May 9, 2023 at 12:27 AM Eugene Rozenfeld via Gcc-patches 
 wrote:
>
> Todo from early_inliner needs to be propagated so that 
> cleanup_tree_cfg () is called if necessary.
>
> This bug was causing an assert in get_loop_body during ipa-sra in 
> autoprofiledbootstrap build since loops weren't fixed up and one of 
> the loops had num_nodes set to 0.
>
> Tested on x86_64-pc-linux-gnu.
>
> gcc/ChangeLog:
>
> * auto-profile.cc (auto_profile): Check todo from early_inline
> to see if cleanup_tree_vfg needs to be called.

_cfg

> (early_inline): Return todo from early_inliner.
> ---
>  gcc/auto-profile.cc | 21 -
>  1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index 
> 360c42c4b89..e3af3555e75 100644
> --- a/gcc/auto-profile.cc
> +++ b/gcc/auto-profile.cc
> @@ -1589,13 +1589,14 @@ afdo_annotate_cfg (const stmt_set 
> &promoted_stmts)
>
>  /* Wrapper function to invoke early inliner.  */
>
> -static void
> +static unsigned int
>  early_inline ()
>  {
>compute_fn_summary (cgraph_node::get (current_function_decl), 
> true);
> -  unsigned todo = early_inliner (cfun);
> +  unsigned int todo = early_inliner (cfun);
>if (todo & TODO_update_ssa_any)
>  update_ssa (TODO_update_ssa);
> +  return todo;
>  }
>
>  /* Use AutoFDO profile to annoate the control flow graph.
> @@ -1651,20 +1652,22 @@ auto_profile (void)
> function before annotation, so the profile inside bar@loc_foo2
> will be useful.  */
>  autofdo::stmt_set promoted_stmts;
> +unsigned int todo = 0;
>  for (int i = 0; i < 10; i++)
>{
> -if (!flag_value_profile_transformations
> -|| !autofdo::afdo_vpt_for_early_inline (&promoted_stmts))
> -  break;
> -early_inline ();
> +   if (!flag_value_profile_transformations
> +   || !autofdo::afdo_vpt_for_early_inline (&promoted_stmts))
> + break;
> +   todo |= early_inline ();
>}
>
> -early_inline ();
> +todo |= early_inline ();
>  autofdo::afdo_annotate_cfg (promoted_stmts);
>  compute_function_frequency ();
>
>  /* Local pure-const may imply need to fixup the cfg.  */
> -if (execute_fixup_cfg () & TODO_cleanup_cfg)
> +todo |= execute_fixup_cfg ();
> +if (todo & TODO_cleanup_cfg)
>cleanup_tree_cfg ();
>
>  free_dominance_info (CDI_DOMINATORS); @@ -1674,7 +1677,7 @@ 
> auto_profile (void)
>  pop_cfun ();
>}
>
> -  return TODO_rebuild_cgraph_edges;
> +  return 0;

This change isn't mentioned - was it accidential?

Otherwise looks OK.

Thanks,
Richard.

>  }
>  } /* namespace autofdo.  */
>
> --
> 2.25.1
>

[PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build

2023-05-09 Thread Eugene Rozenfeld via Gcc-patches

autoprofiledbootstrap build produces new warnings since inlining decisions
are different from other builds. This patch contains fixes and workarounds
for those warnings.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:

* config/i386/i386-expand.cc (expand_vec_perm_interleave2): Work around
-Wstringop-overflow false positive during autoprofiledbootstrap
* ipa-devirt.cc (debug_tree_odr_name): Fix for -Wformat-overflow
warning during autoprofiledbootstrap
* lra-eliminations.cc (setup_can_eliminate): Work around
-Wmaybe-uninitialized false positive during autoprofiledbootstrap
* opts-common.cc (candidates_list_and_hint): Work around
-Wstringop-overflow false positive during autoprofiledbootstrap
* tree-ssa-ccp.cc (bit_value_unop): Work around -Wmaybe-uninitialized
false positive during autoprofiledbootstrap
* wide-int.h (wi::copy): Work around -Wmaybe-uninitialized false
positive during autoprofiledbootstrap
---
 gcc/config/i386/i386-expand.cc | 11 +++
 gcc/ipa-devirt.cc  |  3 ++-
 gcc/lra-eliminations.cc| 11 +++
 gcc/opts-common.cc |  1 +
 gcc/tree-ssa-ccp.cc| 11 +++
 gcc/wide-int.h | 11 +++
 6 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 634fe61ba79..be9f912775b 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -20419,6 +20419,13 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
 
 static bool expand_vec_perm_interleave3 (struct expand_vec_perm_d *d);
 
+/* Work around -Wstringop-overflow false positive during 
autoprofiledbootstrap.  */
+
+# if GCC_VERSION >= 7001
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wstringop-overflow"
+#endif
+
 /* A subroutine of ix86_expand_vec_perm_const_1.  Try to simplify
a two vector permutation into a single vector permutation by using
an interleave operation to merge the vectors.  */
@@ -20737,6 +20744,10 @@ expand_vec_perm_interleave2 (struct expand_vec_perm_d 
*d)
   return true;
 }
 
+# if GCC_VERSION >= 7001
+#pragma GCC diagnostic pop
+#endif
+
 /* A subroutine of ix86_expand_vec_perm_const_1.  Try to simplify
a single vector cross-lane permutation into vpermq followed
by any of the single insn permutations.  */
diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc
index 819860258d1..36ea266e834 100644
--- a/gcc/ipa-devirt.cc
+++ b/gcc/ipa-devirt.cc
@@ -4033,7 +4033,8 @@ debug_tree_odr_name (tree type, bool demangle)
   odr = cplus_demangle (odr, opts);
 }
 
-  fprintf (stderr, "%s\n", odr);
+  if (odr != NULL)
+fprintf (stderr, "%s\n", odr);
 }
 
 /* Register ODR enum so we later stream record about its values.  */
diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 4220639..05e2a7e0d68 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -138,6 +138,13 @@ lra_debug_elim_table (void)
   print_elim_table (stderr);
 }
 
+/* Work around -Wmaybe-uninitialized false positive during 
autoprofiledbootstrap.  */
+
+# if GCC_VERSION >= 4007
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
+#endif
+
 /* Setup possibility of elimination in elimination table element EP to
VALUE.  Setup FRAME_POINTER_NEEDED if elimination from frame
pointer to stack pointer is not possible anymore.  */
@@ -152,6 +159,10 @@ setup_can_eliminate (class lra_elim_table *ep, bool value)
 REGNO_POINTER_ALIGN (HARD_FRAME_POINTER_REGNUM) = 0;
 }
 
+# if GCC_VERSION >= 4007
+#pragma GCC diagnostic pop
+#endif
+
 /* Map: eliminable "from" register -> its current elimination,
or NULL if none.  The elimination table may contain more than
one elimination for the same hard register, but this map specifies
diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc
index 23ddcaa3b55..0bb8e34e2b0 100644
--- a/gcc/opts-common.cc
+++ b/gcc/opts-common.cc
@@ -1388,6 +1388,7 @@ candidates_list_and_hint (const char *arg, char *&str,
   p[len] = ' ';
   p += len + 1;
 }
+  gcc_assert(p > str);
   p[-1] = '\0';
   return find_closest_string (arg, &candidates);
 }
diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
index 03a984f2adf..a54e5a90464 100644
--- a/gcc/tree-ssa-ccp.cc
+++ b/gcc/tree-ssa-ccp.cc
@@ -1976,6 +1976,13 @@ bit_value_binop (enum tree_code code, signop sgn, int 
width,
 }
 }
 
+/* Work around -Wmaybe-uninitialized false positive during 
autoprofiledbootstrap.  */
+
+# if GCC_VERSION >= 4007
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
+#endif
+
 /* Return the propagation value when applying the operation CODE to
the value RHS yielding type TYPE.  */
 
@@ -2011,6 +2018,10 @@ bit_value_unop (enum tree_code code, tree type, tree rhs)
   return val;
 }
 
+# if GCC_VERSION >= 4007
+#pragma GCC diagnostic pop
+

RE: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build

2023-05-10 Thread Eugene Rozenfeld via Gcc-patches



> I cannot find a call to this debug function on trunk.  How exactly did this 
> trigger a warning?

Here is the command during autoprofiledbootstrap build that resulted in a 
warning:

~/gcc1_objdir/gcc$ /home/erozen/gcc1_objdir/./prev-gcc/xg++ 
-B/home/erozen/gcc1_objdir/./prev-gcc/ 
-B/home/erozen/GCC1/x86_64-pc-linux-gnu/bin/ -nostdinc++ 
-B/home/erozen/gcc1_objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs 
-B/home/erozen/gcc1_objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs
  
-I/home/erozen/gcc1_objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu
  -I/home/erozen/gcc1_objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/include  
-I/home/erozen/gcc1/libstdc++-v3/libsupc++ 
-L/home/erozen/gcc1_objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs 
-L/home/erozen/gcc1_objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs
  -fno-PIE -c   -g -O2 -fchecking=1 -DIN_GCC -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H 
-fauto-profile=cc1plus.fda -I. -I. -I/home/erozen/gcc1_objdir/../gcc1/gcc 
-I/home/erozen/gcc1_objdir/../gcc1/gcc/. 
-I/home/erozen/gcc1_objdir/../gcc1/gcc/../include  
-I/home/erozen/gcc1_objdir/../gcc1/gcc/../libcpp/include 
-I/home/erozen/gcc1_objdir/../gcc1/gcc/../libcody 
-I/home/erozen/gcc1_objdir/./gmp -I/home/erozen/gcc1/gmp 
-I/home/erozen/gcc1_objdir/./mpfr/src -I/home/erozen/gcc1/mpfr/src 
-I/home/erozen/gcc1/mpc/src  
-I/home/erozen/gcc1_objdir/../gcc1/gcc/../libdecnumber 
-I/home/erozen/gcc1_objdir/../gcc1/gcc/../libdecnumber/bid -I../libdecnumber 
-I/home/erozen/gcc1_objdir/../gcc1/gcc/../libbacktrace 
-I/home/erozen/gcc1_objdir/./isl/include -I/home/erozen/gcc1/isl/include  -o 
ipa-devirt.o -MT ipa-devirt.o -MMD -MP -MF ./.deps/ipa-devirt.TPo 
/home/erozen/gcc1_objdir/../gcc1/gcc/ipa-devirt.cc
/home/erozen/gcc1_objdir/../gcc1/gcc/ipa-devirt.cc: In function 'void 
debug_tree_odr_name(tree, bool)':
/home/erozen/gcc1_objdir/../gcc1/gcc/ipa-devirt.cc:4037:23: error: '%s' 
directive argument is null [-Werror=format-overflow=]
 4037 | fprintf (stderr, "%s\n", odr);


> In any case, IMHO the function should rather print something that makes it 
> clear that an odr name could not be obtained rather than printing nothing.

> I also think that if we want to handle the case, we should do it before also 
> possibly passing odr to demangler.

I'll modify the fix unless we end up suppressing warnings as errors for 
autoprofiledbootstrap as Richard suggested.

Thanks,

Martin

RE: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build

2023-05-10 Thread Eugene Rozenfeld via Gcc-patches

I'm ok with disabling warnings as errors for autoprofiledbootstrap. What's the 
proper way to do that? Searching for "--disable-werror" I see matches in lib 
configure files but not in gcc files.

Thanks,

Eugene

-Original Message-
From: Richard Biener  
Sent: Tuesday, May 9, 2023 11:40 PM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during 
autoprofiledbootstrap build

On Wed, May 10, 2023 at 3:38 AM Eugene Rozenfeld via Gcc-patches 
 wrote:
>
> autoprofiledbootstrap build produces new warnings since inlining 
> decisions are different from other builds. This patch contains fixes 
> and workarounds for those warnings.
>
> Tested on x86_64-pc-linux-gnu.

Rather than this would it make sense to add --disable-werror to 
autoprofiledbootstrap configs like we do for others?  I also wonder how 
"stable" the afdo bootstrap inlining decisions are, so applying these 
workarounds may not be sustainable?

> gcc/ChangeLog:
>
> * config/i386/i386-expand.cc (expand_vec_perm_interleave2): Work 
> around
> -Wstringop-overflow false positive during autoprofiledbootstrap
> * ipa-devirt.cc (debug_tree_odr_name): Fix for -Wformat-overflow
> warning during autoprofiledbootstrap
> * lra-eliminations.cc (setup_can_eliminate): Work around
> -Wmaybe-uninitialized false positive during autoprofiledbootstrap
> * opts-common.cc (candidates_list_and_hint): Work around
> -Wstringop-overflow false positive during autoprofiledbootstrap
> * tree-ssa-ccp.cc (bit_value_unop): Work around -Wmaybe-uninitialized
> false positive during autoprofiledbootstrap
> * wide-int.h (wi::copy): Work around -Wmaybe-uninitialized false
> positive during autoprofiledbootstrap
> ---
>  gcc/config/i386/i386-expand.cc | 11 +++
>  gcc/ipa-devirt.cc  |  3 ++-
>  gcc/lra-eliminations.cc| 11 +++
>  gcc/opts-common.cc |  1 +
>  gcc/tree-ssa-ccp.cc| 11 +++
>  gcc/wide-int.h | 11 +++
>  6 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/i386-expand.cc 
> b/gcc/config/i386/i386-expand.cc index 634fe61ba79..be9f912775b 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -20419,6 +20419,13 @@ expand_vec_perm_pblendv (struct 
> expand_vec_perm_d *d)
>
>  static bool expand_vec_perm_interleave3 (struct expand_vec_perm_d 
> *d);
>
> +/* Work around -Wstringop-overflow false positive during 
> +autoprofiledbootstrap.  */
> +
> +# if GCC_VERSION >= 7001
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wstringop-overflow"
> +#endif
> +
>  /* A subroutine of ix86_expand_vec_perm_const_1.  Try to simplify
> a two vector permutation into a single vector permutation by using
> an interleave operation to merge the vectors.  */ @@ -20737,6 
> +20744,10 @@ expand_vec_perm_interleave2 (struct expand_vec_perm_d *d)
>return true;
>  }
>
> +# if GCC_VERSION >= 7001
> +#pragma GCC diagnostic pop
> +#endif
> +
>  /* A subroutine of ix86_expand_vec_perm_const_1.  Try to simplify
> a single vector cross-lane permutation into vpermq followed
> by any of the single insn permutations.  */ diff --git 
> a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc index 819860258d1..36ea266e834 
> 100644
> --- a/gcc/ipa-devirt.cc
> +++ b/gcc/ipa-devirt.cc
> @@ -4033,7 +4033,8 @@ debug_tree_odr_name (tree type, bool demangle)
>odr = cplus_demangle (odr, opts);
>  }
>
> -  fprintf (stderr, "%s\n", odr);
> +  if (odr != NULL)
> +fprintf (stderr, "%s\n", odr);
>  }
>
>  /* Register ODR enum so we later stream record about its values.  */ 
> diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc index 
> 4220639..05e2a7e0d68 100644
> --- a/gcc/lra-eliminations.cc
> +++ b/gcc/lra-eliminations.cc
> @@ -138,6 +138,13 @@ lra_debug_elim_table (void)
>print_elim_table (stderr);
>  }
>
> +/* Work around -Wmaybe-uninitialized false positive during 
> +autoprofiledbootstrap.  */
> +
> +# if GCC_VERSION >= 4007
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
> +#endif
> +
>  /* Setup possibility of elimination in elimination table element EP to
> VALUE.  Setup FRAME_POINTER_NEEDED if elimination from frame
> pointer to stack pointer is not possible anymore.  */ @@ -152,6 
> +159,10 @@ setup_can_eliminate (class lra_elim_table *ep, bool value)
>  REGNO_POINTER_ALIGN (HARD_FRAME_POINTER_REGNUM) = 0;  }
>
&g

RE: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build

2023-05-12 Thread Eugene Rozenfeld via Gcc-patches

Thank you, Richard. I went with your suggestion. New patch:


[PATCH] Disable warnings as errors for STAGEautofeedback.

Compilation during STAGEautofeedback produces additional warnings
since inlining decisions with -fauto-profile are different from
other builds.

This patches disables warnings as errors for STAGEautofeedback.

Tested on x86_64-pc-linux-gnu.

ChangeLog:

* Makefile.in: Disable warnings as errors for STAGEautofeedback
---
 Makefile.in | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index 33f3c862557..4c14c73ea61 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -590,8 +590,7 @@ STAGEautofeedback_CXXFLAGS = $(CXXFLAGS)
 STAGEautofeedback_CXXFLAGS = $(STAGEautofeedback_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGEautofeedback_TFLAGS = $(STAGE_TFLAGS)
-STAGEautofeedback_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
-
+STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out 
--enable-werror-always,$(STAGE_CONFIGURE_FLAGS))
 
 # By default, C and C++ are the only stage1 languages, because they are the
 # only ones we require to build with the bootstrap compiler, and also the
-- 
2.25.1

Thanks,

Eugene

-Original Message-
From: Richard Biener  
Sent: Thursday, May 11, 2023 1:58 AM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during 
autoprofiledbootstrap build

On Thu, May 11, 2023 at 4:23 AM Eugene Rozenfeld 
 wrote:
>
> I'm ok with disabling warnings as errors for autoprofiledbootstrap. What's 
> the proper way to do that? Searching for "--disable-werror" I see matches in 
> lib configure files but not in gcc files.

We have --with-build-config selecting things like bootstrap-O3 and configure 
then disables werror by default if the build config is anything other than the 
default or bootstrap-debug.

Of course profiledbootstrap and autoprofiledbootstrap are not build configs but 
make targets - that makes it more difficult (or impossible) to use the 
--disable-werror machinery here.

There is

STAGE_CONFIGURE_FLAGS=@stage2_werror_flag@

so it might be possible to filter out --enable-werror-always from 
STAGEautofeedback_CONFIGURE_FLAGS?

Richard.

> Thanks,
>
> Eugene
>
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, May 9, 2023 11:40 PM
> To: Eugene Rozenfeld 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings 
> during autoprofiledbootstrap build
>
> On Wed, May 10, 2023 at 3:38 AM Eugene Rozenfeld via Gcc-patches 
>  wrote:
> >
> > autoprofiledbootstrap build produces new warnings since inlining 
> > decisions are different from other builds. This patch contains fixes 
> > and workarounds for those warnings.
> >
> > Tested on x86_64-pc-linux-gnu.
>
> Rather than this would it make sense to add --disable-werror to 
> autoprofiledbootstrap configs like we do for others?  I also wonder how 
> "stable" the afdo bootstrap inlining decisions are, so applying these 
> workarounds may not be sustainable?
>
> > gcc/ChangeLog:
> >
> > * config/i386/i386-expand.cc (expand_vec_perm_interleave2): Work 
> > around
> > -Wstringop-overflow false positive during autoprofiledbootstrap
> > * ipa-devirt.cc (debug_tree_odr_name): Fix for -Wformat-overflow
> > warning during autoprofiledbootstrap
> > * lra-eliminations.cc (setup_can_eliminate): Work around
> > -Wmaybe-uninitialized false positive during autoprofiledbootstrap
> > * opts-common.cc (candidates_list_and_hint): Work around
> > -Wstringop-overflow false positive during autoprofiledbootstrap
> > * tree-ssa-ccp.cc (bit_value_unop): Work around 
> > -Wmaybe-uninitialized
> > false positive during autoprofiledbootstrap
> > * wide-int.h (wi::copy): Work around -Wmaybe-uninitialized false
> > positive during autoprofiledbootstrap
> > ---
> >  gcc/config/i386/i386-expand.cc | 11 +++
> >  gcc/ipa-devirt.cc  |  3 ++-
> >  gcc/lra-eliminations.cc| 11 +++
> >  gcc/opts-common.cc |  1 +
> >  gcc/tree-ssa-ccp.cc| 11 +++
> >  gcc/wide-int.h | 11 +++
> >  6 files changed, 47 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/i386/i386-expand.cc 
> > b/gcc/config/i386/i386-expand.cc index 634fe61ba79..be9f912775b 
> > 100644
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -20419,6 +20419,13 @@ expand_vec_perm_pblendv (struct 
> > expand_vec_perm_d *d)
> >
> >  static bool expand_vec_perm_interlea

RE: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build

2023-05-17 Thread Eugene Rozenfeld via Gcc-patches

Thank you for catching this, Thomas!
I modified Makefile.tmp and regenerated Makefile.in.

Here is the patch I pushed:

[PATCH] Disable warnings as errors for STAGEautofeedback.

Compilation during STAGEautofeedback produces additional warnings
since inlining decisions with -fauto-profile are different from
other builds.

This patches disables warnings as errors for STAGEautofeedback.

Tested on x86_64-pc-linux-gnu.

ChangeLog:

* Makefile.tpl: Disable warnings as errors for STAGEautofeedback
* Makefile.in: Regenerate
---
 Makefile.in  | 8 +---
 Makefile.tpl | 3 +++
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index a89bac02351..b559454cc90 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -590,9 +590,8 @@ STAGEautofeedback_CXXFLAGS = $(CXXFLAGS)
 STAGEautofeedback_CXXFLAGS = $(STAGEautofeedback_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGEautofeedback_TFLAGS = $(STAGE_TFLAGS)
-# Disable warnings as errors since inlining decisions with -fauto-profile
-# may result in additional warnings.
-STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out 
--enable-werror-always,$(STAGE_CONFIGURE_FLAGS))
+STAGEautofeedback_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
+
 
 # By default, C and C++ are the only stage1 languages, because they are the
 # only ones we require to build with the bootstrap compiler, and also the
@@ -641,6 +640,9 @@ STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS)
 
 STAGEautofeedback_CFLAGS = $(STAGE3_CFLAGS)
 STAGEautofeedback_TFLAGS = $(STAGE3_TFLAGS)
+# Disable warnings as errors since inlining decisions with -fauto-profile
+# may result in additional warnings.
+STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out 
--enable-werror-always,$(STAGE_CONFIGURE_FLAGS))
 
 do-compare = @do_compare@
 do-compare3 = $(do-compare)
diff --git a/Makefile.tpl b/Makefile.tpl
index 9d8ef9cf678..6bcee3021c9 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -563,6 +563,9 @@ STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS)
 
 STAGEautofeedback_CFLAGS = $(STAGE3_CFLAGS)
 STAGEautofeedback_TFLAGS = $(STAGE3_TFLAGS)
+# Disable warnings as errors since inlining decisions with -fauto-profile
+# may result in additional warnings.
+STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out 
--enable-werror-always,$(STAGE_CONFIGURE_FLAGS))
 
 do-compare = @do_compare@
 do-compare3 = $(do-compare)
-- 
2.25.1

Eugene
-Original Message-
From: Thomas Schwinge  
Sent: Wednesday, May 17, 2023 12:05 AM
To: Richard Biener ; Eugene Rozenfeld 

Cc: gcc-patches@gcc.gnu.org
Subject: Re: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during 
autoprofiledbootstrap build

Hi!

On 2023-05-15T09:30:35+0200, Richard Biener via Gcc-patches 
 wrote:
> On Fri, May 12, 2023 at 10:35 PM Eugene Rozenfeld 
>  wrote:
>>
>> Thank you, Richard. I went with your suggestion. New patch:
>>
>>
>> [PATCH] Disable warnings as errors for STAGEautofeedback.
>>
>> Compilation during STAGEautofeedback produces additional warnings 
>> since inlining decisions with -fauto-profile are different from other 
>> builds.
>>
>> This patches disables warnings as errors for STAGEautofeedback.
>
> Can you add a comment before the filtering?
>
> Otherwise looks good to me - please leave others 24h to comment before 
> you commit.

>> --- a/Makefile.in
>> +++ b/Makefile.in
>> @@ -590,8 +590,7 @@ STAGEautofeedback_CXXFLAGS = $(CXXFLAGS)  
>> STAGEautofeedback_CXXFLAGS = $(STAGEautofeedback_CFLAGS)  @endif 
>> target-libstdc++-v3-bootstrap  STAGEautofeedback_TFLAGS = 
>> $(STAGE_TFLAGS) -STAGEautofeedback_CONFIGURE_FLAGS = 
>> $(STAGE_CONFIGURE_FLAGS)
>> -
>> +STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out 
>> +--enable-werror-always,$(STAGE_CONFIGURE_FLAGS))

That's not how it works; the next person running 'autogen Makefile.def'
to regenerate 'Makefile.in' is going to undo those changes.  Instead, modify 
'Makefile.def', 'Makefile.tpl', and then 'autogen Makefile.def'.


Grüße
 Thomas


>> -Original Message-
>> From: Richard Biener 
>> Sent: Thursday, May 11, 2023 1:58 AM
>> To: Eugene Rozenfeld 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [EXTERNAL] Re: [PATCH] Fixes and workarounds for 
>> warnings during autoprofiledbootstrap build
>>
>> On Thu, May 11, 2023 at 4:23 AM Eugene Rozenfeld 
>>  wrote:
>> >
>> > I'm ok with disabling warnings as errors for autoprofiledbootstrap. What's 
>> > the proper way to do that? Searching for "--disable-werror" I see matches 
>> > in lib configure files but not in gcc files.
>>
>> We have --with-build-config selecting things like bootstrap-O3 and configure 
>> then disables werror by default if the build config is anything other than

RE: [EXTERNAL] Re: [PATCH] gcov-profile/71672 Fix indirect call inlining with AutoFDO

2021-08-03 Thread Eugene Rozenfeld via Gcc-patches

Thank you for the reviews, Andy and Richard.

I split up the patch into 4 commits and pushed to trunk.

Eugene

-Original Message-
From: Richard Biener  
Sent: Monday, August 2, 2021 2:57 AM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org; mli...@suse.cz; Andi Kleen 
Subject: [EXTERNAL] Re: [PATCH] gcov-profile/71672 Fix indirect call inlining 
with AutoFDO

On Fri, Jul 30, 2021 at 9:09 AM Eugene Rozenfeld via Gcc-patches 
 wrote:
>
> This patch has the following changes:
>
> 1. The main fix is in auto-profile.c: the histogram value for
>indirect calls was incorrectly set up. That is fixed now.
>
> 2. Several tests now have -fdump-ipa-afdo-optimized instead of -fdump-ipa-afdo
>in dg-options so that the expected output can be found.
>
> 3. I increased the number of iterations in several tests so that perf can have
>enough sampling events.
>
> 4. indir-call-prof-2.c has -fno-early-inlining but AutoFDO can't work without
>early inlining (it needs to match the inlining of the profiled binary).
>I changed profopt.exp to always pass -fearly-inlining for AutoFDO.
>With that the indirect call inlining in indir-call-prof-2.c happens in the 
> early inliner
>so I changed the dg-final-use-autofdo.
>
> 5. create_gcov tool doesn't currently support dwarf 5 so I made a change in 
> profopt.exp
>to pass -gdwarf-4 when compiling the binary to profile.
>
> 6. I updated the invocation of create_gcov in profopt.exp to pass 
> -gcov_version=2.
>I recently made a change to create_gcov to support version 2:
>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fgoogle%2Fautofdo%2Fpull%2F117&data=04%7C01%7CEugene.Rozen
> feld%40microsoft.com%7C92927d4029754d0d6b4708d9559be06d%7C72f988bf86f1
> 41af91ab2d7cd011db47%7C1%7C0%7C637634950245832767%7CUnknown%7CTWFpbGZs
> b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D
> %7C1000&sdata=Ex1OpS0gt9dpsBVIK71k7hvjJbfIkN%2BlRr%2BYD86%2FqEs%3D
> &reserved=0
>
> 7. I removed useless -o perf.data from the invocation of gcc-auto-profile in
>target-supports.exp.
>
> With these changes the tests checking indirect call inlining in gcc.dg 
> and g++.dg are passing.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> PR gcov-profile/71672
> * auto-profile.c (afdo_indirect_call): Fix the setup of the 
> historgram value for indirect calls.
>
> gcc/testsuite/ChangeLog:
> PR gcov-profile/71672
> * g++.dg/tree-prof/indir-call-prof.C: Fix options, increase the 
> number of iterations.
> * g++.dg/tree-prof/morefunc.C: Fix options, increase the number of 
> iterations.
> * g++.dg/tree-prof/reorder.C: Fix options, increase the number of 
> iterations.
> * gcc.dg/tree-prof/indir-call-prof-2.c: Fix options, fix 
> dg-final-use-autofdo, increase the number of iterations.
> * gcc.dg/tree-prof/indir-call-prof.c: Fix options.
> * lib/profopt.exp: Pass gdwarf-4 when compiling binary to profile; 
> pass -fearly-inlining when compiling with AutoFDO; pass -gcov_version=2 to 
> create_gcov.
> * lib/target-supports.exp: Remove unnecessary -o perf.data passed to 
> gcc-auto-profile.
> ---
>  gcc/auto-profile.c | 13 +
>  gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C   |  4 ++--
>  gcc/testsuite/g++.dg/tree-prof/morefunc.C  |  7 ---
>  gcc/testsuite/g++.dg/tree-prof/reorder.C   |  6 +++---
>  gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c |  8 
>  gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c   |  2 +-
>  gcc/testsuite/lib/profopt.exp  |  6 +++---
>  gcc/testsuite/lib/target-supports.exp  |  2 +-
>  8 files changed, 27 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c index 
> b23b82b2df4..4c1fc6b536b 100644
> --- a/gcc/auto-profile.c
> +++ b/gcc/auto-profile.c
> @@ -1009,13 +1009,18 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, 
> const icall_target_map &map,
>
>histogram_value hist = gimple_alloc_histogram_value (
>cfun, HIST_TYPE_INDIR_CALL, stmt, callee);
> -  hist->n_counters = 3;
> +  hist->n_counters = 4;
>hist->hvalue.counters = XNEWVEC (gcov_type, hist->n_counters);
>gimple_add_histogram_value (cfun, stmt, hist);
>
> -  hist->hvalue.counters[0] = direct_call->profile_id;
> -  hist->hvalue.counters[1] = max_iter->second;
> -  hist->hvalue.counters[2] = total;
> +  // Total counter
> +  hist->hvalue.counters[0] = total;
> +  // Number of value/counter pairs
> +  hist->hvalue.counters[1] = 1;
> +  // Value
> +  hist->hvalue.

[PATCH][PUSHED] Fix for an AutoFDO test.

2022-09-16 Thread Eugene Rozenfeld via Gcc-patches

After 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c17975d81aaed49ff759c20c68b31304a6953d58
the expected inlining in indir-call-prof-2.c test happens during afdo phase 
instead of einline.
This patch adjusts the test accordingly.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/indir-call-prof-2.c: Fix dg-final-use-autofdo.
---
 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c 
b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c
index 594c3f34d57..1d64d9f3f62 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -fno-early-inlining -fdump-ipa-profile-optimized 
-fdump-tree-einline-optimized" } */
+/* { dg-options "-O2 -fno-early-inlining -fdump-ipa-profile-optimized 
-fdump-ipa-afdo-optimized" } */
 volatile int one;
 static int
 add1 (int val)
@@ -31,5 +31,5 @@ main (void)
 }
 /* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* 
add1 .will resolve by ipa-profile" "profile"} } */
 /* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* 
sub1 .will resolve by ipa-profile" "profile"} } */
-/* { dg-final-use-autofdo { scan-tree-dump "Inlining add1/1 into main/4." 
"einline"} } */
-/* { dg-final-use-autofdo { scan-tree-dump "Inlining sub1/2 into main/4." 
"einline"} } */
+/* { dg-final-use-autofdo { scan-ipa-dump "Inlining add1/1 into main/4." 
"afdo"} } */
+/* { dg-final-use-autofdo { scan-ipa-dump "Inlining sub1/2 into main/4." 
"afdo"} } */
--
2.25.1

RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support.

2022-09-19 Thread Eugene Rozenfeld via Gcc-patches

Hi Jason,

Do you have any more feedback for this patch?

Thanks,

Eugene

-Original Message-
From: Eugene Rozenfeld 
Sent: Thursday, September 08, 2022 5:46 PM
To: Jason Merrill ; gcc-patches@gcc.gnu.org
Cc: Andi Kleen ; Jan Hubicka 
Subject: RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator 
support.

Jason,

Thank for your suggestion. The patch is updated (attached).

Thanks,

Eugene

-Original Message-
From: Jason Merrill 
Sent: Thursday, September 08, 2022 10:26 AM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org
Cc: Andi Kleen ; Jan Hubicka 
Subject: Re: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator 
support.

On 9/1/22 18:22, Eugene Rozenfeld wrote:
> Jason,
> 
> I made another small change in addressing your feedback (attached).
> 
> Thanks,
> 
> Eugene
> 
> -Original Message-
> From: Gcc-patches
>  On Behalf Of 
> Eugene Rozenfeld via Gcc-patches
> Sent: Thursday, September 01, 2022 1:49 PM
> To: Jason Merrill ; gcc-patches@gcc.gnu.org
> Cc: Andi Kleen ; Jan Hubicka 
> Subject: RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator 
> support.
> 
> Jason,
> 
> Thank you for your review. You are correct, we only need to check 
> has_discriminator for the statement that's on the same line.
> I updated the patch (attached).
> 
> Thanks,
> 
> Eugene
> 
> -Original Message-
> From: Jason Merrill 
> Sent: Thursday, August 18, 2022 6:55 PM
> To: Eugene Rozenfeld ; 
> gcc-patches@gcc.gnu.org
> Cc: Andi Kleen ; Jan Hubicka 
> Subject: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator 
> support.
> 
> On 8/3/22 17:25, Eugene Rozenfeld wrote:
>> One more ping for this patch
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
>> gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=
>> 0
>> 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da8
>> 1
>> 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63796470932569195
>> 1
>> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6
>> I
>> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2d
>> T
>> DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0
>>
>> CC Jason since this changes discriminators emitted in dwarf.
>>
>> Thanks,
>>
>> Eugene
>>
>> -Original Message-
>> From: Eugene Rozenfeld
>> Sent: Monday, June 27, 2022 12:45 PM
>> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan 
>> Hubicka 
>> Subject: RE: [PING][PATCH] Add instruction level discriminator support.
>>
>> Another ping for 
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=05%7C01%7Ceugene.rozenfeld%40microsoft.com%7Cc7c9fab519184eee0bb808da91bf2fea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637982547579085499%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Kj3YWJrDqjX%2B0Ml3At5P8NRWc1Xs6mbI%2F1vCk9%2BLaQs%3D&reserved=0
>>  .
>>
>> I got a review from Andi 
>> (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596549.html&data=05%7C01%7Ceugene.rozenfeld%40microsoft.com%7Cc7c9fab519184eee0bb808da91bf2fea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637982547579085499%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NBzFtD0mH7EYKxsVWfqZgSpQmUG18SIt01XRYUlwwW4%3D&reserved=0)
>>  but I also need a review from someone who can approve the changes.
>>
>> Thanks,
>>
>> Eugene
>>
>> -Original Message-
>> From: Eugene Rozenfeld
>> Sent: Friday, June 10, 2022 12:03 PM
>> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan 
>> Hubicka 
>> Subject: [PING][PATCH] Add instruction level discriminator support.
>>
>> Hello,
>>
>> I'd like to ping this patch:
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
>> gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=
>> 0
>> 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da8
>> 1
>> 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63796470932569195
>> 1
>> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6
>> I
>> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2d
>> T
>> DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0
>>
>> Thanks,
>>
>> Eugene
>>
>> -Orig

[PATCH] Fix profile count comparison.

2022-09-23 Thread Eugene Rozenfeld via Gcc-patches

The comparison was incorrect when the counts weren't PRECISE.
For example, crossmodule-indir-call-topn-1.c was failing
with AutoFDO: when count_sum is 0 with quality AFDO,
count_sum > profile_count::zero() evaluates to true. Taking that
branch then leads to an assert in the call to to_sreal().

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:

* ipa-cp.cc (good_cloning_opportunity_p): Fix profile count comparison.
---
 gcc/ipa-cp.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 543a9334e2c..66bba71c068 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -3338,9 +3338,9 @@ good_cloning_opportunity_p (struct cgraph_node *node, 
sreal time_benefit,

   ipa_node_params *info = ipa_node_params_sum->get (node);
   int eval_threshold = opt_for_fn (node->decl, param_ipa_cp_eval_threshold);
-  if (count_sum > profile_count::zero ())
+  if (count_sum.nonzero_p ())
 {
-  gcc_assert (base_count > profile_count::zero ());
+  gcc_assert (base_count.nonzero_p ());
   sreal factor = count_sum.probability_in (base_count).to_sreal ();
   sreal evaluation = (time_benefit * factor) / size_cost;
   evaluation = incorporate_penalties (node, info, evaluation);
--
2.25.1

[PATCH][PUSHED] Fix AutoFDO tests to not look for hot/cold splitting.

2022-09-27 Thread Eugene Rozenfeld via Gcc-patches

AutoFDO counts are not reliable and we are currently not
performing hot/cold splitting based on them. This change adjusts
several tree-prof tests not to check for hot/cold splitting
when run with AutoFDO.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/cold_partition_label.c: Don't check for hot/cold 
splitting with AutoFDO.
* gcc.dg/tree-prof/section-attr-1.c: Don't check for hot/cold splitting 
with AutoFDO.
* gcc.dg/tree-prof/section-attr-2.c: Don't check for hot/cold splitting 
with AutoFDO.
* gcc.dg/tree-prof/section-attr-3.c: Don't check for hot/cold splitting 
with AutoFDO.
---
 gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c | 4 ++--
 gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c   | 4 ++--
 gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c   | 4 ++--
 gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c   | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c 
b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
index 511b61067c0..b85e6c1f93d 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
@@ -43,6 +43,6 @@ main (int argc, char *argv[])
   return 0;
 }
 
-/* { dg-final-use { scan-assembler "foo\[._\]+cold" { target *-*-linux* 
*-*-gnu* } } } */
-/* { dg-final-use { scan-assembler "size\[ \ta-zA-Z0-0\]+foo\[._\]+cold" { 
target *-*-linux* *-*-gnu* } } } */
+/* { dg-final-use-not-autofdo { scan-assembler "foo\[._\]+cold" { target 
*-*-linux* *-*-gnu* } } } */
+/* { dg-final-use-not-autofdo { scan-assembler "size\[ 
\ta-zA-Z0-0\]+foo\[._\]+cold" { target *-*-linux* *-*-gnu* } } } */
 /* { dg-final-use { scan-tree-dump-not "Invalid sum" "optimized"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c 
b/gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c
index 2087d0d2059..5376de14a2f 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c
@@ -52,5 +52,5 @@ foo (int path)
 }
 }
 
-/* { dg-final-use { scan-assembler "\.section\[\t 
\]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target 
*-*-linux* *-*-gnu* } } } */
-/* { dg-final-use { scan-assembler {.section[\t 
]*__TEXT,__text_cold[^\n]*[\n\r]+_foo.cold:} { target *-*-darwin* } } } */
+/* { dg-final-use-not-autofdo { scan-assembler "\.section\[\t 
\]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target 
*-*-linux* *-*-gnu* } } } */
+/* { dg-final-use-not-autofdo { scan-assembler {.section[\t 
]*__TEXT,__text_cold[^\n]*[\n\r]+_foo.cold:} { target *-*-darwin* } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c 
b/gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c
index b02526beaea..90de2c08ca4 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c
@@ -51,5 +51,5 @@ foo (int path)
 }
 }
 
-/* { dg-final-use { scan-assembler "\.section\[\t 
\]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target 
*-*-linux* *-*-gnu* } } } */
-/* { dg-final-use { scan-assembler {.section[\t 
]*__TEXT,__text_cold[^\n]*[\n\r]+_foo.cold:} { target *-*-darwin* } } } */
+/* { dg-final-use-not-autofdo { scan-assembler "\.section\[\t 
\]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target 
*-*-linux* *-*-gnu* } } } */
+/* { dg-final-use-not-autofdo { scan-assembler {.section[\t 
]*__TEXT,__text_cold[^\n]*[\n\r]+_foo.cold:} { target *-*-darwin* } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c 
b/gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c
index da064070653..29a48f05feb 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c
@@ -52,5 +52,5 @@ foo (int path)
 }
 }
 
-/* { dg-final-use { scan-assembler "\.section\[\t 
\]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target 
*-*-linux* *-*-gnu* } } } */
-/* { dg-final-use { scan-assembler {.section[\t 
]*__TEXT,__text_cold[^\n]*[\n\r]+_foo.cold:} { target *-*-darwin* } } } */
+/* { dg-final-use-not-autofdo { scan-assembler "\.section\[\t 
\]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target 
*-*-linux* *-*-gnu* } } } */
+/* { dg-final-use-not-autofdo { scan-assembler {.section[\t 
]*__TEXT,__text_cold[^\n]*[\n\r]+_foo.cold:} { target *-*-darwin* } } } */
-- 
2.25.1

[PATCH] Emit discriminators for inlined call sites.

2022-09-29 Thread Eugene Rozenfeld via Gcc-patches

This change is based on commit 9fa26998a63d4b22b637ed8702520819e408a694
by Dehao Chen in vendors/google/heads/gcc-4_8.

gcc/ChangeLog:

* dwarf2out.cc (add_call_src_coords_attributes): Emit discriminators 
for inlined call sites.
---
 gcc/dwarf2out.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 2df75904022..e81044b8c48 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -24783,7 +24783,8 @@ add_call_src_coords_attributes (tree stmt, dw_die_ref 
die)
   if (RESERVED_LOCATION_P (BLOCK_SOURCE_LOCATION (stmt)))
 return;

-  expanded_location s = expand_location (BLOCK_SOURCE_LOCATION (stmt));
+  location_t locus = BLOCK_SOURCE_LOCATION (stmt);
+  expanded_location s = expand_location (locus);

   if (dwarf_version >= 3 || !dwarf_strict)
 {
@@ -24791,6 +24792,9 @@ add_call_src_coords_attributes (tree stmt, dw_die_ref 
die)
   add_AT_unsigned (die, DW_AT_call_line, s.line);
   if (debug_column_info && s.column)
add_AT_unsigned (die, DW_AT_call_column, s.column);
+  unsigned discr = get_discriminator_from_loc (locus);
+   if (discr != 0)
+ add_AT_unsigned (die, DW_AT_GNU_discriminator, discr);
 }
 }

--
2.25.1

[PATCH] Set discriminators for call stmts on the same line within the same basic block

2022-10-02 Thread Eugene Rozenfeld via Gcc-patches

This change is based on commit 1e6c4a7a8fb8e20545bb9f9032d3854f3f794c18
by Dehao Chen in vendors/google/heads/gcc-4_8.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
* tree-cfg.cc (assign_discriminators): Set discriminators for call stmts
on the same line within the same basic block.
---
 gcc/tree-cfg.cc | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index ade66c54499..8e2a3a5f6c6 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -1203,8 +1203,39 @@ assign_discriminators (void)
 {
   edge e;
   edge_iterator ei;
+  gimple_stmt_iterator gsi;
   gimple *last = last_stmt (bb);
   location_t locus = last ? gimple_location (last) : UNKNOWN_LOCATION;
+  location_t curr_locus = UNKNOWN_LOCATION;
+  int curr_discr = 0;
+
+  /* Traverse the basic block, if two function calls within a basic block
+   are mapped to the same line, assign a new discriminator because a call
+   stmt could be a split point of a basic block.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+   {
+ gimple *stmt = gsi_stmt (gsi);
+ expanded_location curr_locus_e;
+ if (curr_locus == UNKNOWN_LOCATION)
+   {
+ curr_locus = gimple_location (stmt);
+ curr_locus_e = expand_location (curr_locus);
+   }
+ else if (!same_line_p (curr_locus, &curr_locus_e, gimple_location 
(stmt)))
+   {
+ curr_locus = gimple_location (stmt);
+ curr_locus_e = expand_location (curr_locus);
+ curr_discr = 0;
+   }
+ else if (curr_discr != 0)
+   {
+ gimple_set_location (stmt, location_with_discriminator (
+ gimple_location (stmt), curr_discr));
+   }
+ /* Allocate a new discriminator for CALL stmt.  */
+ if (gimple_code (stmt) == GIMPLE_CALL)
+   curr_discr = next_discriminator_for_locus (curr_locus);
+   }

   if (locus == UNKNOWN_LOCATION)
continue;
--
2.25.1

Re: [PATCH] Set discriminators for call stmts on the same line within the same basic block

2022-10-06 Thread Eugene Rozenfeld via Gcc-patches

Thank you for the review Jason.

I fixed formatting and updated the commit description:


Call statements are possible split points of a basic block so they may end up
in different basic blocks by the time pass_ipa_auto_profile executes.

This change will also simplify call site lookups since now location with 
discriminator
will uniquely identify the call site (no callee function name is needed).

This change is based on commit 1e6c4a7a8fb8e20545bb9f9032d3854f3f794c18
by Dehao Chen in vendors/google/heads/gcc-4_8.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
* tree-cfg.cc (assign_discriminators): Set discriminators for call stmts
on the same line within the same basic block.
---
 gcc/tree-cfg.cc | 32 
 1 file changed, 32 insertions(+)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index ade66c54499..f6a465f4c91 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -1203,8 +1203,40 @@ assign_discriminators (void)
 {
   edge e;
   edge_iterator ei;
+  gimple_stmt_iterator gsi;
   gimple *last = last_stmt (bb);
   location_t locus = last ? gimple_location (last) : UNKNOWN_LOCATION;
+  location_t curr_locus = UNKNOWN_LOCATION;
+  int curr_discr = 0;
+
+  /* Traverse the basic block, if two function calls within a basic block
+   are mapped to the same line, assign a new discriminator because a call
+   stmt could be a split point of a basic block.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+   {
+ gimple *stmt = gsi_stmt (gsi);
+ expanded_location curr_locus_e;
+ if (curr_locus == UNKNOWN_LOCATION)
+   {
+ curr_locus = gimple_location (stmt);
+ curr_locus_e = expand_location (curr_locus);
+   }
+ else if (!same_line_p (curr_locus, &curr_locus_e, gimple_location 
(stmt)))
+   {
+ curr_locus = gimple_location (stmt);
+ curr_locus_e = expand_location (curr_locus);
+ curr_discr = 0;
+   }
+ else if (curr_discr != 0)
+   {
+ location_t loc = gimple_location (stmt);
+ location_t dloc = location_with_discriminator (loc, curr_discr);
+ gimple_set_location (stmt, dloc);
+   }
+ /* Allocate a new discriminator for CALL stmt.  */
+ if (gimple_code (stmt) == GIMPLE_CALL)
+   curr_discr = next_discriminator_for_locus (curr_locus);
+   }
 
   if (locus == UNKNOWN_LOCATION)
continue;
-- 
2.25.1

-Original Message-
From: Jason Merrill  
Sent: Tuesday, October 04, 2022 3:21 PM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH] Set discriminators for call stmts on the same 
line within the same basic block

On 10/3/22 02:08, Eugene Rozenfeld wrote:
> This change is based on commit 
> 1e6c4a7a8fb8e20545bb9f9032d3854f3f794c18
> by Dehao Chen in vendors/google/heads/gcc-4_8.
> 
> Tested on x86_64-pc-linux-gnu.

Brief rationale for the change?

> gcc/ChangeLog:
>  * tree-cfg.cc (assign_discriminators): Set discriminators for call 
> stmts
>  on the same line within the same basic block.
> ---
>   gcc/tree-cfg.cc | 31 +++
>   1 file changed, 31 insertions(+)
> 
> diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index 
> ade66c54499..8e2a3a5f6c6 100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -1203,8 +1203,39 @@ assign_discriminators (void)
>   {
> edge e;
> edge_iterator ei;
> +  gimple_stmt_iterator gsi;
> gimple *last = last_stmt (bb);
> location_t locus = last ? gimple_location (last) : 
> UNKNOWN_LOCATION;
> +  location_t curr_locus = UNKNOWN_LOCATION;
> +  int curr_discr = 0;
> +
> +  /* Traverse the basic block, if two function calls within a basic block
> +   are mapped to the same line, assign a new discriminator because a call
> +   stmt could be a split point of a basic block.  */
> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +   {
> + gimple *stmt = gsi_stmt (gsi);
> + expanded_location curr_locus_e;
> + if (curr_locus == UNKNOWN_LOCATION)
> +   {
> + curr_locus = gimple_location (stmt);
> + curr_locus_e = expand_location (curr_locus);
> +   }
> + else if (!same_line_p (curr_locus, &curr_locus_e, gimple_location 
> (stmt)))
> +   {
> + curr_locus = gimple_location (stmt);
> + curr_locus_e = expand_location (curr_locus);
> + curr_discr = 0;
> +   }
> + else if (curr_discr != 0)
> +   {
> + gimple_set_location (stmt, location_with_discrim

[PATCH][ICE] Fix for PR107193.

2022-10-10 Thread Eugene Rozenfeld via Gcc-patches

The bug was introduced in f30e9fd33e56a5a721346ea6140722e1b193db42.
A variable (cur_locus_e) was incorrectly declared inside a loop.
I also moved two other declarations (last and locus) down to make
the code more clear.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
PR debug/107193
* tree-cfg.cc (assign_discriminators): Move declaration of cur_locus_e
out of the loop.
---
 gcc/tree-cfg.cc | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 41f2925665f..ae781871a19 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -1204,9 +1204,8 @@ assign_discriminators (void)
   edge e;
   edge_iterator ei;
   gimple_stmt_iterator gsi;
-  gimple *last = last_stmt (bb);
-  location_t locus = last ? gimple_location (last) : UNKNOWN_LOCATION;
   location_t curr_locus = UNKNOWN_LOCATION;
+  expanded_location curr_locus_e = {};
   int curr_discr = 0;
 
   /* Traverse the basic block, if two function calls within a basic block
@@ -1215,7 +1214,7 @@ assign_discriminators (void)
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
  gimple *stmt = gsi_stmt (gsi);
- expanded_location curr_locus_e;
+
  if (curr_locus == UNKNOWN_LOCATION)
{
  curr_locus = gimple_location (stmt);
@@ -1238,6 +1237,8 @@ assign_discriminators (void)
curr_discr = next_discriminator_for_locus (curr_locus);
}
 
+  gimple *last = last_stmt (bb);
+  location_t locus = last ? gimple_location (last) : UNKNOWN_LOCATION;
   if (locus == UNKNOWN_LOCATION)
continue;
 
-- 
2.25.1

Re: [EXTERNAL] Re: [PATCH] Set discriminators for call stmts on the same line within the same basic block

2022-10-10 Thread Eugene Rozenfeld via Gcc-patches

I sent a patch that fixes a bug introduced by this patch: 
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603203.html

What you are seeing could have been caused by the same bug since it involves an 
uninitialized variable.

Eugene


On Oct 10, 2022, at 5:54 PM, David Edelsohn  wrote:


This patch causes a bootstrap comparison failure on AIX.  It apparently does 
not cause a failure on PPC64BE Linux with the same ABI, so I suspect that the 
failure may be related to the way that function aliases are implemented on AIX, 
which doesn't have ELF symbol alias semantics.

"This change will also simplify call site lookups since now location with 
discriminator will uniquely identify the call site (no callee function name is 
needed)."

I will open a PR with more information about the comparison difference now that 
I have a work-around to bring AIX back to a bootstrappable state.  Any thoughts 
about what could be going wrong?

Thanks, David

[PATCH] AutoFDO: don't set param_early_inliner_max_iterations to 10.

2022-01-28 Thread Eugene Rozenfeld via Gcc-patches

param_early_inliner_max_iterations specifies the maximum number
of nested indirect inlining iterations performed by early inliner.
Normally, the default value is 1.

For AutoFDO this parameter was also used as the number of iteration for
its indirect call promotion loop and the default value was set to 10.
While it makes sense to have 10 in the indirect call promotion loop
(we want to make the IR match the profiled binary before actual annotation)
there is no reason to have a special default value for the
regular early inliner.

This change removes the special AutoFDO default value setting for
param_early_inliner_max_iterations while keeping 10 as the number of
iterations for the AutoFDO indirect call promotion loop.

This change improves a simple fibonacci benchmark in AutoFDO mode
by 15% on x86_64-pc-linux-gnu.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
* auto-profile.cc (auto_profile): Hard-code the number of iterations 
(10).

gcc/ChangeLog:
* opt.cc (common_handle_option): Don't set 
param_early_inliner_max_iterations to 10 for AutoFDO.
---
 gcc/auto-profile.cc | 3 +--
 gcc/opts.cc | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 0bfaae7b091..c7cee639c85 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1644,8 +1644,7 @@ auto_profile (void)
function before annotation, so the profile inside bar@loc_foo2
will be useful.  */
 autofdo::stmt_set promoted_stmts;
-for (int i = 0; i < opt_for_fn (node->decl,
-   param_early_inliner_max_iterations); i++)
+for (int i = 0; i < 10; i++)
   {
 if (!flag_value_profile_transformations
 || !autofdo::afdo_vpt_for_early_inline (&promoted_stmts))
diff --git a/gcc/opts.cc b/gcc/opts.cc
index 17e1884f0e4..f6f6a8e1709 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -2899,8 +2899,6 @@ common_handle_option (struct gcc_options *opts,
 case OPT_fauto_profile:
   enable_fdo_optimizations (opts, opts_set, value);
   SET_OPTION_IF_UNSET (opts, opts_set, flag_profile_correction, value);
-  SET_OPTION_IF_UNSET (opts, opts_set,
-  param_early_inliner_max_iterations, 10);
   break;

 case OPT_fprofile_generate_:
--
2.25.1

RE: [EXTERNAL] Re: [PATCH] AutoFDO: don't set param_early_inliner_max_iterations to 10.

2022-01-31 Thread Eugene Rozenfeld via Gcc-patches

Thank you for the review Richard. Is it ok to commit this change to trunk now 
or should I wait till GCC 13 development starts?
This seems like a very low risk change.

Thanks,

Eugene

-Original Message-
From: Richard Biener  
Sent: Monday, January 31, 2022 1:23 AM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH] AutoFDO: don't set 
param_early_inliner_max_iterations to 10.

On Sat, Jan 29, 2022 at 12:24 AM Eugene Rozenfeld via Gcc-patches 
 wrote:
>
> param_early_inliner_max_iterations specifies the maximum number of 
> nested indirect inlining iterations performed by early inliner.
> Normally, the default value is 1.
>
> For AutoFDO this parameter was also used as the number of iteration 
> for its indirect call promotion loop and the default value was set to 10.
> While it makes sense to have 10 in the indirect call promotion loop 
> (we want to make the IR match the profiled binary before actual 
> annotation) there is no reason to have a special default value for the 
> regular early inliner.
>
> This change removes the special AutoFDO default value setting for 
> param_early_inliner_max_iterations while keeping 10 as the number of 
> iterations for the AutoFDO indirect call promotion loop.
>
> This change improves a simple fibonacci benchmark in AutoFDO mode by 
> 15% on x86_64-pc-linux-gnu.
>
> Tested on x86_64-pc-linux-gnu.

OK.

Richard.

> gcc/ChangeLog:
> * auto-profile.cc (auto_profile): Hard-code the number of iterations 
> (10).
>
> gcc/ChangeLog:
> * opt.cc (common_handle_option): Don't set 
> param_early_inliner_max_iterations to 10 for AutoFDO.
> ---
>  gcc/auto-profile.cc | 3 +--
>  gcc/opts.cc | 2 --
>  2 files changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index 
> 0bfaae7b091..c7cee639c85 100644
> --- a/gcc/auto-profile.cc
> +++ b/gcc/auto-profile.cc
> @@ -1644,8 +1644,7 @@ auto_profile (void)
> function before annotation, so the profile inside bar@loc_foo2
> will be useful.  */
>  autofdo::stmt_set promoted_stmts;
> -for (int i = 0; i < opt_for_fn (node->decl,
> -   param_early_inliner_max_iterations); i++)
> +for (int i = 0; i < 10; i++)
>{
>  if (!flag_value_profile_transformations
>  || !autofdo::afdo_vpt_for_early_inline (&promoted_stmts)) 
> diff --git a/gcc/opts.cc b/gcc/opts.cc index 17e1884f0e4..f6f6a8e1709 
> 100644
> --- a/gcc/opts.cc
> +++ b/gcc/opts.cc
> @@ -2899,8 +2899,6 @@ common_handle_option (struct gcc_options *opts,
>  case OPT_fauto_profile:
>enable_fdo_optimizations (opts, opts_set, value);
>SET_OPTION_IF_UNSET (opts, opts_set, flag_profile_correction, value);
> -  SET_OPTION_IF_UNSET (opts, opts_set,
> -  param_early_inliner_max_iterations, 10);
>break;
>
>  case OPT_fprofile_generate_:
> --
> 2.25.1

[PATCH] AutoFDO: Don't try to promote indirect calls that result in recursive direct calls

2022-02-08 Thread Eugene Rozenfeld via Gcc-patches

AutoFDO tries to promote and inline all indirect calls that were promoted
and inlined in the original binary and that are still hot. In the included
test case, the promotion results in a direct call that is a recursive call.
inline_call and optimize_inline_calls can't handle recursive calls at this 
stage.
Currently, inline_call fails with a segmentation fault.

This change leaves the indirect call alone if promotion will result in a 
recursive call.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
* auto-profile.cc (afdo_indirect_call): Don't attempt to promote 
indirect calls
that will result in direct recursive calls.

gcc/testsuite/g++.dg/tree-prof/ChangeLog:
* indir-call-recursive-inlining.C : New test.
---
 gcc/auto-profile.cc   | 40 --
 .../tree-prof/indir-call-recursive-inlining.C | 54 +++
 2 files changed, 78 insertions(+), 16 deletions(-)
 create mode 100644 
gcc/testsuite/g++.dg/tree-prof/indir-call-recursive-inlining.C

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index c7cee639c85..2b34b80b82d 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -975,7 +975,7 @@ read_profile (void)
  * after annotation, we just need to mark, and let follow-up logic to
decide if it needs to promote and inline.  */

-static void
+static bool
 afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map,
 bool transform)
 {
@@ -983,12 +983,12 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const 
icall_target_map &map,
   tree callee;

   if (map.size () == 0)
-return;
+return false;
   gcall *stmt = dyn_cast  (gs);
   if (!stmt
   || gimple_call_internal_p (stmt)
   || gimple_call_fndecl (stmt) != NULL_TREE)
-return;
+return false;

   gcov_type total = 0;
   icall_target_map::const_iterator max_iter = map.end ();
@@ -1003,7 +1003,7 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const 
icall_target_map &map,
   struct cgraph_node *direct_call = cgraph_node::get_for_asmname (
   get_identifier (afdo_string_table->get_name (max_iter->first)));
   if (direct_call == NULL || !direct_call->profile_id)
-return;
+return false;

   callee = gimple_call_fn (stmt);

@@ -1013,20 +1013,27 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const 
icall_target_map &map,
   hist->hvalue.counters = XNEWVEC (gcov_type, hist->n_counters);
   gimple_add_histogram_value (cfun, stmt, hist);

-  // Total counter
+  /* Total counter */
   hist->hvalue.counters[0] = total;
-  // Number of value/counter pairs
+  /* Number of value/counter pairs */
   hist->hvalue.counters[1] = 1;
-  // Value
+  /* Value */
   hist->hvalue.counters[2] = direct_call->profile_id;
-  // Counter
+  /* Counter */
   hist->hvalue.counters[3] = max_iter->second;

   if (!transform)
-return;
+return false;
+
+  cgraph_node* current_function_node = cgraph_node::get 
(current_function_decl);
+
+  /* If the direct call is a recursive call, don't promote it since
+ we are not set up to inline recursive calls at this stage. */
+  if (direct_call == current_function_node)
+return false;

   struct cgraph_edge *indirect_edge
-  = cgraph_node::get (current_function_decl)->get_edge (stmt);
+  = current_function_node->get_edge (stmt);

   if (dump_file)
 {
@@ -1040,13 +1047,13 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const 
icall_target_map &map,
 {
   if (dump_file)
 fprintf (dump_file, " not transforming\n");
-  return;
+  return false;
 }
   if (DECL_STRUCT_FUNCTION (direct_call->decl) == NULL)
 {
   if (dump_file)
 fprintf (dump_file, " no declaration\n");
-  return;
+  return false;
 }

   if (dump_file)
@@ -1063,16 +1070,17 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const 
icall_target_map &map,
   cgraph_edge::redirect_call_stmt_to_callee (new_edge);
   gimple_remove_histogram_value (cfun, stmt, hist);
   inline_call (new_edge, true, NULL, NULL, false);
+  return true;
 }

 /* From AutoFDO profiles, find values inside STMT for that we want to measure
histograms and adds them to list VALUES.  */

-static void
+static bool
 afdo_vpt (gimple_stmt_iterator *gsi, const icall_target_map &map,
   bool transform)
 {
-  afdo_indirect_call (gsi, map, transform);
+  return afdo_indirect_call (gsi, map, transform);
 }

 typedef std::set bb_set;
@@ -1498,8 +1506,8 @@ afdo_vpt_for_early_inline (stmt_set *promoted_stmts)
   {
 /* Promote the indirect call and update the promoted_stmts.  */
 promoted_stmts->insert (stmt);
-afdo_vpt (&gsi, info.targets, true);
-has_vpt = true;
+if (afdo_vpt (&gsi, info.targets, true))
+  has_vpt = true;
   }
   }
   }
diff --git a/gcc/testsuite/g++.dg/tree-prof/indir-call-recursive-inlining.C 
b/gcc/testsuite/g++.dg/tree-prof/indir-call-recursive-inlining.C
new file mode

[PATCH] Improve AutoFDO count propagation algorithm

2021-12-02 Thread Eugene Rozenfeld via Gcc-patches

When a basic block A has been annotated with a count and it has only one
successor (or predecessor) B, we can propagate the A's count to B.
The algorithm without this change could leave B without an annotation if B had
other unannotated predecessors (or successors). For example, in the test case I 
added,
the loop header block was left unannotated, which prevented loop unrolling.

gcc/ChangeLog:
* auto-profile.c (afdo_propagate_edge): Improve count propagation 
algorithm.

gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/init-array.c: New test for unrolling inner loops.
---
 gcc/auto-profile.c  | 20 +-
 gcc/testsuite/gcc.dg/tree-prof/init-array.c | 43 +
 2 files changed, 61 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/init-array.c

diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
index 4c1fc6b536b..dfcd68113aa 100644
--- a/gcc/auto-profile.c
+++ b/gcc/auto-profile.c
@@ -1192,7 +1192,8 @@ afdo_find_equiv_class (bb_set *annotated_bb)
 /* If a basic block's count is known, and only one of its in/out edges' count
is unknown, its count can be calculated. Meanwhile, if all of the in/out
edges' counts are known, then the basic block's unknown count can also be
-   calculated.
+   calculated. Also, if a block has a single predecessor or successor, the 
block's
+   count can be propagated to that predecessor or successor.
IS_SUCC is true if out edges of a basic blocks are examined.
Update ANNOTATED_BB accordingly.
Return TRUE if any basic block/edge count is changed.  */
@@ -1208,6 +1209,7 @@ afdo_propagate_edge (bool is_succ, bb_set *annotated_bb)
 edge e, unknown_edge = NULL;
 edge_iterator ei;
 int num_unknown_edge = 0;
+int num_edge = 0;
 profile_count total_known_count = profile_count::zero ().afdo ();

 FOR_EACH_EDGE (e, ei, is_succ ? bb->succs : bb->preds)
@@ -1217,6 +1219,7 @@ afdo_propagate_edge (bool is_succ, bb_set *annotated_bb)
  num_unknown_edge++, unknown_edge = e;
else
  total_known_count += AFDO_EINFO (e)->get_count ();
+   num_edge++;
   }

 /* Be careful not to annotate block with no successor in special cases.  */
@@ -1230,7 +1233,20 @@ afdo_propagate_edge (bool is_succ, bb_set *annotated_bb)
 else if (num_unknown_edge == 1 && is_bb_annotated (bb, *annotated_bb))
   {
if (bb->count > total_known_count)
- AFDO_EINFO (unknown_edge)->set_count (bb->count - total_known_count);
+ {
+ profile_count new_count = bb->count - total_known_count;
+ AFDO_EINFO(unknown_edge)->set_count(new_count);
+ if (num_edge == 1)
+   {
+ basic_block succ_or_pred_bb = is_succ ? unknown_edge->dest : 
unknown_edge->src;
+ if (new_count > succ_or_pred_bb->count)
+   {
+ succ_or_pred_bb->count = new_count;
+ if (!is_bb_annotated (succ_or_pred_bb, *annotated_bb))
+   set_bb_annotated (succ_or_pred_bb, annotated_bb);
+   }
+   }
+  }
else
  AFDO_EINFO (unknown_edge)->set_count (profile_count::zero().afdo ());
AFDO_EINFO (unknown_edge)->set_annotated ();
diff --git a/gcc/testsuite/gcc.dg/tree-prof/init-array.c 
b/gcc/testsuite/gcc.dg/tree-prof/init-array.c
new file mode 100644
index 000..0f7a5c84481
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/init-array.c
@@ -0,0 +1,43 @@
+/* { dg-options "-O3 -fdump-tree-cunrolli-details" } */
+
+static int s[10][10][10];
+static int d[10][10][10];
+
+__attribute__((noipa))
+int array()
+{
+   int i;
+   register int j, k;
+   for (i = 0; i < 10; i++)
+   for (j = 0; j < 10; j++)
+   for (k = 0; k < 10; k++)
+   d[i][j][k] = s[i][j][k];
+
+   return(0);
+}
+
+__attribute__((noipa))
+void TestBench()
+{
+   for (int i = 0; i < 15; ++i)
+   {
+  array();
+   }
+}
+
+int main(int argc, char *argv[])
+{
+
+   TestBench();
+
+   if (d[9][9][9] == 0 && s[9][9][9] == 0)
+   {
+   return 0;
+   }
+   else
+   {
+   return -1;
+   }
+}
+
+/* { dg-final-use { scan-tree-dump-times "loop with 10 iterations completely 
unrolled" 2 "cunrolli"} } */
--
2.25.1

[PATCH] Don't print discriminators for -fcompare-debug.

2022-10-16 Thread Eugene Rozenfeld via Gcc-patches

With -gstatement-frontiers we may end up with different IR
coming from the front end with and without debug information turned on.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100733 for details.
That may result in differences in discriminator values and -fcompare-debug
failures.

This patch disables printing of discriminators when the dump is intended
for -fcompare-debug comparison and reverses the workaround in a test.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
PR debug/107231
PR debug/107169
* print-rtl.cc (print_rtx_operand_code_i): Don't print discriminators
for -fdebug-compare.

gcc/testsuite/ChangeLog:

* c-c++-common/ubsan/pr85213.c: Reverse the workaround for 
discriminators.
---
 gcc/print-rtl.cc   | 13 ++---
 gcc/testsuite/c-c++-common/ubsan/pr85213.c |  7 +--
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/gcc/print-rtl.cc b/gcc/print-rtl.cc
index e115f987173..0476f3d7e79 100644
--- a/gcc/print-rtl.cc
+++ b/gcc/print-rtl.cc
@@ -453,10 +453,17 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, 
int idx)
  expanded_location xloc = insn_location (in_insn);
  fprintf (m_outfile, " \"%s\":%i:%i", xloc.file, xloc.line,
   xloc.column);
- int discriminator = insn_discriminator (in_insn);
-   if (discriminator)
- fprintf (m_outfile, " discrim %d", discriminator);
 
+ /* Don't print discriminators for -fcompare-debug since the IR
+coming from the front end may be different with and without
+debug information turned on. That may result in different
+discriminator values. */
+ if (!(dump_flags & TDF_COMPARE_DEBUG))
+   {
+ int discriminator = insn_discriminator (in_insn);
+ if (discriminator)
+   fprintf (m_outfile, " discrim %d", discriminator);
+   }
}
 #endif
 }
diff --git a/gcc/testsuite/c-c++-common/ubsan/pr85213.c 
b/gcc/testsuite/c-c++-common/ubsan/pr85213.c
index e903e976f2c..8a6be81d20f 100644
--- a/gcc/testsuite/c-c++-common/ubsan/pr85213.c
+++ b/gcc/testsuite/c-c++-common/ubsan/pr85213.c
@@ -1,11 +1,6 @@
 /* PR sanitizer/85213 */
 /* { dg-do compile } */
-/* Pass -gno-statement-frontiers to work around
-   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100733 :
-   without it the IR coming from the front end may be different with and 
without
-   debug information turned on. That may cause e.g., different discriminator 
values
-   and -fcompare-debug failures. */
-/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug 
-gno-statement-frontiers" } */
+/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug" } */
 
 int
 foo (int x)
-- 
2.25.1

RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2022-10-16 Thread Eugene Rozenfeld via Gcc-patches

That commit had a bug that was fixed in 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=80f414e6d73f9f1683f93d83ce63a6a482e54bee

Was that fix included in your GCC build?

From: Jiang, Haochen 
Sent: Sunday, October 16, 2022 8:09 PM
To: gcc-patches@gcc.gnu.org; Eugene Rozenfeld ; 
Jiang, Haochen ; gcc-regress...@gcc.gnu.org
Subject: [EXTERNAL] [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

You don't often get email from 
haochen.ji...@intel.com<mailto:haochen.ji...@intel.com>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
On Linux/x86_64,

f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit
commit f30e9fd33e56a5a721346ea6140722e1b193db42
Author: Eugene Rozenfeld mailto:ero...@microsoft.com>>
Date:   Thu Apr 21 16:43:24 2022 -0700

Set discriminators for call stmts on the same line within the same basic 
block.

caused

FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"

RE: [EXTERNAL] Re: [PATCH] Don't print discriminators for -fcompare-debug.

2022-10-17 Thread Eugene Rozenfeld via Gcc-patches

Yes, -gstatement-frontiers is the root cause here but the new approach to 
discriminators is especially prone to this. I added the workaround to pr85213.c 
in my original discriminator patch but now two more -fcompare-debug bugs were 
opened (PR107231 and PR107169). I suspect we'll keep getting more. So I'd like 
to disable printing discriminators in -fcompare-debug dums until 
-gstatement-frontier issue is fixed.

Eugene

-Original Message-
From: Richard Biener  
Sent: Monday, October 17, 2022 12:06 AM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org; Jason Merrill 
Subject: [EXTERNAL] Re: [PATCH] Don't print discriminators for -fcompare-debug.

On Sun, Oct 16, 2022 at 10:25 PM Eugene Rozenfeld via Gcc-patches 
 wrote:
>
> With -gstatement-frontiers we may end up with different IR coming from 
> the front end with and without debug information turned on.
> See 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D100733&data=05%7C01%7CEugene.Rozenfeld%40microsoft.com%7C5d3df88ec7e14f5eec2708dab00e0440%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638015871510301049%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2JjQHAgDi6%2Fet1vowA1IRcdInJMkkjuva9DbM5rHawc%3D&reserved=0
>  for details.
> That may result in differences in discriminator values and 
> -fcompare-debug failures.
>
> This patch disables printing of discriminators when the dump is 
> intended for -fcompare-debug comparison and reverses the workaround in a test.

I don't think this is the correct approach.  -gstatement-frontiers is known to 
be prone to these issues and is the one to blame here.  I think the bugs should 
be SUSPENDED until -gstatement-frontiers is fixed or at least disabled by 
default (IIRC Jakub tried that but failed last time)

> Tested on x86_64-pc-linux-gnu.
>
> gcc/ChangeLog:
> PR debug/107231
> PR debug/107169
> * print-rtl.cc (print_rtx_operand_code_i): Don't print discriminators
> for -fdebug-compare.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/ubsan/pr85213.c: Reverse the workaround for 
> discriminators.
> ---
>  gcc/print-rtl.cc   | 13 ++---
>  gcc/testsuite/c-c++-common/ubsan/pr85213.c |  7 +--
>  2 files changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/print-rtl.cc b/gcc/print-rtl.cc index 
> e115f987173..0476f3d7e79 100644
> --- a/gcc/print-rtl.cc
> +++ b/gcc/print-rtl.cc
> @@ -453,10 +453,17 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, 
> int idx)
>   expanded_location xloc = insn_location (in_insn);
>   fprintf (m_outfile, " \"%s\":%i:%i", xloc.file, xloc.line,
>xloc.column);
> - int discriminator = insn_discriminator (in_insn);
> -   if (discriminator)
> - fprintf (m_outfile, " discrim %d", discriminator);
>
> + /* Don't print discriminators for -fcompare-debug since the IR
> +coming from the front end may be different with and without
> +debug information turned on. That may result in different
> +discriminator values. */
> + if (!(dump_flags & TDF_COMPARE_DEBUG))
> +   {
> + int discriminator = insn_discriminator (in_insn);
> + if (discriminator)
> +   fprintf (m_outfile, " discrim %d", discriminator);
> +   }
> }
>  #endif
>  }
> diff --git a/gcc/testsuite/c-c++-common/ubsan/pr85213.c 
> b/gcc/testsuite/c-c++-common/ubsan/pr85213.c
> index e903e976f2c..8a6be81d20f 100644
> --- a/gcc/testsuite/c-c++-common/ubsan/pr85213.c
> +++ b/gcc/testsuite/c-c++-common/ubsan/pr85213.c
> @@ -1,11 +1,6 @@
>  /* PR sanitizer/85213 */
>  /* { dg-do compile } */
> -/* Pass -gno-statement-frontiers to work around
> -   
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D100733&data=05%7C01%7CEugene.Rozenfeld%40microsoft.com%7C5d3df88ec7e14f5eec2708dab00e0440%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638015871510301049%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2JjQHAgDi6%2Fet1vowA1IRcdInJMkkjuva9DbM5rHawc%3D&reserved=0
>  :
> -   without it the IR coming from the front end may be different with and 
> without
> -   debug information turned on. That may cause e.g., different discriminator 
> values
> -   and -fcompare-debug failures. */
> -/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug 
> -gno-statement-frontiers" } */
> +/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug" } */
>
>  int
>  foo (int x)
> --
> 2.25.1

RE: [EXTERNAL] RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors)

2022-10-17 Thread Eugene Rozenfeld via Gcc-patches

Yes, I received that one. The root cause is the -gstatement-frontiers issue 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100733 . I submitted a workaround 
patch for that ( 
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603673.html ) but it 
hasn't been approved yet. Another workaround is passing 
-gno-statement-frontiers to the test.

Eugene

-Original Message-
From: Thomas Schwinge  
Sent: Monday, October 17, 2022 3:46 AM
To: Eugene Rozenfeld 
Cc: haochen.ji...@intel.com; gcc-patches@gcc.gnu.org; gcc-regress...@gcc.gnu.org
Subject: [EXTERNAL] RE: [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

Hi!

On 2022-10-17T06:11:01+, "Jiang, Haochen via Gcc-patches" 
 wrote:
> I just checkout to your commit and the test still got failed.
>
> It is reporting like this:
> xgcc: error: 
> /export/users2/haochenj/src/gcc/master/./libgomp/testsuite/libgomp.oac
> c-c++/../libgomp.oacc-c-c++-common/kernels-loop-g.c: '-fcompare-debug' 
> failure (length)

Right.  I had filed 
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2FPR107231&data=05%7C01%7CEugene.Rozenfeld%40microsoft.com%7C05c1cd8b5db4455fe3de08dab02ce66e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638016004172688130%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=BfdPnXz5%2Bxr8NwIcBvaOnkFPyzfLQBbyrbMc777EIsk%3D&reserved=0>
 "[13 Regression]
c-c++-common/goacc/kernels-loop-g.c: '-fcompare-debug' failure (length)"
with you in CC: .  (Have you not received that
one?)


Grüße
 Thomas


> Also fix a typo in manually sending, should be this to reproduce
>
> To reproduce:
>
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
>
> BRs,
> Haochen
>
> From: Jiang, Haochen
> Sent: Monday, October 17, 2022 1:41 PM
> To: Eugene Rozenfeld ; 
> gcc-patches@gcc.gnu.org; gcc-regress...@gcc.gnu.org
> Subject: RE: [r13-3172 Regression] 
> FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 
> (test for excess errors) on Linux/x86_64
>
> If that has been fixed, just ignore that mail.
>
> It is run through by a script and got the result few days ago. 
> However, the sendmail service was down on that machine and I just 
> noticed that issue. So I sent that result manually today in case that is not 
> fixed.
>
> Sorry for the disturb!
>
> BRs,
> Haochen
>
> From: Eugene Rozenfeld 
> mailto:eugene.rozenf...@microsoft.com>
> >
> Sent: Monday, October 17, 2022 1:23 PM
> To: Jiang, Haochen 
> mailto:haochen.ji...@intel.com>>; 
> gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; 
> gcc-regress...@gcc.gnu.org<mailto:gcc-regress...@gcc.gnu.org>
> Subject: RE: [r13-3172 Regression] 
> FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 
> (test for excess errors) on Linux/x86_64
>
> That commit had a bug that was fixed in 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
> gnu.org%2Fgit%2F%3Fp%3Dgcc.git%3Ba%3Dcommit%3Bh%3D80f414e6d73f9f1683f9
> 3d83ce63a6a482e54bee&data=05%7C01%7CEugene.Rozenfeld%40microsoft.c
> om%7C05c1cd8b5db4455fe3de08dab02ce66e%7C72f988bf86f141af91ab2d7cd011db
> 47%7C1%7C0%7C638016004172688130%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
> AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&a
> mp;sdata=50kAEt5C%2F8DO%2BvuyNvE7c1ydxriGeQwY5G%2Bqs%2FgSGE0%3D&re
> served=0
>
> Was that fix included in your GCC build?
>
> From: Jiang, Haochen 
> mailto:haochen.ji...@intel.com>>
> Sent: Sunday, October 16, 2022 8:09 PM
> To: gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; Eugene 
> Rozenfeld 
> mailto:e

[PATCH][PUSHED] Start using discriminators in AutoFDO

2022-10-25 Thread Eugene Rozenfeld via Gcc-patches

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:

* auto-profile.cc (get_combined_location): Include discriminator in the
returned combined location.
(read_function_instance): Read discriminators from profiles.
---
 gcc/auto-profile.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index ca48404eaf1..97307321cbf 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -363,7 +363,8 @@ get_combined_location (location_t loc, tree decl)
   /* TODO: allow more bits for line and less bits for discriminator.  */
   if (LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl) >= (1<<16))
 warning_at (loc, OPT_Woverflow, "offset exceeds 16 bytes");
-  return ((LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16);
+  return ((LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16)
+| get_discriminator_from_loc (loc);
 }
 
 /* Return the function decl of a given lexical BLOCK.  */
@@ -652,7 +653,7 @@ function_instance::read_function_instance 
(function_instance_stack *stack,
 
   for (unsigned i = 0; i < num_pos_counts; i++)
 {
-  unsigned offset = gcov_read_unsigned () & 0x;
+  unsigned offset = gcov_read_unsigned ();
   unsigned num_targets = gcov_read_unsigned ();
   gcov_type count = gcov_read_counter ();
   s->pos_counts[offset].count = count;
-- 
2.25.1

[PATCH][PUSHED] Don't force DWARF4 for AutoFDO tests

2022-10-25 Thread Eugene Rozenfeld via Gcc-patches

Support for DWARF5 was added to create_gcov in
https://github.com/google/autofdo so we no longer need
to force DWARF4 for AutoFDO tests.

Tested on x86_64-pc-linux-gnu.

gcc/testsuite/ChangeLog:
* lib/profopt.exp: Don't force DWARF4 for AutoFDO tests
---
 gcc/testsuite/lib/profopt.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/profopt.exp b/gcc/testsuite/lib/profopt.exp
index ac7712a5b42..fe88d2fce37 100644
--- a/gcc/testsuite/lib/profopt.exp
+++ b/gcc/testsuite/lib/profopt.exp
@@ -289,7 +289,7 @@ proc auto-profopt-execute { src } {
 return
 }
 set profile_wrapper [profopt-perf-wrapper]
-set profile_option "-gdwarf-4 -DFOR_AUTOFDO_TESTING"
+set profile_option "-g -DFOR_AUTOFDO_TESTING"
 set feedback_option "-fauto-profile -DFOR_AUTOFDO_TESTING -fearly-inlining"
 set run_autofdo 1
 profopt-execute $src
-- 
2.25.1

RE: [EXTERNAL] [PATCH] contrib: modernize gen_autofdo_event.py

2022-08-26 Thread Eugene Rozenfeld via Gcc-patches

The patch is approved.

Eugene

-Original Message-
From: Andi Kleen  
Sent: Friday, August 05, 2022 11:29 PM
To: Eugene Rozenfeld ; Xi Ruoyao 
; gcc-patches@gcc.gnu.org
Subject: Re: [EXTERNAL] [PATCH] contrib: modernize gen_autofdo_event.py


On 8/6/2022 1:07 AM, Eugene Rozenfeld wrote:
> The changes look good to me. Also adding Andi, the author of the script.


Looks all good to me too.


-Andi

RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support.

2022-09-01 Thread Eugene Rozenfeld via Gcc-patches

Jason,

Thank you for your review. You are correct, we only need to check 
has_discriminator for the statement that's on the same line.
I updated the patch (attached).

Thanks,

Eugene

-Original Message-
From: Jason Merrill  
Sent: Thursday, August 18, 2022 6:55 PM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org
Cc: Andi Kleen ; Jan Hubicka 
Subject: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator 
support.

On 8/3/22 17:25, Eugene Rozenfeld wrote:
> One more ping for this patch 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
> gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=0
> 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da81
> 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637964709325691951
> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I
> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2dT
> DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0
> 
> CC Jason since this changes discriminators emitted in dwarf.
> 
> Thanks,
> 
> Eugene
> 
> -Original Message-
> From: Eugene Rozenfeld
> Sent: Monday, June 27, 2022 12:45 PM
> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan 
> Hubicka 
> Subject: RE: [PING][PATCH] Add instruction level discriminator support.
> 
> Another ping for 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=05%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da8185dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637964709325691951%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2dTDYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0
>  .
> 
> I got a review from Andi 
> (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596549.html&data=05%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da8185dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637964709325691951%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=se6x1LD0GQyFz%2B28gdVqsye3Aw8kPoMRhVQO1BSPg6I%3D&reserved=0)
>  but I also need a review from someone who can approve the changes.
> 
> Thanks,
> 
> Eugene
> 
> -Original Message-
> From: Eugene Rozenfeld
> Sent: Friday, June 10, 2022 12:03 PM
> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan 
> Hubicka 
> Subject: [PING][PATCH] Add instruction level discriminator support.
> 
> Hello,
> 
> I'd like to ping this patch: 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
> gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=0
> 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da81
> 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637964709325691951
> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I
> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2dT
> DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0
> 
> Thanks,
> 
> Eugene
> 
> -Original Message-
> From: Gcc-patches 
>  On Behalf Of 
> Eugene Rozenfeld via Gcc-patches
> Sent: Thursday, June 02, 2022 12:22 AM
> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan 
> Hubicka 
> Subject: [EXTERNAL] [PATCH] Add instruction level discriminator support.
> 
> This is the first in a series of patches to enable discriminator support in 
> AutoFDO.
> 
> This patch switches to tracking discriminators per statement/instruction 
> instead of per basic block. Tracking per basic block was problematic since 
> not all statements in a basic block needed a discriminator and, also, later 
> optimizations could move statements between basic blocks making correlation 
> during AutoFDO compilation unreliable. Tracking per statement also allows us 
> to assign different discriminators to multiple function calls in the same 
> basic block. A subsequent patch will add that support.
> 
> The idea of this patch is based on commit 
> 4c311d95cf6d9519c3c20f641cc77af7df491fdf
> by Dehao Chen in vendors/google/heads/gcc-4_8 but uses a slightly different 
> approach. In Dehao's work special (normally unused) location ids and side 
> tables were used to keep track of locations with discriminators. Things have 
> changed since then and I don't think we have unused location ids anymore. 
> Instead, I made discriminators a part of ad-hoc locations.
> 
> The difference from Dehao's work also includes support for discriminator 
> reading/writing in lto streaming and in modules.
&g

RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support.

2022-09-01 Thread Eugene Rozenfeld via Gcc-patches

Jason,

I made another small change in addressing your feedback (attached).

Thanks,

Eugene

-Original Message-
From: Gcc-patches  On 
Behalf Of Eugene Rozenfeld via Gcc-patches
Sent: Thursday, September 01, 2022 1:49 PM
To: Jason Merrill ; gcc-patches@gcc.gnu.org
Cc: Andi Kleen ; Jan Hubicka 
Subject: RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator 
support.

Jason,

Thank you for your review. You are correct, we only need to check 
has_discriminator for the statement that's on the same line.
I updated the patch (attached).

Thanks,

Eugene

-Original Message-
From: Jason Merrill 
Sent: Thursday, August 18, 2022 6:55 PM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org
Cc: Andi Kleen ; Jan Hubicka 
Subject: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator 
support.

On 8/3/22 17:25, Eugene Rozenfeld wrote:
> One more ping for this patch
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
> gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=0
> 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da81
> 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637964709325691951
> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I
> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2dT
> DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0
> 
> CC Jason since this changes discriminators emitted in dwarf.
> 
> Thanks,
> 
> Eugene
> 
> -Original Message-
> From: Eugene Rozenfeld
> Sent: Monday, June 27, 2022 12:45 PM
> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan 
> Hubicka 
> Subject: RE: [PING][PATCH] Add instruction level discriminator support.
> 
> Another ping for 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=05%7C01%7CEugene.Rozenfeld%40microsoft.com%7Cf217ebc45428465857bd08da8c5b6fb2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637976621612503972%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=b0kTdzWRyiwdtcEFasyNlKv1vj%2FqwnipN3776C7xWcg%3D&reserved=0
>  .
> 
> I got a review from Andi 
> (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596549.html&data=05%7C01%7CEugene.Rozenfeld%40microsoft.com%7Cf217ebc45428465857bd08da8c5b6fb2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637976621612503972%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qxjBUCcGiKXtR4%2BOJq%2FFQN11C2M6BBurTguOBOjWJDw%3D&reserved=0)
>  but I also need a review from someone who can approve the changes.
> 
> Thanks,
> 
> Eugene
> 
> -Original Message-
> From: Eugene Rozenfeld
> Sent: Friday, June 10, 2022 12:03 PM
> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan 
> Hubicka 
> Subject: [PING][PATCH] Add instruction level discriminator support.
> 
> Hello,
> 
> I'd like to ping this patch: 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
> gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=0
> 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da81
> 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637964709325691951
> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I
> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2dT
> DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0
> 
> Thanks,
> 
> Eugene
> 
> -Original Message-
> From: Gcc-patches
>  On Behalf Of 
> Eugene Rozenfeld via Gcc-patches
> Sent: Thursday, June 02, 2022 12:22 AM
> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan 
> Hubicka 
> Subject: [EXTERNAL] [PATCH] Add instruction level discriminator support.
> 
> This is the first in a series of patches to enable discriminator support in 
> AutoFDO.
> 
> This patch switches to tracking discriminators per statement/instruction 
> instead of per basic block. Tracking per basic block was problematic since 
> not all statements in a basic block needed a discriminator and, also, later 
> optimizations could move statements between basic blocks making correlation 
> during AutoFDO compilation unreliable. Tracking per statement also allows us 
> to assign different discriminators to multiple function calls in the same 
> basic block. A subsequent patch will add that support.
> 
> The idea of this patch is based on commit 
> 4c311d95cf6d9519c3c20f641cc77af7df491fdf
> by Dehao Chen in vendors/google/heads/gcc-4_8 but uses a slightly different 
> approach. In Dehao's work special (normally unused) location ids and side 
> tables

RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support.

2022-09-08 Thread Eugene Rozenfeld via Gcc-patches

Jason,

Thank for your suggestion. The patch is updated (attached).

Thanks,

Eugene

-Original Message-
From: Jason Merrill  
Sent: Thursday, September 08, 2022 10:26 AM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org
Cc: Andi Kleen ; Jan Hubicka 
Subject: Re: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator 
support.

On 9/1/22 18:22, Eugene Rozenfeld wrote:
> Jason,
> 
> I made another small change in addressing your feedback (attached).
> 
> Thanks,
> 
> Eugene
> 
> -Original Message-
> From: Gcc-patches 
>  On Behalf Of 
> Eugene Rozenfeld via Gcc-patches
> Sent: Thursday, September 01, 2022 1:49 PM
> To: Jason Merrill ; gcc-patches@gcc.gnu.org
> Cc: Andi Kleen ; Jan Hubicka 
> Subject: RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator 
> support.
> 
> Jason,
> 
> Thank you for your review. You are correct, we only need to check 
> has_discriminator for the statement that's on the same line.
> I updated the patch (attached).
> 
> Thanks,
> 
> Eugene
> 
> -Original Message-
> From: Jason Merrill 
> Sent: Thursday, August 18, 2022 6:55 PM
> To: Eugene Rozenfeld ; 
> gcc-patches@gcc.gnu.org
> Cc: Andi Kleen ; Jan Hubicka 
> Subject: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator 
> support.
> 
> On 8/3/22 17:25, Eugene Rozenfeld wrote:
>> One more ping for this patch
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
>> gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=
>> 0
>> 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da8
>> 1
>> 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63796470932569195
>> 1 
>> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6
>> I 
>> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2d
>> T
>> DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0
>>
>> CC Jason since this changes discriminators emitted in dwarf.
>>
>> Thanks,
>>
>> Eugene
>>
>> -Original Message-
>> From: Eugene Rozenfeld
>> Sent: Monday, June 27, 2022 12:45 PM
>> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan 
>> Hubicka 
>> Subject: RE: [PING][PATCH] Add instruction level discriminator support.
>>
>> Another ping for 
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=05%7C01%7Ceugene.rozenfeld%40microsoft.com%7Cc7c9fab519184eee0bb808da91bf2fea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637982547579085499%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Kj3YWJrDqjX%2B0Ml3At5P8NRWc1Xs6mbI%2F1vCk9%2BLaQs%3D&reserved=0
>>  .
>>
>> I got a review from Andi 
>> (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596549.html&data=05%7C01%7Ceugene.rozenfeld%40microsoft.com%7Cc7c9fab519184eee0bb808da91bf2fea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637982547579085499%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NBzFtD0mH7EYKxsVWfqZgSpQmUG18SIt01XRYUlwwW4%3D&reserved=0)
>>  but I also need a review from someone who can approve the changes.
>>
>> Thanks,
>>
>> Eugene
>>
>> -Original Message-
>> From: Eugene Rozenfeld
>> Sent: Friday, June 10, 2022 12:03 PM
>> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan 
>> Hubicka 
>> Subject: [PING][PATCH] Add instruction level discriminator support.
>>
>> Hello,
>>
>> I'd like to ping this patch:
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
>> gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.html&data=
>> 0
>> 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da8
>> 1
>> 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63796470932569195
>> 1 
>> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6
>> I 
>> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K%2BMx6jelnED3n%2Be2d
>> T
>> DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3D&reserved=0
>>
>> Thanks,
>>
>> Eugene
>>
>> -Original Message-
>> From: Gcc-patches
>>  On Behalf Of 
>> Eugene Rozenfeld via Gcc-patches
>> Sent: Thursday, June 02, 2022 12:22 AM
>> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan 
>> Hubicka 
>> Subject: [EXTERNAL] [PATCH] Add instruction level discriminator support.
>&

[PATCH] Guard against applying scale with 0 denominator

2022-05-06 Thread Eugene Rozenfeld via Gcc-patches

Calling count.apply_scale with a 0 denominator causes an assert.
This change guards against that.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
* tree-loop-vect-manip.cc (vect_do_peeling): Guard against applying 
scale with 0 denominator.
---
 gcc/tree-vect-loop-manip.cc | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 1d4337eb261..db54ae69e45 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -2989,10 +2989,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
 get lost if we scale down to 0.  */
  basic_block *bbs = get_loop_body (epilog);
  for (unsigned int i = 0; i < epilog->num_nodes; i++)
-   bbs[i]->count = bbs[i]->count.apply_scale
-(bbs[i]->count,
- bbs[i]->count.apply_probability
-   (prob_vector));
+   if (bbs[i]->count.nonzero_p ())
+ bbs[i]->count = bbs[i]->count.apply_scale
+  (bbs[i]->count,
+   bbs[i]->count.apply_probability
+ (prob_vector));
  free (bbs);
}

--
2.25.1

RE: [EXTERNAL] Re: [PATCH] Guard against applying scale with 0 denominator

2022-05-11 Thread Eugene Rozenfeld via Gcc-patches

In my case this is not exactly what the FIXME in the comment above says. The 
count is 0 even before
the initial scaling happens. I hit this case with some changes I'm working on 
to enable per-instruction discriminators for AutoFDO.

I looked into saving/restoring the old counts but it would be awkward to do. 
The initial scaling happens here:

if (skip_vector)
{
  split_edge (loop_preheader_edge (loop));

  /* Due to the order in which we peel prolog and epilog, we first
 propagate probability to the whole loop.  The purpose is to
 avoid adjusting probabilities of both prolog and vector loops
 separately.  Note in this case, the probability of epilog loop
 needs to be scaled back later.  */
  basic_block bb_before_loop = loop_preheader_edge (loop)->src;
  if (prob_vector.initialized_p ())
{
  scale_bbs_frequencies (&bb_before_loop, 1, prob_vector);
  scale_loop_profile (loop, prob_vector, 0);
}
}

The scaling happens before we do the cloning for the epilog so to save/restore 
the counts
we would need to maintain a mapping between the original basic blocks and the 
cloned basic blocks in the epilog.

I'd like to get my simple fix in since it makes things better even if it 
doesn't address the issue mentioned
In the FIXME.

-Original Message-
From: Richard Biener  
Sent: Monday, May 09, 2022 12:42 AM
To: Eugene Rozenfeld ; Jan Hubicka 

Cc: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH] Guard against applying scale with 0 denominator

On Fri, May 6, 2022 at 10:32 PM Eugene Rozenfeld via Gcc-patches 
 wrote:
>
> Calling count.apply_scale with a 0 denominator causes an assert.
> This change guards against that.
>
> Tested on x86_64-pc-linux-gnu.
>
> gcc/ChangeLog:
> * tree-loop-vect-manip.cc (vect_do_peeling): Guard against applying 
> scale with 0 denominator.
> ---
>  gcc/tree-vect-loop-manip.cc | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc 
> index 1d4337eb261..db54ae69e45 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -2989,10 +2989,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
> niters, tree nitersm1,
>  get lost if we scale down to 0.  */
>   basic_block *bbs = get_loop_body (epilog);
>   for (unsigned int i = 0; i < epilog->num_nodes; i++)
> -   bbs[i]->count = bbs[i]->count.apply_scale
> -(bbs[i]->count,
> - bbs[i]->count.apply_probability
> -   (prob_vector));
> +   if (bbs[i]->count.nonzero_p ())
> + bbs[i]->count = bbs[i]->count.apply_scale
> +  (bbs[i]->count,
> +   bbs[i]->count.apply_probability
> + (prob_vector));

So exactly what the FIXME in the comment above says happens.   It
might be better
to save/restore the old counts if the intent is to get them back.  I'm not 
exactly sure where the other scaling happens though.

Richard.

>   free (bbs);
> }
>
> --
> 2.25.1

RE: [EXTERNAL] Re: [PATCH] Guard against applying scale with 0 denominator

2022-05-20 Thread Eugene Rozenfeld via Gcc-patches

Thank you for the feedback Richard. I attached a patch that saves/restores 
counts if the epilog doesn't use a scalar loop.

Eugene

-Original Message-
From: Richard Biener  
Sent: Thursday, May 12, 2022 12:34 AM
To: Eugene Rozenfeld 
Cc: Jan Hubicka ; gcc-patches@gcc.gnu.org
Subject: Re: [EXTERNAL] Re: [PATCH] Guard against applying scale with 0 
denominator

On Thu, May 12, 2022 at 3:37 AM Eugene Rozenfeld 
 wrote:
>
> In my case this is not exactly what the FIXME in the comment above 
> says. The count is 0 even before the initial scaling happens. I hit this case 
> with some changes I'm working on to enable per-instruction discriminators for 
> AutoFDO.
>
> I looked into saving/restoring the old counts but it would be awkward to do. 
> The initial scaling happens here:
>
> if (skip_vector)
> {
>   split_edge (loop_preheader_edge (loop));
>
>   /* Due to the order in which we peel prolog and epilog, we first
>  propagate probability to the whole loop.  The purpose is to
>  avoid adjusting probabilities of both prolog and vector loops
>  separately.  Note in this case, the probability of epilog loop
>  needs to be scaled back later.  */
>   basic_block bb_before_loop = loop_preheader_edge (loop)->src;
>   if (prob_vector.initialized_p ())
> {
>   scale_bbs_frequencies (&bb_before_loop, 1, prob_vector);
>   scale_loop_profile (loop, prob_vector, 0);
> }
> }
>
> The scaling happens before we do the cloning for the epilog so to 
> save/restore the counts we would need to maintain a mapping between the 
> original basic blocks and the cloned basic blocks in the epilog.

There is one already - after the epilogue is copied you can use get_bb_original 
(epilouge_bb) to get at the block it was copied from.
It could also be that we can rely on the basic-block order to be preserved 
between the copies (I _think_ that would work ... but then assert () for this 
using get_bb_original ()).  That means the simplest fix would be to have an 
auto_vec and initialize that from the BB counts in loop body order (we 
also have exactly two BBs in all peeled loops ...

But note we only scaled the scalar if-converted loop but eventually used the 
not if-coverted copy for the epilogue (if not vectorizing the epilogue), so 
indiscriminate scaling back looks wrong in some cases (I'd have to double-check 
the details here).

> I'd like to get my simple fix in since it makes things better even if 
> it doesn't address the issue mentioned In the FIXME.

But don't you need to check that
bbs[i]->count.apply_probability (prob_vector) is not zero instead of checking 
that bbs[i].count is not zero?

Richard.

> -Original Message-
> From: Richard Biener 
> Sent: Monday, May 09, 2022 12:42 AM
> To: Eugene Rozenfeld ; Jan Hubicka 
> 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [EXTERNAL] Re: [PATCH] Guard against applying scale with 0 
> denominator
>
> On Fri, May 6, 2022 at 10:32 PM Eugene Rozenfeld via Gcc-patches 
>  wrote:
> >
> > Calling count.apply_scale with a 0 denominator causes an assert.
> > This change guards against that.
> >
> > Tested on x86_64-pc-linux-gnu.
> >
> > gcc/ChangeLog:
> > * tree-loop-vect-manip.cc (vect_do_peeling): Guard against applying 
> > scale with 0 denominator.
> > ---
> >  gcc/tree-vect-loop-manip.cc | 9 +
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop-manip.cc 
> > b/gcc/tree-vect-loop-manip.cc index 1d4337eb261..db54ae69e45 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -2989,10 +2989,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
> > niters, tree nitersm1,
> >  get lost if we scale down to 0.  */
> >   basic_block *bbs = get_loop_body (epilog);
> >   for (unsigned int i = 0; i < epilog->num_nodes; i++)
> > -   bbs[i]->count = bbs[i]->count.apply_scale
> > -(bbs[i]->count,
> > - bbs[i]->count.apply_probability
> > -   (prob_vector));
> > +   if (bbs[i]->count.nonzero_p ())
> > + bbs[i]->count = bbs[i]->count.apply_scale
> > +  (bbs[i]->count,
> > +   bbs[i]->count.apply_probability
> > + (prob_vector));
>
> So exactly what the FIXME in the comment above says happens.   It
> might be better
> to save/restore the old counts if the intent is to get them back.  I'm not 
> exactly sure where the other scaling happens though.
>
> Richard.
>
>
>
> >   free (bbs);
> > }
> >
> > --
> > 2.25.1


0001-Fix-profile-count-maintenance-in-vectorizer-peeling.patch
Description: 0001-Fix-profile-count-maintenance-in-vectorizer-peeling.patch

[PATCH] [tree-optimization] Optimize two patterns with three xors.

2020-11-11 Thread Eugene Rozenfeld via Gcc-patches

Simplify (a ^ b) & ((b ^ c) ^ a) --> (a ^ b) & ~c.

int f(int a, int b, int c)
{
return (a ^ b) & ((b ^ c) ^ a);
}

Code without the patch:
moveax,edx
xoreax,esi
xoreax,edi
xoredi,esi
andeax,edi
ret  

Code with the patch:
xoredi,esi
andn   eax,edx,edi
ret  

Simplify (a ^ b) | ((b ^ c) ^ a) --> (a ^ b) | c.
int g(int a, int b, int c)
{
return (a ^ b) | ((b ^ c) ^ a);
}

Code without the patch:
moveax,edx
xoreax,esi
xoreax,edi
xoredi,esi
or eax,edi
ret

Code with the patch:
xoredi,esi
moveax,edi
or eax,edx
ret

This fixes PR96671.

Tested on x86_64-pc-linux-gnu.



0001-Optimize-two-patterns-with-three-xors.patch
Description: 0001-Optimize-two-patterns-with-three-xors.patch

Re: [PATCH] [tree-optimization] Optimize two patterns with three xors.

2020-11-16 Thread Eugene Rozenfeld via Gcc-patches

Thank you for the review Richard!

I re-worked the patch based on your suggestions (attached).
I made the change to reuse the first bit_xor in both patterns and I added :s to 
the last xor in the first pattern.
For the second pattern I didn't add :s because I think the simplification is 
beneficial even if the second or third bit_xor has more than one use since we 
are simplifying them to just a single operand (@2). If that is incorrect, 
please explain why.

Eugene

-Original Message-
From: Richard Biener  
Sent: Monday, November 16, 2020 4:11 AM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] [tree-optimization] Optimize two patterns with three xors.

On Thu, Nov 12, 2020 at 2:53 AM Eugene Rozenfeld via Gcc-patches 
 wrote:
>
> Simplify (a ^ b) & ((b ^ c) ^ a) --> (a ^ b) & ~c.
>
> int f(int a, int b, int c)
> {
> return (a ^ b) & ((b ^ c) ^ a);
> }
>
> Code without the patch:
> moveax,edx
> xoreax,esi
> xoreax,edi
> xoredi,esi
> andeax,edi
> ret
>
> Code with the patch:
> xoredi,esi
> andn   eax,edx,edi
> ret
>
> Simplify (a ^ b) | ((b ^ c) ^ a) --> (a ^ b) | c.
> int g(int a, int b, int c)
> {
> return (a ^ b) | ((b ^ c) ^ a);
> }
>
> Code without the patch:
> moveax,edx
> xoreax,esi
> xoreax,edi
> xoredi,esi
> or eax,edi
> ret
>
> Code with the patch:
> xoredi,esi
> moveax,edi
> or eax,edx
> ret
>
> This fixes PR96671.

+/* (a ^ b) & ((b ^ c) ^ a) --> (a ^ b) & ~c */ (simplify  (bit_and:c 
+(bit_xor:c @0 @1) (bit_xor:cs (bit_xor:c @1 @2) @0))  (bit_and (bit_xor 
+@0 @1) (bit_not @2)))

you should be able to re-use the first bit_xor by doing

+/* (a ^ b) & ((b ^ c) ^ a) --> (a ^ b) & ~c */ (simplify  (bit_and:c 
+(bit_xor:c@3 @0 @1) (bit_xor:cs (bit_xor:c @1 @2) @0))  (bit_and @3 
+(bit_not @2)))

For completeness the last (bit_xor:c @1 @2) should also have a :s

+/* (a ^ b) | ((b ^ c) ^ a) --> (a ^ b) | c */ (simplify  (bit_ior:c 
+(bit_xor:c @0 @1) (bit_xor:c (bit_xor:c @1 @2) @0))  (bit_ior (bit_xor 
+@0 @1) @2))

completely misses :s here and the same comment about the re-use of the existing 
xor applies.

Otherwise looks OK to me, sorry for the delay.

Thanks,
Richard.

> Tested on x86_64-pc-linux-gnu.
>


0001-Optimize-two-patterns-with-three-xors.patch
Description: 0001-Optimize-two-patterns-with-three-xors.patch

[PATCH 1/1] Fix a typo in an AutoFDO error string

2021-06-11 Thread Eugene Rozenfeld via Gcc-patches

Trivial change, committed to trunk.

[PATCH 1/1] Fix a typo in an AutoFDO error string

gcc/ChangeLog:

* auto-profile.c (read_profile): fix a typo in an error string
---
 gcc/auto-profile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
index 2a6d9a1fc24..a4601243dc9 100644
--- a/gcc/auto-profile.c
+++ b/gcc/auto-profile.c
@@ -939,7 +939,7 @@ read_profile (void)
   unsigned version = gcov_read_unsigned ();
   if (version != AUTO_PROFILE_VERSION)
 {
-  error ("AutoFDO profile version %u does match %u",
+  error ("AutoFDO profile version %u does not match %u",
 version, AUTO_PROFILE_VERSION);
   return;
 }
-- 
2.25.1

[PATCH] gcov-profile/71672 Fix indirect call inlining with AutoFDO

2021-07-30 Thread Eugene Rozenfeld via Gcc-patches

This patch has the following changes:

1. The main fix is in auto-profile.c: the histogram value for
   indirect calls was incorrectly set up. That is fixed now.

2. Several tests now have -fdump-ipa-afdo-optimized instead of -fdump-ipa-afdo
   in dg-options so that the expected output can be found.

3. I increased the number of iterations in several tests so that perf can have
   enough sampling events.

4. indir-call-prof-2.c has -fno-early-inlining but AutoFDO can't work without
   early inlining (it needs to match the inlining of the profiled binary).
   I changed profopt.exp to always pass -fearly-inlining for AutoFDO.
   With that the indirect call inlining in indir-call-prof-2.c happens in the 
early inliner
   so I changed the dg-final-use-autofdo.

5. create_gcov tool doesn't currently support dwarf 5 so I made a change in 
profopt.exp
   to pass -gdwarf-4 when compiling the binary to profile.

6. I updated the invocation of create_gcov in profopt.exp to pass 
-gcov_version=2.
   I recently made a change to create_gcov to support version 2:
   https://github.com/google/autofdo/pull/117

7. I removed useless -o perf.data from the invocation of gcc-auto-profile in
   target-supports.exp.

With these changes the tests checking indirect call inlining in gcc.dg and 
g++.dg
are passing.

gcc/ChangeLog:
PR gcov-profile/71672
* auto-profile.c (afdo_indirect_call): Fix the setup of the historgram 
value for indirect calls.

gcc/testsuite/ChangeLog:
PR gcov-profile/71672
* g++.dg/tree-prof/indir-call-prof.C: Fix options, increase the number 
of iterations.
* g++.dg/tree-prof/morefunc.C: Fix options, increase the number of 
iterations.
* g++.dg/tree-prof/reorder.C: Fix options, increase the number of 
iterations.
* gcc.dg/tree-prof/indir-call-prof-2.c: Fix options, fix 
dg-final-use-autofdo, increase the number of iterations.
* gcc.dg/tree-prof/indir-call-prof.c: Fix options.
* lib/profopt.exp: Pass gdwarf-4 when compiling binary to profile; pass 
-fearly-inlining when compiling with AutoFDO; pass -gcov_version=2 to 
create_gcov.
* lib/target-supports.exp: Remove unnecessary -o perf.data passed to 
gcc-auto-profile.
---
 gcc/auto-profile.c | 13 +
 gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C   |  4 ++--
 gcc/testsuite/g++.dg/tree-prof/morefunc.C  |  7 ---
 gcc/testsuite/g++.dg/tree-prof/reorder.C   |  6 +++---
 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c |  8 
 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c   |  2 +-
 gcc/testsuite/lib/profopt.exp  |  6 +++---
 gcc/testsuite/lib/target-supports.exp  |  2 +-
 8 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
index b23b82b2df4..4c1fc6b536b 100644
--- a/gcc/auto-profile.c
+++ b/gcc/auto-profile.c
@@ -1009,13 +1009,18 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const 
icall_target_map &map,

   histogram_value hist = gimple_alloc_histogram_value (
   cfun, HIST_TYPE_INDIR_CALL, stmt, callee);
-  hist->n_counters = 3;
+  hist->n_counters = 4;
   hist->hvalue.counters = XNEWVEC (gcov_type, hist->n_counters);
   gimple_add_histogram_value (cfun, stmt, hist);

-  hist->hvalue.counters[0] = direct_call->profile_id;
-  hist->hvalue.counters[1] = max_iter->second;
-  hist->hvalue.counters[2] = total;
+  // Total counter
+  hist->hvalue.counters[0] = total;
+  // Number of value/counter pairs
+  hist->hvalue.counters[1] = 1;
+  // Value
+  hist->hvalue.counters[2] = direct_call->profile_id;
+  // Counter
+  hist->hvalue.counters[3] = max_iter->second;

   if (!transform)
 return;
diff --git a/gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C 
b/gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C
index 3374744613e..b45417106d0 100644
--- a/gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C
+++ b/gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -fdump-tree-optimized -fdump-ipa-profile-optimized 
-fdump-ipa-afdo" } */
+/* { dg-options "-O2 -fdump-tree-optimized -fdump-ipa-profile-optimized 
-fdump-ipa-afdo-optimized" } */

 struct A {
   A () {}
@@ -26,7 +26,7 @@ main (void)

   int i;

-  for (i = 0; i < 100; i++)
+  for (i = 0; i < 1000; i++)
 {
   p = (A *)wrap ((void *)&a);
   p->AA ();
diff --git a/gcc/testsuite/g++.dg/tree-prof/morefunc.C 
b/gcc/testsuite/g++.dg/tree-prof/morefunc.C
index 621d09aec5b..96e0073ca8f 100644
--- a/gcc/testsuite/g++.dg/tree-prof/morefunc.C
+++ b/gcc/testsuite/g++.dg/tree-prof/morefunc.C
@@ -1,4 +1,5 @@
-/* { dg-options "-O2 -fno-devirtualize --param=profile-func-internal-id=0 
-fdump-ipa-profile-optimized -fdump-ipa-afdo -Wno-attributes 
-Wno-coverage-mismatch -Wno-missing-profile" } */
+/* { dg-options "-O2 -fno-devirtualize --param=profile-func-internal-id=0 
-fdump-ipa-profile-optimized -fdump-ipa-afdo-optimized -Wno-attribu

RE: [EXTERNAL] Check that passes do not forget to define profile

2023-08-30 Thread Eugene Rozenfeld via Gcc-patches

Hi Jan,

These new checks are too strong for AutoFDO. For example, the edge 
probabilities are not guaranteed to be initialized (see 
afdo_calculate_branch_prob).
This currently breaks autoprofiledbootstrap build.

I suggest removing
cfun->cfg->full_profile = true;
from auto-profile.cc.

Eugene

-Original Message-
From: Gcc-patches  On 
Behalf Of Jan Hubicka via Gcc-patches
Sent: Thursday, August 24, 2023 6:15 AM
To: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Check that passes do not forget to define profile

Hi,
this patch extends verifier to check that all probabilities and counts are 
initialized if profile is supposed to be present.  This is a bit complicated by 
the posibility that we inline !flag_guess_branch_probability function into 
function with profile defined and in this case we need to stop verification.  
For this reason I added flag to cfg structure tracking this.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* cfg.h (struct control_flow_graph): New field full_profile.
* auto-profile.cc (afdo_annotate_cfg): Set full_profile to true.
* cfg.cc (init_flow): Set full_profile to false.
* graphite.cc (graphite_transform_loops): Set full_profile to false.
* lto-streamer-in.cc (input_cfg): Initialize full_profile flag.
* predict.cc (pass_profile::execute): Set full_profile to true.
* symtab-thunks.cc (expand_thunk): Set full_profile to true.
* tree-cfg.cc (gimple_verify_flow_info): Verify that profile is full
if full_profile is set.
* tree-inline.cc (initialize_cfun): Initialize full_profile.
(expand_call_inline): Combine full_profile.


diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index 
e3af3555e75..ff3b763945c 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1578,6 +1578,7 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts)
 }
   update_max_bb_count ();
   profile_status_for_fn (cfun) = PROFILE_READ;
+  cfun->cfg->full_profile = true;
   if (flag_value_profile_transformations)
 {
   gimple_value_profile_transformations (); diff --git a/gcc/cfg.cc 
b/gcc/cfg.cc index 9eb9916f61a..b7865f14e7f 100644
--- a/gcc/cfg.cc
+++ b/gcc/cfg.cc
@@ -81,6 +81,7 @@ init_flow (struct function *the_fun)
 = ENTRY_BLOCK_PTR_FOR_FN (the_fun);
   the_fun->cfg->edge_flags_allocated = EDGE_ALL_FLAGS;
   the_fun->cfg->bb_flags_allocated = BB_ALL_FLAGS;
+  the_fun->cfg->full_profile = false;
 }
 

 /* Helper function for remove_edge and free_cffg.  Frees edge structure diff 
--git a/gcc/cfg.h b/gcc/cfg.h index a0e944979c8..53e2553012c 100644
--- a/gcc/cfg.h
+++ b/gcc/cfg.h
@@ -78,6 +78,9 @@ struct GTY(()) control_flow_graph {
   /* Dynamically allocated edge/bb flags.  */
   int edge_flags_allocated;
   int bb_flags_allocated;
+
+  /* Set if the profile is computed on every edge and basic block.  */  
+ bool full_profile;
 };
 
 
diff --git a/gcc/graphite.cc b/gcc/graphite.cc index 19f8975ffa2..2b387d5b016 
100644
--- a/gcc/graphite.cc
+++ b/gcc/graphite.cc
@@ -512,6 +512,8 @@ graphite_transform_loops (void)
 
   if (changed)
 {
+  /* FIXME: Graphite does not update profile meaningfully currently.  */
+  cfun->cfg->full_profile = false;
   cleanup_tree_cfg ();
   profile_status_for_fn (cfun) = PROFILE_ABSENT;
   release_recorded_exits (cfun);
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc index 
0cce14414ca..d3128fcebe4 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1030,6 +1030,7 @@ input_cfg (class lto_input_block *ib, class data_in 
*data_in,
   basic_block p_bb;
   unsigned int i;
   int index;
+  bool full_profile = false;
 
   init_empty_tree_cfg_for_function (fn);
 
@@ -1071,6 +1072,8 @@ input_cfg (class lto_input_block *ib, class data_in 
*data_in,
  data_in->location_cache.input_location_and_block (&e->goto_locus,
&bp, ib, data_in);
  e->probability = profile_probability::stream_in (ib);
+ if (!e->probability.initialized_p ())
+   full_profile = false;
 
}
 
@@ -1145,6 +1148,7 @@ input_cfg (class lto_input_block *ib, class data_in 
*data_in,
 
   /* Rebuild the loop tree.  */
   flow_loops_find (loops);
+  cfun->cfg->full_profile = full_profile;
 }
 
 
diff --git a/gcc/predict.cc b/gcc/predict.cc index 5a1a561cc24..396746cbfd1 
100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -4131,6 +4131,7 @@ pass_profile::execute (function *fun)
 scev_initialize ();
 
   tree_estimate_probability (false);
+  cfun->cfg->full_profile = true;
 
   if (nb_loops > 1)
 scev_finalize ();
diff --git a/gcc/symtab-thunks.cc b/gcc/symtab-thunks.cc index 
4c04235c41b..23ead0d2138 100644
--- a/gcc/symtab-thunks.cc
+++ b/gcc/symtab-thunks.cc
@@ -648,6 +648,7 @@ expand_thunk (cgraph_node *node, bool output_asm_thunks,
  ? PROFILE_READ : PROFILE_GUESSED;
   /* FIXME: C++ FE should stop setting TREE_AS

Re: Optimize combination of comparisons to dec+compare

2020-12-15 Thread Eugene Rozenfeld via Gcc-patches

Richard,

> Do we already handle x < y || x <= CST to x <= y - CST?

That is an invalid transformation: e.g., consider x=3, y=4, CST=2.
Can you please clarify?

Thanks,

Eugene

-Original Message-
From: Richard Biener  
Sent: Thursday, December 10, 2020 12:21 AM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: Optimize combination of comparisons to dec+compare

On Thu, Dec 10, 2020 at 1:52 AM Eugene Rozenfeld via Gcc-patches 
 wrote:
>
> This patch adds a pattern for optimizing x < y || x == XXX_MIN to x <= 
> y-1 if y is an integer with TYPE_OVERFLOW_WRAPS.

Do we already handle x < y || x <= CST to x <= y - CST?
That is, the XXX_MIN case is just a special-case of generic anti-range testing? 
 For anti-range testing with signed types we pun to unsigned when possible.

> This fixes pr96674.
>
> Tested on x86_64-pc-linux-gnu.
>
> For this function
>
> bool f(unsigned a, unsigned b)
> {
> return (b == 0) | (a < b);
> }
>
> the code without the patch is
>
> test   esi,esi
> sete   al
> cmpesi,edi
> seta   dl
> or eax,edx
> ret
>
> the code with the patch is
>
> subesi,0x1
> cmpesi,edi
> setae  al
> ret
>
> Eugene
>
> gcc/
> PR tree-optimization/96674
> * match.pd: New pattern x < y || x == XXX_MIN --> x <= y - 1
>
> gcc/testsuite
> * gcc.dg/pr96674.c: New test.
>

RE: Optimize combination of comparisons to dec+compare

2020-12-22 Thread Eugene Rozenfeld via Gcc-patches

Re-sending my question and re-attaching the patch.

Richard, can you please clarify your feedback?

Thanks,

Eugene

-Original Message-
From: Gcc-patches  On Behalf Of Eugene 
Rozenfeld via Gcc-patches
Sent: Tuesday, December 15, 2020 2:06 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: Optimize combination of comparisons to dec+compare

Richard,

> Do we already handle x < y || x <= CST to x <= y - CST?

That is an invalid transformation: e.g., consider x=3, y=4, CST=2.
Can you please clarify?

Thanks,

Eugene

-Original Message-
From: Richard Biener  
Sent: Thursday, December 10, 2020 12:21 AM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: Optimize combination of comparisons to dec+compare

On Thu, Dec 10, 2020 at 1:52 AM Eugene Rozenfeld via Gcc-patches 
 wrote:
>
> This patch adds a pattern for optimizing x < y || x == XXX_MIN to x <= 
> y-1 if y is an integer with TYPE_OVERFLOW_WRAPS.

Do we already handle x < y || x <= CST to x <= y - CST?
That is, the XXX_MIN case is just a special-case of generic anti-range testing? 
 For anti-range testing with signed types we pun to unsigned when possible.

> This fixes pr96674.
>
> Tested on x86_64-pc-linux-gnu.
>
> For this function
>
> bool f(unsigned a, unsigned b)
> {
> return (b == 0) | (a < b);
> }
>
> the code without the patch is
>
> test   esi,esi
> sete   al
> cmpesi,edi
> seta   dl
> or eax,edx
> ret
>
> the code with the patch is
>
> subesi,0x1
> cmpesi,edi
> setae  al
> ret
>
> Eugene
>
> gcc/
> PR tree-optimization/96674
> * match.pd: New pattern x < y || x == XXX_MIN --> x <= y - 1
>
> gcc/testsuite
> * gcc.dg/pr96674.c: New test.
>


0001-Optimize-combination-of-comparisons-to-dec-compare.patch
Description: 0001-Optimize-combination-of-comparisons-to-dec-compare.patch

[PATCH][tree-optimization]Optimize combination of comparisons to dec+compare

2021-01-04 Thread Eugene Rozenfeld via Gcc-patches

Ping.

-Original Message-
From: Eugene Rozenfeld 
Sent: Tuesday, December 22, 2020 3:01 PM
To: Richard Biener ; gcc-patches@gcc.gnu.org
Subject: RE: Optimize combination of comparisons to dec+compare

Re-sending my question and re-attaching the patch.

Richard, can you please clarify your feedback?

Thanks,

Eugene

-Original Message-
From: Gcc-patches  On Behalf Of Eugene 
Rozenfeld via Gcc-patches
Sent: Tuesday, December 15, 2020 2:06 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: Optimize combination of comparisons to dec+compare

Richard,

> Do we already handle x < y || x <= CST to x <= y - CST?

That is an invalid transformation: e.g., consider x=3, y=4, CST=2.
Can you please clarify?

Thanks,

Eugene

-Original Message-
From: Richard Biener  
Sent: Thursday, December 10, 2020 12:21 AM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: Optimize combination of comparisons to dec+compare

On Thu, Dec 10, 2020 at 1:52 AM Eugene Rozenfeld via Gcc-patches 
 wrote:
>
> This patch adds a pattern for optimizing x < y || x == XXX_MIN to x <= 
> y-1 if y is an integer with TYPE_OVERFLOW_WRAPS.

Do we already handle x < y || x <= CST to x <= y - CST?
That is, the XXX_MIN case is just a special-case of generic anti-range testing? 
 For anti-range testing with signed types we pun to unsigned when possible.

> This fixes pr96674.
>
> Tested on x86_64-pc-linux-gnu.
>
> For this function
>
> bool f(unsigned a, unsigned b)
> {
> return (b == 0) | (a < b);
> }
>
> the code without the patch is
>
> test   esi,esi
> sete   al
> cmpesi,edi
> seta   dl
> or eax,edx
> ret
>
> the code with the patch is
>
> subesi,0x1
> cmpesi,edi
> setae  al
> ret
>
> Eugene
>
> gcc/
> PR tree-optimization/96674
> * match.pd: New pattern x < y || x == XXX_MIN --> x <= y - 1
>
> gcc/testsuite
> * gcc.dg/pr96674.c: New test.
>


0001-Optimize-combination-of-comparisons-to-dec-compare.patch
Description: 0001-Optimize-combination-of-comparisons-to-dec-compare.patch

RE: [EXTERNAL] Re: [PATCH][tree-optimization]Optimize combination of comparisons to dec+compare

2021-01-14 Thread Eugene Rozenfeld via Gcc-patches

I got more feedback for the patch from Gabriel Ravier and Jakub Jelinek in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96674 and re-worked it accordingly.

The changes from the previous patch are:
1. Switched the tests to use __attribute__((noipa)) instead of 
__attribute__((noinline)) .
2. Fixed a type in the pattern comment.
3. Added :c for top-level bit_ior expression.
4. Added :s for the subexpressions.
5. Added a pattern for the negated expression:
x >= y && y != XXX_MIN --> x > y - 1
and the corresponding tests.

The new patch is attached.

Eugene

-Original Message-
From: Richard Biener  
Sent: Tuesday, January 5, 2021 4:21 AM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH][tree-optimization]Optimize combination of 
comparisons to dec+compare

On Mon, Jan 4, 2021 at 9:50 PM Eugene Rozenfeld 
 wrote:
>
> Ping.
>
> -Original Message-
> From: Eugene Rozenfeld
> Sent: Tuesday, December 22, 2020 3:01 PM
> To: Richard Biener ; 
> gcc-patches@gcc.gnu.org
> Subject: RE: Optimize combination of comparisons to dec+compare
>
> Re-sending my question and re-attaching the patch.
>
> Richard, can you please clarify your feedback?

Hmm, OK.

The patch is OK.

Thanks,
Richard.


> Thanks,
>
> Eugene
>
> -Original Message-
> From: Gcc-patches  On Behalf Of 
> Eugene Rozenfeld via Gcc-patches
> Sent: Tuesday, December 15, 2020 2:06 PM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [EXTERNAL] Re: Optimize combination of comparisons to 
> dec+compare
>
> Richard,
>
> > Do we already handle x < y || x <= CST to x <= y - CST?
>
> That is an invalid transformation: e.g., consider x=3, y=4, CST=2.
> Can you please clarify?
>
> Thanks,
>
> Eugene
>
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, December 10, 2020 12:21 AM
> To: Eugene Rozenfeld 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: Optimize combination of comparisons to dec+compare
>
> On Thu, Dec 10, 2020 at 1:52 AM Eugene Rozenfeld via Gcc-patches 
>  wrote:
> >
> > This patch adds a pattern for optimizing x < y || x == XXX_MIN to x 
> > <=
> > y-1 if y is an integer with TYPE_OVERFLOW_WRAPS.
>
> Do we already handle x < y || x <= CST to x <= y - CST?
> That is, the XXX_MIN case is just a special-case of generic anti-range 
> testing?  For anti-range testing with signed types we pun to unsigned when 
> possible.
>
> > This fixes pr96674.
> >
> > Tested on x86_64-pc-linux-gnu.
> >
> > For this function
> >
> > bool f(unsigned a, unsigned b)
> > {
> > return (b == 0) | (a < b);
> > }
> >
> > the code without the patch is
> >
> > test   esi,esi
> > sete   al
> > cmpesi,edi
> > seta   dl
> > or eax,edx
> > ret
> >
> > the code with the patch is
> >
> > subesi,0x1
> > cmpesi,edi
> > setae  al
> > ret
> >
> > Eugene
> >
> > gcc/
> > PR tree-optimization/96674
> > * match.pd: New pattern x < y || x == XXX_MIN --> x <= y - 1
> >
> > gcc/testsuite
> > * gcc.dg/pr96674.c: New test.
> >


0002-Optimize-combination-of-comparisons-to-dec-compare.patch
Description: 0002-Optimize-combination-of-comparisons-to-dec-compare.patch

RE: [EXTERNAL] Re: [PATCH][tree-optimization]Optimize combination of comparisons to dec+compare

2021-01-19 Thread Eugene Rozenfeld via Gcc-patches

Richard,

Can you please commit this patch for me? I don't have write access yet, I'm 
still working on getting copyright assignment/disclaimer signed by my employer.

Thanks,

Eugene

-Original Message-
From: Richard Biener  
Sent: Friday, January 15, 2021 3:55 AM
To: Eugene Rozenfeld 
Cc: gabrav...@gmail.com; ja...@gcc.gnu.org; gcc-patches@gcc.gnu.org
Subject: Re: [EXTERNAL] Re: [PATCH][tree-optimization]Optimize combination of 
comparisons to dec+compare

On Thu, Jan 14, 2021 at 10:04 PM Eugene Rozenfeld 
 wrote:
>
> I got more feedback for the patch from Gabriel Ravier and Jakub Jelinek in 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D96674&data=04%7C01%7CEugene.Rozenfeld%40microsoft.com%7Cf4b1e41de6b4469fd3bb08d8b94c5a20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637463084866262315%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2Fc77uS%2BNlNGmXwOmK729BByEW0VDiq1HEe8BA7DpI30%3D&reserved=0
>  and re-worked it accordingly.
>
> The changes from the previous patch are:
> 1. Switched the tests to use __attribute__((noipa)) instead of 
> __attribute__((noinline)) .
> 2. Fixed a type in the pattern comment.
> 3. Added :c for top-level bit_ior expression.
> 4. Added :s for the subexpressions.
> 5. Added a pattern for the negated expression:
> x >= y && y != XXX_MIN --> x > y - 1
> and the corresponding tests.
>
> The new patch is attached.

OK.

Thanks,
Richard.

> Eugene
>
> -----Original Message-
> From: Richard Biener 
> Sent: Tuesday, January 5, 2021 4:21 AM
> To: Eugene Rozenfeld 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [EXTERNAL] Re: [PATCH][tree-optimization]Optimize combination 
> of comparisons to dec+compare
>
> On Mon, Jan 4, 2021 at 9:50 PM Eugene Rozenfeld 
>  wrote:
> >
> > Ping.
> >
> > -Original Message-
> > From: Eugene Rozenfeld
> > Sent: Tuesday, December 22, 2020 3:01 PM
> > To: Richard Biener ; 
> > gcc-patches@gcc.gnu.org
> > Subject: RE: Optimize combination of comparisons to dec+compare
> >
> > Re-sending my question and re-attaching the patch.
> >
> > Richard, can you please clarify your feedback?
>
> Hmm, OK.
>
> The patch is OK.
>
> Thanks,
> Richard.
>
>
> > Thanks,
> >
> > Eugene
> >
> > -Original Message-
> > From: Gcc-patches  On Behalf Of 
> > Eugene Rozenfeld via Gcc-patches
> > Sent: Tuesday, December 15, 2020 2:06 PM
> > To: Richard Biener 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: [EXTERNAL] Re: Optimize combination of comparisons to
> > dec+compare
> >
> > Richard,
> >
> > > Do we already handle x < y || x <= CST to x <= y - CST?
> >
> > That is an invalid transformation: e.g., consider x=3, y=4, CST=2.
> > Can you please clarify?
> >
> > Thanks,
> >
> > Eugene
> >
> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, December 10, 2020 12:21 AM
> > To: Eugene Rozenfeld 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: Optimize combination of comparisons to dec+compare
> >
> > On Thu, Dec 10, 2020 at 1:52 AM Eugene Rozenfeld via Gcc-patches 
> >  wrote:
> > >
> > > This patch adds a pattern for optimizing x < y || x == XXX_MIN to 
> > > x <=
> > > y-1 if y is an integer with TYPE_OVERFLOW_WRAPS.
> >
> > Do we already handle x < y || x <= CST to x <= y - CST?
> > That is, the XXX_MIN case is just a special-case of generic anti-range 
> > testing?  For anti-range testing with signed types we pun to unsigned when 
> > possible.
> >
> > > This fixes pr96674.
> > >
> > > Tested on x86_64-pc-linux-gnu.
> > >
> > > For this function
> > >
> > > bool f(unsigned a, unsigned b)
> > > {
> > > return (b == 0) | (a < b);
> > > }
> > >
> > > the code without the patch is
> > >
> > > test   esi,esi
> > > sete   al
> > > cmpesi,edi
> > > seta   dl
> > > or eax,edx
> > > ret
> > >
> > > the code with the patch is
> > >
> > > subesi,0x1
> > > cmpesi,edi
> > > setae  al
> > > ret
> > >
> > > Eugene
> > >
> > > gcc/
> > > PR tree-optimization/96674
> > > * match.pd: New pattern x < y || x == XXX_MIN --> x <= y - 1
> > >
> > > gcc/testsuite
> > > * gcc.dg/pr96674.c: New test.
> > >

RE: [EXTERNAL] Re: [PATCH] [tree-optimization] Optimize two patterns with three xors.

2020-11-19 Thread Eugene Rozenfeld via Gcc-patches

Thank you for installing my patch Jeff!

Yes, I intend to contribute regularly. I'm working on getting copyright 
assignment/disclaimer paperwork approved by my employer. I'll apply for commit 
privs after that.

Eugene

-Original Message-
From: Jeff Law  
Sent: Wednesday, November 18, 2020 11:33 AM
To: Richard Biener ; Eugene Rozenfeld 

Cc: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH] [tree-optimization] Optimize two patterns with 
three xors.



On 11/17/20 12:57 AM, Richard Biener via Gcc-patches wrote:
> On Tue, Nov 17, 2020 at 3:19 AM Eugene Rozenfeld 
>  wrote:
>> Thank you for the review Richard!
>>
>> I re-worked the patch based on your suggestions (attached).
>> I made the change to reuse the first bit_xor in both patterns and I added :s 
>> to the last xor in the first pattern.
>> For the second pattern I didn't add :s because I think the simplification is 
>> beneficial even if the second or third bit_xor has more than one use since 
>> we are simplifying them to just a single operand (@2). If that is incorrect, 
>> please explain why.
> Ah, true, that's correct.
>
> The patch is OK.
I've installed this on the trunk.

Eugene, if you're going to contribute regularly you should probably go ahead 
and get commit privs so that you can commit ACK's patches yourself.   There 
should be a link to a form from this page:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fgitwrite.html&data=04%7C01%7Ceugene.rozenfeld%40microsoft.com%7Ca31f5335968749cdfe1708d88bf8bfb2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637413247775369684%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BfQGAtU%2F8IJ8%2BRcvuNI8qKWgnC9oPFtWQhXo1RzlBTU%3D&reserved=0


Jeff

[PATCH] [tree-optimization] Optimize or+and+or pattern to and+or

2020-11-22 Thread Eugene Rozenfeld via Gcc-patches

Simplify
((b | c) & a) | b
to 
(a & c) | b.

This fixes PR96679.

Tested on x86_64-pc-linux-gnu.

int f(int a, int b, int c)
{
return ((b | c) & a) | b;
}

Code without the patch:
or edx,esi
andedx,edi
moveax,edx
or eax,esi
ret

Code with the patch:
andedi,edx
moveax,edi
or eax,esi
ret

Eugene


0001-Optimize-or-and-or-pattern-to-and-or.patch
Description: 0001-Optimize-or-and-or-pattern-to-and-or.patch

[PATCH] [tree-optimization] Optimize max/min pattern with comparison

2020-11-25 Thread Eugene Rozenfeld via Gcc-patches

Make the following simplifications:
X <= MAX(X, Y) -> true
X > MAX(X, Y) -> false
X >= MIN(X, Y) -> true
X < MIN(X, Y) -> false

This fixes PR96708.

Tested on x86_64-pc-linux-gnu.

bool f(int a, int b)
{
int tmp = (a < b) ? b : a;
return tmp >= a;
}

Code without the patch:

vmovd  xmm0,edi
vmovd  xmm1,esi
vpmaxsd xmm0,xmm0,xmm1
vmovd  eax,xmm0
cmpeax,edi
setge  al
ret

Code with the patch:

moveax,0x1
ret

Eugene


0001-Optimize-max-pattern-with-comparison.patch
Description: 0001-Optimize-max-pattern-with-comparison.patch

Re: [PATCH] [tree-optimization] Optimize max/min pattern with comparison

2020-11-30 Thread Eugene Rozenfeld via Gcc-patches

Thank you for the review Jeff.

I don't need to look at the opcode to know the result. The pattern will be 
matched only in these 4 cases:

X <= MAX(X, Y) -> true
X > MAX(X, Y) -> false
X >= MIN(X, Y) -> true
X < MIN(X, Y) -> false

So, the result will be true for GE_EXPR and LE_EXPR and false otherwise.

I added two test files: one for positive cases and one for negative cases. The 
updated patch is attached.

Thanks,

Eugene

-Original Message-
From: Jeff Law  
Sent: Monday, November 30, 2020 9:51 AM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] [tree-optimization] Optimize max/min pattern with 
comparison



On 11/25/20 3:04 PM, Eugene Rozenfeld via Gcc-patches wrote:
> Make the following simplifications:
> X <= MAX(X, Y) -> true
> X > MAX(X, Y) -> false
> X >= MIN(X, Y) -> true
> X < MIN(X, Y) -> false
> 
> This fixes PR96708.
> 
> Tested on x86_64-pc-linux-gnu.
>
> bool f(int a, int b)
> {
> int tmp = (a < b) ? b : a;
> return tmp >= a;
> }
>
> Code without the patch:
>
> vmovd  xmm0,edi
> vmovd  xmm1,esi
> vpmaxsd xmm0,xmm0,xmm1
> vmovd  eax,xmm0
> cmpeax,edi
> setge  al
> ret
>
> Code with the patch:
>
> moveax,0x1
> ret
>
> Eugene
>
> 0001-Optimize-max-pattern-with-comparison.patch
>
> From f6391c197b670b516238ac7707512c1358336520 Mon Sep 17 00:00:00 2001
> From: Eugene Rozenfeld 
> Date: Sat, 21 Nov 2020 01:08:50 -0800
> Subject: [PATCH] Optimize max pattern with comparison
>
> Make the following simplifications:
> X <= MAX(X, Y) -> true
> X > MAX(X, Y) -> false
> X >= MIN(X, Y) -> true
> X < MIN(X, Y) -> false
>
> This fixes PR96708.
>
> gcc/
> * match.pd : New patterns.
> ---
>  gcc/match.pd | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd index 
> cbb4bf0b32d..75237741946 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2851,6 +2851,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(cmp (minmax @0 INTEGER_CST@1) INTEGER_CST@2)
>(comb (cmp @0 @2) (cmp @1 @2
>  
> +/* X <= MAX(X, Y) -> true
> +   X > MAX(X, Y) -> false 
> +   X >= MIN(X, Y) -> true
> +   X < MIN(X, Y) -> false */
> +(for minmax (min min max max )
> + cmp(ge  lt  le  gt  )
> + (simplify
> +  (cmp @0 (minmax:c @0 @1))
> +  { constant_boolean_node (cmp == GE_EXPR || cmp == LE_EXPR, type); } 
> +))
Don't you need to look at the opcode (MIN vs MAX) vs CMP to know the result?Â  
I'd really like to see some tests for the testsuite.Â Â Â  In particular I'd 
like to see positive tests where we should apply the optimization and negative 
tests when we should not apply the optimization.

I also wonder if there's value in handling this in Ranger and/or DOM.Â Though 
I'd probably wait to see if fixing in match.pd is sufficient to cover the cases 
I'm thinking of in Ranger & DOM.

Jeff




0001-Optimize-min-and-max-patterns-with-comparison.patch
Description: 0001-Optimize-min-and-max-patterns-with-comparison.patch

Optimize combination of comparisons to dec+compare

2020-12-09 Thread Eugene Rozenfeld via Gcc-patches

This patch adds a pattern for optimizing 
x < y || x == XXX_MIN to x <= y-1
if y is an integer with TYPE_OVERFLOW_WRAPS.

This fixes pr96674.

Tested on x86_64-pc-linux-gnu.

For this function

bool f(unsigned a, unsigned b)
{
return (b == 0) | (a < b);
}

the code without the patch is

test   esi,esi
sete   al
cmpesi,edi
seta   dl
or eax,edx
ret

the code with the patch is

subesi,0x1
cmpesi,edi
setae  al
ret

Eugene

gcc/
PR tree-optimization/96674
* match.pd: New pattern x < y || x == XXX_MIN --> x <= y - 1

gcc/testsuite
* gcc.dg/pr96674.c: New test.



0001-Optimize-combination-of-comparisons-to-dec-compare.patch
Description: 0001-Optimize-combination-of-comparisons-to-dec-compare.patch

[PATCH] [tree-optimization] Fix for PR97223

2020-10-23 Thread Eugene Rozenfeld via Gcc-patches

This patch adds a pattern for folding 
x < (short) ((unsigned short)x + const)
to
 x <= SHORT_MAX - const
(and similarly for other integral types) if const is not 0.
as described in PR97223.

For example, without this patch the x86_64-pc-linux code generated for this 
function

bool f(char x)
{
return x < (char)(x + 12);
}

is

leaeax,[rdi+0xc]
cmpal,dil
setg   al
ret  

With the patch the code is 

cmpdil,0x73
setle  al
ret

Tested on x86_64-pc-linux.

Eugene



0001-Add-a-tree-optimization-described-in-PR97223.patch
Description: 0001-Add-a-tree-optimization-described-in-PR97223.patch

RE: [EXTERNAL] Re: [PATCH] [tree-optimization] Fix for PR97223

2020-10-29 Thread Eugene Rozenfeld via Gcc-patches

Thank you for the review Richard!

I re-worked the patch based on your suggestions. I combined the two patterns. 
Neither one requires a signedness check as long as the type of the 'add' has 
overflow wrap semantics.

I had to modify the regular expression in no-strict-overflow-4.c test. In that 
test the following function is compiled with -fno-strict-overflow :

int
foo (int i)
{
  return i + 1 > i;
}

We now optimize this function so that the tree-optimized dump has

;; Function foo (foo, funcdef_no=0, decl_uid=1931, cgraph_uid=1, symbol_order=0)

foo (int i)
{
  _Bool _1;
  int _3;

   [local count: 1073741824]:
  _1 = i_2(D) != 2147483647;
  _3 = (int) _1;
  return _3;
}

This is a correct optimization since -fno-strict-overflow implies -fwrapv.

Eugene

-Original Message-
From: Richard Biener  
Sent: Tuesday, October 27, 2020 2:23 AM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH] [tree-optimization] Fix for PR97223

On Sat, Oct 24, 2020 at 2:20 AM Eugene Rozenfeld via Gcc-patches 
 wrote:
>
> This patch adds a pattern for folding
> x < (short) ((unsigned short)x + const) to
>  x <= SHORT_MAX - const
> (and similarly for other integral types) if const is not 0.
> as described in PR97223.
>
> For example, without this patch the x86_64-pc-linux code generated for 
> this function
>
> bool f(char x)
> {
> return x < (char)(x + 12);
> }
>
> is
>
> leaeax,[rdi+0xc]
> cmpal,dil
> setg   al
> ret
>
> With the patch the code is
>
> cmpdil,0x73
> setle  al
> ret
>
> Tested on x86_64-pc-linux.

+/* Similar to the previous pattern but with additional casts. */ (for 
+cmp (lt le ge gt)
+ out (gt gt le le)
+ (simplify
+  (cmp:c (convert@3 (plus@2 (convert@4 @0) INTEGER_CST@1)) @0)
+  (if (!TYPE_UNSIGNED (TREE_TYPE (@0))
+   && types_match (TREE_TYPE (@0), TREE_TYPE (@3))
+   && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE (@0)))
+   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@4))
+   && wi::to_wide (@1) != 0
+   && single_use (@2))
+   (with { unsigned int prec = TYPE_PRECISION (TREE_TYPE (@0)); }
+(out @0 { wide_int_to_tree (TREE_TYPE (@0),
+   wi::max_value (prec, SIGNED)
+   - wi::to_wide (@1)); })

I think it's reasonable but the comment can be made more precise.
In particular I wonder why we require a signed comparison here while the 
previous pattern requires an unsigned comparison.  It might be an artifact and 
the restriction instead only applies to the plus?

Note that

+   && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE 
+ (@0)))

unsigned_type_for should be avoided since it's quite expensive.  May I suggest

  && TYPE_UNSIGNED (TREE_TYPE (@4))
  && tree_nop_conversion_p (TREE_TYPE (@4), TREE_TYPE (@0))

instead?

I originally wondered if "but with additional casts" could be done in a single 
pattern via (convert? ...) uses but then I noticed the strange difference in 
the comparison signedness requirement ...

Richard.

> Eugene
>

0001-Add-a-tree-optimization-described-in-PR97223.patch
Description: 0001-Add-a-tree-optimization-described-in-PR97223.patch

[PATCH] [tree-optimization] Fix for PR96701

2020-10-29 Thread Eugene Rozenfeld via Gcc-patches

This patch adds a pattern for folding

x >> x

to

  0

as described in PR96701.


Without this patch the x86_64-pc-linux-gnu code generated for this function



int

foo (int i)

{

  return i >> i;

}



is



movecx,edi

saredi,cl

test   edi,edi

setne  al

ret



With the patch the code is


xoreax,eax
ret


Tested on x86_64-pc-linux-gnu.

Eugene


0001-Optimize-self-right-shift-to-0.patch
Description: 0001-Optimize-self-right-shift-to-0.patch

[PATCH] [tree-optimization] Fix for PR96701

2020-10-29 Thread Eugene Rozenfeld via Gcc-patches

This patch adds a pattern for folding 
x >> x
to
  0
as described in PR96701.

Without this patch the x86_64-pc-linux-gnu code generated for this function

int
foo (int i)
{
  return i >> i;
}

is

mov    ecx,edi
sar    edi,cl
test   edi,edi
setne  al
ret

With the patch the code is 

xor    eax,eax
ret  

Tested on x86_64-pc-linux-gnu.

Eugene


0001-Optimize-self-right-shift-to-0.patch
Description: 0001-Optimize-self-right-shift-to-0.patch

[PATCH][PUSHED] Fix cfg maintenance after inlining in AutoFDO

2023-05-08 Thread Eugene Rozenfeld via Gcc-patches

Todo from early_inliner needs to be propagated so that
cleanup_tree_cfg () is called if necessary.

This bug was causing an assert in get_loop_body during
ipa-sra in autoprofiledbootstrap build since loops weren't
fixed up and one of the loops had num_nodes set to 0.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:

* auto-profile.cc (auto_profile): Check todo from early_inline
to see if cleanup_tree_vfg needs to be called.
(early_inline): Return todo from early_inliner.
---
 gcc/auto-profile.cc | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 360c42c4b89..e3af3555e75 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1589,13 +1589,14 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts)
 
 /* Wrapper function to invoke early inliner.  */
 
-static void
+static unsigned int
 early_inline ()
 {
   compute_fn_summary (cgraph_node::get (current_function_decl), true);
-  unsigned todo = early_inliner (cfun);
+  unsigned int todo = early_inliner (cfun);
   if (todo & TODO_update_ssa_any)
 update_ssa (TODO_update_ssa);
+  return todo;
 }
 
 /* Use AutoFDO profile to annoate the control flow graph.
@@ -1651,20 +1652,22 @@ auto_profile (void)
function before annotation, so the profile inside bar@loc_foo2
will be useful.  */
 autofdo::stmt_set promoted_stmts;
+unsigned int todo = 0;
 for (int i = 0; i < 10; i++)
   {
-if (!flag_value_profile_transformations
-|| !autofdo::afdo_vpt_for_early_inline (&promoted_stmts))
-  break;
-early_inline ();
+   if (!flag_value_profile_transformations
+   || !autofdo::afdo_vpt_for_early_inline (&promoted_stmts))
+ break;
+   todo |= early_inline ();
   }
 
-early_inline ();
+todo |= early_inline ();
 autofdo::afdo_annotate_cfg (promoted_stmts);
 compute_function_frequency ();
 
 /* Local pure-const may imply need to fixup the cfg.  */
-if (execute_fixup_cfg () & TODO_cleanup_cfg)
+todo |= execute_fixup_cfg ();
+if (todo & TODO_cleanup_cfg)
   cleanup_tree_cfg ();
 
 free_dominance_info (CDI_DOMINATORS);
@@ -1674,7 +1677,7 @@ auto_profile (void)
 pop_cfun ();
   }
 
-  return TODO_rebuild_cgraph_edges;
+  return 0;
 }
 } /* namespace autofdo.  */
 
-- 
2.25.1

[PATCH] Add instruction level discriminator support.

2022-06-02 Thread Eugene Rozenfeld via Gcc-patches

This is the first in a series of patches to enable discriminator support
in AutoFDO.

This patch switches to tracking discriminators per statement/instruction
instead of per basic block. Tracking per basic block was problematic since
not all statements in a basic block needed a discriminator and, also, later
optimizations could move statements between basic blocks making correlation
during AutoFDO compilation unreliable. Tracking per statement also allows
us to assign different discriminators to multiple function calls in the same
basic block. A subsequent patch will add that support.

The idea of this patch is based on commit 
4c311d95cf6d9519c3c20f641cc77af7df491fdf
by Dehao Chen in vendors/google/heads/gcc-4_8 but uses a slightly different
approach. In Dehao's work special (normally unused) location ids and side tables
were used to keep track of locations with discriminators. Things have changed
since then and I don't think we have unused location ids anymore. Instead,
I made discriminators a part of ad-hoc locations.

The difference from Dehao's work also includes support for discriminator
reading/writing in lto streaming and in modules.

Tested on x86_64-pc-linux-gnu.


0001-Add-instruction-level-discriminator-support.patch
Description: 0001-Add-instruction-level-discriminator-support.patch

[PING][PATCH] Add instruction level discriminator support.

2022-06-10 Thread Eugene Rozenfeld via Gcc-patches

Hello,

I'd like to ping this patch: 
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596065.html

Thanks,

Eugene

-Original Message-
From: Gcc-patches  On 
Behalf Of Eugene Rozenfeld via Gcc-patches
Sent: Thursday, June 02, 2022 12:22 AM
To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan Hubicka 

Subject: [EXTERNAL] [PATCH] Add instruction level discriminator support.

This is the first in a series of patches to enable discriminator support in 
AutoFDO.

This patch switches to tracking discriminators per statement/instruction 
instead of per basic block. Tracking per basic block was problematic since not 
all statements in a basic block needed a discriminator and, also, later 
optimizations could move statements between basic blocks making correlation 
during AutoFDO compilation unreliable. Tracking per statement also allows us to 
assign different discriminators to multiple function calls in the same basic 
block. A subsequent patch will add that support.

The idea of this patch is based on commit 
4c311d95cf6d9519c3c20f641cc77af7df491fdf
by Dehao Chen in vendors/google/heads/gcc-4_8 but uses a slightly different 
approach. In Dehao's work special (normally unused) location ids and side 
tables were used to keep track of locations with discriminators. Things have 
changed since then and I don't think we have unused location ids anymore. 
Instead, I made discriminators a part of ad-hoc locations.

The difference from Dehao's work also includes support for discriminator 
reading/writing in lto streaming and in modules.

Tested on x86_64-pc-linux-gnu.


0001-Add-instruction-level-discriminator-support.patch
Description: 0001-Add-instruction-level-discriminator-support.patch

RE: [EXTERNAL] Re: [PATCH] Add instruction level discriminator support.

2022-06-12 Thread Eugene Rozenfeld via Gcc-patches

Thank you for the review Andi.

Yes, the corresponding reader change is

@@ -15671,8 +15673,9 @@ module_state::read_location (bytes_in &sec) const
if (range.m_start == UNKNOWN_LOCATION)
  range.m_start = locus;
range.m_finish = read_location (sec);
+   unsigned discriminator = sec.u ();
if (locus != loc && range.m_start != loc && range.m_finish != loc)
- locus = get_combined_adhoc_loc (line_table, locus, range, NULL);
+ locus = get_combined_adhoc_loc (line_table, locus, range, NULL, 
discriminator);
   }
   break;

-Original Message-
From: Andi Kleen  
Sent: Sunday, June 12, 2022 4:53 PM
To: Eugene Rozenfeld via Gcc-patches 
Cc: Jan Hubicka ; Eugene Rozenfeld 

Subject: [EXTERNAL] Re: [PATCH] Add instruction level discriminator support.

Eugene Rozenfeld via Gcc-patches  writes:
>  {
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 
> d1dc73724d1..5ed6b7b0f94 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -15587,6 +15587,8 @@ module_state::write_location (bytes_out &sec, 
> location_t loc)
>   range.m_start = UNKNOWN_LOCATION;
>write_location (sec, range.m_start);
>write_location (sec, range.m_finish);
> +  unsigned discriminator = get_discriminator_from_adhoc_loc (line_table, 
> loc);
> +  sec.u (discriminator);

I hope this has a corresponding reader change, wasn't fully clear. Should it 
use some common function?

The patch looks good to me, but I cannot approve.

-Andi

RE: [PING][PATCH] Add instruction level discriminator support.

2022-06-27 Thread Eugene Rozenfeld via Gcc-patches

Another ping for 
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596065.html .

I got a review from Andi 
(https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596549.html) but I also 
need a review from someone who can approve the changes.

Thanks,

Eugene

-Original Message-
From: Eugene Rozenfeld 
Sent: Friday, June 10, 2022 12:03 PM
To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan Hubicka 

Subject: [PING][PATCH] Add instruction level discriminator support.

Hello,

I'd like to ping this patch: 
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596065.html

Thanks,

Eugene

-Original Message-
From: Gcc-patches  On 
Behalf Of Eugene Rozenfeld via Gcc-patches
Sent: Thursday, June 02, 2022 12:22 AM
To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan Hubicka 

Subject: [EXTERNAL] [PATCH] Add instruction level discriminator support.

This is the first in a series of patches to enable discriminator support in 
AutoFDO.

This patch switches to tracking discriminators per statement/instruction 
instead of per basic block. Tracking per basic block was problematic since not 
all statements in a basic block needed a discriminator and, also, later 
optimizations could move statements between basic blocks making correlation 
during AutoFDO compilation unreliable. Tracking per statement also allows us to 
assign different discriminators to multiple function calls in the same basic 
block. A subsequent patch will add that support.

The idea of this patch is based on commit 
4c311d95cf6d9519c3c20f641cc77af7df491fdf
by Dehao Chen in vendors/google/heads/gcc-4_8 but uses a slightly different 
approach. In Dehao's work special (normally unused) location ids and side 
tables were used to keep track of locations with discriminators. Things have 
changed since then and I don't think we have unused location ids anymore. 
Instead, I made discriminators a part of ad-hoc locations.

The difference from Dehao's work also includes support for discriminator 
reading/writing in lto streaming and in modules.

Tested on x86_64-pc-linux-gnu.


0001-Add-instruction-level-discriminator-support.patch
Description: 0001-Add-instruction-level-discriminator-support.patch

[COMMITTED] MAINTAINERS: Add myself for write after approval

2021-03-12 Thread Eugene Rozenfeld via Gcc-patches

ChangeLog:

2021-03-12  Eugene Rozenfeld  

* MAINTAINERS (Write After Approval): Add myself.


0001-MAINTAINERS-Add-myself-for-write-after-approval.patch
Description: 0001-MAINTAINERS-Add-myself-for-write-after-approval.patch

RE: [EXTERNAL] [PATCH] gcov: Use system IO buffering

2021-06-16 Thread Eugene Rozenfeld via Gcc-patches

The commit from this patch 
(https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=23eb66d1d46a34cb28c4acbdf8a1deb80a7c5a05)
 changed the semantics of gcov_read_string and gcov_write_string. Before this 
change the string storage was as described in gcov-io.h:

"Strings are
   padded with 1 to 4 NUL bytes, to bring the length up to a multiple
   of 4. The number of 4 bytes is stored, followed by the padded
   string."

After this change the number before the string indicates the number of bytes 
(not words) and there is no padding.

Was this file format change intentional? It breaks AutoFDO because create_gcov 
produces strings in the format specified in gcov-io.h.

Thanks,

Eugene

-Original Message-
From: Gcc-patches  On Behalf Of Martin Liška
Sent: Wednesday, April 21, 2021 12:52 AM
To: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] [PATCH] gcov: Use system IO buffering

Hey.

I/O buffering in gcov seems duplicite to what modern C library can provide.
The patch is a simplification and can provide easier interface for system that 
don't have a filesystem and would like using GCOV.

I'm going to install the patch after 11.1 if there are no objections.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Thanks,
Martin

gcc/ChangeLog:

* gcov-io.c (gcov_write_block): Remove.
(gcov_write_words): Likewise.
(gcov_read_words): Re-implement using gcov_read_bytes.
(gcov_allocate): Remove.
(GCOV_BLOCK_SIZE): Likewise.
(struct gcov_var): Remove most of the fields.
(gcov_position): Implement with ftell.
(gcov_rewrite): Remove setting of start and offset fields.
(from_file): Re-format.
(gcov_open): Remove setbuf call. It should not be needed.
(gcov_close): Remove internal buffer handling.
(gcov_magic): Use __builtin_bswap32.
(gcov_write_counter): Use directly gcov_write_unsigned.
(gcov_write_string): Use direct fwrite and do not round
to 4 bytes.
(gcov_seek): Use directly fseek.
(gcov_write_tag): Use gcov_write_unsigned directly.
(gcov_write_length): Likewise.
(gcov_write_tag_length): Likewise.
(gcov_read_bytes): Use directly fread.
(gcov_read_unsigned): Use gcov_read_words.
(gcov_read_counter): Likewise.
(gcov_read_string): Use gcov_read_bytes.
* gcov-io.h (GCOV_WORD_SIZE): Adjust to reflect
that size is not in bytes, but words (4B).
(GCOV_TAG_FUNCTION_LENGTH): Likewise.
(GCOV_TAG_ARCS_LENGTH): Likewise.
(GCOV_TAG_ARCS_NUM): Likewise.
(GCOV_TAG_COUNTER_LENGTH): Likewise.
(GCOV_TAG_COUNTER_NUM): Likewise.
(GCOV_TAG_SUMMARY_LENGTH): Likewise.

libgcc/ChangeLog:

* libgcov-driver.c: Fix GNU coding style.
---
 gcc/gcov-io.c   | 282 +---
 gcc/gcov-io.h   |  17 ++-
 libgcc/libgcov-driver.c |   6 +-
 3 files changed, 76 insertions(+), 229 deletions(-)

diff --git a/gcc/gcov-io.c b/gcc/gcov-io.c index 80c9082a649..bd2316dedab 100644
--- a/gcc/gcov-io.c
+++ b/gcc/gcov-io.c
@@ -27,40 +27,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 /* Routines declared in gcov-io.h.  This file should be #included by
another source file, after having #included gcov-io.h.  */
 
-#if !IN_GCOV
-static void gcov_write_block (unsigned); -static gcov_unsigned_t 
*gcov_write_words (unsigned); -#endif -static const gcov_unsigned_t 
*gcov_read_words (unsigned); -#if !IN_LIBGCOV -static void gcov_allocate 
(unsigned); -#endif
-
-/* Optimum number of gcov_unsigned_t's read from or written to disk.  */ 
-#define GCOV_BLOCK_SIZE (1 << 10)
+static gcov_unsigned_t *gcov_read_words (void *buffer, unsigned);
 
 struct gcov_var
 {
   FILE *file;
-  gcov_position_t start;   /* Position of first byte of block */
-  unsigned offset; /* Read/write position within the block.  */
-  unsigned length; /* Read limit in the block.  */
-  unsigned overread;   /* Number of words overread.  */
   int error;   /* < 0 overflow, > 0 disk error.  */
-  int mode;/* < 0 writing, > 0 reading */
+  int mode;/* < 0 writing, > 0 reading.  */
   int endian;  /* Swap endianness.  */
-#if IN_LIBGCOV
-  /* Holds one block plus 4 bytes, thus all coverage reads & writes
- fit within this buffer and we always can transfer GCOV_BLOCK_SIZE
- to and from the disk. libgcov never backtracks and only writes 4
- or 8 byte objects.  */
-  gcov_unsigned_t buffer[GCOV_BLOCK_SIZE + 1]; -#else
-  /* Holds a variable length block, as the compiler can write
- strings and needs to backtrack.  */
-  size_t alloc;
-  gcov_unsigned_t *buffer;
-#endif
 } gcov_var;
 
 /* Save the current position in the gcov file.  */ @@ -71,8 +45,7 @@ static 
inline  gcov_position_t  gcov_position (void)  {
-  gcov_nonruntime_assert (gcov_var.mode

RE: [EXTERNAL] [PATCH] gcov: Use system IO buffering

2021-06-17 Thread Eugene Rozenfeld via Gcc-patches

Thank you for your reply Martin!

AUTO_PROFILE_VERSION should also be changed. Then create_gcov can be updated to 
support both the old format and the new format.

Eugene

-Original Message-
From: Martin Liška  
Sent: Thursday, June 17, 2021 2:38 AM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org
Subject: Re: [EXTERNAL] [PATCH] gcov: Use system IO buffering

On 6/17/21 3:59 AM, Eugene Rozenfeld wrote:
> |The commit from this patch 
> (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fgit%2F%3Fp%3Dgcc.git%3Ba%3Dcommit%3Bh%3D23eb66d1d46a34cb28c4acbdf8a1deb80a7c5a05&data=04%7C01%7CEugene.Rozenfeld%40microsoft.com%7C508d63026ea84be211cc08d9317395bb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637595194782996821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UG%2B41tXMZ94%2Ff80qCnmq%2BtZsFkLXc9NrdWF8KXwPjnk%3D&reserved=0)
>  changed the semantics of gcov_read_string and gcov_write_string. Before this 
> change the string storage was as described in gcov-io.h: "Strings are padded 
> with 1 to 4 NUL bytes, to bring the length up to a multiple of 4. The number 
> of 4 bytes is stored, followed by the padded string." After this change the 
> number before the string indicates the number of bytes (not words) and there 
> is no padding. Was this file format change intentional?

Hello.

Thanks for heads up!

Yes, the change was intentional and I'm going to update documentation entry in 
gcov-io.h.

> It breaks AutoFDO because create_gcov produces strings in the format 
> specified in gcov-io.h. Thanks, Eugene

Sure, that needs to be adjusted.

Martin

RE: [PATCH][pushed] gcov: update documentation entry about string format

2021-06-21 Thread Eugene Rozenfeld via Gcc-patches

Thank you for updating the documentation Martin.

The following line can now be removed:

padding: | char:0 | char:0 char:0 | char:0 char:0 char:0

Eugene

-Original Message-
From: Gcc-patches  On 
Behalf Of Martin Liška
Sent: Thursday, June 17, 2021 2:40 AM
To: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] [PATCH][pushed] gcov: update documentation entry about 
string format

gcc/ChangeLog:

* gcov-io.h: Update documentation entry about string format.
---
  gcc/gcov-io.h | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h index f7584eb9679..ff92afe63df 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -42,15 +42,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
  
 Numbers are recorded in the 32 bit unsigned binary form of the
 endianness of the machine generating the file. 64 bit numbers are
-   stored as two 32 bit numbers, the low part first.  Strings are
-   padded with 1 to 4 NUL bytes, to bring the length up to a multiple
-   of 4. The number of 4 bytes is stored, followed by the padded
+   stored as two 32 bit numbers, the low part first.
+   The number of bytes is stored, followed by the
 string. Zero length and NULL strings are simply stored as a length
 of zero (they have no trailing NUL or padding).
  
int32:  byte3 byte2 byte1 byte0 | byte0 byte1 byte2 byte3
int64:  int32:low int32:high
-   string: int32:0 | int32:length char* char:0 padding
+   string: int32:0 | int32:length char* char:0
padding: | char:0 | char:0 char:0 | char:0 char:0 char:0
item: int32 | int64 | string
  
--
2.32.0

[PATCH] Update gen_autofdo_event.py and gcc-auto-profile

2021-07-01 Thread Eugene Rozenfeld via Gcc-patches

gen_autofdo_event.py was stumbling on models with stepping so
I updated the script to handle this case similar to the code in
https://github.com/andikleen/pmu-tools/blob/c6a5f63aede19def8886d6a8b74d7a55c38ca947/event_download.py

The second change was to tolerate cases when the CPU supports PEBS but the
perf command with /p fails. This can happen in, e.g., a virtual machine.

I regenerated gcc-auto-profile using the updated script.

contrib/ChangeLog:

* gen_autofdo_event.py: handle stepping, non-working PEBS

gcc/ChangeLog:

* config/i386/gcc-auto-profile: regenerate
---
 contrib/gen_autofdo_event.py | 54 ++--
 gcc/config/i386/gcc-auto-profile | 41 +++-
 2 files changed, 71 insertions(+), 24 deletions(-)

diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
index c97460c61c6..1eb6f1d6d85 100755
--- a/contrib/gen_autofdo_event.py
+++ b/contrib/gen_autofdo_event.py
@@ -46,20 +46,29 @@ args = ap.parse_args()
 
 eventmap = collections.defaultdict(list)
 
-def get_cpu_str():
-with open('/proc/cpuinfo', 'r') as c:
-vendor, fam, model = None, None, None
-for j in c:
-n = j.split()
-if n[0] == 'vendor_id':
-vendor = n[2]
-elif n[0] == 'model' and n[1] == ':':
-model = int(n[2])
-elif n[0] == 'cpu' and n[1] == 'family':
-fam = int(n[3])
-if vendor and fam and model:
-return "%s-%d-%X" % (vendor, fam, model), model
-return None, None
+def get_cpustr():
+cpuinfo = os.getenv("CPUINFO")
+if cpuinfo is None:
+cpuinfo = '/proc/cpuinfo'
+f = open(cpuinfo, 'r')
+cpu = [None, None, None, None]
+for j in f:
+n = j.split()
+if n[0] == 'vendor_id':
+cpu[0] = n[2]
+elif n[0] == 'model' and n[1] == ':':
+cpu[2] = int(n[2])
+elif n[0] == 'cpu' and n[1] == 'family':
+cpu[1] = int(n[3])
+elif n[0] == 'stepping' and n[1] == ':':
+cpu[3] = int(n[2])
+if all(v is not None for v in cpu):
+break
+# stepping for SKX only
+stepping = cpu[0] == "GenuineIntel" and cpu[1] == 6 and cpu[2] == 0x55
+if stepping:
+return "%s-%d-%X-%X" % tuple(cpu)
+return "%s-%d-%X" % tuple(cpu)[:3]
 
 def find_event(eventurl, model):
 print >>sys.stderr, "Downloading", eventurl
@@ -81,7 +90,7 @@ def find_event(eventurl, model):
 return found
 
 if not args.all:
-cpu, model = get_cpu_str()
+cpu = get_cpu_str()
 if not cpu:
 sys.exit("Unknown CPU type")
 
@@ -94,7 +103,8 @@ for j in u:
 n = j.rstrip().split(',')
 if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core":
 if args.all:
-vendor, fam, model = n[0].split("-")
+components = n[0].split("-")
+model = components[2]
 model = int(model, 16)
 cpufound += 1
 found += find_event(baseurl + n[2], model)
@@ -146,7 +156,17 @@ case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
 echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
update script."
exit 1 ;;'''
 print "esac"
-print 'exec perf record -e $E -b "$@"'
+print "set -x"
+print 'if ! perf record -e $E -b "$@" ; then'
+print '  # PEBS may not actually be working even if the processor supports 
it'
+print '  # (e.g., in a virtual machine). Trying to run without /p.'
+print '  set +x'
+print '  echo >&2 "Retrying without /p."'
+print '  E="$(echo "${E}" | sed -e \'s/\/p/\//\')"'
+print '  set -x'
+print '  exec perf record -e $E -b "$@"'
+print ' set +x'
+print 'fi'
 
 if cpufound == 0 and not args.all:
 sys.exit('CPU %s not found' % cpu)
diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
index 5da5c63cd84..56f64cbff1f 100755
--- a/gcc/config/i386/gcc-auto-profile
+++ b/gcc/config/i386/gcc-auto-profile
@@ -1,7 +1,7 @@
 #!/bin/sh
-# profile workload for gcc profile feedback (autofdo) using Linux perf
-# auto generated. to regenerate for new CPUs run
-# contrib/gen_autofdo_event.py --shell --all in gcc source
+# Profile workload for gcc profile feedback (autofdo) using Linux perf.
+# Auto generated. To regenerate for new CPUs run
+# contrib/gen_autofdo_event.py --script --all in gcc source
 
 # usages:
 # gcc-auto-profile program (profile program and children)
@@ -10,7 +10,7 @@
 # gcc-auto-profile --kernel -a sleep X (profile kernel)
 # gcc-auto-profile --all -a sleep X(profile kernel and user space)
 
-# identify branches taken event for CPU
+# Identify branches taken event for CPU.
 #
 
 FLAGS=u
@@ -37,7 +37,12 @@ case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
   egrep "^model\s*:" /proc/cpuinfo | head -n1` in
 model*:\ 55|\
 model*:\ 77|\
-model*:\ 76) E="cpu/event=0xC4,umask=0xFE/p$FLAGS" ;;
+model*:\ 76|\
+model*:\ 92|\
+

[PATCH] Fix count comparison in ipa-cp

2022-11-21 Thread Eugene Rozenfeld via Gcc-patches

The existing comparison was incorrect for non-PRECISE counts
(e.g., AFDO): we could end up with a 0 base_count, which could
lead to asserts, e.g., in good_cloning_opportunity_p.

gcc/ChangeLog:

* ipa-cp.cc (ipcp_propagate_stage): Fix profile count comparison.
---
 gcc/ipa-cp.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index d2bcd5e5e69..9df8b456759 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -4225,7 +4225,7 @@ ipcp_propagate_stage (class ipa_topo_info *topo)
for (cgraph_edge *cs = node->callees; cs; cs = cs->next_callee)
  {
profile_count count = cs->count.ipa ();
-   if (!(count > profile_count::zero ()))
+   if (!count.nonzero_p ())
  continue;
 
enum availability avail;
-- 
2.25.1

[PATCH] Fix autoprofiledbootstrap build

2022-11-21 Thread Eugene Rozenfeld via Gcc-patches

1. Fix gcov version
2. Don't attempt to create an autoprofile file for cc1 since cc1plus
(not cc1) is not invoked when building cc1
3. Fix documentation typo

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:

* c/Make-lang.in: Don't attempt to create an autoprofile file for cc1
* cp/Make-lang.in: Fix gcov version
* lto/Make-lang.in: Fix gcov version
* doc/install.texi: Fix documentation typo
---
 gcc/c/Make-lang.in   | 15 +--
 gcc/cp/Make-lang.in  |  2 +-
 gcc/doc/install.texi |  2 +-
 gcc/lto/Make-lang.in |  2 +-
 4 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/gcc/c/Make-lang.in b/gcc/c/Make-lang.in
index 9bd9c0ea123..ba33ec03bf0 100644
--- a/gcc/c/Make-lang.in
+++ b/gcc/c/Make-lang.in
@@ -62,12 +62,6 @@ c_OBJS = $(C_OBJS) cc1-checksum.o c/gccspec.o
 # Use strict warnings for this front end.
 c-warn = $(STRICT_WARN)
 
-ifeq ($(if $(wildcard ../stage_current),$(shell cat \
-  ../stage_current)),stageautofeedback)
-$(C_OBJS): ALL_COMPILERFLAGS += -fauto-profile=cc1.fda
-$(C_OBJS): cc1.fda
-endif
-
 # compute checksum over all object files and the options
 # re-use the checksum from the prev-final stage so it passes
 # the bootstrap comparison and allows comparing of the cc1 binary
@@ -88,9 +82,6 @@ cc1$(exeext): $(C_OBJS) cc1-checksum.o $(BACKEND) $(LIBDEPS)
  cc1-checksum.o $(BACKEND) $(LIBS) $(BACKENDLIBS)
@$(call LINK_PROGRESS,$(INDEX.c),end)
 
-cc1.fda: ../stage1-gcc/cc1$(exeext) ../prev-gcc/$(PERF_DATA)
-   $(CREATE_GCOV) -binary ../stage1-gcc/cc1$(exeext) -gcov cc1.fda 
-profile ../prev-gcc/$(PERF_DATA) -gcov_version 1
-
 #

 # Build hooks:
 
@@ -180,7 +171,6 @@ c.mostlyclean:
-rm -f cc1$(exeext)
-rm -f c/*$(objext)
-rm -f c/*$(coverageexts)
-   -rm -f cc1.fda
 c.clean:
 c.distclean:
-rm -f c/config.status c/Makefile
@@ -201,7 +191,4 @@ c.stageprofile: stageprofile-start
-mv c/*$(objext) stageprofile/c
 c.stagefeedback: stagefeedback-start
-mv c/*$(objext) stagefeedback/c
-c.autostageprofile: autostageprofile-start
-   -mv c/*$(objext) autostageprofile/c
-c.autostagefeedback: autostagefeedback-start
-   -mv c/*$(objext) autostagefeedback/c
+
diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in
index 291835d326e..49e5cd66912 100644
--- a/gcc/cp/Make-lang.in
+++ b/gcc/cp/Make-lang.in
@@ -178,7 +178,7 @@ endif
 cp/name-lookup.o: $(srcdir)/cp/std-name-hint.h
 
 cc1plus.fda: ../stage1-gcc/cc1plus$(exeext) ../prev-gcc/$(PERF_DATA)
-   $(CREATE_GCOV) -binary ../stage1-gcc/cc1plus$(exeext) -gcov cc1plus.fda 
-profile ../prev-gcc/$(PERF_DATA) -gcov_version 1
+   $(CREATE_GCOV) -binary ../stage1-gcc/cc1plus$(exeext) -gcov cc1plus.fda 
-profile ../prev-gcc/$(PERF_DATA) -gcov_version 2
 
 #

 # Build hooks:
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index c1876f24a84..61a483bc410 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3059,7 +3059,7 @@ It is recommended to only use GCC for this.
 
 On Linux/x86_64 hosts with some restrictions (no virtualization) it is
 also possible to do autofdo build with @samp{make
-autoprofiledback}. This uses Linux perf to sample branches in the
+autoprofiledbootstrap}. This uses Linux perf to sample branches in the
 binary and then rebuild it with feedback derived from the profile.
 Linux perf and the @code{autofdo} toolkit needs to be installed for
 this.
diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in
index a2dcf0dfc12..3ee748489ac 100644
--- a/gcc/lto/Make-lang.in
+++ b/gcc/lto/Make-lang.in
@@ -106,7 +106,7 @@ $(LTO_DUMP_EXE): $(LTO_DUMP_OBJS) $(BACKEND) $(LIBDEPS) 
$(lto2.prev)
 lto/lto-dump.o: $(LTO_OBJS)
 
 lto1.fda: ../prev-gcc/lto1$(exeext) ../prev-gcc/$(PERF_DATA)
-   $(CREATE_GCOV) -binary ../prev-gcc/lto1$(exeext) -gcov lto1.fda 
-profile ../prev-gcc/$(PERF_DATA) -gcov_version 1
+   $(CREATE_GCOV) -binary ../prev-gcc/lto1$(exeext) -gcov lto1.fda 
-profile ../prev-gcc/$(PERF_DATA) -gcov_version 2
 
 # LTO testing is done as part of C/C++/Fortran etc. testing.
 check-lto:
-- 
2.25.1

RE: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build

2022-11-22 Thread Eugene Rozenfeld via Gcc-patches

I took another look at this. We actually collect perf data when building the 
libraries. So, we have ./prev-gcc/perf.data, ./prev-libcpp/perf.data, 
./prev-libiberty/perf.data, etc. But when creating gcov data for  
-fauto-profile build of cc1plus or cc1 we only use ./prev-gcc/perf.data . So, a 
better solution would be either having a single perf.data for all builds (gcc 
and libraries) or merging perf.data files before attempting autostagefeedback. 
What would you recommend?

Thanks,

Eugene

-Original Message-
From: Jeff Law  
Sent: Tuesday, November 22, 2022 12:01 PM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org; 
Andi Kleen 
Subject: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build

[You don't often get email from jeffreya...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

On 11/21/22 14:57, Eugene Rozenfeld via Gcc-patches wrote:
> 1. Fix gcov version
> 2. Don't attempt to create an autoprofile file for cc1 since cc1plus 
> (not cc1) is not invoked when building cc1 3. Fix documentation typo
>
> Tested on x86_64-pc-linux-gnu.
>
> gcc/ChangeLog:
>
>   * c/Make-lang.in: Don't attempt to create an autoprofile file for cc1
>   * cp/Make-lang.in: Fix gcov version
>   * lto/Make-lang.in: Fix gcov version
>   * doc/install.texi: Fix documentation typo

Just to be 100% sure.  While the compiler is built with cc1plus, various 
runtime libraries are still build with the C compiler and thus would use cc1.  
AFAICT it looks like we don't try to build the runtime libraries to get any 
data about the behavior of the C compiler.  Can you confirm?


Assuming that's correct, this is fine for the trunk.


Thanks,

Jeff

RE: [EXTERNAL] Re: [PATCH] Fix count comparison in ipa-cp

2022-12-06 Thread Eugene Rozenfeld via Gcc-patches

I initially ran into this while reviving autoprofiledbootstrap build.

I was able to create a simple reliable test for this bug and captured it in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108000

I also included the test case in the updated patch below.

Eugene

=

The existing comparison was incorrect for non-PRECISE counts
(e.g., AFDO): we could end up with a 0 base_count, which could
lead to asserts, e.g., in good_cloning_opportunity_p.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
PR ipa/108000
* ipa-cp.cc (ipcp_propagate_stage): Fix profile count comparison

gcc/testsuite
* gcc.dg/tree-prof/pr108000.c: Regression test
---
 gcc/ipa-cp.cc |  2 +-
 gcc/testsuite/gcc.dg/tree-prof/pr108000.c | 93 +++
 2 files changed, 94 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/pr108000.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index d5230c7c5e6..cc031ebed0f 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -4225,7 +4225,7 @@ ipcp_propagate_stage (class ipa_topo_info *topo)
for (cgraph_edge *cs = node->callees; cs; cs = cs->next_callee)
  {
profile_count count = cs->count.ipa ();
-   if (!(count > profile_count::zero ()))
+   if (!count.nonzero_p ())
  continue;
 
enum availability avail;
diff --git a/gcc/testsuite/gcc.dg/tree-prof/pr108000.c 
b/gcc/testsuite/gcc.dg/tree-prof/pr108000.c
new file mode 100644
index 000..c59ea799748
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/pr108000.c
@@ -0,0 +1,93 @@
+/* { dg-options "-O2" } */
+
+#include 
+
+volatile int flag;
+const int array_size = 10;
+int* array;
+int iterations = 1000;
+
+#define BAR(num) \
+int __attribute__((noinline)) \
+bar##num (int i, int j) \
+{ \
+  if (i == 0) \
+return 2*num - 1; \
+  else \
+return 2*num; \
+}
+
+BAR(1)
+BAR(2)
+BAR(3)
+BAR(4)
+BAR(5)
+BAR(6)
+BAR(7)
+BAR(8)
+BAR(9)
+BAR(10)
+BAR(11)
+BAR(12)
+BAR(13)
+BAR(14)
+BAR(15)
+BAR(16)
+BAR(17)
+BAR(18)
+BAR(19)
+
+int __attribute__((noinline))
+foo ()
+{
+  switch (flag)
+  {
+   case 1:
+   return bar1 (0, 0);
+   case 2:
+   return bar2 (0, 0);
+   case 3:
+   return bar3 (0, 0);
+   case 4:
+   return bar4 (0, 0);
+   case 5:
+   return bar5 (0, 0);
+   case 6:
+   return bar6 (0, 0);
+   case 7:
+   return bar7 (0, 0);
+   case 8:
+   return bar8 (0, 0);
+   case 9:
+   return bar9 (0, 0);
+   case 10:
+   return bar10 (0, 0);
+   case 11:
+   return bar11 (0, 0);
+   case 12:
+   return bar12 (0, 0);
+   case 13:
+   return bar13 (0, 0);
+   case 14:
+   return bar14 (0, 0);
+   case 15:
+   return bar15 (0, 0);
+   case 16:
+   return bar16 (0, 0);
+   case 17:
+   return bar17 (0, 0);
+   case 18:
+   return bar18 (0, 0);
+   default:
+   return bar19(0, 0);
+  }
+}
+
+int
+main ()
+{
+  flag = 0;
+  array = calloc(array_size, sizeof(int));
+  for (int i = 0, j = 0; i < iterations; ++i, j = (j + 1) % 10)
+array[j] = foo ();
+}
-- 
2.25.1

-Original Message-
From: Jeff Law  
Sent: Tuesday, November 22, 2022 12:03 PM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH] Fix count comparison in ipa-cp

[You don't often get email from jeffreya...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

On 11/21/22 14:26, Eugene Rozenfeld via Gcc-patches wrote:
> The existing comparison was incorrect for non-PRECISE counts (e.g., 
> AFDO): we could end up with a 0 base_count, which could lead to 
> asserts, e.g., in good_cloning_opportunity_p.
>
> gcc/ChangeLog:
>
>  * ipa-cp.cc (ipcp_propagate_stage): Fix profile count comparison.

OK.  Probably somewhat painful to pull together a reliable test for this, right?


Jeff

RE: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build

2023-03-27 Thread Eugene Rozenfeld via Gcc-patches

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613974.html

Thanks,

Eugene

-Original Message-
From: Eugene Rozenfeld 
Sent: Tuesday, March 14, 2023 2:21 PM
To: Jeff Law ; gcc-patches@gcc.gnu.org; Andi Kleen 

Subject: RE: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build

Hi Jeff,

I revived profile_merger tool in http://github.com/google/autofdo and re-worked 
the patch to merge profiles for compiling the libraries.

Please take a look at the attached patch.

Thanks,

Eugene

-Original Message-
From: Jeff Law  
Sent: Tuesday, November 22, 2022 10:16 PM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org; 
Andi Kleen 
Subject: Re: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build

[You don't often get email from jeffreya...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

On 11/22/22 14:20, Eugene Rozenfeld wrote:
> I took another look at this. We actually collect perf data when building the 
> libraries. So, we have ./prev-gcc/perf.data, ./prev-libcpp/perf.data, 
> ./prev-libiberty/perf.data, etc. But when creating gcov data for  
> -fauto-profile build of cc1plus or cc1 we only use ./prev-gcc/perf.data . So, 
> a better solution would be either having a single perf.data for all builds 
> (gcc and libraries) or merging perf.data files before attempting 
> autostagefeedback. What would you recommend?

ISTM that if neither approach loses data, then they're functionally equivalent 
-- meaning that we can select whichever is easier to wire into our build system.

A single perf.data might serialize the build.  So perhaps separate, then merge 
right before autostagefeedback.


But I'm willing to go with whatever you think is best.

Jeff

86 matches

Mail list logo