>
>
> > On 26 May 2025, at 5:34 pm, Jan Hubicka <[email protected]> wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Hi,
> > also, please, can you add an testcase? We should have some coverage for
> > auto-fdo specific issues....
> I was looking for this too. AFIK we dont do any testing currently.
> We could
>
> 1. Add gcov files as part of the test. However, This would make updating gcov
> versions difficult.
> 2.We could add execution test that also uses autfdo tools to generate .gcov.
> This would make them slow.
> Also we may not be able to match exact profile values and only see if afdo
> annotations are there.
There is a testuiste coverage, but currently enabled only for Intel
based x86_64 CPUs and I think no-one runs it regularly. To get AutoFDO
into a good shape we definitely need to enable it on more setup and also
start testing/benmarking regularly.
For a long time I had no easy access for CPU with AutoFDO support, but
now I have zen3 based desktop and also use zen5 based box for testing.
I think the attached patch makes testuite do the right hting on AMD Zens 3,4
and 5.
I get following failures on Zen5:
FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining add1/1
into main/4."
FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining sub1/2
into main/4."
FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function ..;"
FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times"
FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 times"
while on Intel CPU I get:
FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining add1/1
into main/4."
FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining sub1/2
into main/4."
FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-tree-dump-not optimized "Invalid
sum"
FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function ..;"
FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times"
FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 times"
FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 "Peeled likely exits: likely
decreased number of iterations of loop 1"
FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 "Peeled all exits: decreased
number of iterations of loop 2"
FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-tree-dump-not optimized
"Invalid sum"
I did not dive yet into where the difference scome from.
Andy, does the patch makes sense to you? I simply followed kernel's
auto-fdo instructions for clang and built current git version of
create_gcov. In the past I always had troubles to get create_gcov
working with version of perf distributted by open-suse, but this time it
seems to work even though it complains:
[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1322]
Skipping 4228 bytes of metadata: HEADER_CPU_TOPOLOGY
[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
Skipping unsupported event PERF_RECORD_ID_INDEX
[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
Skipping unsupported event PERF_RECORD_EVENT_UPDATE
[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
Skipping unsupported event PERF_RECORD_CPU_MAP
[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
Skipping unsupported event UNKNOWN_EVENT_82
[INFO:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1060]
Number of events stored: 2178
[INFO:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_parser.cc:272]
Parser processed: 5 MMAP/MMAP2 events, 2 COMM events, 0 FORK events, 1 EXIT
events, 2108 SAMPLE events, 2099 of these were mapped, 0 SAMPLE events with a
data address, 0 of these were mapped
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20250525 22:10:18.478610 1692721 sample_reader.cc:289] No buildid found in
binary
W20250525 22:10:18.479000 1692721 sample_reader.cc:345] Bogus LBR data (range
is negative): 1050->0 index=4
W20250525 22:10:18.479007 1692721 sample_reader.cc:345] Bogus LBR data (range
is negative): 1057->0 index=2
W20250525 22:10:18.479010 1692721 sample_reader.cc:345] Bogus LBR data (range
is negative): 1050->0 index=6
W20250525 22:10:18.479013 1692721 sample_reader.cc:345] Bogus LBR data (range
is negative): 1050->0 index=6
W20250525 22:10:18.479017 1692721 sample_reader.cc:345] Bogus LBR data (range
is negative): 1057->0 index=8
W20250525 22:10:18.479019 1692721 sample_reader.cc:345] Bogus LBR data (range
is negative): 1050->0 index=c
I20250525 22:10:18.479228 1692721 symbol_map.cc:477] Adding loadable exec
segment: offset=1000 vaddr=401000
Did someone run SPEC recently? I made auto-FDO spec config and tested
-Ofast with ipa-icf, ipa-cp-clone and ipa-sra disabled (to get rid of
the clone merging). I get sort of comparable results as w/o profile at
all. This is actually not _that_ bad start - it means that the data is
probably not completely bogus, just not very useful :)
(Without disabling ipa-cp, for example exchange regresses a lot since
all profile info of the hot clone is lost).
About the pre-ipa and post-ipa clone issues I think we may need to list
names of clones that are created late we want to drop and keep it up to
date, perhaps inventing clones.def file...
Honza
contrib/ChangeLog:
* gen_autofdo_event.py: Add support for AMD Zen 3 and
later CPUs.
gcc/ChangeLog:
* config/i386/gcc-auto-profile: regenerate.
diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
index 4364e5ce072..b1d373f82fe 100755
--- a/contrib/gen_autofdo_event.py
+++ b/contrib/gen_autofdo_event.py
@@ -138,8 +138,16 @@ if [ "$1" = "--all" ] ; then
shift
fi
-if ! grep -q Intel /proc/cpuinfo ; then
- echo >&2 "Only Intel CPUs supported"
+if grep -q AuthenticAMD /proc/cpuinfo ; then
+ vendor=AMD
+ if ! grep -q " brs" /proc/cpuinfo && ! grep -q amd_lbr_v2 /proc/cpuinfo ;
then
+ echo >&2 "AMD CPU with brs (Zen 3) or amd_lbr_v2 (Zen 4+) feature is
required"
+ exit 1
+ fi
+elif grep -q Intel /proc/cpuinfo ; then
+ vendor=Intel
+else
+ echo >&2 "Only AMD and Intel CPUs supported"
exit 1
fi
@@ -147,7 +155,7 @@ if grep -q hypervisor /proc/cpuinfo ; then
echo >&2 "Warning: branch profiling may not be functional in VMs"
fi
-case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
+case `test $vendor = Intel && grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
grep -E "^model\s*:" /proc/cpuinfo | head -n1` in''')
for event, mod in eventmap.items():
for m in mod[:-1]:
@@ -156,8 +164,13 @@ case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
print(r'''*)
if perf list br_inst_retired | grep -q br_inst_retired.near_taken ;
then
E=br_inst_retired.near_taken:p
+ elif perf list ex_ret_brn_tkn | grep -q ex_ret_brn_tkn ; then
+ E=ex_ret_brn_tkn:P$FLAGS
+ elif $vendor = Intel ; then
+echo >&2 "Unknown Intel CPU. Run contrib/gen_autofdo_event.py --all --script
to update script."
+ exit 1
else
-echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to
update script."
+echo >&2 "AMD CPU without support for ex_ret_brn_tkn event"
exit 1
fi ;;''')
print(r"esac")
diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
index 528b34e4240..0e9e5fec2fe 100755
--- a/gcc/config/i386/gcc-auto-profile
+++ b/gcc/config/i386/gcc-auto-profile
@@ -24,8 +24,16 @@ if [ "$1" = "--all" ] ; then
shift
fi
-if ! grep -q Intel /proc/cpuinfo ; then
- echo >&2 "Only Intel CPUs supported"
+if grep -q AuthenticAMD /proc/cpuinfo ; then
+ vendor=AMD
+ if ! grep -q " brs" /proc/cpuinfo && ! grep -q amd_lbr_v2 /proc/cpuinfo ;
then
+ echo >&2 "AMD CPU with brs (Zen 3) or amd_lbr_v2 (Zen 4+) feature is
required"
+ exit 1
+ fi
+elif grep -q Intel /proc/cpuinfo ; then
+ vendor=Intel
+else
+ echo >&2 "Only AMD and Intel CPUs supported"
exit 1
fi
@@ -33,7 +41,7 @@ if grep -q hypervisor /proc/cpuinfo ; then
echo >&2 "Warning: branch profiling may not be functional in VMs"
fi
-case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
+case `test $vendor = Intel && grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
grep -E "^model\s*:" /proc/cpuinfo | head -n1` in
model*:\ 46|\
model*:\ 30|\
@@ -82,6 +90,8 @@ model*:\ 126|\
model*:\ 167|\
model*:\ 140|\
model*:\ 141|\
+model*:\ 143|\
+model*:\ 207|\
model*:\ 106|\
model*:\ 108|\
model*:\ 173|\
@@ -89,15 +99,20 @@ model*:\ 174) E="cpu/event=0xc4,umask=0x20/$FLAGS" ;;
model*:\ 134|\
model*:\ 150|\
model*:\ 156) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;;
-model*:\ 143|\
-model*:\ 207) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;;
-model*:\ 190) E="cpu/event=0xc4,umask=0xc0/$FLAGS" ;;
+model*:\ 190|\
+model*:\ 175|\
+model*:\ 182) E="cpu/event=0xc4,umask=0xc0/$FLAGS" ;;
model*:\ 190) E="cpu/event=0xc4,umask=0xfe/$FLAGS" ;;
*)
if perf list br_inst_retired | grep -q br_inst_retired.near_taken ;
then
E=br_inst_retired.near_taken:p
+ elif perf list ex_ret_brn_tkn | grep -q ex_ret_brn_tkn ; then
+ E=ex_ret_brn_tkn:P$FLAGS
+ elif $vendor = Intel ; then
+echo >&2 "Unknown Intel CPU. Run contrib/gen_autofdo_event.py --all --script
to update script."
+ exit 1
else
-echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to
update script."
+echo >&2 "AMD CPU without support for ex_ret_brn_tkn event"
exit 1
fi ;;
esac