> Hi,
> autofdo tests are now running only for x86. This patch makes it
> run for aarch64 too. Verified that perf and create_gcov are running
> as expected.
> 
> gcc/ChangeLog:
> 
>         * config/aarch64/gcc-auto-profile: Make script executable.
> 
> gcc/testsuite/ChangeLog:
> 
>         * lib/target-supports.exp: Enable autofdo tests for aarch64.
> 
> Is this OK?
OK.
What is your set of failures?
I now get on AMD

FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining add1/1 
into main/4."
FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining sub1/2 
into main/4."
FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-tree-dump-not optimized "Invalid 
sum"
FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function ..;"
FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times"
FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 times"

and on Intel

./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-prof/indir-call-prof-2.c 
scan-ipa-dump afdo "Inlining add1/1 into main/4."
./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-prof/indir-call-prof-2.c 
scan-ipa-dump afdo "Inlining sub1/2 into main/4."
./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump 
optimized "cold_function ..;"
./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll 
"Peeled loop ., 1 times"
./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 
"Peeled all exits: decreased number of iterations of loop 2"
./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 
"Peeled likely exits: likely decreased number of iterations of loop 1"
./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll 
"Peeled loop 2, 1 times"

(i.e. some of the failures are gone after fixing the autofdo 0 issues)

The peeling tests have loop with low iteration count which is not
visible to inliner and tests that profile feedback determines it.  I do
not see how auto-FDO (at least in current form) can do this reliably.
Even if we measure taken branches their count wil differ i.e. with
unrolling or vectorization.  So I think we can just diable those tests
for AFDO.  Now sure what happens with indir call and inliner yet.

The difference there is that Intel produces more events then AMD (which
is probably due to different default sampling count).

Honza

Reply via email to