On 2017.11.07 at 00:12 +0100, Jan Hubicka wrote:
> > On 2017.11.05 at 11:55 +0100, Jan Hubicka wrote:
> > > > On 2017.11.03 at 16:48 +0100, Jan Hubicka wrote:
> > > > > this is updated patch which I have comitted after
> > > > > profiledbootstrapping x86-64
> > > >
> > > > Unfortunately, compiling tramp3d-v4.cpp is 6-7% slower after this patch.
> > > > This happens with an LTO/PGO bootstrapped gcc using
> > > > --enable-checking=release.
> > >
> > > our periodic testers has also picked up the change and there is no
> > > compile time
> > > regression reported for tramp3d.
> > > https://gcc.opensuse.org/gcc-old/c++bench-czerny/tramp3d/
> > > so I would conclude that it is regression in LTO+PGO bootstrap. I am
> > > fixing one checking
> > > bug that may cause it (where we mix local and global profiles) so perhaps
> > > it will go away
> > > afterwards.
> >
> > Just to confirm: pure PGO bootstrap is fine, e.g. on Ryzen:
> > (LTO/PGO) 17.65 sec ( +- 0.68% )
> > (PGO) 15.74 sec ( +- 0.27% )
>
> Thanks. I have comitted the patch for inlining profile update bug, so with
> some
> luck LTO/PGO may be fine again.
It got worse, unfortunately:
Pure PGO:
Performance counter stats for '/home/trippels/gcc_8/usr/local/bin/g++ -w
-Ofast tramp3d-v4.cpp' (4 runs):
16213.529306 task-clock (msec) # 0.999 CPUs utilized
( +- 0.25% )
1,387 context-switches # 0.086 K/sec
( +- 0.17% )
4 cpu-migrations # 0.000 K/sec
( +- 14.80% )
261,764 page-faults # 0.016 M/sec
( +- 0.03% )
62,633,457,222 cycles # 3.863 GHz
( +- 0.20% ) (83.32%)
13,990,050,204 stalled-cycles-frontend # 22.34% frontend cycles
idle ( +- 0.51% ) (83.33%)
13,189,755,888 stalled-cycles-backend # 21.06% backend cycles
idle ( +- 0.04% ) (83.31%)
75,194,592,630 instructions # 1.20 insn per cycle
# 0.19 stalled cycles per
insn ( +- 0.03% ) (83.35%)
17,113,639,942 branches # 1055.516 M/sec
( +- 0.02% ) (83.38%)
634,471,544 branch-misses # 3.71% of all branches
( +- 0.07% ) (83.34%)
16.226375499 seconds time elapsed
( +- 0.24% )
LTO/PGO:
Performance counter stats for '/home/trippels/gcc_8/usr/local/bin/g++ -w
-Ofast tramp3d-v4.cpp' (4 runs):
18622.496264 task-clock (msec) # 0.999 CPUs utilized
( +- 0.35% )
1,592 context-switches # 0.086 K/sec
( +- 0.32% )
4 cpu-migrations # 0.000 K/sec
( +- 14.43% )
261,370 page-faults # 0.014 M/sec
( +- 0.12% )
71,849,030,564 cycles # 3.858 GHz
( +- 0.08% ) (83.34%)
15,987,209,604 stalled-cycles-frontend # 22.25% frontend cycles
idle ( +- 0.47% ) (83.32%)
14,336,345,458 stalled-cycles-backend # 19.95% backend cycles
idle ( +- 0.05% ) (83.33%)
87,674,608,740 instructions # 1.22 insn per cycle
# 0.18 stalled cycles per
insn ( +- 0.01% ) (83.36%)
20,610,950,144 branches # 1106.777 M/sec
( +- 0.01% ) (83.35%)
638,454,497 branch-misses # 3.10% of all branches
( +- 0.08% ) (83.35%)
18.644370559 seconds time elapsed
( +- 0.38% )
--
Markus