https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #18 from Jan Hubicka <hubicka at ucw dot cz> --- > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 > > --- Comment #16 from kugan at gcc dot gnu.org --- > I ran spec2017 again with recent gcc and SPE based autofdo (with local patches > to enable SPE based profiling support for autofdo tools). I am seeing > following > compared PGO: > > 621.wrf_s -23% > 549.fotonik3d_r -21% > 525.x264_r -17% > 644.nab_s -14% > 603.bwaves_s -13% > 625.x264_s -12% > 623.xalancbmk_s -12% > 600.perlbench_s -11% > 500.perlbench_r -10% LNT tester reports the following regressions: SPEC/SPEC2017/FP/521.wrf_r 110.97% SPEC/SPEC2017/FP/538.imagick_r 67.70% SPEC/SPEC2017/FP/554.roms_r 15.77% SPEC/SPEC2017/FP/503.bwaves_r 12.67% SPEC/SPEC2017/INT/523.xalancbmk_r 11.29% SPEC/SPEC2017/INT/548.exchange2_r 10.72% SPEC/SPEC2017/FP/508.namd_r 8.78% SPEC/SPEC2017/INT/531.deepsjeng_r 7.26% SPEC/SPEC2017/INT/541.leela_r 6.54% SPEC/SPEC2017/FP/519.lbm_r 5.72% SPEC/SPEC2017/FP/549.fotonik3d_r 3.37% SPEC/SPEC2017/INT/525.x264_r 3.09% SPEC/SPEC2017/FP/510.parest_r 2.97% SPEC/SPEC2017/FP/527.cam4_r 2.23% SPEC/SPEC2017/INT/505.mcf_r 2.22% In our setup wrf training is broken and does nothing without failing the verification (which is odd). As a result profile is almost empty and everything is optimized for size. I wonder if that is true for you as well? You can do gcov_dump and see if you get any reasobly large numbers. I am quite puzzled by this issue but did not have time to debug it yet. imagemagick has broken train dataset in spec (it does not train the hot loop which disables vectorization). I hacked runspec so I can use -train_with=refrun and them imagemagick actually runs faster with autofdo than without. So i think it is non-issue. (with this hack autofdo now seems overall win for SPECfp modulo wrf) Roms regresses due to disabled vectorization due to header BB having very low count. This is caused by vectorizer not doing very good job on updating debu gstatement but also triggers problem in create_gcov not consumming dwarf correctly. LLVM has 3-layer discriminators that can record multiplicity, so vectorization can keep iterations count accurate. I think it is useful feature we should implement as well. https://lists.llvm.org/pipermail/llvm-dev/2020-November/146694.html fotonik was quite random for us, so we hacked the config file to train every binary 8 times which reduced noise in profile obtained. Here is history of runs https://lnt.opensuse.org/db_default/v4/SPEC/graph?highlight_run=69309&plot.0=1370.527.0&plot.1=1288.527.0 and one can see that the randomness went away. Some of regressions goes away for me with -fprofile-partial-training since IPA code is still havind some issues with AFDO 0 not actually being 0. In particular if funcction has only 0 samples in it, it will get 0 AFDO profile with local profile preserved. If later function with non-zero AFDO profile is inlined to it, it wil get 0 AFDO profile and end up optimized for size. I did not look into other regressions yet. I think it would be interesting to unerstand leela, deepsjent and xalancbnk since they are quite C++ heavy. povray, omnetpp, perlbench, gcc sees out of noise improvmeents in our setup. It would be interesting why perlbench regresses for you https://lnt.opensuse.org/db_default/v4/SPEC/graph?highlight_run=69309&plot.0=1370.327.0&plot.1=1288.327.0 https://lnt.opensuse.org/db_default/v4/SPEC/69309?compare_to=69261