https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88440
--- Comment #7 from Martin Liška <marxin at gcc dot gnu.org> --- (In reply to Richard Biener from comment #6) > Created attachment 45313 [details] > patch > > This enables distribution of patterns at -O[2s]+ and optimizes the testcase > at -Os by adjusting the guards in loop distribution. > > Note that the interesting bits are compile-time, binary-size and performance > at mainly -O2, eventually size at -Os. > > I suspect that at -O2 w/o profiling most loops would be > optimize_loop_for_speed > anyways so changing the heuristics isn't so bad but of course enabling > distribution at -O2 might encour a penalty. I have so far build numbers on a Zen machine with -j16: SPEC2006: Elapsed compile for '400.perlbench': 00:00:05 (5) Elapsed compile for '401.bzip2': 00:00:02 (2) Elapsed compile for '403.gcc': 00:00:11 (11) Elapsed compile for '429.mcf': 00:00:01 (1) Elapsed compile for '445.gobmk': 00:00:04 (4) Elapsed compile for '456.hmmer': 00:00:01 (1) Elapsed compile for '458.sjeng': 00:00:01 (1) Elapsed compile for '462.libquantum': 00:00:01 (1) Elapsed compile for '464.h264ref': 00:00:04 (4) Elapsed compile for '471.omnetpp': 00:00:05 (5) Elapsed compile for '473.astar': 00:00:01 (1) Elapsed compile for '483.xalancbmk': 00:00:21 (21) Elapsed compile for '410.bwaves': 00:00:01 (1) Elapsed compile for '416.gamess': 00:00:20 (20) Elapsed compile for '433.milc': 00:00:02 (2) Elapsed compile for '434.zeusmp': 00:00:02 (2) Elapsed compile for '435.gromacs': 00:00:06 (6) Elapsed compile for '436.cactusADM': 00:00:04 (4) Elapsed compile for '437.leslie3d': 00:00:04 (4) Elapsed compile for '444.namd': 00:00:09 (9) Elapsed compile for '447.dealII': 00:00:15 (15) Elapsed compile for '450.soplex': 00:00:03 (3) Elapsed compile for '453.povray': 00:00:04 (4) Elapsed compile for '454.calculix': 00:00:06 (6) Elapsed compile for '459.GemsFDTD': 00:00:09 (9) Elapsed compile for '465.tonto': 00:00:53 (53) Elapsed compile for '470.lbm': 00:00:02 (2) Elapsed compile for '481.wrf': 00:00:38 (38) Elapsed compile for '482.sphinx3': 00:00:01 (1) All differences before and after are withing 1s, which is granularity. SPEC 2017: Elapsed compile for '503.bwaves_r': 00:00:01 (1) Elapsed compile for '507.cactuBSSN_r': 00:00:25 (25) Elapsed compile for '508.namd_r': 00:00:09 (9) Elapsed compile for '510.parest_r': 00:00:46 (46) Elapsed compile for '511.povray_r': 00:00:04 (4) Elapsed compile for '519.lbm_r': 00:00:01 (1) Elapsed compile for '521.wrf_r': 00:05:46 (346) Elapsed compile for '526.blender_r': 00:00:25 (25) Elapsed compile for '527.cam4_r': 00:00:37 (37) Elapsed compile for '538.imagick_r': 00:00:11 (11) Elapsed compile for '544.nab_r': 00:00:01 (1) Elapsed compile for '549.fotonik3d_r': 00:00:07 (7) Elapsed compile for '554.roms_r': 00:00:06 (6) Elapsed compile for '500.perlbench_r': 00:00:09 (9) Elapsed compile for '502.gcc_r': 00:00:44 (44) Elapsed compile for '505.mcf_r': 00:00:01 (1) Elapsed compile for '520.omnetpp_r': 00:00:12 (12) Elapsed compile for '523.xalancbmk_r': 00:00:25 (25) Elapsed compile for '525.x264_r': 00:00:09 (9) Elapsed compile for '531.deepsjeng_r': 00:00:02 (2) Elapsed compile for '541.leela_r': 00:00:03 (3) Elapsed compile for '548.exchange2_r': 00:00:04 (4) Elapsed compile for '557.xz_r': 00:00:01 (1) There's only one difference: 521.wrf_r: 310 -> 346s