https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88440

--- Comment #7 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #6)
> Created attachment 45313 [details]
> patch
> 
> This enables distribution of patterns at -O[2s]+ and optimizes the testcase
> at -Os by adjusting the guards in loop distribution.
> 
> Note that the interesting bits are compile-time, binary-size and performance
> at mainly -O2, eventually size at -Os.
> 
> I suspect that at -O2 w/o profiling most loops would be
> optimize_loop_for_speed
> anyways so changing the heuristics isn't so bad but of course enabling
> distribution at -O2 might encour a penalty.

I have so far build numbers on a Zen machine with -j16:

SPEC2006:

  Elapsed compile for '400.perlbench': 00:00:05 (5)
  Elapsed compile for '401.bzip2': 00:00:02 (2)
  Elapsed compile for '403.gcc': 00:00:11 (11)
  Elapsed compile for '429.mcf': 00:00:01 (1)
  Elapsed compile for '445.gobmk': 00:00:04 (4)
  Elapsed compile for '456.hmmer': 00:00:01 (1)
  Elapsed compile for '458.sjeng': 00:00:01 (1)
  Elapsed compile for '462.libquantum': 00:00:01 (1)
  Elapsed compile for '464.h264ref': 00:00:04 (4)
  Elapsed compile for '471.omnetpp': 00:00:05 (5)
  Elapsed compile for '473.astar': 00:00:01 (1)
  Elapsed compile for '483.xalancbmk': 00:00:21 (21)
  Elapsed compile for '410.bwaves': 00:00:01 (1)
  Elapsed compile for '416.gamess': 00:00:20 (20)
  Elapsed compile for '433.milc': 00:00:02 (2)
  Elapsed compile for '434.zeusmp': 00:00:02 (2)
  Elapsed compile for '435.gromacs': 00:00:06 (6)
  Elapsed compile for '436.cactusADM': 00:00:04 (4)
  Elapsed compile for '437.leslie3d': 00:00:04 (4)
  Elapsed compile for '444.namd': 00:00:09 (9)
  Elapsed compile for '447.dealII': 00:00:15 (15)
  Elapsed compile for '450.soplex': 00:00:03 (3)
  Elapsed compile for '453.povray': 00:00:04 (4)
  Elapsed compile for '454.calculix': 00:00:06 (6)
  Elapsed compile for '459.GemsFDTD': 00:00:09 (9)
  Elapsed compile for '465.tonto': 00:00:53 (53)
  Elapsed compile for '470.lbm': 00:00:02 (2)
  Elapsed compile for '481.wrf': 00:00:38 (38)
  Elapsed compile for '482.sphinx3': 00:00:01 (1)

All differences before and after are withing 1s, which is granularity.

SPEC 2017:

  Elapsed compile for '503.bwaves_r': 00:00:01 (1)
  Elapsed compile for '507.cactuBSSN_r': 00:00:25 (25)
  Elapsed compile for '508.namd_r': 00:00:09 (9)
  Elapsed compile for '510.parest_r': 00:00:46 (46)
  Elapsed compile for '511.povray_r': 00:00:04 (4)
  Elapsed compile for '519.lbm_r': 00:00:01 (1)
  Elapsed compile for '521.wrf_r': 00:05:46 (346)
  Elapsed compile for '526.blender_r': 00:00:25 (25)
  Elapsed compile for '527.cam4_r': 00:00:37 (37)
  Elapsed compile for '538.imagick_r': 00:00:11 (11)
  Elapsed compile for '544.nab_r': 00:00:01 (1)
  Elapsed compile for '549.fotonik3d_r': 00:00:07 (7)
  Elapsed compile for '554.roms_r': 00:00:06 (6)
  Elapsed compile for '500.perlbench_r': 00:00:09 (9)
  Elapsed compile for '502.gcc_r': 00:00:44 (44)
  Elapsed compile for '505.mcf_r': 00:00:01 (1)
  Elapsed compile for '520.omnetpp_r': 00:00:12 (12)
  Elapsed compile for '523.xalancbmk_r': 00:00:25 (25)
  Elapsed compile for '525.x264_r': 00:00:09 (9)
  Elapsed compile for '531.deepsjeng_r': 00:00:02 (2)
  Elapsed compile for '541.leela_r': 00:00:03 (3)
  Elapsed compile for '548.exchange2_r': 00:00:04 (4)
  Elapsed compile for '557.xz_r': 00:00:01 (1)

There's only one difference:

521.wrf_r: 310 -> 346s

Reply via email to