On Mon, May 28, 2018 at 5:50 PM Allan Sandfeld Jensen <li...@carewolf.com> wrote:
> On Montag, 28. Mai 2018 12:58:20 CEST Richard Biener wrote: > > compile-time effects of the patch on that. Embedded folks may want to rhn > > their favorite benchmark and report results as well. > > > > So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006 compile > > and run and the compile-time > > effect where measurable (SPEC records on a second granularity) is within > > one second per benchmark > > apart from 410.bwaves (from 3s to 5s) and 481.wrf (76s to 78s). > > Performance-wise I notice significant > > slowdowns for SPEC FP and some for SPEC INT (I only did a train run > > sofar). I'll re-run with ref input now > > and will post those numbers. > > > If you continue to see slowdowns, could you check with either no avx, or with > -mprefer-avx128? The occational AVX256 instructions might be downclocking the > CPU. But yes that would be a problem for this change on its own. So here's a complete two-run with ref input, peak is -O2 -march=haswell -ftree-slp-vectorize. It confirms the slowdowns in SPEC FP but not in SPEC INT. You are right that using AVX256 (or AVX512) might be problematic on its own but that is not restricted to -O2 -ftree-slp-vectorize but also -O3. I will re-benchmark the SPEC FP part with -mprefer-avx128 to see if that is the issue. Note I did not use any -ffast-math flags in the experiment - those are as "unlikely" as using -march=native together with -O2. In theory another issue is the ability to debug code. Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 410.bwaves 13590 362 37.5 * 13590 370 36.7 * 410.bwaves 13590 365 37.2 S 13590 377 36.0 S 416.gamess 19580 558 35.1 * 19580 598 32.7 * 416.gamess 19580 560 35.0 S 19580 600 32.6 S 433.milc 9180 331 27.8 S 9180 374 24.6 * 433.milc 9180 331 27.8 * 9180 383 24.0 S 434.zeusmp 9100 301 30.2 S 9100 301 30.2 * 434.zeusmp 9100 301 30.2 * 9100 302 30.1 S 435.gromacs 7140 300 23.8 S 7140 303 23.6 S 435.gromacs 7140 298 23.9 * 7140 301 23.8 * 436.cactusADM 11950 495 24.1 S 11950 482 24.8 * 436.cactusADM 11950 486 24.6 * 11950 484 24.7 S 437.leslie3d 9400 289 32.5 * 9400 288 32.6 * 437.leslie3d 9400 301 31.3 S 9400 289 32.5 S 444.namd 8020 301 26.6 * 8020 301 26.6 * 444.namd 8020 301 26.6 S 8020 301 26.6 S 447.dealII 11440 255 44.9 * 11440 252 45.3 * 447.dealII 11440 255 44.9 S 11440 253 45.3 S 450.soplex 8340 212 39.4 S 8340 213 39.1 S 450.soplex 8340 211 39.5 * 8340 211 39.5 * 453.povray 5320 111 47.9 S 5320 113 47.0 S 453.povray 5320 111 48.0 * 5320 113 47.2 * 454.calculix 8250 748 11.0 * 8250 835 9.88 * 454.calculix 8250 748 11.0 S 8250 835 9.88 S 459.GemsFDTD 10610 324 32.8 S 10610 324 32.8 S 459.GemsFDTD 10610 323 32.9 * 10610 323 32.9 * 465.tonto 9840 449 21.9 S 9840 469 21.0 * 465.tonto 9840 446 22.0 * 9840 469 21.0 S 470.lbm 13740 253 54.3 * 13740 255 53.9 S 470.lbm 13740 253 54.2 S 13740 254 54.2 * 481.wrf 11170 415 26.9 * 11170 416 26.9 S 481.wrf 11170 417 26.8 S 11170 416 26.9 * 482.sphinx3 19490 456 42.7 * 19490 465 41.9 * 482.sphinx3 19490 464 42.0 S 19490 468 41.6 S Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 400.perlbench 9770 251 38.9 S 9770 252 38.8 S 400.perlbench 9770 250 39.1 * 9770 251 39.0 * 401.bzip2 9650 399 24.2 S 9650 397 24.3 S 401.bzip2 9650 395 24.4 * 9650 395 24.4 * 403.gcc 8050 246 32.8 S 8050 245 32.9 S 403.gcc 8050 244 33.0 * 8050 243 33.1 * 429.mcf 9120 251 36.3 S 9120 248 36.8 * 429.mcf 9120 250 36.5 * 9120 248 36.8 S 445.gobmk 10490 394 26.6 S 10490 392 26.8 * 445.gobmk 10490 393 26.7 * 10490 392 26.8 S 456.hmmer 9330 389 24.0 S 9330 388 24.0 * 456.hmmer 9330 389 24.0 * 9330 389 24.0 S 458.sjeng 12100 447 27.1 * 12100 439 27.5 * 458.sjeng 12100 449 27.0 S 12100 449 26.9 S 462.libquantum 20720 309 67.0 S 20720 307 67.5 S 462.libquantum 20720 302 68.7 * 20720 300 69.1 * 464.h264ref 22130 457 48.5 S 22130 459 48.2 S 464.h264ref 22130 456 48.6 * 22130 459 48.2 * 471.omnetpp 6250 307 20.4 * 6250 308 20.3 * 471.omnetpp 6250 317 19.7 S 6250 310 20.2 S 473.astar 7020 346 20.3 * 7020 347 20.2 * 473.astar 7020 346 20.3 S 7020 347 20.2 S 483.xalancbmk 6900 198 34.8 * 6900 199 34.7 * 483.xalancbmk 6900 202 34.2 S 6900 203 34.1 S > 'Allan