On Mon, May 28, 2018 at 5:50 PM Allan Sandfeld Jensen <li...@carewolf.com>
wrote:

> On Montag, 28. Mai 2018 12:58:20 CEST Richard Biener wrote:
> > compile-time effects of the patch on that. Embedded folks may want to
rhn
> > their favorite benchmark and report results as well.
> >
> > So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006
compile
> > and run and the compile-time
> > effect where measurable (SPEC records on a second granularity) is within
> > one second per benchmark
> > apart from 410.bwaves (from 3s to 5s)  and 481.wrf (76s to 78s).
> > Performance-wise I notice significant
> > slowdowns for SPEC FP and some for SPEC INT (I only did a train run
> > sofar).  I'll re-run with ref input now
> > and will post those numbers.
> >
> If you continue to see slowdowns, could you check with either no avx, or
with
> -mprefer-avx128? The occational AVX256 instructions might be downclocking
the
> CPU. But yes that would be a problem for this change on its own.

So here's a complete two-run with ref input, peak is -O2 -march=haswell
-ftree-slp-vectorize.
It confirms the slowdowns in SPEC FP but not in SPEC INT.  You are right
that using
AVX256 (or AVX512) might be problematic on its own but that is not
restricted to
-O2 -ftree-slp-vectorize but also -O3.  I will re-benchmark the SPEC FP
part with
-mprefer-avx128 to see if that is the issue.  Note I  did not use any
-ffast-math flags in the
experiment - those are as "unlikely" as using -march=native together with
-O2.  In theory
another issue is the ability to debug code.

                 Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  ---------
410.bwaves      13590        362       37.5 *   13590        370      36.7
  *
410.bwaves      13590        365       37.2 S   13590        377      36.0
  S
416.gamess      19580        558       35.1 *   19580        598      32.7
  *
416.gamess      19580        560       35.0 S   19580        600      32.6
  S
433.milc         9180        331       27.8 S    9180        374      24.6
  *
433.milc         9180        331       27.8 *    9180        383      24.0
  S
434.zeusmp       9100        301       30.2 S    9100        301      30.2
  *
434.zeusmp       9100        301       30.2 *    9100        302      30.1
  S
435.gromacs      7140        300       23.8 S    7140        303      23.6
  S
435.gromacs      7140        298       23.9 *    7140        301      23.8
  *
436.cactusADM   11950        495       24.1 S   11950        482      24.8
  *
436.cactusADM   11950        486       24.6 *   11950        484      24.7
  S
437.leslie3d     9400        289       32.5 *    9400        288      32.6
  *
437.leslie3d     9400        301       31.3 S    9400        289      32.5
  S
444.namd         8020        301       26.6 *    8020        301      26.6
  *
444.namd         8020        301       26.6 S    8020        301      26.6
  S
447.dealII      11440        255       44.9 *   11440        252      45.3
  *
447.dealII      11440        255       44.9 S   11440        253      45.3
  S
450.soplex       8340        212       39.4 S    8340        213      39.1
  S
450.soplex       8340        211       39.5 *    8340        211      39.5
  *
453.povray       5320        111       47.9 S    5320        113      47.0
  S
453.povray       5320        111       48.0 *    5320        113      47.2
  *
454.calculix     8250        748       11.0 *    8250        835       9.88
*
454.calculix     8250        748       11.0 S    8250        835       9.88
S
459.GemsFDTD    10610        324       32.8 S   10610        324      32.8
  S
459.GemsFDTD    10610        323       32.9 *   10610        323      32.9
  *
465.tonto        9840        449       21.9 S    9840        469      21.0
  *
465.tonto        9840        446       22.0 *    9840        469      21.0
  S
470.lbm         13740        253       54.3 *   13740        255      53.9
  S
470.lbm         13740        253       54.2 S   13740        254      54.2
  *
481.wrf         11170        415       26.9 *   11170        416      26.9
  S
481.wrf         11170        417       26.8 S   11170        416      26.9
  *
482.sphinx3     19490        456       42.7 *   19490        465      41.9
  *
482.sphinx3     19490        464       42.0 S   19490        468      41.6
  S

                 Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  ---------
400.perlbench    9770        251       38.9 S    9770        252       38.8
S
400.perlbench    9770        250       39.1 *    9770        251       39.0
*
401.bzip2        9650        399       24.2 S    9650        397       24.3
S
401.bzip2        9650        395       24.4 *    9650        395       24.4
*
403.gcc          8050        246       32.8 S    8050        245       32.9
S
403.gcc          8050        244       33.0 *    8050        243       33.1
*
429.mcf          9120        251       36.3 S    9120        248       36.8
*
429.mcf          9120        250       36.5 *    9120        248       36.8
S
445.gobmk       10490        394       26.6 S   10490        392       26.8
*
445.gobmk       10490        393       26.7 *   10490        392       26.8
S
456.hmmer        9330        389       24.0 S    9330        388       24.0
*
456.hmmer        9330        389       24.0 *    9330        389       24.0
S
458.sjeng       12100        447       27.1 *   12100        439       27.5
*
458.sjeng       12100        449       27.0 S   12100        449       26.9
S
462.libquantum  20720        309       67.0 S   20720        307       67.5
S
462.libquantum  20720        302       68.7 *   20720        300       69.1
*
464.h264ref     22130        457       48.5 S   22130        459       48.2
S
464.h264ref     22130        456       48.6 *   22130        459       48.2
*
471.omnetpp      6250        307       20.4 *    6250        308       20.3
*
471.omnetpp      6250        317       19.7 S    6250        310       20.2
S
473.astar        7020        346       20.3 *    7020        347       20.2
*
473.astar        7020        346       20.3 S    7020        347       20.2
S
483.xalancbmk    6900        198       34.8 *    6900        199       34.7
*
483.xalancbmk    6900        202       34.2 S    6900        203       34.1
S


> 'Allan

Reply via email to