On Tue, Sep 18, 2018 at 9:44 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Tue, Sep 11, 2018 at 9:01 AM, H.J. Lu <hjl.to...@gmail.com> wrote: > > On Tue, Sep 4, 2018 at 9:01 AM, H.J. Lu <hjl.to...@gmail.com> wrote: > >> On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu <hongjiu...@intel.com> wrote: > >>> With -mavx, for > >>> > >>> [hjl@gnu-cfl-1 skx-2]$ cat foo.i > >>> extern float f; > >>> extern double d; > >>> extern int i; > >>> > >>> void > >>> foo (void) > >>> { > >>> d = f; > >>> f = i; > >>> } > >>> > >>> we need to generate > >>> > >>> vxorp[ds] %xmmN, %xmmN, %xmmN > >>> ... > >>> vcvtss2sd f(%rip), %xmmN, %xmmX > >>> ... > >>> vcvtsi2ss i(%rip), %xmmN, %xmmY > >>> > >>> to avoid partial XMM register stall. This patch adds a pass to generate > >>> a single > >>> > >>> vxorps %xmmN, %xmmN, %xmmN > >>> > >>> at function entry, which is shared by all SF and DF conversions, instead > >>> of generating one > >>> > >>> vxorp[ds] %xmmN, %xmmN, %xmmN > >>> > >>> for each SF/DF conversion. > >>> > >>> Performance impacts on SPEC CPU 2017 rate with 1 copy using > >>> > >>> -Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops > >>> > >>> are > >>> > >>> 1. On Broadwell server: > >>> > >>> 500.perlbench_r (-0.82%) > >>> 502.gcc_r (0.73%) > >>> 505.mcf_r (-0.24%) > >>> 520.omnetpp_r (-2.22%) > >>> 523.xalancbmk_r (-1.47%) > >>> 525.x264_r (0.31%) > >>> 531.deepsjeng_r (0.27%) > >>> 541.leela_r (0.85%) > >>> 548.exchange2_r (-0.11%) > >>> 557.xz_r (-0.34%) > >>> Geomean: (-0.23%) > >>> > >>> 503.bwaves_r (0.00%) > >>> 507.cactuBSSN_r (-1.88%) > >>> 508.namd_r (0.00%) > >>> 510.parest_r (-0.56%) > >>> 511.povray_r (0.49%) > >>> 519.lbm_r (-1.28%) > >>> 521.wrf_r (-0.28%) > >>> 526.blender_r (0.55%) > >>> 527.cam4_r (-0.20%) > >>> 538.imagick_r (2.52%) > >>> 544.nab_r (-0.18%) > >>> 549.fotonik3d_r (-0.51%) > >>> 554.roms_r (-0.22%) > >>> Geomean: (0.00%) > >>> > >>> 2. On Skylake client: > >>> > >>> 500.perlbench_r (-0.29%) > >>> 502.gcc_r (-0.36%) > >>> 505.mcf_r (1.77%) > >>> 520.omnetpp_r (-0.26%) > >>> 523.xalancbmk_r (-3.69%) > >>> 525.x264_r (-0.32%) > >>> 531.deepsjeng_r (0.00%) > >>> 541.leela_r (-0.46%) > >>> 548.exchange2_r (0.00%) > >>> 557.xz_r (0.00%) > >>> Geomean: (-0.34%) > >>> > >>> 503.bwaves_r (0.00%) > >>> 507.cactuBSSN_r (-0.56%) > >>> 508.namd_r (0.87%) > >>> 510.parest_r (0.00%) > >>> 511.povray_r (-0.73%) > >>> 519.lbm_r (0.84%) > >>> 521.wrf_r (0.00%) > >>> 526.blender_r (-0.81%) > >>> 527.cam4_r (-0.43%) > >>> 538.imagick_r (2.55%) > >>> 544.nab_r (0.28%) > >>> 549.fotonik3d_r (0.00%) > >>> 554.roms_r (0.32%) > >>> Geomean: (0.12%) > >>> > >>> 3. On Skylake server: > >>> > >>> 500.perlbench_r (-0.55%) > >>> 502.gcc_r (0.69%) > >>> 505.mcf_r (0.00%) > >>> 520.omnetpp_r (-0.33%) > >>> 523.xalancbmk_r (-0.21%) > >>> 525.x264_r (-0.27%) > >>> 531.deepsjeng_r (0.00%) > >>> 541.leela_r (0.00%) > >>> 548.exchange2_r (-0.11%) > >>> 557.xz_r (0.00%) > >>> Geomean: (0.00%) > >>> > >>> 503.bwaves_r (0.58%) > >>> 507.cactuBSSN_r (0.00%) > >>> 508.namd_r (0.00%) > >>> 510.parest_r (0.18%) > >>> 511.povray_r (-0.58%) > >>> 519.lbm_r (0.25%) > >>> 521.wrf_r (0.40%) > >>> 526.blender_r (0.34%) > >>> 527.cam4_r (0.19%) > >>> 538.imagick_r (5.87%) > >>> 544.nab_r (0.17%) > >>> 549.fotonik3d_r (0.00%) > >>> 554.roms_r (0.00%) > >>> Geomean: (0.62%) > >>> > >>> On Skylake client, impacts on 538.imagick_r are > >>> > >>> size before: > >>> > >>> text data bss dec hex filename > >>> 2555577 10876 5576 2572029 273efd imagick_r.exe > >>> > >>> size after: > >>> > >>> text data bss dec hex filename > >>> 2511825 10876 5576 2528277 269415 imagick_r.exe > >>> > >>> number of vxorp[ds]: > >>> > >>> before after difference > >>> 14570 4515 -69% > >>> > >>> OK for trunk? > >>> > >>> Thanks. > >>> > >>> > >>> H.J. > >>> --- > >>> gcc/ > >>> > >>> 2018-08-28 H.J. Lu <hongjiu...@intel.com> > >>> Sunil K Pandey <sunil.k.pan...@intel.com> > >>> > >>> PR target/87007 > >>> * config/i386/i386-passes.def: Add > >>> pass_remove_partial_avx_dependency. > >>> * config/i386/i386-protos.h > >>> (make_pass_remove_partial_avx_dependency): New. > >>> * config/i386/i386.c (make_pass_remove_partial_avx_dependency): > >>> New function. > >>> (pass_data_remove_partial_avx_dependency): New. > >>> (pass_remove_partial_avx_dependency): Likewise. > >>> (make_pass_remove_partial_avx_dependency): Likewise. > >>> * config/i386/i386.md (SF/DF conversion splitters): Disabled > >>> for TARGET_AVX. > >>> > >>> gcc/testsuite/ > >>> > >>> 2018-08-28 H.J. Lu <hongjiu...@intel.com> > >>> Sunil K Pandey <sunil.k.pan...@intel.com> > >>> > >>> PR target/87007 > >>> * gcc.target/i386/pr87007.c: New file. > >> > >> > >> PING: > >> > >> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01781.html > >> > > > > PING. > > > > Hi Kirll, Jakub, Jan, > > Can you take a look? >
PING. -- H.J.