On Sun, May 20, 2018 at 11:51 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >> r254152 disabled partial_reg_dependency and movx for Haswell and newer >> Intel processors. r258972 restored them for skylake-avx512. For Haswell, >> movx improves performance. But partial_reg_stall may be better than >> partial_reg_dependency in theory. We will investigate performance impact >> of partial_reg_stall vs partial_reg_dependency on Haswell for GCC 9. In >> the meantime, this patch restores both partial_reg_dependency and mox for >> Haswell in GCC 8. >> >> OK for GCC 8? > > I would still like to know in what situations/bechnarks it improves the > performance. > The change was benchmarked on spec2000/2006 plus some additional benchmarks > and, so > it would be nice to know where it hurts.
From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829#c5: I have made measurements on HSW comparing -mtune-ctrl=movx,partial_reg_dependency -Ofast -march=haswell to -Ofast -mtune=haswell and I see improvements on EEMBC benchmarks. automotive ========= aifftr01 (default) - goodperf: Runtime improvement of 2.6% (time). aiifft01 (default) - goodperf: Runtime improvement of 2.2% (time). networking ========= ip_pktcheckb1m (default) - goodperf: Runtime improvement of 3.8% (time). ip_pktcheckb2m (default) - goodperf: Runtime improvement of 5.2% (time). ip_pktcheckb4m (default) - goodperf: Runtime improvement of 4.4% (time). ip_pktcheckb512k (default) - goodperf: Runtime improvement of 4.2% (time). telecom ========= fft00data_1 (default) - goodperf: Runtime improvement of 8.4% (time). fft00data_2 (default) - goodperf: Runtime improvement of 8.6% (time). fft00data_3 (default) - goodperf: Runtime improvement of 9.0% (time). OK for GCC 8? H.J. > Honza >> >> H.J. >> --- >> PR target/85829 >> * config/i386/x86-tune.def: Re-enable partial_reg_dependency >> and movx for Haswell. >> --- >> gcc/config/i386/x86-tune.def | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def >> index 5649fdcf416..60625668236 100644 >> --- a/gcc/config/i386/x86-tune.def >> +++ b/gcc/config/i386/x86-tune.def >> @@ -48,7 +48,7 @@ DEF_TUNE (X86_TUNE_SCHEDULE, "schedule", >> over partial stores. For example preffer MOVZBL or MOVQ to load 8bit >> value over movb. */ >> DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency", >> - m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE >> + m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE | m_HASWELL >> | m_BONNELL | m_SILVERMONT | m_INTEL >> | m_KNL | m_KNM | m_AMD_MULTIPLE | m_SKYLAKE_AVX512 | m_GENERIC) >> >> @@ -84,7 +84,7 @@ DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STALL, >> "partial_flag_reg_stall", >> partial dependencies. */ >> DEF_TUNE (X86_TUNE_MOVX, "movx", >> m_PPRO | m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE >> - | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL >> + | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_HASWELL >> | m_GEODE | m_AMD_MULTIPLE | m_SKYLAKE_AVX512 | m_GENERIC) >> >> /* X86_TUNE_MEMORY_MISMATCH_STALL: Avoid partial stores that are followed by >> -- >> 2.17.0 >> -- H.J.