> r254152 disabled partial_reg_dependency and movx for Haswell and newer > Intel processors. r258972 restored them for skylake-avx512. For Haswell, > movx improves performance. But partial_reg_stall may be better than > partial_reg_dependency in theory. We will investigate performance impact > of partial_reg_stall vs partial_reg_dependency on Haswell for GCC 9. In > the meantime, this patch restores both partial_reg_dependency and mox for > Haswell in GCC 8. > > OK for GCC 8?
I would still like to know in what situations/bechnarks it improves the performance. The change was benchmarked on spec2000/2006 plus some additional benchmarks and, so it would be nice to know where it hurts. Honza > > H.J. > --- > PR target/85829 > * config/i386/x86-tune.def: Re-enable partial_reg_dependency > and movx for Haswell. > --- > gcc/config/i386/x86-tune.def | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def > index 5649fdcf416..60625668236 100644 > --- a/gcc/config/i386/x86-tune.def > +++ b/gcc/config/i386/x86-tune.def > @@ -48,7 +48,7 @@ DEF_TUNE (X86_TUNE_SCHEDULE, "schedule", > over partial stores. For example preffer MOVZBL or MOVQ to load 8bit > value over movb. */ > DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency", > - m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE > + m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE | m_HASWELL > | m_BONNELL | m_SILVERMONT | m_INTEL > | m_KNL | m_KNM | m_AMD_MULTIPLE | m_SKYLAKE_AVX512 | m_GENERIC) > > @@ -84,7 +84,7 @@ DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STALL, > "partial_flag_reg_stall", > partial dependencies. */ > DEF_TUNE (X86_TUNE_MOVX, "movx", > m_PPRO | m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE > - | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL > + | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_HASWELL > | m_GEODE | m_AMD_MULTIPLE | m_SKYLAKE_AVX512 | m_GENERIC) > > /* X86_TUNE_MEMORY_MISMATCH_STALL: Avoid partial stores that are followed by > -- > 2.17.0 >