On Thu, Oct 10, 2013 at 08:40:05PM +0200, Jan Hubicka wrote:
> --- config/i386/x86-tune.def (revision 203387)
> +++ config/i386/x86-tune.def (working copy)
> +/* X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL: if true, unaligned loads are
> + split. */
> +DEF_TUNE (X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL,
> "256_unaligned_load_optimal",
> + ~(m_COREI7 | m_GENERIC))
> +
> +/* X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL: if true, unaligned loads are
s/loads/stores/
> + split. */
> +DEF_TUNE (X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL,
> "256_unaligned_load_optimal",
> + ~(m_COREI7 | m_BDVER | m_GENERIC))
s/load/store/
Also, I wonder if we couldn't improve the generated code for
-mavx2 -mtune=generic or -march=core-avx2 -mtune=generic etc.
- m_GENERIC is included clearly because vmovup{s,d} was really bad
on SandyBridge (am I right here?), but if the ISA includes AVX2, then
the code will not run on that chip at all, so can't we override it?
> @@ -3946,10 +3933,10 @@ ix86_option_override_internal (bool main
> if (flag_expensive_optimizations
> && !(target_flags_explicit & MASK_VZEROUPPER))
> target_flags |= MASK_VZEROUPPER;
> - if ((x86_avx256_split_unaligned_load & ix86_tune_mask)
> + if (!ix86_tune_features[X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL]
Didn't you mean to use X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL here?
> && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
> target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD;
> - if ((x86_avx256_split_unaligned_store & ix86_tune_mask)
> + if (!ix86_tune_features[X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL]
And similarly for STORE here?
> && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE))
> target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE;
> /* Enable 128-bit AVX instruction generation
Jakub