On Wed, Mar 01, 2017 at 10:00:08PM +0100, Thomas Koenig wrote: > @@ -101,7 +93,7 @@ > `static void > 'matmul_name` ('rtype` * const restrict retarray, > 'rtype` * const restrict a, 'rtype` * const restrict b, int try_blas, > - int blas_limit, blas_call gemm) __attribute__((__target__("avx2"))); > + int blas_limit, blas_call gemm) __attribute__((__target__("avx2,fma"))); > static' include(matmul_internal.m4)dnl > `#endif /* HAVE_AVX2 */ >
I guess the question here is if there are any CPUs that have AVX2 but don't have FMA3. If there are none, then this is not controversial, if there are some, it depends on how widely they are used compared to ones that have both AVX2 and FMA3. Going just from our -march= bitsets, it seems if there is PTA_AVX2, then there is also PTA_FMA: haswell, broadwell, skylake, skylake-avx512, knl, bdver4, znver1, there are CPUs that have just PTA_AVX and not PTA_AVX2 and still have PTA_FMA: bdver2, bdver3 (but that is not relevant to this patch). > @@ -110,7 +102,7 @@ > `static void > 'matmul_name` ('rtype` * const restrict retarray, > 'rtype` * const restrict a, 'rtype` * const restrict b, int try_blas, > - int blas_limit, blas_call gemm) __attribute__((__target__("avx512f"))); > + int blas_limit, blas_call gemm) > __attribute__((__target__("avx512f,fma"))); > static' include(matmul_internal.m4)dnl > `#endif /* HAVE_AVX512F */ > I think this change is not needed, because the EVEX encoded VFMADD???[SP][DS] instructions etc. are in AVX512F ISA, not in FMA3 ISA (which has just the VEX encoded ones). Which is why I'm seeing the fmas in my libgfortran even without your patch. Thus I think you should remove this from your patch. > @@ -147,7 +141,8 @@ > #endif /* HAVE_AVX512F */ > > #ifdef HAVE_AVX2 > - if (__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX2)) > + if ((__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX2)) > + && (__cpu_model.__cpu_features[0] & (1 << FEATURE_FMA))) > { > matmul_p = matmul_'rtype_code`_avx2; > goto tailcall; and this too. Jakub