Hello world, here is another, much revised, update of the AVX-specific matmul patch.
The processor-specific switching is now done directly, using the machinery from gcclib. For this, I have moved information from the i386-specific cpuinfo.c file to a new header file cpuinfo.h, which is then accessed from the matmul function to select the correct version for the deteced CPU. For matmul itself, the workhorse function was put into its own file, which is then included multiple times with name and target attributes set correctly. So far, this patch is Intel only. Jerry's benchmarks indicated that AVX is actually slower on AMD chips. Some googing reveals that other people have had similar experience. Using AVX128 for AMD processors would be somewhat beneficial, but that currently cannot be specified as a target attribute. I'll leave that for later. As an added bonus, I added some m4 hacks to disable both AVX and AVX2 code generation for REAL. So, what do you think? Is this the right way forward, especially regarding the CPU detection part? Regards Thomas 2016-11-27 Thomas Koenig <tkoe...@gcc.gnu.org> PR fortran/78379 * config/i386/cpuinfo.c: Move denums for processor vendors, processor type, processor subtypes and declaration of struct __processor_model into * config/i386/cpuinfo.h: New header file. * Makefile.am: Add dependence of m4/matmul_internal_m4 to mamtul files.. * Makefile.in: Regenerated. * acinclude.m4: Check for AVX, AVX2 and AVX512F. * config.h.in: Add HAVE_AVX, HAVE_AVX2 and HAVE_AVX512F. * configure: Regenerated. * configure.ac: Use checks for AVX, AVX2 and AVX_512F. * m4/matmul_internal.m4: New file. working part of matmul.m4. * m4/matmul.m4: Implement architecture-specific switching for AVX, AVX2 and AVX512F by including matmul_internal.m4 multiple times. * generated/matmul_c10.c: Regenerated. * generated/matmul_c16.c: Regenerated. * generated/matmul_c4.c: Regenerated. * generated/matmul_c8.c: Regenerated. * generated/matmul_i1.c: Regenerated. * generated/matmul_i16.c: Regenerated. * generated/matmul_i2.c: Regenerated. * generated/matmul_i4.c: Regenerated. * generated/matmul_i8.c: Regenerated. * generated/matmul_r10.c: Regenerated. * generated/matmul_r16.c: Regenerated. * generated/matmul_r4.c: Regenerated. * generated/matmul_r8.c: Regenerated. [Full patch at https://gcc.gnu.org/ml/fortran/2016-11/msg00246.html , this was rejected for reasons of size]