Hello world,

here is another, much revised, update of the AVX-specific matmul patch.

The processor-specific switching is now done directly, using the
machinery from gcclib. For this, I have moved information from
the i386-specific cpuinfo.c file to a new header file cpuinfo.h,
which is then accessed from the matmul function to select the
correct version for the deteced CPU.

For matmul itself, the workhorse function was put into its
own file, which is then included multiple times with
name and target attributes set correctly.

So far, this patch is Intel only.  Jerry's benchmarks indicated
that AVX is actually slower on AMD chips.  Some googing reveals
that other people have had similar experience.

Using AVX128 for AMD processors would be somewhat beneficial,
but that currently cannot be specified as a target attribute.
I'll leave that for later.

As an added bonus, I added some m4 hacks to disable both
AVX and AVX2 code generation for REAL.

So, what do you think?  Is this the right way forward, especially
regarding the CPU detection part?

Regards

    Thomas

2016-11-27  Thomas Koenig  <tkoe...@gcc.gnu.org>

        PR fortran/78379
        * config/i386/cpuinfo.c:  Move denums for processor vendors,
        processor type, processor subtypes and declaration of
        struct __processor_model into
        * config/i386/cpuinfo.h:  New header file.
        * Makefile.am:  Add dependence of m4/matmul_internal_m4 to
        mamtul files..
        * Makefile.in:  Regenerated.
        * acinclude.m4:  Check for AVX, AVX2 and AVX512F.
        * config.h.in:  Add HAVE_AVX, HAVE_AVX2 and HAVE_AVX512F.
        * configure:  Regenerated.
        * configure.ac:  Use checks for AVX, AVX2 and AVX_512F.
        * m4/matmul_internal.m4:  New file. working part of matmul.m4.
        * m4/matmul.m4:  Implement architecture-specific switching
        for AVX, AVX2 and AVX512F by including matmul_internal.m4
        multiple times.
        * generated/matmul_c10.c: Regenerated.
        * generated/matmul_c16.c: Regenerated.
        * generated/matmul_c4.c: Regenerated.
        * generated/matmul_c8.c: Regenerated.
        * generated/matmul_i1.c: Regenerated.
        * generated/matmul_i16.c: Regenerated.
        * generated/matmul_i2.c: Regenerated.
        * generated/matmul_i4.c: Regenerated.
        * generated/matmul_i8.c: Regenerated.
        * generated/matmul_r10.c: Regenerated.
        * generated/matmul_r16.c: Regenerated.
        * generated/matmul_r4.c: Regenerated.
        * generated/matmul_r8.c: Regenerated.


[Full patch at https://gcc.gnu.org/ml/fortran/2016-11/msg00246.html ,
this was rejected for reasons of size]

Reply via email to