[Bug libfortran/78379] Processor-specific versions for matmul

2017-05-25 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 Thomas Koenig changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug libfortran/78379] Processor-specific versions for matmul

2017-05-25 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #37 from Thomas Koenig --- Author: tkoenig Date: Thu May 25 21:51:27 2017 New Revision: 248472 URL: https://gcc.gnu.org/viewcvs?rev=248472&root=gcc&view=rev Log: 2017-05-25 Thomas Koenig PR libfortran/78379 * Make

[Bug libfortran/78379] Processor-specific versions for matmul

2017-05-24 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #36 from Jerry DeLisle --- Results look very good. Gfortran 7, no patch gives: $ gfc7 -static -Ofast -ftree-vectorize compare.f90 $ ./a.out = ME

[Bug libfortran/78379] Processor-specific versions for matmul

2017-05-24 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #35 from Jerry DeLisle --- (In reply to Thomas Koenig from comment #34) > Created attachment 41410 [details] > Patch which has all the files > > Well, I suspect my way of splitting the previous patch into > one real patch and one *.t

[Bug libfortran/78379] Processor-specific versions for matmul

2017-05-23 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 Thomas Koenig changed: What|Removed |Added Attachment #41405|0 |1 is obsolete|

[Bug libfortran/78379] Processor-specific versions for matmul

2017-05-23 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #33 from Jerry DeLisle --- (In reply to Thomas Koenig from comment #32) > Created attachment 41406 [details] > Additional files for the previous patch > > Here are the new files for the patch. Well I tried to apply the patch and tes

[Bug libfortran/78379] Processor-specific versions for matmul

2017-05-22 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #32 from Thomas Koenig --- Created attachment 41406 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41406&action=edit Additional files for the previous patch Here are the new files for the patch.

[Bug libfortran/78379] Processor-specific versions for matmul

2017-05-22 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 Thomas Koenig changed: What|Removed |Added Attachment #40120|0 |1 is obsolete|

[Bug libfortran/78379] Processor-specific versions for matmul

2017-05-07 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #30 from Thomas Koenig --- I think there still is one thing to do. Apparently, AMD CPUs (which use only vanilla at the moment) are slightly faster with -mprefer-avx128, and they should be much faster if they have FMA3. Unless I miss

[Bug libfortran/78379] Processor-specific versions for matmul

2017-03-03 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #29 from Thomas Koenig --- (In reply to David Edelsohn from comment #28) > Because PPC64LE Linux reset the base ISA level, VSX now is enabled by > default, so a function clone for VSX probably isn't necessary. While > special version

[Bug libfortran/78379] Processor-specific versions for matmul

2017-03-03 Thread dje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #28 from David Edelsohn --- Because PPC64LE Linux reset the base ISA level, VSX now is enabled by default, so a function clone for VSX probably isn't necessary. While special versions might help AIX and PPC64BE, with lower ISA defaul

[Bug libfortran/78379] Processor-specific versions for matmul

2017-03-03 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #27 from Thomas Koenig --- (In reply to # David Edelsohn from comment #26) > What is AVX-specific, as opposed to SIMD vector size-specific, about this > feature? It seems that this should be enabled for all SIMD architectures of > the

[Bug libfortran/78379] Processor-specific versions for matmul

2017-03-02 Thread dje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 David Edelsohn changed: What|Removed |Added CC||dje at gcc dot gnu.org --- Comment #26

[Bug libfortran/78379] Processor-specific versions for matmul

2017-03-02 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #25 from Thomas Koenig --- Author: tkoenig Date: Thu Mar 2 11:04:01 2017 New Revision: 245836 URL: https://gcc.gnu.org/viewcvs?rev=245836&root=gcc&view=rev Log: 2017-03-02 Thomas Koenig PR fortran/78379 * m4/matm

[Bug libfortran/78379] Processor-specific versions for matmul

2017-02-27 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #24 from Thomas Koenig --- Could be a good idea to add a version with -mfma to the flags for AVX2. I'll see what I can do. It might be too late for gcc 7, and I also don't have an AVX2 machine to test on. Might also be a good idea t

[Bug libfortran/78379] Processor-specific versions for matmul

2016-12-03 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 Dominique d'Humieres changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed|

[Bug libfortran/78379] Processor-specific versions for matmul

2016-12-03 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #22 from Thomas Koenig --- Author: tkoenig Date: Sat Dec 3 09:44:35 2016 New Revision: 243219 URL: https://gcc.gnu.org/viewcvs?rev=243219&root=gcc&view=rev Log: 2016-12-03 Thomas Koenig PR fortran/78379 * config/

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-22 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #21 from Jerry DeLisle --- (In reply to Thomas Koenig from comment #19) > Created attachment 40120 [details] > Updated patch > > Well, here's an update also for AVX512F. > > I can confirm the patch gives the same performance as the

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-22 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #20 from Jerry DeLisle --- (In reply to Thomas Koenig from comment #18) > Created attachment 40119 [details] > Version that works (AVX only) > > Here is a version that should only do AVX stuff on Intel processors. > Optimization for

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-22 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 Thomas Koenig changed: What|Removed |Added Attachment #40119|0 |1 is obsolete|

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-22 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #18 from Thomas Koenig --- Created attachment 40119 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40119&action=edit Version that works (AVX only) Here is a version that should only do AVX stuff on Intel processors. Optimizatio

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-20 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #17 from Jerry DeLisle --- On a hunch, this brings it back. $(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ffast-math -ftree-vectorize -funroll-loops --param max-unroll-times=4 -march=native So -march=native fixes it. n

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-20 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #16 from Jerry DeLisle --- (In reply to Thomas Koenig from comment #15) > OMG, the world of processors is more complicated than I thought. > So, these rather modern AMD chips support AVX, but suck at it. > > Two questions: > > - Can

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-20 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #15 from Thomas Koenig --- OMG, the world of processors is more complicated than I thought. So, these rather modern AMD chips support AVX, but suck at it. Two questions: - Can you check if -mfma3 and/or -mfma4 make any difference?

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-19 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #14 from Jerry DeLisle --- (In reply to Thomas Koenig from comment #12) > I read some advice on the net that certain types of AMD processors > have AVX, but AVX128 is better for them. > > What exactly is your CPU model? What does /p

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-19 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #13 from Thomas Koenig --- OK, I think I have a rough idea how to do this. For querying the CPU model, we need to put the interface in libgcc/config/i386/cpuinfo.c into a separate header. Then we generate a list of matmul functions

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-19 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #12 from Thomas Koenig --- (In reply to Jerry DeLisle from comment #11) > One could consider running a reference matrix multiply of size 32 in a loop > and do timing tests to determine whether to use -mprefer-avx128. 0n this > machine

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-18 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #11 from Jerry DeLisle --- One could consider running a reference matrix multiply of size 32 in a loop and do timing tests to determine whether to use -mprefer-avx128. 0n this machine from comment 8 mavx = 1.276 mavx mprefer-avx1

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-18 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #10 from Jerry DeLisle --- (In reply to Thomas Koenig from comment #9) > Next question - what happens if you add > > -mvzeroupper -mavx > > to the line in the Makefile? Does that make a difference in speed? -mvzeroupper slows all

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-18 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #9 from Thomas Koenig --- Next question - what happens if you add -mvzeroupper -mavx to the line in the Makefile? Does that make a difference in speed?

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-17 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #8 from Jerry DeLisle --- (In reply to Thomas Koenig from comment #6) > > You may notice I was invoking the wrong executable in what I posted in > > comment #3. I did rerun the correct one several times and tried it with > > -mavx -mp

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-17 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #7 from Thomas Koenig --- And one more thing. Comparing the timing you get for the version with the target_clone and a version using just -mavx added to the relevant line in the Makefile, do you see a difference?

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-17 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #6 from Thomas Koenig --- > You may notice I was invoking the wrong executable in what I posted in > comment #3. I did rerun the correct one several times and tried it with > -mavx -mprefer-avx128. I get the same poor results regardl

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-17 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #5 from Jerry DeLisle --- (In reply to Uroš Bizjak from comment #4) > (In reply to Jerry DeLisle from comment #3) > > I did apply your second patch: > > > > I do not get any improvement and results are diminished from current trunk,

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-17 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #4 from Uroš Bizjak --- (In reply to Jerry DeLisle from comment #3) > I did apply your second patch: > > I do not get any improvement and results are diminished from current trunk, > so I am missing something. This is same machine I

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-17 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 Jerry DeLisle changed: What|Removed |Added CC||jvdelisle at gcc dot gnu.org --- Comment

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-17 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #2 from Thomas Koenig --- Here are some measurements with the AVX-enabling patch. They were done on an AVX machine, namely gcc75 from the compile farm. This was done with the command line gfortran -static-libgfortran -finline-matmul

[Bug libfortran/78379] Processor-specific versions for matmul

2016-11-17 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379 --- Comment #1 from Thomas Koenig --- Created attachment 40074 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40074&action=edit Test program for benchmarks