https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
Thomas Koenig changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #37 from Thomas Koenig ---
Author: tkoenig
Date: Thu May 25 21:51:27 2017
New Revision: 248472
URL: https://gcc.gnu.org/viewcvs?rev=248472&root=gcc&view=rev
Log:
2017-05-25 Thomas Koenig
PR libfortran/78379
* Make
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #36 from Jerry DeLisle ---
Results look very good.
Gfortran 7, no patch gives:
$ gfc7 -static -Ofast -ftree-vectorize compare.f90
$ ./a.out
=
ME
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #35 from Jerry DeLisle ---
(In reply to Thomas Koenig from comment #34)
> Created attachment 41410 [details]
> Patch which has all the files
>
> Well, I suspect my way of splitting the previous patch into
> one real patch and one *.t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
Thomas Koenig changed:
What|Removed |Added
Attachment #41405|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #33 from Jerry DeLisle ---
(In reply to Thomas Koenig from comment #32)
> Created attachment 41406 [details]
> Additional files for the previous patch
>
> Here are the new files for the patch.
Well I tried to apply the patch and tes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #32 from Thomas Koenig ---
Created attachment 41406
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41406&action=edit
Additional files for the previous patch
Here are the new files for the patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
Thomas Koenig changed:
What|Removed |Added
Attachment #40120|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #30 from Thomas Koenig ---
I think there still is one thing to do.
Apparently, AMD CPUs (which use only vanilla at
the moment) are slightly faster with -mprefer-avx128,
and they should be much faster if they have FMA3.
Unless I miss
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #29 from Thomas Koenig ---
(In reply to David Edelsohn from comment #28)
> Because PPC64LE Linux reset the base ISA level, VSX now is enabled by
> default, so a function clone for VSX probably isn't necessary. While
> special version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #28 from David Edelsohn ---
Because PPC64LE Linux reset the base ISA level, VSX now is enabled by default,
so a function clone for VSX probably isn't necessary. While special versions
might help AIX and PPC64BE, with lower ISA defaul
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #27 from Thomas Koenig ---
(In reply to # David Edelsohn from comment #26)
> What is AVX-specific, as opposed to SIMD vector size-specific, about this
> feature? It seems that this should be enabled for all SIMD architectures of
> the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
David Edelsohn changed:
What|Removed |Added
CC||dje at gcc dot gnu.org
--- Comment #26
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #25 from Thomas Koenig ---
Author: tkoenig
Date: Thu Mar 2 11:04:01 2017
New Revision: 245836
URL: https://gcc.gnu.org/viewcvs?rev=245836&root=gcc&view=rev
Log:
2017-03-02 Thomas Koenig
PR fortran/78379
* m4/matm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #24 from Thomas Koenig ---
Could be a good idea to add a version with -mfma to the flags for AVX2.
I'll see what I can do. It might be too late for gcc 7, and I also
don't have an AVX2 machine to test on.
Might also be a good idea t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
Dominique d'Humieres changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #22 from Thomas Koenig ---
Author: tkoenig
Date: Sat Dec 3 09:44:35 2016
New Revision: 243219
URL: https://gcc.gnu.org/viewcvs?rev=243219&root=gcc&view=rev
Log:
2016-12-03 Thomas Koenig
PR fortran/78379
* config/
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #21 from Jerry DeLisle ---
(In reply to Thomas Koenig from comment #19)
> Created attachment 40120 [details]
> Updated patch
>
> Well, here's an update also for AVX512F.
>
> I can confirm the patch gives the same performance as the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #20 from Jerry DeLisle ---
(In reply to Thomas Koenig from comment #18)
> Created attachment 40119 [details]
> Version that works (AVX only)
>
> Here is a version that should only do AVX stuff on Intel processors.
> Optimization for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
Thomas Koenig changed:
What|Removed |Added
Attachment #40119|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #18 from Thomas Koenig ---
Created attachment 40119
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40119&action=edit
Version that works (AVX only)
Here is a version that should only do AVX stuff on Intel processors.
Optimizatio
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #17 from Jerry DeLisle ---
On a hunch, this brings it back.
$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ffast-math
-ftree-vectorize -funroll-loops --param max-unroll-times=4 -march=native
So -march=native fixes it. n
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #16 from Jerry DeLisle ---
(In reply to Thomas Koenig from comment #15)
> OMG, the world of processors is more complicated than I thought.
> So, these rather modern AMD chips support AVX, but suck at it.
>
> Two questions:
>
> - Can
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #15 from Thomas Koenig ---
OMG, the world of processors is more complicated than I thought.
So, these rather modern AMD chips support AVX, but suck at it.
Two questions:
- Can you check if -mfma3 and/or -mfma4 make any difference?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #14 from Jerry DeLisle ---
(In reply to Thomas Koenig from comment #12)
> I read some advice on the net that certain types of AMD processors
> have AVX, but AVX128 is better for them.
>
> What exactly is your CPU model? What does /p
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #13 from Thomas Koenig ---
OK, I think I have a rough idea how to do this.
For querying the CPU model, we need to put the interface in
libgcc/config/i386/cpuinfo.c into a separate header.
Then we generate a list of matmul functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #12 from Thomas Koenig ---
(In reply to Jerry DeLisle from comment #11)
> One could consider running a reference matrix multiply of size 32 in a loop
> and do timing tests to determine whether to use -mprefer-avx128. 0n this
> machine
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #11 from Jerry DeLisle ---
One could consider running a reference matrix multiply of size 32 in a loop and
do timing tests to determine whether to use -mprefer-avx128. 0n this machine
from comment 8
mavx = 1.276 mavx mprefer-avx1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #10 from Jerry DeLisle ---
(In reply to Thomas Koenig from comment #9)
> Next question - what happens if you add
>
> -mvzeroupper -mavx
>
> to the line in the Makefile? Does that make a difference in speed?
-mvzeroupper slows all
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #9 from Thomas Koenig ---
Next question - what happens if you add
-mvzeroupper -mavx
to the line in the Makefile? Does that make a difference in speed?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #8 from Jerry DeLisle ---
(In reply to Thomas Koenig from comment #6)
> > You may notice I was invoking the wrong executable in what I posted in
> > comment #3. I did rerun the correct one several times and tried it with
> > -mavx -mp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #7 from Thomas Koenig ---
And one more thing.
Comparing the timing you get for the version with the target_clone
and a version using just -mavx added to the relevant line in the
Makefile, do you see a difference?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #6 from Thomas Koenig ---
> You may notice I was invoking the wrong executable in what I posted in
> comment #3. I did rerun the correct one several times and tried it with
> -mavx -mprefer-avx128. I get the same poor results regardl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #5 from Jerry DeLisle ---
(In reply to Uroš Bizjak from comment #4)
> (In reply to Jerry DeLisle from comment #3)
> > I did apply your second patch:
> >
> > I do not get any improvement and results are diminished from current trunk,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #4 from Uroš Bizjak ---
(In reply to Jerry DeLisle from comment #3)
> I did apply your second patch:
>
> I do not get any improvement and results are diminished from current trunk,
> so I am missing something. This is same machine I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
Jerry DeLisle changed:
What|Removed |Added
CC||jvdelisle at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #2 from Thomas Koenig ---
Here are some measurements with the AVX-enabling patch.
They were done on an AVX machine, namely gcc75 from the compile farm.
This was done with the command line
gfortran -static-libgfortran -finline-matmul
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #1 from Thomas Koenig ---
Created attachment 40074
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40074&action=edit
Test program for benchmarks
38 matches
Mail list logo