http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179
Richard Guenther changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179
--- Comment #11 from Joost VandeVondele
2012-06-30 11:26:59 UTC ---
It looks like this problem is solved in the current 4.7 and 4.8 branches. At
least on an avx machine, the best performance found by the code in comment #4
jumps from 5.3Gflops in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179
--- Comment #10 from Joost VandeVondele
2011-11-23 20:11:17 UTC ---
(In reply to comment #1)
> What about current 4.7 SVN?
The fastest 4x10 . 10x10 multiply as found with tiny_find.f90 yields somewhat
better results with 4.7, but not quite as ef
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179
--- Comment #9 from Joost VandeVondele
2011-11-23 17:19:28 UTC ---
(In reply to comment #8)
> (In reply to comment #6)
> > (if nobody beats me, I'll try to reduce the code and open a new pr).
> If reproduced the ICE with 4.7, and started a delta
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179
--- Comment #8 from Joost VandeVondele
2011-11-23 08:34:59 UTC ---
(In reply to comment #6)
> (if nobody beats me, I'll try to reduce the code and open a new pr).
If reproduced the ICE with 4.7, and started a delta reduce. It goes very
slowly, s
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179
--- Comment #7 from Uros Bizjak 2011-11-22 22:00:36
UTC ---
(In reply to comment #3)
> Your testcase doesn't ressemble the original, the inner for cycles need
> clearing of the iteration variable.
Ah, indeed... fingers were too fast.
One additi
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179
--- Comment #6 from Dominique d'Humieres 2011-11-22
21:08:47 UTC ---
> ... I attach the test code, which I use for testing.
Compiling the code with -O3 gives the following ICE
pr51179_1.f90: In function 'tiny_find':
pr51179_1.f90:3594:0: intern
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179
--- Comment #5 from Joost VandeVondele
2011-11-22 18:34:48 UTC ---
(In reply to comment #3)
> is IMHO just a matter whether graphite can -floop-interchange this or not.
> If you swap manually the l and j for lines, the generated code looks better
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179
--- Comment #4 from Joost VandeVondele
2011-11-22 18:34:03 UTC ---
Created attachment 25887
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25887
general code
the more general code used to find the most efficient matrix multiply for sizes
4,
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179
Jakub Jelinek changed:
What|Removed |Added
CC||jakub at gcc dot gnu.org
--- Comment #3 f
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179
Uros Bizjak changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed|
11 matches
Mail list logo