subject:"\[Bug tree\-optimization\/51179\] poor vectorization on interlagos."

[Bug tree-optimization/51179] poor vectorization on interlagos.

2012-07-19 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179 Richard Guenther changed: What|Removed |Added Status|NEW |RESOLVED Resolution|

[Bug tree-optimization/51179] poor vectorization on interlagos.

2012-06-30 Thread Joost.VandeVondele at mat dot ethz.ch

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179 --- Comment #11 from Joost VandeVondele 2012-06-30 11:26:59 UTC --- It looks like this problem is solved in the current 4.7 and 4.8 branches. At least on an avx machine, the best performance found by the code in comment #4 jumps from 5.3Gflops in

[Bug tree-optimization/51179] poor vectorization on interlagos.

2011-11-23 Thread Joost.VandeVondele at mat dot ethz.ch

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179 --- Comment #10 from Joost VandeVondele 2011-11-23 20:11:17 UTC --- (In reply to comment #1) > What about current 4.7 SVN? The fastest 4x10 . 10x10 multiply as found with tiny_find.f90 yields somewhat better results with 4.7, but not quite as ef

[Bug tree-optimization/51179] poor vectorization on interlagos.

2011-11-23 Thread Joost.VandeVondele at mat dot ethz.ch

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179 --- Comment #9 from Joost VandeVondele 2011-11-23 17:19:28 UTC --- (In reply to comment #8) > (In reply to comment #6) > > (if nobody beats me, I'll try to reduce the code and open a new pr). > If reproduced the ICE with 4.7, and started a delta

[Bug tree-optimization/51179] poor vectorization on interlagos.

2011-11-23 Thread Joost.VandeVondele at mat dot ethz.ch

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179 --- Comment #8 from Joost VandeVondele 2011-11-23 08:34:59 UTC --- (In reply to comment #6) > (if nobody beats me, I'll try to reduce the code and open a new pr). If reproduced the ICE with 4.7, and started a delta reduce. It goes very slowly, s

[Bug tree-optimization/51179] poor vectorization on interlagos.

2011-11-22 Thread ubizjak at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179 --- Comment #7 from Uros Bizjak 2011-11-22 22:00:36 UTC --- (In reply to comment #3) > Your testcase doesn't ressemble the original, the inner for cycles need > clearing of the iteration variable. Ah, indeed... fingers were too fast. One additi

[Bug tree-optimization/51179] poor vectorization on interlagos.

2011-11-22 Thread dominiq at lps dot ens.fr

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179 --- Comment #6 from Dominique d'Humieres 2011-11-22 21:08:47 UTC --- > ... I attach the test code, which I use for testing. Compiling the code with -O3 gives the following ICE pr51179_1.f90: In function 'tiny_find': pr51179_1.f90:3594:0: intern

[Bug tree-optimization/51179] poor vectorization on interlagos.

2011-11-22 Thread Joost.VandeVondele at mat dot ethz.ch

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179 --- Comment #5 from Joost VandeVondele 2011-11-22 18:34:48 UTC --- (In reply to comment #3) > is IMHO just a matter whether graphite can -floop-interchange this or not. > If you swap manually the l and j for lines, the generated code looks better

[Bug tree-optimization/51179] poor vectorization on interlagos.

2011-11-22 Thread Joost.VandeVondele at mat dot ethz.ch

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179 --- Comment #4 from Joost VandeVondele 2011-11-22 18:34:03 UTC --- Created attachment 25887 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25887 general code the more general code used to find the most efficient matrix multiply for sizes 4,