[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance

2017-04-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance

2017-04-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796 --- Comment #11 from Richard Biener --- Author: rguenth Date: Thu Apr 20 14:26:26 2017 New Revision: 247026 URL: https://gcc.gnu.org/viewcvs?rev=247026&root=gcc&view=rev Log: 2017-04-20 Richard Biener PR tree-optimization/57796

[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance

2017-03-28 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796 --- Comment #10 from vincenzo Innocente --- added a self contained "benchmark" on my machine [innocent@vinavx3 ctest]$ c++ -Ofast -Wall SparseOnly.c -march=native ; time ./a.out 0.496u 0.000s 0:00.49 100.0%0+0k 0+0io 0pf+0w [innocent@vinavx3

[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance

2017-03-28 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796 --- Comment #9 from vincenzo Innocente --- Created attachment 41070 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41070&action=edit self contained benchmark of scimark2 SparseMat must content is not randomized param must be modified by ha

[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance

2017-03-28 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796 --- Comment #8 from vincenzo Innocente --- My understanding of the gather latency is that it essentially corresponds to a load per cacheline: fast if all items are closeby, slower than scalar loads if items are all in different cachelines. Not su

[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance

2014-06-19 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796 --- Comment #5 from vincenzo Innocente --- so with latest 4.9 gcc version 4.10.0 20140611 (experimental) [trunk revision 211467] (GCC) situation has not changed much (the scalar version is now faster!): I think that the cost of gather instructi

[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance

2013-07-05 Thread ysrumyan at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796 --- Comment #4 from Yuri Rumyantsev --- (In reply to Jakub Jelinek from comment #3) > By tuning I've meant the vectorizer cost model. If the desirability of > gathers vs. no vectorization at all doesn't depend only on the insns in the > loop, but

[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance

2013-07-05 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796 --- Comment #3 from Jakub Jelinek --- By tuning I've meant the vectorizer cost model. If the desirability of gathers vs. no vectorization at all doesn't depend only on the insns in the loop, but also on how many iterations the loop has, then perh

[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance

2013-07-05 Thread ysrumyan at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796 Yuri Rumyantsev changed: What|Removed |Added CC||ysrumyan at gmail dot com --- Comment #