[Bug tree-optimization/54978] New: Add ability to provide vectorized functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54978 Bug #: 54978 Summary: Add ability to provide vectorized functions Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: ddes...@gmail.com Presently, the auto-vectorizer chokes on any function calls present in a loop. It would be nice to be able to provide vectorized versions of functions, as well as non-vectorized versions, and let the auto-vectorizer notice these. This is already done for transcendental functions when -mveclibabi is used. However, to do this, some standardized form for the vectorized function would have to be defined. For instance, assuming sse type functions with double precision as an example, this could be done as follows. C++: the simple case because overloading is available double fx(double x); v2df fx(v2df x); C: this is more difficult, but something should be doable. To prevent accidental name overlap, a new function attribute could be used, to define a vectorized version, i.e.: double fx(double x) __attribute__((vectorized_alias(fxv2df))); v2df fxv2df(v2df x);
[Bug tree-optimization/54978] Add ability to provide vectorized functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54978 --- Comment #3 from Daniel Davis 2012-10-19 16:14:36 UTC --- Obviously, it would be nice if gcc can build the functions if they are pure functions. But that would require somehow knowing that those functions should be built and having access to the code, which may not be the case for something like a library. On second thought, the attribute would best go with declaration of the vectorized function. So, to rewrite my example, double fx(double x); v2df fx_v2df(v2df x) __attribute__((vectorized_alias(fx,double,16))); v4df fx_v4df(v4df x) __attribute__((vectorized_alias(fx,double,32))); That would let the compiler know exactly what could be replaced, although it should be able to figure out the argument types from the declarations. So I see two potential optimizations here. The first is a way to let the compiler know that vectorized functions are available. The second would be to let the auto-vectorizer create vectorized forms of functions.
[Bug tree-optimization/52112] New: Vectorizer fails when using CRTP
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52112 Bug #: 52112 Summary: Vectorizer fails when using CRTP Classification: Unclassified Product: gcc Version: 4.6.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: ddes...@gmail.com Created attachment 26565 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26565 The test case code. Using the CRTP along with static_cast pointers prevents auto-vectorization. If the code below is compiled with -ftree-vectorizer-verbose=7, the CRTP method fails to vectorize with "not vectorized: control flow in loop." However, if compiled with -O3 -S -fno-tree-vectorize, both methods produce identical assembly. I don't know how difficult this would be to change, but it could certainly speed up a lot of c++ code. For instance, this currently prevents boost ublas from vectorizing. #include template class CRTP_base { public: typedef E& reference; typedef Tp value_type; reference operator()() { return *static_cast(this); } value_type square() { return (*this)().x() * (*this)().x(); } protected: CRTP_base() {} ~CRTP_base() {} }; template class CRTP_child : public CRTP_base,Tp> { Tp xval; typedef CRTP_base,Tp> parent; public: CRTP_child(Tp xv = Tp()) : xval(xv) {} Tp x() { return xval; } using parent::square; }; int main() { const int N = 100; double A[N] __attribute__((aligned(16))); double B[N] __attribute__((aligned(16))); double sum1=0.0; for(int i = 0; i < N; ++i) { A[i] = i; } for(int i = 0; i < N; ++i) { B[i] = A[i]*A[i]; } for(int i = 0; i < N; ++i) { sum1 += B[i]; } std::cout << "Sum of method 1: " << sum1; for(int i = 0; i < N; ++i) { B[i] = CRTP_child(A[i]).square(); } for(int i = 0; i < N; ++i) { sum1 += B[i]; } std::cout << "\nSum of method 2: " << sum2 << std::endl; return 0; }
[Bug tree-optimization/52112] Vectorizer fails when using CRTP
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52112 --- Comment #4 from Daniel Davis 2012-02-03 18:24:49 UTC --- Any thoughts on why it won't vectorize for me on x86_64 4.6.1?
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
--- Comment #6 from ddesics at gmail dot com 2010-04-17 00:28 --- Has any work been done on this enhancement? I'm using gcc 4.3.2, and I noticed that there is still limited use of SSE instructions for complex arithmetic. Unless I'm missing something in my understanding, wouldn't the ideal for all _Complex double additions with SSE2 be to use addpd, and movapd or movupd for memory operations? -- ddesics at gmail dot com changed: What|Removed |Added CC| |ddesics at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485