The following code only gets vectorized with explicitly copying the "size_" member to the local "sz" variable for 4.1.1, 4.2.0 rev. 114610 and autovect branch.
========================================= template <class T> class vec { public: vec& multiply(const vec& other) { // do something to make sure restrict is valid... const T* __restrict__ op = other.data_; T* __restrict__ tp = data_; unsigned int sz = size_; // NEEDED! for (unsigned int i=0; i<sz; ++i) { tp[i] *= op[i]; } return *this; } private: unsigned int size_; T* data_; }; template class vec<int>; ========================================= Without the local variable I get the following output: g++ -O3 -ftree-vectorize -ftree-vectorizer-verbose=7 -march=pentium-m -c vectorizer.cpp vectorizer.cpp:16: note: ===== analyze_loop_nest ===== vectorizer.cpp:16: note: === vect_analyze_loop_form === vectorizer.cpp:16: note: split exit edge. vectorizer.cpp:16: note: === get_loop_niters === vectorizer.cpp:16: note: not vectorized: number of iterations cannot be computed. vectorizer.cpp:16: note: bad loop form. vectorizer.cpp:16: note: vectorized 0 loops in function. -------------------------------------------------------------------- It works fine for T in [float, double, long int] Using typedef int aint __attribute__ ((__aligned__(16))); as in the examples doesn't help. Replacing the template declaration with int and using the class in main() leads to vectorization of the loop without needing the local variable. -- Summary: missed optimization with -ftree-vectorize and templates Product: gcc Version: tree-ssa Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: gcc at pdoerfler dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28030