The following code only gets vectorized with explicitly copying the "size_"
member to the local "sz" variable for 4.1.1, 4.2.0 rev. 114610 and autovect
branch.
=========================================
template <class T>
class vec
{
public:
vec& multiply(const vec& other)
{
// do something to make sure restrict is valid...
const T* __restrict__ op = other.data_;
T* __restrict__ tp = data_;
unsigned int sz = size_; // NEEDED!
for (unsigned int i=0; i<sz; ++i) {
tp[i] *= op[i];
}
return *this;
}
private:
unsigned int size_;
T* data_;
};
template class vec<int>;
=========================================
Without the local variable I get the following output:
g++ -O3 -ftree-vectorize -ftree-vectorizer-verbose=7 -march=pentium-m -c
vectorizer.cpp
vectorizer.cpp:16: note: ===== analyze_loop_nest =====
vectorizer.cpp:16: note: === vect_analyze_loop_form ===
vectorizer.cpp:16: note: split exit edge.
vectorizer.cpp:16: note: === get_loop_niters ===
vectorizer.cpp:16: note: not vectorized: number of iterations cannot be
computed.
vectorizer.cpp:16: note: bad loop form.
vectorizer.cpp:16: note: vectorized 0 loops in function.
--------------------------------------------------------------------
It works fine for T in [float, double, long int]
Using
typedef int aint __attribute__ ((__aligned__(16)));
as in the examples doesn't help.
Replacing the template declaration with int and using the class in main() leads
to vectorization of the loop without needing the local variable.
--
Summary: missed optimization with -ftree-vectorize and templates
Product: gcc
Version: tree-ssa
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: gcc at pdoerfler dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28030