The following code only gets vectorized with explicitly copying the "size_"
member to the local "sz" variable for 4.1.1, 4.2.0 rev. 114610 and autovect
branch.

=========================================
template <class T>
class vec 
{
public:
  vec& multiply(const vec& other)
    {
      // do something to make sure restrict is valid...

      const T* __restrict__ op = other.data_;
      T* __restrict__ tp = data_;

      unsigned int sz = size_; // NEEDED!

      for (unsigned int i=0; i<sz; ++i) {
        tp[i] *= op[i];
      }
      return *this;
    }

private:
  unsigned int size_;
  T* data_;
};

template class vec<int>;
=========================================

Without the local variable I get the following output:

g++ -O3 -ftree-vectorize -ftree-vectorizer-verbose=7 -march=pentium-m -c
vectorizer.cpp

vectorizer.cpp:16: note: ===== analyze_loop_nest =====
vectorizer.cpp:16: note: === vect_analyze_loop_form ===
vectorizer.cpp:16: note: split exit edge.
vectorizer.cpp:16: note: === get_loop_niters ===
vectorizer.cpp:16: note: not vectorized: number of iterations cannot be
computed.
vectorizer.cpp:16: note: bad loop form.
vectorizer.cpp:16: note: vectorized 0 loops in function.
--------------------------------------------------------------------

It works fine for T in [float, double, long int] 

Using
typedef int aint __attribute__ ((__aligned__(16)));
as in the examples doesn't help.

Replacing the template declaration with int and using the class in main() leads
to vectorization of the loop without needing the local variable.


-- 
           Summary: missed optimization with -ftree-vectorize and templates
           Product: gcc
           Version: tree-ssa
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: gcc at pdoerfler dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28030

Reply via email to