------- Comment #5 from dorit at il dot ibm dot com  2007-06-27 11:57 -------
(In reply to comment #4)
> (In reply to comment #3)
> > The problem is in -ftree-vectorize
> The difference is, that without -ftree-vectorize the inner loop (do k = 1, 9)
> is completely unrolled, but with vectorization, the loop is vectorized, but
> _not_ unrolled. Since the vectorization factor is only 2 for V2DF mode 
> vectors,
> we loose big time at this point.
> My best guess for unroller problems would be rtl-optimization.

Could it be the tree-level complete unroller? (does the vectorizer peel the
loop to handle a misaligned store by any chance? if so, and if the misalignment
amount is unknown, then the number of iterations of the vectorized loop is
unknown, in which case the complete unroller wouldn't work). In autovect-branch
the tree-level complete unroller is before the vectorizer - wonder what happens
there.

Another thing to consider is using -fvect-cost-model (it's very perliminary and
hasn't been tuned much, but this could be a good data point for whoever wants
to tune the vectorizer cost-model for x86_64).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32084

Reply via email to