------- Comment #5 from dorit at il dot ibm dot com 2007-06-27 11:57 ------- (In reply to comment #4) > (In reply to comment #3) > > The problem is in -ftree-vectorize > The difference is, that without -ftree-vectorize the inner loop (do k = 1, 9) > is completely unrolled, but with vectorization, the loop is vectorized, but > _not_ unrolled. Since the vectorization factor is only 2 for V2DF mode > vectors, > we loose big time at this point. > My best guess for unroller problems would be rtl-optimization.
Could it be the tree-level complete unroller? (does the vectorizer peel the loop to handle a misaligned store by any chance? if so, and if the misalignment amount is unknown, then the number of iterations of the vectorized loop is unknown, in which case the complete unroller wouldn't work). In autovect-branch the tree-level complete unroller is before the vectorizer - wonder what happens there. Another thing to consider is using -fvect-cost-model (it's very perliminary and hasn't been tuned much, but this could be a good data point for whoever wants to tune the vectorizer cost-model for x86_64). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32084