As we also only vectorize innermost loops I believe doing a complete unrolling pass early will help in general (I pushed for this some time ago).Thoughts?
It might also hurt, though, since we don't have a basic block vectorizer. IIUC the vectorizer is able to turn
for (i = 0; i < 4; i++) v[i] = 0.0; into *(vector double *)v = (vector double){0.0, 0.0, 0.0, 0.0}; Paolo