https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92464
--- Comment #3 from Kewen Lin <linkw at gcc dot gnu.org> --- (In reply to Segher Boessenkool from comment #2) > What is the testcase testing? Whether we can properly vectorize this > code, right? And for p7 we now do it correctly, but thought it was > too expensive before? On Power7, it's to verify whether the cost model can take the loop as not profitable due to high overhead of peeling to get vector aligned address and not to vectorize the loop. The related patch changes the cost of load insns on Power7, it leads the profitable min iteration count change from 19 to 12. We are not lucky that the case happens to use 12 as iteration count (N-OFF), it hits the threshold. As actual runtime performance evaluation on this case (result mentioned above), the vectorized version works on par with non-vectorized version (before), so I believe the cost change is innocent for this case. One simple fix can be lowered the loop bound N to 15 instead of 16.