https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92464

--- Comment #3 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #2)
> What is the testcase testing?  Whether we can properly vectorize this
> code, right?  And for p7 we now do it correctly, but thought it was
> too expensive before?

On Power7, it's to verify whether the cost model can take the loop as not
profitable due to high overhead of peeling to get vector aligned address and
not to vectorize the loop. The related patch changes the cost of load insns on
Power7, it leads the profitable min iteration count change from 19 to 12. We
are not lucky that the case happens to use 12 as iteration count (N-OFF), it
hits the threshold. As actual runtime performance evaluation on this case
(result mentioned above), the vectorized version works on par with
non-vectorized version (before), so I believe the cost change is innocent for
this case. One simple fix can be lowered the loop bound N to 15 instead of 16.

Reply via email to