https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
--- Comment #3 from Hao Liu <hliu at amperecomputing dot com> --- Sorry, it seems this case can not be fixed by only adjusting the calculation of "reduction latency". Even it becomes smaller, the case still can not be vectorized as the "general operations" count is still too large: Original vector body cost = 51 Scalar issue estimate: ... general operations = 8 reduction latency = 2 estimated min cycles per iteration = 2.000000 estimated cycles per vector iteration (for VF 2) = 4.000000 Vector issue estimate: ... general operations = 15 <-- Too large reduction latency = 2 <-- from 8 to 2 estimated min cycles per iteration = 7.500000 Increasing body cost to 96 because scalar code would issue more quickly ... missed: cost model: the vector iteration cost = 96 divided by the scalar iteration cost = 44 is greater or equal to the vectorization factor = 2. missed: not vectorized: vectorization not profitable.