https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625

--- Comment #3 from Hao Liu <hliu at amperecomputing dot com> ---
Sorry, it seems this case can not be fixed by only adjusting the calculation of
"reduction latency".  Even it becomes smaller, the case still can not be
vectorized as the "general operations" count is still too large:

    Original vector body cost = 51
    Scalar issue estimate:
      ...
      general operations = 8
      reduction latency = 2
      estimated min cycles per iteration = 2.000000
      estimated cycles per vector iteration (for VF 2) = 4.000000
    Vector issue estimate:
      ...
      general operations = 15   <-- Too large
      reduction latency = 2     <-- from 8 to 2
      estimated min cycles per iteration = 7.500000
    Increasing body cost to 96 because scalar code would issue more quickly
    ...
    missed:  cost model: the vector iteration cost = 96 divided by the scalar
iteration cost = 44 is greater or equal to the vectorization factor = 2.
    missed:  not vectorized: vectorization not profitable.

Reply via email to