https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114814

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fxue at os dot 
amperecomputing.com

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is the high VF imposed on us and the required bool -> size_t
conversion.  What you get is of course massive parallelism.  What hurts
as well is the "linear" accumulation done of the vector IVs instead of
having multiple accumulators or accumulating them in a tree.

I think there's work to improve that part in progress.

Using a widen-sum for part of the accumulation might be another improvement,
currrently we fail here because QI -> DI widen sum isn't available but both
SI -> DI widen sum with earlier QI -> SI widening or QI -> HI widen sum
with later HI -> DI widening would be possible.

Reply via email to