[Bug tree-optimization/116684] [vectorization][x86-64] dot_16x1x16_uint8_int8_int32 could be better optimized

rguenth at gcc dot gnu.org via Gcc-bugs Thu, 12 Sep 2024 00:04:57 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116684


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fxue at os dot 
amperecomputing.com

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Since the reduction opportunity is in the unrolled scalar inner loop we'd have
to know how DOT_PROD combines lanes which we do not specify but instead
expect the whole vector to be reduced to a single lane.

I think Feng works on related areas, not sure whether exactly covering this
one.

Implementation wise this is a (SLP) pattern to recognize.

[Bug tree-optimization/116684] [vectorization][x86-64] dot_16x1x16_uint8_int8_int32 could be better optimized

Reply via email to