https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116684
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fxue at os dot amperecomputing.com --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- Since the reduction opportunity is in the unrolled scalar inner loop we'd have to know how DOT_PROD combines lanes which we do not specify but instead expect the whole vector to be reduced to a single lane. I think Feng works on related areas, not sure whether exactly covering this one. Implementation wise this is a (SLP) pattern to recognize.