https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- The vectorizer looks for a way to "shift" the whole vector by either vec_shr or a corresponding vec_perm with constant shuffle operands. When the target provides none of those you get element extracts and scalar adds. So yes, the vectorizer does the work for you but only if you hand it the pieces. It could possibly use a larger vector, doing only the "tail" of its final reduction, so try with v8hi instead of v4hi, but it's not really clear if such strategy would be good in general.