https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119114
--- Comment #4 from Robin Dapp <rdapp at gcc dot gnu.org> --- Very weird indeed. It looks like we're not even vectorizing? I mean, sure, we use vector instructions but they are all broadcast from scalars? (VMAT_INVARIANT) And in the end we extract the first element without a reduction. Can't reproduce it on aarch64.