https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057
--- Comment #1 from JuzheZhong <juzhe.zhong at rivai dot ai> --- https://godbolt.org/z/q1E6dn6T9 Try -fno-vect-cost-model, it can be vectorized. I think both Clang and GCC (with no cost vect model) vectorized code can't give better performance in a wide-issue OOO superscalar machine.