https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554
--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> --- Got it, thanks for your detail explanation, so there're 2 issues in this case, first x86 target didn't choose vector size w/ smallest cost, second BB vectorization with gaps at the end of a load is not supported. on the other side, if "BB vectorization with gaps at the end of a load is not supported", cost of scalar version should be cheaper than both 128 and 256 vectorization. I've once tried to increase cost of vec_construct to make it more realistic, but the patch regressed PR101929. The current cost model tends to generate more vectorized code.