https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Andrew Pinski from comment #7) > With just -mavx512f we produce a bunch of instructions (looking like we went > to scalar mode) while LLVM is able to produce: > foo(short __vector(16)): # @foo(short __vector(16)) > .cfi_startproc > # %bb.0: > vpmovzxwd ymm1, xmm0 # ymm1 = > xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5], > zero,xmm0[6],zero,xmm0[7],zero > vextracti128 xmm0, ymm0, 1 > vpmovzxwd ymm0, xmm0 # ymm0 = > xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5], > zero,xmm0[6],zero,xmm0[7],zero > vinserti64x4 zmm0, zmm1, ymm0, 1 > ret > > zero_extend from ymm to zmm is supported under avx512bw, LLVM breaks them into 2 zero extends from xmm to ymm, and then pack them back to zmm.