https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement --- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> --- With just -mavx512f we produce a bunch of instructions (looking like we went to scalar mode) while LLVM is able to produce: foo(short __vector(16)): # @foo(short __vector(16)) .cfi_startproc # %bb.0: vpmovzxwd ymm1, xmm0 # ymm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero vextracti128 xmm0, ymm0, 1 vpmovzxwd ymm0, xmm0 # ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero vinserti64x4 zmm0, zmm1, ymm0, 1 ret bar(short __vector(32)): # @bar(short __vector(32)) .cfi_startproc # %bb.0: vpmovdw ymm0, zmm0 ret For -march=skylake512 we do produce now: foo(short __vector(16)): vpmovzxwd zmm0, ymm0 ret bar(short __vector(32)): vpmovdw ymm0, zmm0 ret So still confirmed for the -mavx512f case.