https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104441
Bug ID: 104441 Summary: [12 Regression] vzeroupper is placed at the wrong place Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hjl.tools at gmail dot com CC: crazylht at gmail dot com, lili.cui at intel dot com Target Milestone: --- Target: x86-64 Created attachment 52375 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52375&action=edit A testcase When compiled with -march=skylake -Wno-attributes, GCC 12 generated: .L3: vmovd (%rdi), %xmm0 vmovd (%rdi,%r13), %xmm1 vpinsrd $1, (%rdi,%r12), %xmm1, %xmm1 vpinsrd $1, (%rdi,%rsi), %xmm0, %xmm0 vmovd (%rax,%rbx), %xmm2 vinserti128 $0x1, %xmm1, %ymm0, %ymm0 vmovd (%rax), %xmm1 vpinsrd $1, (%rax,%rcx), %xmm1, %xmm1 vpinsrd $1, (%rax,%r11), %xmm2, %xmm2 addl $4, %edx vinserti128 $0x1, %xmm2, %ymm1, %ymm1 vpsadbw %ymm1, %ymm0, %ymm0 vpaddd %ymm0, %ymm3, %ymm0 vmovdqa %ymm0, %ymm3 addq %r10, %rdi addq %r9, %rax cmpl %r8d, %edx jb .L3 vzeroupper <<<<<<<<<<< Clear upper 128bits. popq %rbx popq %r12 vextracti128 $0x1, %ymm3, %xmm3 << The upper 128bits of YMM3 are used. vpaddd %xmm3, %xmm0, %xmm0 popq %r13 vmovd %xmm0, %eax popq %rbp This is triggered by commit 9775e465c1fbfc32656de77c618c61acf5bd905d Author: H.J. Lu <hjl.to...@gmail.com> Date: Tue Jul 27 07:46:04 2021 -0700 x86: Don't set AVX_U128_DIRTY when zeroing YMM/ZMM register