https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91994
Uroš Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |---
--- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to [email protected] from comment #8)
> Fixed for the reduced testcase. Please reopen if there's still a problem
> with the SPEC test itself.
Please note that when the testcase from the comment #5 is compiled with
"-march=skylake -O2 -mavx512f", then a vzeroupper before the call to "foo" is
now missing:
bar:
pushq %rbp
movq %rsp, %rbp
andq $-32, %rsp
subq $32, %rsp
vmovdqa x1(%rip), %ymm0
vmovdqa %ymm0, (%rsp)
call foo
vmovdqa (%rsp), %ymm0
vmovdqa %ymm0, x3(%rip)
vzeroupper
leave
ret
gcc-9.2.1 compiles the function to:
bar:
pushq %rbp
movq %rsp, %rbp
andq $-32, %rsp
subq $32, %rsp
vmovdqa x1(%rip), %ymm1
vmovdqa %ymm1, (%rsp)
vzeroupper <---- here
call foo
vmovdqa (%rsp), %ymm1
vmovdqa %ymm1, x3(%rip)
vzeroupper
leave
ret
(I would also expect that %ymm 16+ is uses as a temporary, as it is not
clobbered by a vzeroupper in "foo").