https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978
Bug ID: 113978
Summary: Misoptimize for long vector load operation
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: xjkp2283572185 at gmail dot com
Target Milestone: ---
===
Compiler
===
Using built-in specs.
COLLECT_GCC=D:\Tools\gcc\bin\g++.exe
COLLECT_LTO_WRAPPER=D:/Tools/gcc/bin/../libexec/gcc/x86_64-w64-mingw32/14.0.1/lto-wrapper.exe
Target: x86_64-w64-mingw32
Configured with: ../configure --disable-werror
--prefix=/home/luo/x86_64-w64-mingw32-native-gcc14 --host=x86_64-w64-mingw32
--target=x86_64-w64-mingw32 --enable-multilib --enable-languages=c,c++
--disable-sjlj-exceptions --enable-threads=win32
Thread model: win32
Supported LTO compression algorithms: zlib
gcc version 14.0.1 20240130 (experimental) (GCC)
===
Source Code
===
using v [[using gnu: vector_size(128)]] = char;
auto f(v* p) noexcept
{
return *p;
}
===
Command
===
g++ test.cpp -Ofast -march=znver4
===
Result
===
_Z1fPDv128_c:
.LFB0:
subq $248, %rsp
.seh_stackalloc 248
.seh_endprologue
vmovdqa64 (%rdx), %zmm0
movq %rcx, %rax
vmovdqa64 %zmm0, (%rcx)
vmovdqa64 64(%rdx), %zmm0
vmovdqa64 %zmm0, 64(%rcx)
vzeroupper
addq $248, %rsp
ret
GCC generates extra stack operation. But clang just generates two load:
_Z1fPDv128_c: # @_Z1fPDv128_c
# %bb.0:
vmovaps (%rcx), %zmm0
vmovaps 64(%rcx), %zmm1
retq