https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116145
Bug ID: 116145
Summary: Suboptimal SVE immediate synthesis
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: aarch64-sve, missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: ktkachov at gcc dot gnu.org
Target Milestone: ---
Target: aarch64
While optimising some string matching code I wanted to create a vector of
characters to match through an svdup and an svreinterpret but am getting
suboptimal codegen through the constant pool:
A minimised testcase:
#include <arm_sve.h>
svuint8_t
foo (void)
{
return svreinterpret_u8(svdup_u32(0x0a0d5c3f));
}
generates for -O2 -march=armv9-a:
foo:
ptrue p3.b, all
adrp x0, .LC0
add x0, x0, :lo12:.LC0
ld1rw z0.s, p3/z, [x0]
ret
.LC0:
.word 168647743
but LLVM can do it with:
foo:
mov w8, #23615
movk w8, #2573, lsl #16
mov z0.s, w8
ret