https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812
Bug ID: 102812
Summary: Unoptimal (and wrong) code for _Float16 insert
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
Following code:
--cut here--
typedef _Float16 v8hf __attribute__((__vector_size__ (16)));
v8hf t (_Float16 a)
{
return (v8hf){a, 0, 0, 0, 0, 0, 0, 0};
}
--cut here--
compiles with -msse4 to:
pxor %xmm15, %xmm15
movaps %xmm15, -56(%rsp)
pextrw $0, %xmm0, -56(%rsp)
vmovdqa64 -56(%rsp), %xmm0
PBLWNDW with cleared %xmm15 would be much more optimal, and wouldn't use
memory.
Also, VMOVDQA64 is an AVX512F/AVX512VL, not a SSE4 (not even AVX) instruction.