https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109079
Bug ID: 109079
Summary: Missing optimization for x86 avx intrinsic
_mm256_zeroall().
Product: gcc
Version: 12.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: dorazzsoft at gmail dot com
Target Milestone: ---
Here is a simple code: https://godbolt.org/z/q9o5rf4dM
#include <immintrin.h>
void fn(float *out) {
_mm256_zeroall();
register __m256 r0;
r0 = _mm256_setzero_ps();
_mm256_storeu_ps(out, r0);
}
which is compiled into
fn:
vzeroall
vxorps xmm0, xmm0, xmm0
vmovups YMMWORD PTR [rdi], ymm0
vzeroupper
ret
There are both vzeroall and vxorps instructions in the code, but only one is
needed.
In my specific use case (matrix product), I want to initialize multiple
registers using vzeroall with _mm256_zeroall() to reduce code size and prevent
uninitialized variable warnings by setting all register variables as
_mm256_setzero_ps().
The missing optimization makes the leading _mm256_zeroall() instruction
useless.