https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109079
Bug ID: 109079 Summary: Missing optimization for x86 avx intrinsic _mm256_zeroall(). Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: dorazzsoft at gmail dot com Target Milestone: --- Here is a simple code: https://godbolt.org/z/q9o5rf4dM #include <immintrin.h> void fn(float *out) { _mm256_zeroall(); register __m256 r0; r0 = _mm256_setzero_ps(); _mm256_storeu_ps(out, r0); } which is compiled into fn: vzeroall vxorps xmm0, xmm0, xmm0 vmovups YMMWORD PTR [rdi], ymm0 vzeroupper ret There are both vzeroall and vxorps instructions in the code, but only one is needed. In my specific use case (matrix product), I want to initialize multiple registers using vzeroall with _mm256_zeroall() to reduce code size and prevent uninitialized variable warnings by setting all register variables as _mm256_setzero_ps(). The missing optimization makes the leading _mm256_zeroall() instruction useless.