https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109079

            Bug ID: 109079
           Summary: Missing optimization for x86 avx intrinsic
                    _mm256_zeroall().
           Product: gcc
           Version: 12.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dorazzsoft at gmail dot com
  Target Milestone: ---

Here is a simple code: https://godbolt.org/z/q9o5rf4dM

#include <immintrin.h>

void fn(float *out) {
    _mm256_zeroall();
    register __m256 r0;
    r0 = _mm256_setzero_ps();
    _mm256_storeu_ps(out, r0);
}

which is compiled into

fn:
        vzeroall
        vxorps  xmm0, xmm0, xmm0
        vmovups YMMWORD PTR [rdi], ymm0
        vzeroupper
        ret

There are both vzeroall and vxorps instructions in the code, but only one is
needed.
In my specific use case (matrix product), I want to initialize multiple
registers using vzeroall with _mm256_zeroall() to reduce code size and prevent
uninitialized variable warnings by setting all register variables as
_mm256_setzero_ps().
The missing optimization makes the leading _mm256_zeroall() instruction
useless.

Reply via email to