[Bug target/52572] New: suboptimal assignment to avx element

marc.glisse at normalesup dot org Mon, 12 Mar 2012 15:50:22 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52572


             Bug #: 52572
           Summary: suboptimal assignment to avx element
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: [email protected]
        ReportedBy: [email protected]


For the following program:
#include <x86intrin.h>
__m256d f(__m256d x){
  x[0]=0;
  return x;
}

gcc -O3 generates:
    vmovlpd    .LC0(%rip), %xmm0, %xmm1
    vinsertf128    $0x0, %xmm1, %ymm0, %ymm0
or with -Os:
    vxorps    %xmm2, %xmm2, %xmm2
    vmovsd    %xmm2, %xmm0, %xmm1
    vinsertf128    $0x0, %xmm1, %ymm0, %ymm0

If I understand correctly, it first constructs {0,x[1],0,0} and then merges it
with the upper part of x. However, using the legacy movlpd instruction would
avoid zeroing the upper 128 bits and thus the vinsertf128 wouldn't be needed.

Is there a policy not to generate the non-VEX instructions anymore, or is this
a missed optimization?

Setting x[1] is similar. For x[2] or x[3], we get extract+mov+insert, but it
might be better to do something with vblendpd.

[Bug target/52572] New: suboptimal assignment to avx element

Reply via email to