[Bug tree-optimization/54422] New: Merge adjacent stores of elements of a vector (or loads)

glisse at gcc dot gnu.org Thu, 30 Aug 2012 09:01:59 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54422


             Bug #: 54422
           Summary: Merge adjacent stores of elements of a vector (or
                    loads)
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
        AssignedTo: [email protected]
        ReportedBy: [email protected]
            Target: x86_64-linux-gnu


Hello,

#include <x86intrin.h>
void f1(__m128d*dd,__m128d e){
  double*d=(double*)dd;
  d[0]=e[0];
  d[1]=e[1];
}
void f2(__m128d*dd,__m128d e){
  _mm_storeu_pd((double*)dd,e);
}
void f3(__m128d*dd,__m128d e){
  __builtin_memcpy(dd,&e,16);
}

for this code, gcc -O3 -mavx2 generates:

for f2:
    vmovupd    %xmm0, (%rdi)
(it could possibly have guessed that the alignment was right, but I don't mind
today)

for f1:
    vmovlpd    %xmm0, (%rdi)
    vmovhpd    %xmm0, 8(%rdi)
(this is my main issue, could it merge those into a vmovupd?)

for f3:
    vmovdqa    %xmm0, -40(%rsp)
    movq    -40(%rsp), %rax
    vmovapd    %xmm0, -24(%rsp)
    movq    %rax, (%rdi)
    movq    -16(%rsp), %rax
    movq    %rax, 8(%rdi)
(I hope the sse memcpy patch at
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00336.html will eventually help
with that)


At tree level, for f1, we have:
  _3 = BIT_FIELD_REF <e_5(D), 64, 0>;
  MEM[(double *)dd_1(D)] = _3;
  _6 = BIT_FIELD_REF <e_5(D), 64, 64>;
  MEM[(double *)dd_1(D) + 8B] = _6;

merging those 2 looks like it might be possible (though I am not familiar with
that part of the compiler, maybe only the backend can handle it). Note that I
am interested in both the aligned and unaligned cases (if f1 takes a double*
argument instead of a __m128d*), and in both loads and stores.

Most relevant other bugs I found were: PR 41464, PR 23684, PR 47059.

[Bug tree-optimization/54422] New: Merge adjacent stores of elements of a vector (or loads)

Reply via email to