http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55590



             Bug #: 55590

           Summary: SRA still produces unnecessarily unaligned memory

                    accesses

    Classification: Unclassified

           Product: gcc

           Version: 4.8.0

            Status: UNCONFIRMED

          Severity: normal

          Priority: P3

         Component: tree-optimization

        AssignedTo: jamb...@gcc.gnu.org

        ReportedBy: jamb...@gcc.gnu.org





SRA can still produce unaligned memory accesses which should be

aligned when it's basing its new scalar access on a MEM_REF buried

below COMPONENT_REFs or ARRAY_REFs.



Testcase 1:



/* { dg-do compile } */

/* { dg-options "-O2 -mavx" } */



#include <immintrin.h>



struct S

{

  __m128 a, b;

};



struct T

{

  int a;

  struct S s;

};



void foo (struct T *p, __m128 v)

{

  struct S s;



  s = p->s;

  s.b = _mm_add_ps(s.b, v);

  p->s = s;

}



/* { dg-final { scan-assembler-not "vmovups" } } */



on x86_64 compiles to



        vmovups 32(%rdi), %xmm1

        vaddps  %xmm0, %xmm1, %xmm0

        vmovups %xmm0, 32(%rdi)



even though it should really be



        vaddps  32(%rdi), %xmm0, %xmm0

        vmovaps %xmm0, 32(%rdi)

        ret







Testcase 2 (which describes why this should be fixed differently from

the recent IPA-SRA patch because of the variable array index):



/* { dg-do compile } */

/* { dg-options "-O2 -mavx" } */



#include <immintrin.h>



struct S

{

  __m128 a, b;

};



struct T

{

  int a;

  struct S s[8];

};



void foo (struct T *p, int i, __m128 v)

{

  struct S s;



  s = p->s[i];

  s.b = _mm_add_ps(s.b, v);

  p->s[i] = s;

}



/* { dg-final { scan-assembler-not "vmovups" } } */



Compiles to



        movslq  %esi, %rsi

        salq    $5, %rsi

        leaq    16(%rdi,%rsi), %rax

        vmovups 16(%rax), %xmm1

        vaddps  %xmm0, %xmm1, %xmm0

        vmovups %xmm0, 16(%rax)

        ret



when it should produce



        movslq  %esi, %rsi

        salq    $5, %rsi

        leaq    16(%rdi,%rsi), %rax

        vaddps  16(%rax), %xmm0, %xmm0

        vmovaps %xmm0, 16(%rax)

        ret



I'm testing a patch.

Reply via email to