https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80647

            Bug ID: 80647
           Summary: vectorized loop crashes from wrongly assuming 16 byte
                    alignment
           Product: gcc
           Version: 6.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yzhang1985 at gmail dot com
  Target Milestone: ---

Created attachment 41328
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41328&action=edit
compiling with -O3 will reproduce the crash

I'm getting a crash for a function that extracts a sub region of an image
in-place. I compile with gcc -O3, which vectorizes the inner most loop,

while (twd--)
{
  *pintdest++ = *pintsrc++;
}


---------------assembly-------------------------
movdqa (%r10,%rax,1),%xmm0
add    $0x1,%ecx
movups %xmm0,(%rdx,%rax,1)
------------------------------------------------

It crashes on movdqa because the address isn't aligned. It should be using
unaligned vector loads like movdqu or lddqu instead.

I tested it with GCC 4.8 which did vectorize the loop correctly.


Starting with Nehalem, there is no penalty for using unaligned loads/stores if
the vector doesn't span 2 cache lines, so why not always generate unaligned
loads/stores? 

It used to be that the other advantage to exploit for aligned data was to fuse
the vector load/store with another instruction, reducing machine code size. But
even that alignment restriction for memory operands was relaxed starting with
SandyBridge's VEX instructions.

Reply via email to