https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86530

            Bug ID: 86530
           Summary: Vectorization failure for a simple loop
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jiangning.liu at amperecomputing dot com
  Target Milestone: ---

GCC -O3 can't vectorize the following simple case. 

$ cat test_loop_2.c
int test_loop_2(char *p1, char *p2)
{
    int s = 0;
    for(int i=0; i<4; i++, p1+=4, p2+=4)
    {
        s += (p1[0]-p2[0]) + (p1[1]-p2[1]) + (p1[2]-p2[2]) + (p1[3]-p2[3]);
    }

    return s;
}

The vector size is 4*1=4 bytes, and it doesn't directly fit into 8-byte or
16-byte vector, but we still can extend the element to be 32-bit, and use the
vector operations on 4*4=16 bytes vector.

Reply via email to