https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86530
Bug ID: 86530 Summary: Vectorization failure for a simple loop Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- GCC -O3 can't vectorize the following simple case. $ cat test_loop_2.c int test_loop_2(char *p1, char *p2) { int s = 0; for(int i=0; i<4; i++, p1+=4, p2+=4) { s += (p1[0]-p2[0]) + (p1[1]-p2[1]) + (p1[2]-p2[2]) + (p1[3]-p2[3]); } return s; } The vector size is 4*1=4 bytes, and it doesn't directly fit into 8-byte or 16-byte vector, but we still can extend the element to be 32-bit, and use the vector operations on 4*4=16 bytes vector.