https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
Bug ID: 88398 Summary: vectorization failure for a small loop to do byte comparison Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- For the small case below, GCC -O3 can't vectorize the small loop to do byte comparison in func2. void *malloc(long unsigned int); typedef struct { unsigned char *buffer; } data; static unsigned char *func1(data *d) { return d->buffer; } static int func2(int max, int pos, unsigned char *cur) { unsigned char *p = cur + pos; int len = 0; while (++len != max) if (p[len] != cur[len]) break; return cur[len]; } int main (int argc) { data d; d.buffer = malloc(2*argc); return func2(argc, argc, func1(&d)); } At the moment, the following code is generated for this loop, 4004d4: 38616862 ldrb w2, [x3,x1] 4004d8: 6b00005f cmp w2, w0 4004dc: 540000a1 b.ne 4004f0 <main+0x50> 4004e0: 38616880 ldrb w0, [x4,x1] 4004e4: 6b01027f cmp w19, w1 4004e8: 91000421 add x1, x1, #0x1 4004ec: 54ffff41 b.ne 4004d4 <main+0x34> In fact, this loop can be vectorized by checking if the comparison size is aligned to SIMD register length. It may introduce run time overhead, but cost model could make decision on doing it or not.