https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
Bug ID: 88398
Summary: vectorization failure for a small loop to do byte
comparison
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
For the small case below, GCC -O3 can't vectorize the small loop to do byte
comparison in func2.
void *malloc(long unsigned int);
typedef struct {
unsigned char *buffer;
} data;
static unsigned char *func1(data *d)
{
return d->buffer;
}
static int func2(int max, int pos, unsigned char *cur)
{
unsigned char *p = cur + pos;
int len = 0;
while (++len != max)
if (p[len] != cur[len])
break;
return cur[len];
}
int main (int argc) {
data d;
d.buffer = malloc(2*argc);
return func2(argc, argc, func1(&d));
}
At the moment, the following code is generated for this loop,
4004d4: 38616862 ldrb w2, [x3,x1]
4004d8: 6b00005f cmp w2, w0
4004dc: 540000a1 b.ne 4004f0 <main+0x50>
4004e0: 38616880 ldrb w0, [x4,x1]
4004e4: 6b01027f cmp w19, w1
4004e8: 91000421 add x1, x1, #0x1
4004ec: 54ffff41 b.ne 4004d4 <main+0x34>
In fact, this loop can be vectorized by checking if the comparison size is
aligned to SIMD register length. It may introduce run time overhead, but cost
model could make decision on doing it or not.