https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398

            Bug ID: 88398
           Summary: vectorization failure for a small loop to do byte
                    comparison
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jiangning.liu at amperecomputing dot com
  Target Milestone: ---

For the small case below, GCC -O3 can't vectorize the small loop to do byte
comparison in func2.

void *malloc(long unsigned int);
typedef struct {
        unsigned char *buffer;
} data;

static unsigned char *func1(data *d)
{
        return d->buffer;
}

static int func2(int max, int pos, unsigned char *cur)
{
        unsigned char *p = cur + pos;
        int len = 0;
        while (++len != max)
                if (p[len] != cur[len])
                        break;
        return cur[len];
}

int main (int argc) {
        data d;
        d.buffer = malloc(2*argc);
        return func2(argc, argc, func1(&d));
}

At the moment, the following code is generated for this loop,

  4004d4:       38616862        ldrb    w2, [x3,x1]
  4004d8:       6b00005f        cmp     w2, w0
  4004dc:       540000a1        b.ne    4004f0 <main+0x50>
  4004e0:       38616880        ldrb    w0, [x4,x1]
  4004e4:       6b01027f        cmp     w19, w1
  4004e8:       91000421        add     x1, x1, #0x1
  4004ec:       54ffff41        b.ne    4004d4 <main+0x34>

In fact, this loop can be vectorized by checking if the comparison size is
aligned to SIMD register length. It may introduce run time overhead, but cost
model could make decision on doing it or not.

Reply via email to