https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88259

            Bug ID: 88259
           Summary: vectorization failure for a typical loop for getting
                    max value and index
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jiangning.liu at amperecomputing dot com
  Target Milestone: ---

GCC -O3 can't vectorize the following typical loop for getting max value and
index from an array.

void test_vec(int *data, int n) {
        int best_i, best = 0;

        for (int i = 0; i < n; i++) {
                if (data[i] > best) {
                        best = data[i];
                        best_i = i;
                }
        }

        data[best_i] = data[0];
        data[0] = best;
}

The code generated in the kernel loop is as below,

.L4:
        ldr     w4, [x0, x2, lsl 2]
        cmp     w3, w4
        csel    w6, w4, w3, lt
        csel    w5, w2, w5, lt
        add     x2, x2, 1
        mov     w3, w6
        cmp     w1, w2
        bgt     .L4

If n is a constant like 1024, gcc -O3 still fails to vectorize it.

If we only get the max value and keep only one statement in the if statement
inside the loop,

void test_vec(int *data, int n) {
        int best = 0;
        for (int i = 0; i < n; i++) {
                if (data[i] > best) {
                        best = data[i];
                }
        }

        data[0] = best;
}

"gcc -O3" can do vectorization and the kernel loop is like below,

.L4:
        ldr     q1, [x2], 16
        smax    v0.4s, v0.4s, v1.4s
        cmp     x2, x3
        bne     .L4

Reply via email to