https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88259
Bug ID: 88259 Summary: vectorization failure for a typical loop for getting max value and index Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- GCC -O3 can't vectorize the following typical loop for getting max value and index from an array. void test_vec(int *data, int n) { int best_i, best = 0; for (int i = 0; i < n; i++) { if (data[i] > best) { best = data[i]; best_i = i; } } data[best_i] = data[0]; data[0] = best; } The code generated in the kernel loop is as below, .L4: ldr w4, [x0, x2, lsl 2] cmp w3, w4 csel w6, w4, w3, lt csel w5, w2, w5, lt add x2, x2, 1 mov w3, w6 cmp w1, w2 bgt .L4 If n is a constant like 1024, gcc -O3 still fails to vectorize it. If we only get the max value and keep only one statement in the if statement inside the loop, void test_vec(int *data, int n) { int best = 0; for (int i = 0; i < n; i++) { if (data[i] > best) { best = data[i]; } } data[0] = best; } "gcc -O3" can do vectorization and the kernel loop is like below, .L4: ldr q1, [x2], 16 smax v0.4s, v0.4s, v1.4s cmp x2, x3 bne .L4