https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105181
Bug ID: 105181 Summary: [optimization] gcc generate worse code than clang base on neon Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: zhongyunde at huawei dot com Target Milestone: --- test case: void loop(int N, double *a, double *b) { // #pragma clang loop vectorize_width(4, scalable) for (int i = 0; i < N; i++) { a[i] = b[i] + 1.0; } } gcc's kernel loop body: .L4: ldr q0, [x2, x3] fadd v0.2d, v0.2d, v1.2d str q0, [x1, x3] add x3, x3, 16 cmp x3, x0 bne .L4 llvm's kernel loop body: .LBB0_9: // =>This Inner Loop Header: Depth=1 ldr q1, [x12], #16 subs x10, x10, #2 fadd v1.2d, v1.2d, v0.2d str q1, [x11], #16 b.ne .LBB0_9 see detail in https://godbolt.org/z/54nssME4f