https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105181

            Bug ID: 105181
           Summary: [optimization] gcc generate worse code than clang base
                    on neon
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zhongyunde at huawei dot com
  Target Milestone: ---

test case:
void loop(int N, double *a, double *b) {
  // #pragma clang loop vectorize_width(4, scalable)
  for (int i = 0; i < N; i++) {
    a[i] = b[i] + 1.0;
  }
}

gcc's kernel loop body:
.L4:
        ldr     q0, [x2, x3]
        fadd    v0.2d, v0.2d, v1.2d
        str     q0, [x1, x3]
        add     x3, x3, 16
        cmp     x3, x0
        bne     .L4

llvm's kernel loop body:
.LBB0_9:                                // =>This Inner Loop Header: Depth=1
        ldr     q1, [x12], #16
        subs    x10, x10, #2
        fadd    v1.2d, v1.2d, v0.2d
        str     q1, [x11], #16
        b.ne    .LBB0_9

see detail in https://godbolt.org/z/54nssME4f

Reply via email to