https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026

            Bug ID: 110026
           Summary: [Bug] 5% performance drop on important benchmark after
                    r260951.
           Product: gcc
           Version: 10.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: d_vampile at 163 dot com
  Target Milestone: ---

Created attachment 55184
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55184&action=edit
Open-source stream benchmark

After the patch is submitted on AArch64, the performance of copying subitems in
the stream benchmark decreases by 3%.

Alternatively, you can obtain it from
https://github.com/jeffhammond/stream/archive/master.zip.

Compiling & Running:
gcc -fopenmp -O -DSTREAM_ARRAY_SIZE=100000000 stream.c  -o stream
./stream

Before modification: (copy subitem)
ldr x2, [x3, x0, lsl #3]
str x2, [x4, x0, lsl #3]
add x0, x0, #0x1
cmp x1, x0
b.ne 400a00 <main._omp_fn.4+0x54>
ldr x19, [sp, #16]
ldp x29, x30, [sp], #32
ret

After the modification:
ldr d0, [x2, x0, lsl #3]
str d0, [x3, x0, lsl #3]
add x0, x0, #0x1
cmp x1, x0
b.ne 400a00 <main._omp_fn.4+0x54>
ldr x19, [sp, #16]
ldp x29, x30, [sp], #32
ret

It can be seen that the vector register (X0) is used before the modification,
and the common register (D0) is used after the modification.

Reply via email to