https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026
Bug ID: 110026 Summary: [Bug] 5% performance drop on important benchmark after r260951. Product: gcc Version: 10.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: d_vampile at 163 dot com Target Milestone: --- Created attachment 55184 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55184&action=edit Open-source stream benchmark After the patch is submitted on AArch64, the performance of copying subitems in the stream benchmark decreases by 3%. Alternatively, you can obtain it from https://github.com/jeffhammond/stream/archive/master.zip. Compiling & Running: gcc -fopenmp -O -DSTREAM_ARRAY_SIZE=100000000 stream.c -o stream ./stream Before modification: (copy subitem) ldr x2, [x3, x0, lsl #3] str x2, [x4, x0, lsl #3] add x0, x0, #0x1 cmp x1, x0 b.ne 400a00 <main._omp_fn.4+0x54> ldr x19, [sp, #16] ldp x29, x30, [sp], #32 ret After the modification: ldr d0, [x2, x0, lsl #3] str d0, [x3, x0, lsl #3] add x0, x0, #0x1 cmp x1, x0 b.ne 400a00 <main._omp_fn.4+0x54> ldr x19, [sp, #16] ldp x29, x30, [sp], #32 ret It can be seen that the vector register (X0) is used before the modification, and the common register (D0) is used after the modification.