https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123481
Bug ID: 123481
Summary: newlib memcpy
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: xuli1 at eswincomputing dot com
Target Milestone: ---
Created attachment 63280
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=63280&action=edit
A modified memcpy function from newlib can be used to verify the above issue.
Starting with GCC 14.0, the `memcpy` function in `newlib` does not access
addresses sequentially when load and store data. It loads the last data to be
stored into a register first. This might be related to an optimization, but it
has a significant negative impact on our CPU's memory microarchitecture. This
could cause the first piece of data and subsequent data to reside in different
cache lines, resulting in write stream invalidation. How can I modify GCC to
avoid non-sequential access to contiguous addresses?
The test files are in the attachment below.
Compilation command:
riscv32-unknown-elf-gcc -march=rv32gc -mabi=ilp32d -O2 memcpy.c -S -o
memcpy.s
A portion of the assembly code for `memcpy` is as follows(Note the first
`lw/sw` instruction):
.L7:
lw a2,32(s1)
lw t5,0(s1)
lw t4,4(s1)
lw t3,8(s1)
lw t1,12(s1)
lw a7,16(s1)
lw a6,20(s1)
lw a0,24(s1)
lw a1,28(s1)
addi a5,a5,36
sw a2,-4(a5)
sw t5,-36(a5)
sub a2,a4,a5
sw t4,-32(a5)
sw t3,-28(a5)
sw t1,-24(a5)
sw a7,-20(a5)
sw a6,-16(a5)
sw a0,-12(a5)
sw a1,-8(a5)
addi s1,s1,36
bgt a2,a3,.L7