https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115690
Bug ID: 115690
Summary: Strange codegen for small fixed-size `memcpy` when
targeting `-march=i486`
Product: gcc
Version: 14.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: arcata at gmail dot com
Target Milestone: ---
Given the following C code:
```
void *memcpy(void *a, const void *b, unsigned long c);
void foo(unsigned *x, unsigned *y) {
memcpy(x, y, 16);
}
```
Using gcc 14.1, `gcc -m32 -march=i486 -O2` produces the following assembly:
```
foo:
push edi
push esi
mov ecx, DWORD PTR [esp+12]
mov esi, DWORD PTR [esp+16]
mov eax, DWORD PTR [esi]
mov DWORD PTR [ecx], eax
mov eax, DWORD PTR [esi+12]
mov DWORD PTR [ecx+12], eax
lea edi, [ecx+4]
and edi, -4
sub ecx, edi
sub esi, ecx
add ecx, 16
shr ecx, 2
rep movsd
pop esi
pop edi
ret
```
While not wrong, this seems suboptimal compared to either using `rep movsd` to
do the entire memcpy or breaking it down into four 32-bit loads and stores.
`-march=i386` does the former:
```
foo:
push edi
push esi
mov esi, DWORD PTR [esp+16]
mov ecx, 4
mov edi, DWORD PTR [esp+12]
rep movsd
pop esi
pop edi
ret
```
and `-march=i586` does the latter:
```
foo:
mov edx, DWORD PTR [esp+8]
mov eax, DWORD PTR [esp+4]
mov ecx, DWORD PTR [edx]
mov DWORD PTR [eax], ecx
mov ecx, DWORD PTR [edx+4]
mov DWORD PTR [eax+4], ecx
mov ecx, DWORD PTR [edx+8]
mov DWORD PTR [eax+8], ecx
mov edx, DWORD PTR [edx+12]
mov DWORD PTR [eax+12], edx
ret
```
either of which seems like it would better suit the i486 microarchitecture than
the hybrid approach it seems to be taking.