https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122272
Bug ID: 122272
Summary: Missed optimizations in classic "gcd" implementation
Product: gcc
Version: 15.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: matt at godbolt dot org
Target Milestone: ---
In the straightforward code:
```
unsigned gcd(unsigned a, unsigned b) {
if (b == 0) return a;
return gcd(b, a % b);
}
```
GCC for x86 at -O2 (or -O3) generates the following assembly, which has a
number of redundant register moves (marked with ?):
```asm
gcd(unsigned int, unsigned int):
mov eax, edi ; eax (return value) = a
test esi, esi ; is b == 0?
jne .L2 ; if b != 0; goto starting point L2
jmp .L10 ; if b = 0; goto special case
.L11: ; loop top
mov esi, edx ; set b to be a % b from the previous iteration
.L2:
xor edx, edx ; set up for divide
div esi ; eax = edx:eax (0:a) / esi (b)
; edx = edx:eax (0:a) % esi (b)
mov eax, esi ; eax = b
test edx, edx ; was a % b zero?
jne .L11 ; if not, loop around
mov eax, esi ; return esi (? eax is already esi)
ret ; returns a
.L10:
; This seems daft; when we get here, eax already has
; the right value, so the compiler has overcomplicated things.
mov esi, edi ; (?) esi = a
mov eax, esi ; (?) eax (return value) = esi
ret ; returns a
```
Nothing obvious (to me, anyway) in the optimisation report.
Clang 21 produces much tighter code (unannotated here):
```
gcd(unsigned int, unsigned int):
mov eax, edi
test esi, esi
je .LBB0_4
mov edx, esi
.LBB0_2:
mov ecx, edx
xor edx, edx
div ecx
mov eax, ecx
test edx, edx
jne .LBB0_2
mov eax, ecx
.LBB0_4:
ret
```
CE link: https://godbolt.org/z/34jfKrW57