https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114448
Bug ID: 114448
Summary: Roundup not optimized
Product: gcc
Version: 13.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: pali at kernel dot org
Target Milestone: ---
https://godbolt.org/z/4fPKGzs1M
Straightforward code which round up unsigned number to the next multiply of 4
is:
(num % 4 == 0) ? num : num + (4 - num % 4);
gcc -O2 generates:
mov edx, edi
mov eax, edi
and edx, -4
add edx, 4
test dil, 3
cmovne eax, edx
ret
This is not optimal and branch/test can be avoided by using double modulo:
num + (4 - num % 4) % 4;
for which gcc -O2 generates:
mov eax, edi
neg eax
and eax, 3
add eax, edi
ret
Optimal implementation for round up 4 is using bithacks:
(num + 3) & ~3;
for which gcc -O2 generates:
lea eax, [rdi+3]
and eax, -4
ret