http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
Summary: Horrible code generation for trivial decrement with
test
Product: gcc
Version: 4.5.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: other
AssignedTo: [email protected]
ReportedBy: [email protected]
This trivial code:
extern void fncall(void *);
int main(int argc, char **argv)
{
if (!--*argv)
fncall(argv);
return 0;
}
compiles into this ridiculous x86-64 assembly language:
movq (%rsi), %rax
subq $1, %rax
testq %rax, %rax
movq %rax, (%rsi)
je .L4
for the "decrement and test result" at -O2.
I'd have expected that any reasonable compiler would generate something like
decq (%rsi)
je .L4
instead, which would be smaller and faster (even a "subq $1" would be fine, but
the decq is one byte shorter).
The problem is more noticeable when the memory location is a structure offset,
when the "load+decrement+store" model really results in relatively much bigger
code due to the silly repetition of the memory address, for absolutely no
advantage.
Is there some way that I haven't found to make gcc use the rmw instructions?