http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48877
Summary: Inline asm for rdtsc generates silly code
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: [email protected]
ReportedBy: [email protected]
gcc -O2 -S on this input:
typedef unsigned long long u64;
u64 test()
{
u64 low, high;
asm volatile ("rdtsc" : "=a" (low), "=d" (high));
return low | (high << 32);
}
generates this:
test:
.LFB0:
.cfi_startproc
#APP
# 6 "rax_rdx.c" 1
rdtsc
# 0 "" 2
#NO_APP
movq %rax, %rcx
movq %rdx, %rax
salq $32, %rax
orq %rcx, %rax
ret
.cfi_endproc
which is silly -- both movq instructions are unnecessary.
clang -O3 -fomit-frame-pointer does much better:
test:
.Leh_func_begin0:
#APP
rdtsc
#NO_APP
shlq $32, %rdx
orq %rdx, %rax
ret
Getting rid of the << 32 makes gcc generate the obvious code.
FWIW, this code:
unsigned long long rdtsc (void)
{
unsigned int tickl, tickh;
__asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh));
return ((unsigned long long)tickh << 32)|tickl;
}
is copied verbatim from the manual in the "Machine Constraints"
(http://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints)
and generates the same silly code.