http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48877
Summary: Inline asm for rdtsc generates silly code Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: l...@mit.edu gcc -O2 -S on this input: typedef unsigned long long u64; u64 test() { u64 low, high; asm volatile ("rdtsc" : "=a" (low), "=d" (high)); return low | (high << 32); } generates this: test: .LFB0: .cfi_startproc #APP # 6 "rax_rdx.c" 1 rdtsc # 0 "" 2 #NO_APP movq %rax, %rcx movq %rdx, %rax salq $32, %rax orq %rcx, %rax ret .cfi_endproc which is silly -- both movq instructions are unnecessary. clang -O3 -fomit-frame-pointer does much better: test: .Leh_func_begin0: #APP rdtsc #NO_APP shlq $32, %rdx orq %rdx, %rax ret Getting rid of the << 32 makes gcc generate the obvious code. FWIW, this code: unsigned long long rdtsc (void) { unsigned int tickl, tickh; __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh)); return ((unsigned long long)tickh << 32)|tickl; } is copied verbatim from the manual in the "Machine Constraints" (http://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints) and generates the same silly code.