http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48877

           Summary: Inline asm for rdtsc generates silly code
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: l...@mit.edu


gcc -O2 -S on this input:

typedef unsigned long long u64;

u64 test()
{
  u64 low, high;
  asm volatile ("rdtsc" : "=a" (low), "=d" (high));
  return low | (high << 32);
}

generates this:

test:
.LFB0:
        .cfi_startproc
#APP
# 6 "rax_rdx.c" 1
        rdtsc
# 0 "" 2
#NO_APP
        movq    %rax, %rcx
        movq    %rdx, %rax
        salq    $32, %rax
        orq     %rcx, %rax
        ret
        .cfi_endproc

which is silly -- both movq instructions are unnecessary.

clang -O3 -fomit-frame-pointer does much better:

test:
.Leh_func_begin0:
        #APP
        rdtsc
        #NO_APP
        shlq    $32, %rdx
        orq     %rdx, %rax
        ret

Getting rid of the << 32 makes gcc generate the obvious code.

FWIW, this code:

unsigned long long rdtsc (void)
{
  unsigned int tickl, tickh;
  __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh));
  return ((unsigned long long)tickh << 32)|tickl;
}

is copied verbatim from the manual in the "Machine Constraints"
(http://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints)
and generates the same silly code.

Reply via email to