------- Comment #5 from adam at consulting dot net dot nz  2010-09-13 00:24 
-------
Andrew Pinski wrote:

   >This is caused by revision 160124:

   Not really, it is a noreturn function so the behavior is correct for our
   policy of allowing a more correct backtrace for noreturn functions.

I'm not sure what you're trying to say here Andrew. Are you trying to justify
-O3 generating slower code to simplify debugging?

   The testcase is a incorrect one based on size

If you mean zero-extension of 32-bit function pointers, this is the x86-64
small code model.

If you mean that you don't care that the testcase increased in size without
further benchmarking then empirical analysis is actually unnecessary. The
generated assembly is clearly worse.

  and not really that interesting anymore with respect of global register 
  variables.

It's another example of global register variables being copied for no good
reason whatsoever. RAX is free and the obvious translation of uint32_t next =
Iptr[1]; to x86-64 assembly is mov eax,DWORD PTR [rbp+0x4]; (Intel syntax,
where RBP is the global register variable). Generating mov rax,rbp; mov
eax,DWORD PTR [rax+0x4]; is just dumb.

I've been experimenting with optimal forms of virtual machine dispatch for a
long time and what you have is a fragment of a very fast direct threaded
interpreter. So fast in fact that a type-safe countdown will execute at 5
cycles per iteration on Intel Core 2:

#include <assert.h>
#include <stdint.h>
#include <stdlib.h>

#define LIKELY(x)   __builtin_expect(!!(x), 1)
#define UNLIKELY(x) __builtin_expect(!!(x), 0)

register uint32_t *Iptr __asm__("rbp");

typedef void (*inst_t)(uint64_t types, uint64_t a, uint64_t b);

#define FUNC(x) ((inst_t) (uint64_t) x)
#define INST(x) ((uint32_t) (uint64_t) x)

__attribute__ ((noinline)) void dec_helper(uint64_t types, uint64_t a, uint64_t
b) {
  assert("FIXME"=="");
}

void dec(uint64_t types, uint64_t a, uint64_t b) {
  if (LIKELY((types & 0xFF) == 1)) {
    uint32_t next = Iptr[1];
    --a;
    ++Iptr;
    FUNC(next)(types, a, b);
  } else dec_helper(types, a, b);
}


__attribute__ ((noinline)) void if_not_equal_jump_back_1_helper(uint64_t types,
uint64_t a, uint64_t b) {
  assert("FIXME"=="");
}

void if_not_equal_jump_back_1(uint64_t types, uint64_t a, uint64_t b) {
  if (LIKELY((types & 0xFFFF) == 0x0101)) {
    if (LIKELY(a != b)) {
      uint32_t next = Iptr[-1];
      --Iptr;
      FUNC(next)(types, a, b);
    } else {
      uint32_t next = Iptr[1];
      ++Iptr;
      FUNC(next)(types, a, b);
    }
  } else if_not_equal_jump_back_1_helper(types, a, b);
}

void unconditional_exit(uint64_t types, uint64_t a, uint64_t b) {
  exit(0);
}

__attribute__ ((noinline, noclone)) void execute(uint32_t *code, uint64_t
types, uint64_t a, uint64_t b) {
  Iptr = code;
  FUNC(code[0])(types, a, b);
}

int main() {
  uint32_t code[]={INST(dec),
                   INST(if_not_equal_jump_back_1),
                   INST(unconditional_exit)};
  execute(code + 1, 0x0101, 3000000000, 0);
  return 0;
}

$ gcc-4.5 -O3 -std=gnu99 plain-32bit-direct-dispatch-countdown.c && time
./a.out 

real    0m5.007s
user    0m4.996s
sys     0m0.004s

CPU is 3GHz. Code execution starts at the second instruction
(if_not_equal_jump_back_1). a==3000000000 of type==1 is not equal to b==0 of
type==1 (the two type comparisons are performed in parallel in one cycle
without masking since one can compare the low 8-, 16- or 32-bits of a 64-bit
register without masking and the two types are packed into the low 16-bits of
the types register).

As a!=b the code jumps back to the dec instruction, which performs another type
check that a is of type==1 before decrementing a and jumping to
if_not_equal_jump_back_1. This continues until a==0 and program exit occurs.

While the generated assembly of GCC snapshot speaks for itself, here's some
empirical evidence of its inferiority:

$ gcc-snapshot.sh -O3 -std=gnu99 plain-32bit-direct-dispatch-countdown.c &&
time ./a.out 

real    0m10.014s
user    0m10.009s
sys     0m0.000s

GCC snapshot has doubled the execution time of this virtual machine example
(compared to gcc-4.3, gcc-4.4 and gcc-4.5).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

Reply via email to