gcc 4.8.2 fails to do optimization on global register variables when compiling on x86_64 Linux.
Consider the following code: --------------------------------------------------------------------- include <stdint.h> register uint64_t i0_BP __asm__ ("r14"); register uint64_t i0_SP __asm__ ("r15"); void test(void) { *((uint64_t*) (i0_SP - 8)) = i0_BP; i0_BP = i0_SP - 0x8; i0_SP -= 0x100; i0_SP = i0_BP; i0_BP = *((uint64_t*) i0_SP); i0_SP += 0x8; return; } --------------------------------------------------------------------- Apply either -O3 or -Os option to gcc, the final object file gives the same results as follows: --------------------------------------------------------------------- <test>: 0: lea 0xfffffffffffffff8(%r15),%rcx 4: mov %r14,%rdx 7: mov %r15,%rax a: mov %r14,0xfffffffffffffff8(%r15) e: mov %rcx,%r14 11: mov %rcx,%r15 14: mov %rdx,%r14 17: mov %rax,%r15 1a: retq --------------------------------------------------------------------- Here we just try to emulate a function call. In the object file, there are apparently lots of redundant movs between registers. It seems to be a bug in gcc since we have already apply the maximum optimization level possible. Environment: On CentOS 5.10 (Linux 2.6.18 x86_64) using GCC 4.8.2 Using built-in specs. COLLECT_GCC=gcc4 COLLECT_LTO_WRAPPER=/usr/local/GNU/gcc-4.8.2/libexec/gcc/x86_64-unknown-linu x-gnu/4.8.2/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-4.8.2/configure --prefix=/usr/local/GNU/gcc-4.8.2 --enable-clocale=generic Thread model: posix gcc version 4.8.2 (GCC)