Hi I've found a case which looks like it should be possible to optimise but gcc (very recent trunk) isn't doing which could give improvements in many cases - certainly in a case I've come across:
#ifdef NEW unsigned int fn(unsigned int n, unsigned int dmax) throw() { for (unsigned int d = 0; d < dmax; ++d) { n += d?d:1; } return n; } #else unsigned int fn(unsigned int n, unsigned int dmax) throw() { unsigned int add = 1; for (unsigned int d = 0; d < dmax; add = ++d) { n += add; } return n; } #endif When compiled with -O3 -DOLD I get: .p2align 4,,15 .globl _Z2fnjj .type _Z2fnjj, @function _Z2fnjj: .LFB2: testl %esi, %esi je .L2 movl $1, %edx xorl %eax, %eax .p2align 4,,10 .p2align 3 .L3: addl $1, %eax addl %edx, %edi cmpl %esi, %eax movl %eax, %edx jne .L3 .L2: movl %edi, %eax ret .LFE2: .size _Z2fnjj, .-_Z2fnjj but with -DNEW I get: .p2align 4,,15 .globl _Z2fnjj .type _Z2fnjj, @function _Z2fnjj: .LFB2: testl %esi, %esi je .L2 movl $1, %edx xorl %eax, %eax movl $1, %ecx jmp .L7 .p2align 4,,10 .p2align 3 .L5: testl %eax, %eax movl %ecx, %edx cmovne %eax, %edx .L7: addl $1, %eax addl %edx, %edi cmpl %esi, %eax jne .L5 .L2: movl %edi, %eax ret .LFE2: .size _Z2fnjj, .-_Z2fnjj The performance difference is about 50% with -DNEW taking 1.5 times as long as -DOLD (that was with dmax == 1000000000). The loop unfortunately can't always be written as in -DOLD as the implementation of an iterator adapter might use ?: to special case the first element of a sequence and when used in a generic algorithm which just has the simple loop of -DNEW it ought to be optimised like -DOLD if inlining occurs. -- Tristan Wibberley Any opinion expressed is mine (or else I'm playing devils advocate for the sake of a good argument). My employer had nothing to do with this communication.