Hi

I've found a case which looks like it should be possible to optimise but
gcc (very recent trunk) isn't doing which could give improvements in
many cases - certainly in a case I've come across:

        #ifdef NEW
        unsigned int fn(unsigned int n, unsigned int dmax) throw()
        {
          for (unsigned int d = 0; d < dmax; ++d) {
            n += d?d:1;
          }
          return n;
        }
        #else
        unsigned int fn(unsigned int n, unsigned int dmax) throw()
        {
          unsigned int add = 1;
          for (unsigned int d = 0; d < dmax; add = ++d) {
            n += add;
          }
          return n;
        }
        #endif

When compiled with -O3 -DOLD I get:

                .p2align 4,,15
        .globl _Z2fnjj
                .type   _Z2fnjj, @function
        _Z2fnjj:
        .LFB2:
                testl   %esi, %esi
                je      .L2
                movl    $1, %edx
                xorl    %eax, %eax
                .p2align 4,,10
                .p2align 3
        .L3:
                addl    $1, %eax
                addl    %edx, %edi
                cmpl    %esi, %eax
                movl    %eax, %edx
                jne     .L3
        .L2:
                movl    %edi, %eax
                ret
        .LFE2:
                .size   _Z2fnjj, .-_Z2fnjj
        
but with -DNEW I get:

                .p2align 4,,15
        .globl _Z2fnjj
                .type   _Z2fnjj, @function
        _Z2fnjj:
        .LFB2:
                testl   %esi, %esi
                je      .L2
                movl    $1, %edx
                xorl    %eax, %eax
                movl    $1, %ecx
                jmp     .L7
                .p2align 4,,10
                .p2align 3
        .L5:
                testl   %eax, %eax
                movl    %ecx, %edx
                cmovne  %eax, %edx
        .L7:
                addl    $1, %eax
                addl    %edx, %edi
                cmpl    %esi, %eax
                jne     .L5
        .L2:
                movl    %edi, %eax
                ret
        .LFE2:
                .size   _Z2fnjj, .-_Z2fnjj

The performance difference is about 50% with -DNEW taking 1.5 times as
long as -DOLD (that was with dmax == 1000000000).

The loop unfortunately can't always be written as in -DOLD as the
implementation of an iterator adapter might use ?: to special case the
first element of a sequence and when used in a generic algorithm which
just has the simple loop of -DNEW it ought to be optimised like -DOLD if
inlining occurs.

-- 
Tristan Wibberley

Any opinion expressed is mine (or else I'm playing devils advocate for
the sake of a good argument). My employer had nothing to do with this
communication.


Reply via email to