------- Comment #3 from zackw at panix dot com  2006-07-12 22:42 -------
I should mention that the exact command line flags were -O2
-fomit-frame-pointer -march=pentium4, and that I hand-tweaked the label numbers
for ease of reading.

Also, -fno-tree-ch does suppress this bad optimization, but in exchange we get
mildly worse code from the loop optimizer proper - it uses [reg+reg] indexing
and a 0..n count instead of [reg] indexing and a base..limit count.  The code
is pretty short so I'll just paste it here (meaningless labels removed):

_Z17has_bad_chars_newRKSs:
        pushl   %ebx
        movl    8(%esp), %eax
        movl    (%eax), %eax
        xorl    %ecx, %ecx
        movl    -12(%eax), %ebx
.L2:
        cmpl    %ecx, %ebx
        je      .L10
        movzbl  (%ecx,%eax), %edx
        cmpb    $31, %dl
        jbe     .L4
        cmpb    $92, %dl
        je      .L4
        addl    $1, %ecx
        cmpb    $127, %dl
        jne     .L2
.L4:
        movl    $1, %eax
        popl    %ebx
        .p2align 4,,2
        ret
.L10:
        xorl    %eax, %eax
        popl    %ebx
        .p2align 4,,2
        ret

Looking at the code, I see that the entire purpose of tree-ch is to duplicate
loop bodies in this fashion, and the justification given is that it "increases
effectiveness of code motion and reduces the need for loop preconditioning",
which I take to cover the above degradation in addressing mode choice.  I'm not
an optimizer expert, but surely there is a way to get the best of both worlds
here...?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364

Reply via email to