------- Comment #3 from zackw at panix dot com 2006-07-12 22:42 ------- I should mention that the exact command line flags were -O2 -fomit-frame-pointer -march=pentium4, and that I hand-tweaked the label numbers for ease of reading.
Also, -fno-tree-ch does suppress this bad optimization, but in exchange we get mildly worse code from the loop optimizer proper - it uses [reg+reg] indexing and a 0..n count instead of [reg] indexing and a base..limit count. The code is pretty short so I'll just paste it here (meaningless labels removed): _Z17has_bad_chars_newRKSs: pushl %ebx movl 8(%esp), %eax movl (%eax), %eax xorl %ecx, %ecx movl -12(%eax), %ebx .L2: cmpl %ecx, %ebx je .L10 movzbl (%ecx,%eax), %edx cmpb $31, %dl jbe .L4 cmpb $92, %dl je .L4 addl $1, %ecx cmpb $127, %dl jne .L2 .L4: movl $1, %eax popl %ebx .p2align 4,,2 ret .L10: xorl %eax, %eax popl %ebx .p2align 4,,2 ret Looking at the code, I see that the entire purpose of tree-ch is to duplicate loop bodies in this fashion, and the justification given is that it "increases effectiveness of code motion and reduces the need for loop preconditioning", which I take to cover the above degradation in addressing mode choice. I'm not an optimizer expert, but surely there is a way to get the best of both worlds here...? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364