------- Comment #3 from ubizjak at gmail dot com 2007-08-28 12:02 ------- Current mainline [GCC: (GNU) 4.3.0 20070828] generates:
test: .LFB2: xorl %eax, %eax xorl %edx, %edx .align 16 .L2: addb table(%rdx), %al addq $1, %rdx cmpq $10, %rdx jne .L2 movsbl %al,%eax ret Now, there is an optimization problem in the intialization code in loop header. To omit one iteration, it should start with: .LFB2: movzbl table(%rip), %ecx movl $1, %edx .L2: ... BTW: By reversing the loop: for (i = 9; i; i--) val += table[i]; we could remove comparison from the loop, but instead we produce (x86_64): test: .LFB2: xorl %eax, %eax xorl %edx, %edx .align 16 .L2: addb table+9(%rdx), %al subq $1, %rdx cmpq $-9, %rdx jne .L2 movsbl %al,%eax ret However, using -m32 we get: test: xorl %eax, %eax movl $9, %edx .align 16 .L2: addb table(%edx), %al subl $1, %edx jne .L2 movsbl %al,%eax ret In reversed-loop case, we could generate: test: movl $9, %edx movzbl table(%edx), %eax .align 16 .L2: addb table(%edx), %al subl $1, %edx jne .L2 movsbl %al,%eax ret -- ubizjak at gmail dot com changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|char adding gives an extra |char adding (in loops) gives |move or two |an extra move or two http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31248