http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829
Bug #: 54829 Summary: bad optimization: sub followed by cmp w/ zero (x86 & ARM) Classification: Unclassified Product: gcc Version: 4.7.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: daniel.san...@pobox.com I originally posted this under bug #3507 but have since discovered that it is target-specific and is a separate issue than bug #3507. extern print_gt(void); extern print_lt(void); extern print_eq(void); void cmp_and_branch(long a, long b) { long diff = a - b; if (diff > 0) { print_gt(); } else if (diff < 0) { print_lt(); } else { print_eq(); } } Here, result of the subtraction is directly used in the branch code and nowhere else. However, gcc -O2 -S still generates this output: cmp_and_branch: .LFB0: .cfi_startproc subq %rsi, %rdi cmpq $0, %rdi jg .L5 jne .L6 jmp print_eq .p2align 4,,10 .p2align 3 .L5: jmp print_gt .p2align 4,,10 .p2align 3 .L6: jmp print_lt .cfi_endproc Notice that we're using subq followed by cmpq instead of just cmpq %rsi, %rdi. In another case, where there is a loop and one of the values compared against remains the same, an additional mov instruction is required to prevent the unchanging value's register from being destroyed, so it actually generates two extra instructions in that situation. When built on ARM, we get something similar: cmp_and_branch: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. rsb r1, r1, r0 cmp r1, #0 bgt .L5 bne .L6 b print_eq .L5: b print_gt .L6: b print_lt Note here that we do rsb followed by cmp with zero again. However, on PPC (apinski from freenode compiled this for me), the result is actually correct: subf. 9,4,3 bgt 0,.L5 bne 0,.L6 print_eq Finally, on MIPS (also from apinski): dsubu $4,$4,$5 bgtz $4,$L5 nop