zero (x86 & ARM)

daniel.santos at pobox dot com Fri, 05 Oct 2012 16:16:33 -0700


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829




             Bug #: 54829

           Summary: bad optimization: sub followed by cmp w/ zero (x86 &

                    ARM)

    Classification: Unclassified

           Product: gcc

           Version: 4.7.1

            Status: UNCONFIRMED

          Severity: normal

          Priority: P3

         Component: target

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: daniel.san...@pobox.com





I originally posted this under bug #3507 but have since discovered that it is

target-specific and is a separate issue than bug #3507.



extern print_gt(void);

extern print_lt(void);

extern print_eq(void);



void cmp_and_branch(long a, long b)

{

    long diff = a - b;



    if (diff > 0) {

        print_gt();

    } else if (diff < 0) {

        print_lt();

    } else {

        print_eq();

    }

}



Here, result of the subtraction is directly used in the branch code and nowhere

else.  However, gcc -O2 -S still generates this output:



cmp_and_branch:

.LFB0:

    .cfi_startproc

    subq    %rsi, %rdi

    cmpq    $0, %rdi

    jg    .L5

    jne    .L6

    jmp    print_eq

    .p2align 4,,10

    .p2align 3

.L5:

    jmp    print_gt

    .p2align 4,,10

    .p2align 3

.L6:

    jmp    print_lt

    .cfi_endproc



Notice that we're using subq followed by cmpq instead of just cmpq %rsi, %rdi. 

In another case, where there is a loop and one of the values compared against

remains the same, an additional mov instruction is required to prevent the

unchanging value's register from being destroyed, so it actually generates two

extra instructions in that situation.



When built on ARM, we get something similar:

cmp_and_branch:

    @ args = 0, pretend = 0, frame = 0

    @ frame_needed = 0, uses_anonymous_args = 0

    @ link register save eliminated.

    rsb    r1, r1, r0

    cmp    r1, #0

    bgt    .L5

    bne    .L6

    b    print_eq

.L5:

    b    print_gt

.L6:

    b    print_lt



Note here that we do rsb followed by cmp with zero again.  However, on PPC

(apinski from freenode compiled this for me), the result is actually correct:



subf. 9,4,3

bgt 0,.L5

bne 0,.L6

print_eq



Finally, on MIPS (also from apinski):

dsubu $4,$4,$5

bgtz $4,$L5

nop

[Bug target/54829] New: bad optimization: sub followed by cmp w/ zero (x86 & ARM)

Reply via email to