https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91569

            Bug ID: 91569
           Summary: Optimisation test case and unnecessary XOR-OR pair
                    instead of MOV.
           Product: gcc
           Version: 9.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: cubitect at gmail dot com
  Target Milestone: ---

I wasn't entirely sure where to post this, but I have a very simple test 
problem that shows some missed optimisation potential. The task is to cast 
an integer to a long and replace the second lowest byte of the result with 
a constant (4). Below are three ways to achieve this:


long opt_test1(int num)             //  opt_test1:
{                                   //      movslq  %edi, %rax
    union {                         //      mmovb   $4, %ah
        long q;                     //      ret
        struct { char l,h; };
    } a;
    a.q = num;
    a.h = 4;
    return a.q;
}

The union here is modelled after the structure of a r?x register which 
contains the low and high byte registers: ?l and ?h. The cast and second 
byte assignment can be done in one instruction each. The optimiser manages 
to understand this and gives the optimal instructions.


long opt_test2(int num)             //  opt_test2:
{                                   //      movl    %edi, %eax
    long a = num;                   //      xor     %ah, %ah
    a &= (-1UL ^ 0xff00);           //      orb     $4, %ah
    a |= (4 << 8);                  //      cltq
    return a;                       //      ret
}

This solution, based on a bitwise AND and OR, is interesting. The optimiser 
recognised that I am interested in the second byte and makes use of the 'ah' 
register, but why is there a XOR and an OR rather than an a single, 
equivalent MOV? Similarly the (MOV + CLTQ) can be replaced outright with 
MOVSLQ. Notable here is that some older versions (such as "gcc-4.8.5 -O3") 
give results that correspond more to the C code:
    andl    $-65281, %edi
    orl     $1024, %edi
    movslq  %edi, %rax
    ret
which is actually better than the output for gcc-9.2.


long opt_test3(int num)             //  opt_test3:
{                                   //      movslq  %edi, %rdi
    long a = num;                   //      movq    %rdi, -8(%rsp)
    ((char*)&a)[1] = 4;             //      movb    $4, -7(%rsp)
    return a;                       //      movq    -8(%rsp), %rax
}                                   //      ret

This is the straightforwards approach, addressing the second byte in memory.
I am including this because LLVM manages to recognise that the stack is not 
actually necessary and goes for a register based solution.

As far as I could tell, these results seem quite consistent across most GCC 
versions and across all optimisation levels above -O0. However, I obtained 
the assembly code above using:

$ gcc-9.2 opt_tests.c -S -O3 -Wall -Wextra -pedantic

Reply via email to