[Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA

ysrumyan at gmail dot com Thu, 15 Nov 2012 07:25:41 -0800


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342




             Bug #: 55342

           Summary: [LRA,x86] Non-optimal code for simple loop with LRA

    Classification: Unclassified

           Product: gcc

           Version: 4.8.0

            Status: UNCONFIRMED

          Severity: normal

          Priority: P3

         Component: rtl-optimization

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: ysrum...@gmail.com

            Target: x86





For a simple test-case we got -15% regression with LRA on x86 in 32-bit mode.

The test-case is



#define byte unsigned char

#define MIN(a, b) ((a) > (b)?(b):(a))



void convert_image(byte *in, byte *out, int size) {

    int i;

    byte * read = in,

     * write = out;

    for(i = 0; i < size; i++) {

        byte r = *read++;

        byte g = *read++;

        byte b = *read++;

        byte c, m, y, k, tmp;

        c = 255 - r;

        m = 255 - g;

        y = 255 - b;

    if (c < m)

      k = MIN (c, y);

    else

          k = MIN (m, y);

        *write++ = c - k;

        *write++ = m - k;

        *write++ = y - k;

        *write++ = k;

    }

}



The essential part of assembly is (it is correspondent to write-part of loop): 



without LRA

.L4:

    movl    %esi, %ecx

    addl    $4, %eax

    subl    %ecx, %ebx

    movzbl    3(%esp), %ecx

    movb    %bl, -4(%eax)

    movl    %esi, %ebx

    subl    %ebx, %edx

    movb    %dl, -2(%eax)

    subl    %ebx, %ecx

    movb    %cl, -3(%eax)

    cmpl    %ebp, 4(%esp)

    movb    %bl, -1(%eax)

    je    .L1



with LRA



.L4:

    movl    %esi, %eax

    subl    %eax, %ebx

    movl    28(%esp), %eax

    movb    %bl, (%eax)

    movl    %esi, %eax

    subl    %eax, %ecx

    movl    28(%esp), %eax

    movb    %cl, 1(%eax)

    movl    %esi, %eax

    subl    %eax, %edx

    movl    28(%esp), %eax

    movb    %dl, 2(%eax)

    addl    $4, %eax

    movl    %eax, 28(%esp)

    movl    28(%esp), %ecx

    movl    %esi, %eax

    cmpl    %ebp, (%esp)

    movb    %al, -1(%ecx)

    je    .L1



I also wonder why additional moves are required to perform subtraction:



    movl  %esi, %eax

    subl  %eax, %ebx



whereas only one instruction is required:

    subl  %esi, %ebx.



I assume that this part is not related to LRA.

[Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA

Reply via email to