http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Bug #: 55342 Summary: [LRA,x86] Non-optimal code for simple loop with LRA Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: ysrum...@gmail.com Target: x86 For a simple test-case we got -15% regression with LRA on x86 in 32-bit mode. The test-case is #define byte unsigned char #define MIN(a, b) ((a) > (b)?(b):(a)) void convert_image(byte *in, byte *out, int size) { int i; byte * read = in, * write = out; for(i = 0; i < size; i++) { byte r = *read++; byte g = *read++; byte b = *read++; byte c, m, y, k, tmp; c = 255 - r; m = 255 - g; y = 255 - b; if (c < m) k = MIN (c, y); else k = MIN (m, y); *write++ = c - k; *write++ = m - k; *write++ = y - k; *write++ = k; } } The essential part of assembly is (it is correspondent to write-part of loop): without LRA .L4: movl %esi, %ecx addl $4, %eax subl %ecx, %ebx movzbl 3(%esp), %ecx movb %bl, -4(%eax) movl %esi, %ebx subl %ebx, %edx movb %dl, -2(%eax) subl %ebx, %ecx movb %cl, -3(%eax) cmpl %ebp, 4(%esp) movb %bl, -1(%eax) je .L1 with LRA .L4: movl %esi, %eax subl %eax, %ebx movl 28(%esp), %eax movb %bl, (%eax) movl %esi, %eax subl %eax, %ecx movl 28(%esp), %eax movb %cl, 1(%eax) movl %esi, %eax subl %eax, %edx movl 28(%esp), %eax movb %dl, 2(%eax) addl $4, %eax movl %eax, 28(%esp) movl 28(%esp), %ecx movl %esi, %eax cmpl %ebp, (%esp) movb %al, -1(%ecx) je .L1 I also wonder why additional moves are required to perform subtraction: movl %esi, %eax subl %eax, %ebx whereas only one instruction is required: subl %esi, %ebx. I assume that this part is not related to LRA.