In the following C source file test.c, int compare(char const * p1, int n1, char const * p2, int n2) { char const * q1 = p1 + n1; char const * q2 = p2 + n2; while (p1 < q1 && p2 < q2) { int n = *--q1 - *--q2; if (n) { return n; } } return n1 - n2; } int main(void) { char str[1000]; int i; for (i = 0; i < 1000000; ++i) { compare(str, 1000, str, 1000); } }
compiled with gcc -O2 test.c the loop body within compare takes 14 instructions on GCC 3.4.3 Linux x86, compared to only 11 instructions on GCC 3.2.2 Linux x86 (see disassemblies below), and the GCC 3.4.3 output takes substantially longer to run than the GCC 3.2.2 output: 3.4.3> time ./a.out 4.690u 0.001s 0:04.71 99.5% 3.2.2> time ./a.out 3.533u 0.002s 0:03.55 99.4% This seems to bite us on OpenOffice.org, which contains an oft-called function similar to compare above, and new versions of OOo (built with GCC 3.4) run slower than old versions (built with GCC 3.2). The question that remains for us is whether this performance loss is specific to the given function, or could be a general problem. The dissasemblies: 08048360 <compare>: ! gcc 3.4.3 8048360: 55 push %ebp 8048361: 89 e5 mov %esp,%ebp 8048363: 57 push %edi 8048364: 56 push %esi 8048365: 53 push %ebx 8048366: 8b 45 0c mov 0xc(%ebp),%eax 8048369: 8b 7d 08 mov 0x8(%ebp),%edi 804836c: 8b 75 10 mov 0x10(%ebp),%esi 804836f: 8d 1c 07 lea (%edi,%eax,1),%ebx 8048372: 8b 45 14 mov 0x14(%ebp),%eax 8048375: 8d 0c 06 lea (%esi,%eax,1),%ecx 8048378: 90 nop 8048379: 8d b4 26 00 00 00 00 lea 0x0(%esi),%esi 8048380: 39 df cmp %ebx,%edi ! <-+ 8048382: 0f 92 c0 setb %al ! | 8048385: 31 d2 xor %edx,%edx ! | 8048387: 39 ce cmp %ecx,%esi ! | 8048389: 0f 92 c2 setb %dl ! | 804838c: 85 d0 test %edx,%eax ! | 804838e: 74 13 je 80483a3 ! | 8048390: 4b dec %ebx ! | 8048391: 49 dec %ecx ! | 8048392: 0f be 01 movsbl (%ecx),%eax ! | 8048395: 0f be 13 movsbl (%ebx),%edx ! | 8048398: 29 c2 sub %eax,%edx ! | 804839a: 89 d0 mov %edx,%eax ! | 804839c: 74 e2 je 8048380 ! --+ 804839e: 5b pop %ebx 804839f: 5e pop %esi 80483a0: 5f pop %edi 80483a1: 5d pop %ebp 80483a2: c3 ret 80483a3: 5b pop %ebx 80483a4: 8b 45 0c mov 0xc(%ebp),%eax 80483a7: 8b 55 14 mov 0x14(%ebp),%edx 80483aa: 5e pop %esi 80483ab: 29 d0 sub %edx,%eax 80483ad: 5f pop %edi 80483ae: 5d pop %ebp 80483af: c3 ret 08048370 <compare>: ! gcc 3.2.2 8048370: 55 push %ebp 8048371: 89 e5 mov %esp,%ebp 8048373: 57 push %edi 8048374: 8b 45 0c mov 0xc(%ebp),%eax 8048377: 56 push %esi 8048378: 8b 7d 08 mov 0x8(%ebp),%edi 804837b: 53 push %ebx 804837c: 8b 75 10 mov 0x10(%ebp),%esi 804837f: 8d 1c 38 lea (%eax,%edi,1),%ebx 8048382: 8b 45 14 mov 0x14(%ebp),%eax 8048385: 39 df cmp %ebx,%edi 8048387: 8d 0c 30 lea (%eax,%esi,1),%ecx 804838a: 73 1a jae 80483a6 804838c: 39 ce cmp %ecx,%esi 804838e: 73 16 jae 80483a6 8048390: 4b dec %ebx ! <-+ 8048391: 49 dec %ecx ! | 8048392: 0f be 01 movsbl (%ecx),%eax ! | 8048395: 0f be 13 movsbl (%ebx),%edx ! | 8048398: 29 c2 sub %eax,%edx ! | 804839a: 89 d0 mov %edx,%eax ! | 804839c: 75 10 jne 80483ae ! | 804839e: 39 df cmp %ebx,%edi ! | 80483a0: 73 04 jae 80483a6 ! | 80483a2: 39 ce cmp %ecx,%esi ! | 80483a4: 72 ea jb 8048390 ! --+ 80483a6: 8b 45 0c mov 0xc(%ebp),%eax 80483a9: 8b 55 14 mov 0x14(%ebp),%edx 80483ac: 29 d0 sub %edx,%eax 80483ae: 5b pop %ebx 80483af: 5e pop %esi 80483b0: 5f pop %edi 80483b1: 5d pop %ebp 80483b2: c3 ret -- Summary: Performance regression in simple loop code Product: gcc Version: 3.4.3 Status: UNCONFIRMED Severity: normal Priority: P2 Component: regression AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: stephan dot bergmann at sun dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19672