http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57534
Bug ID: 57534 Summary: Performance regression versus 4.7.3, 4.8.1 is ~15% slower Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: major Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ncahill_alt at yahoo dot com Created attachment 30261 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30261&action=edit Reduced source code - timing functions With x86 GCC 4.8.1, slower code is produced (than with 4.7.3) for a particular benchmark I ran, about 15% slower. Whatever is wrong must be happening here: 80486e5: d9 ee fldz 80486e7: d9 c0 fld %st(0) 80486e9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi 80486f0: 8d 04 f5 00 00 00 00 lea 0x0(,%esi,8),%eax 80486f7: dd 04 f3 fldl (%ebx,%esi,8) 80486fa: dc 44 03 08 faddl 0x8(%ebx,%eax,1) 80486fe: dc 44 03 10 faddl 0x10(%ebx,%eax,1) 8048702: dc 44 03 18 faddl 0x18(%ebx,%eax,1) 8048706: de c2 faddp %st,%st(2) 8048708: dd 44 03 20 fldl 0x20(%ebx,%eax,1) 804870c: dc 44 03 28 faddl 0x28(%ebx,%eax,1) 8048710: dc 44 03 30 faddl 0x30(%ebx,%eax,1) 8048714: dc 44 03 38 faddl 0x38(%ebx,%eax,1) 8048718: 8d 46 08 lea 0x8(%esi),%eax 804871b: 39 c7 cmp %eax,%edi 804871d: de c1 faddp %st,%st(1) 804871f: 7f 0e jg 804872f 8048721: a1 34 91 04 08 mov 0x8049134,%eax 8048726: 85 c0 test %eax,%eax 8048728: 74 0e je 8048738 804872a: 83 c5 01 add $0x1,%ebp 804872d: 31 c0 xor %eax,%eax 804872f: 89 c6 mov %eax,%esi 8048731: eb bd jmp 80486f0 8048733: 90 nop 8048734: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi 8048738: dd 5c 24 10 fstpl 0x10(%esp) 804873c: 83 c6 10 add $0x10,%esi 804873f: dd 5c 24 08 fstpl 0x8(%esp) This is the commandline: gcc -O2 reduceme.c timer.o -o cachebench This is from a benchmark (llcbench, GPL software) and uses timers which may be a problem, if I preprocess them, they may not work. I'll attach the main code (reduced) for now, and I'll work on getting the timing code included very soon. I'll also test with 4.8.0 to see whether that version is also affected. Attached is the reduced code minus the timing functions. Uncommenting the commented line in the source code removes the bug. Thanks. Neil.