http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49865
Summary: Unneccessary reload causes small size regression from 4.6.1 Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: sgunder...@bigfoot.com Target: i?86-*-* Comparing 4.6.1 with gcc-snapshot from Debian: gcc version 4.7.0 20110709 (experimental) [trunk revision 176106] (Debian 20110709-1) Given this code: fugl:~> cat test.cpp #include <string.h> class MyClass { void func(); float f[1024]; int i; }; void MyClass::func() { memset(f, 0, sizeof(f)); i = 0; } and compiling with fugl:~> /usr/lib/gcc-snapshot/bin/g++ -Os -c test.cpp g++ produces, according to objdump: 00000000 <_ZN7MyClass4funcEv>: 0: 55 push %ebp 1: 31 c0 xor %eax,%eax 3: 89 e5 mov %esp,%ebp 5: b9 00 04 00 00 mov $0x400,%ecx a: 57 push %edi b: 8b 7d 08 mov 0x8(%ebp),%edi e: f3 ab rep stos %eax,%es:(%edi) 10: 8b 45 08 mov 0x8(%ebp),%eax 13: c7 80 00 10 00 00 00 movl $0x0,0x1000(%eax) 1a: 00 00 00 1d: 5f pop %edi 1e: 5d pop %ebp 1f: c3 ret while 4.6.1 has a more efficient sequence: 00000000 <_ZN7MyClass4funcEv>: 0: 55 push %ebp 1: b9 00 04 00 00 mov $0x400,%ecx 6: 89 e5 mov %esp,%ebp 8: 31 c0 xor %eax,%eax a: 8b 55 08 mov 0x8(%ebp),%edx d: 57 push %edi e: 89 d7 mov %edx,%edi 10: f3 ab rep stos %eax,%es:(%edi) 12: c7 82 00 10 00 00 00 movl $0x0,0x1000(%edx) 19: 00 00 00 1c: 5f pop %edi 1d: 5d pop %ebp 1e: c3 ret It seems 4.6 is able to take a copy of the "this" pointer from a register before the "rep stos" operation, which is one byte smaller than reloading it from the stack when it needs to clear "i". Of course, the _most_ efficient code sequence here would be doing the i = 0 before the memset, but I'm not sure if this is legal. However, eax should still contain zero, so the mov could be done from eax instead of from a constant.