http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50088
Bug #: 50088 Summary: movzbl is generated instead of movl Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: enkovich....@gmail.com Created attachment 25016 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25016 Reproducer When spilled register is going to be used in subreg expression then short load is generated to fill register. Example: movl %edx, 0x34(%esp) jz 0x1498 <Block 54> Block 34: movzxb 0x34(%esp), %ecx shl %cl, %eax It is correct but may cause performance problems. I doubt there are situations when zero extended load is better than natural one. On Atom processors (and probably some others) such situations cause stalls because store forwarding does not work for store/load pair using different access sizes. For example EEMBC 2.0/huffde has ~6% performance improvement on Atom if we replace such movzbl with movl. Attached reproducer demonstrates fills performed via movzbl. Used compiler and options: Target: x86_64-unknown-linux-gnu Configured with: ../gcc1/configure --prefix=/export/users/gcc-perf/install --enable-languages=c,c++,fortran Thread model: posix gcc version 4.7.0 20110615 (experimental) (GCC) COLLECT_GCC_OPTIONS='-O2' '-m32' '-S' '-v' '-mtune=generic' '-march=x86-64' /export/users/gcc-perf/install/libexec/gcc/x86_64-unknown-linux-gnu/4.7.0/cc1 -quiet -v -imultilib 32 test_movzbl.c -quiet -dumpbase test_movzbl.c -m32 -mtune=generic -march=x86-64 -auxbase test_movzbl -O2 -version -o test_movzbl.s GNU C (GCC) version 4.7.0 20110615 (experimental) (x86_64-unknown-linux-gnu) compiled by GNU C version 4.4.3, GMP version 4.3.1, MPFR version 2.4.2, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096