[Bug rtl-optimization/50088] New: movzbl is generated instead of movl

enkovich.gnu at gmail dot com Mon, 15 Aug 2011 06:02:58 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50088


             Bug #: 50088
           Summary: movzbl is generated instead of movl
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: enkovich....@gmail.com


Created attachment 25016
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25016
Reproducer

When spilled register is going to be used in subreg expression then short load
is generated to fill register.

Example:
    movl  %edx, 0x34(%esp)
    jz 0x1498 <Block 54>
  Block 34:
    movzxb  0x34(%esp), %ecx
    shl %cl, %eax

It is correct but may cause performance problems. I doubt there are situations
when zero extended load is better than natural one.

On Atom processors (and probably some others) such situations cause stalls
because store forwarding does not work for store/load pair using different
access sizes.

For example EEMBC 2.0/huffde has ~6% performance improvement on Atom if we
replace such movzbl with movl.

Attached reproducer demonstrates fills performed via movzbl.
Used compiler and options:

Target: x86_64-unknown-linux-gnu
Configured with: ../gcc1/configure --prefix=/export/users/gcc-perf/install
--enable-languages=c,c++,fortran
Thread model: posix
gcc version 4.7.0 20110615 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-O2' '-m32' '-S' '-v' '-mtune=generic' '-march=x86-64'
 /export/users/gcc-perf/install/libexec/gcc/x86_64-unknown-linux-gnu/4.7.0/cc1
-quiet -v -imultilib 32 test_movzbl.c -quiet -dumpbase test_movzbl.c -m32
-mtune=generic -march=x86-64 -auxbase test_movzbl -O2 -version -o test_movzbl.s
GNU C (GCC) version 4.7.0 20110615 (experimental) (x86_64-unknown-linux-gnu)
        compiled by GNU C version 4.4.3, GMP version 4.3.1, MPFR version 2.4.2,
MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096

[Bug rtl-optimization/50088] New: movzbl is generated instead of movl

Reply via email to