(Sources are from CVS as of about 6AM US/Eastern time today.)

I'm testing out how well gcc optimizes some code for reversing bit
strings.  It appears that on x86 at least, double-word shifts followed
by masks that zero out all the bits that crossed the word boundary are
not optimized as well as they could be.

In the included file, compiled with "-O9 -fomit-frame-pointer",
functions rt and rt2 both result in assembly code including a
double-word shift, bringing two bits from the upper half of the
argument into the top of the lower half of the double-word value, then
masks that word with 0x33333333, which zeros out those bits:

    rt:
            movl        8(%esp), %edx
            movl        4(%esp), %eax
            shrdl       $2, %edx, %eax
            shrl        $2, %edx
            andl        $858993459, %eax
            andl        $858993459, %edx
            ret

Okay, in this case, the only optimization would be to make the shift
not reference both %edx and %eax, and drop the reference to the upper
half flom the RTL during optimization.  To highlight the issue a
little more, rt4 is like rt but only returns the lower half.  Still,
the upper half is read in from memory (and shifted!) needlessly:

    rt4:
            movl        8(%esp), %edx
            movl        4(%esp), %eax
            shrdl       $2, %edx, %eax
            andl        $858993459, %eax
            shrl        $2, %edx
            ret

Function left shows the same problem, shifting in the opposite
direction:

    left:
            movl        4(%esp), %eax
            movl        8(%esp), %edx
            shldl       $2, %eax, %edx
            sall        $2, %eax
            andl        $-858993460, %edx
            andl        $-858993460, %eax
            ret

The "andl" of %edx with 0xcccccccc will clobber the bits brought in
from %eax.

I haven't got the hang of reading ppc assembly yet, but I think the
Mac OS X compiler (10.4.2 = "gcc version 4.0.0 (Apple Computer,
Inc. build 5026)") is missing similar optimizations.  I haven't tried
the cvs code on ppc.

Environment:
System: Linux kal-el 2.4.17 #4 SMP Sun Apr 6 16:25:37 EDT 2003 i686 GNU/Linux
Architecture: i686

        
host: i686-pc-linux-gnu
build: i686-pc-linux-gnu
target: i686-pc-linux-gnu
configured with: ../src/configure --enable-maintainer-mode 
--prefix=/u3/raeburn/gcc/linux/Install --enable-languages=c,c++,java,objc 
--no-create --no-recursion : (reconfigured) ../src/configure 
--prefix=/u3/raeburn/gcc/linux/Install

How-To-Repeat:

typedef unsigned long long uint64_t;
typedef unsigned long uint32_t;

uint64_t rt (uint64_t n) { return (n >> 2) & 0x3333333333333333ULL; }
uint64_t rt2 (uint64_t n) { return (n & (0x3333333333333333ULL << 2)) >> 2; }
uint32_t rt4 (uint64_t n) { return (n >> 2) & 0x33333333; }
uint64_t left(uint64_t n) {
  return (n << 2) & (0xFFFFFFFFFFFFFFFFULL & ~0x3333333333333333ULL);
}

-- 
           Summary: missed 64-bit shift+mask optimizations on 32-bit arch
           Product: gcc
           Version: 4.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: raeburn at raeburn dot org
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23810

Reply via email to