(Sources are from CVS as of about 6AM US/Eastern time today.)
I'm testing out how well gcc optimizes some code for reversing bit strings. It appears that on x86 at least, double-word shifts followed by masks that zero out all the bits that crossed the word boundary are not optimized as well as they could be. In the included file, compiled with "-O9 -fomit-frame-pointer", functions rt and rt2 both result in assembly code including a double-word shift, bringing two bits from the upper half of the argument into the top of the lower half of the double-word value, then masks that word with 0x33333333, which zeros out those bits: rt: movl 8(%esp), %edx movl 4(%esp), %eax shrdl $2, %edx, %eax shrl $2, %edx andl $858993459, %eax andl $858993459, %edx ret Okay, in this case, the only optimization would be to make the shift not reference both %edx and %eax, and drop the reference to the upper half flom the RTL during optimization. To highlight the issue a little more, rt4 is like rt but only returns the lower half. Still, the upper half is read in from memory (and shifted!) needlessly: rt4: movl 8(%esp), %edx movl 4(%esp), %eax shrdl $2, %edx, %eax andl $858993459, %eax shrl $2, %edx ret Function left shows the same problem, shifting in the opposite direction: left: movl 4(%esp), %eax movl 8(%esp), %edx shldl $2, %eax, %edx sall $2, %eax andl $-858993460, %edx andl $-858993460, %eax ret The "andl" of %edx with 0xcccccccc will clobber the bits brought in from %eax. I haven't got the hang of reading ppc assembly yet, but I think the Mac OS X compiler (10.4.2 = "gcc version 4.0.0 (Apple Computer, Inc. build 5026)") is missing similar optimizations. I haven't tried the cvs code on ppc. Environment: System: Linux kal-el 2.4.17 #4 SMP Sun Apr 6 16:25:37 EDT 2003 i686 GNU/Linux Architecture: i686 host: i686-pc-linux-gnu build: i686-pc-linux-gnu target: i686-pc-linux-gnu configured with: ../src/configure --enable-maintainer-mode --prefix=/u3/raeburn/gcc/linux/Install --enable-languages=c,c++,java,objc --no-create --no-recursion : (reconfigured) ../src/configure --prefix=/u3/raeburn/gcc/linux/Install How-To-Repeat: typedef unsigned long long uint64_t; typedef unsigned long uint32_t; uint64_t rt (uint64_t n) { return (n >> 2) & 0x3333333333333333ULL; } uint64_t rt2 (uint64_t n) { return (n & (0x3333333333333333ULL << 2)) >> 2; } uint32_t rt4 (uint64_t n) { return (n >> 2) & 0x33333333; } uint64_t left(uint64_t n) { return (n << 2) & (0xFFFFFFFFFFFFFFFFULL & ~0x3333333333333333ULL); } -- Summary: missed 64-bit shift+mask optimizations on 32-bit arch Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: raeburn at raeburn dot org CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23810