[Bug tree-optimization/40210] gcc byte swap builtins inadequately optimized

eric-bugs at omnifarious dot org Wed, 20 May 2009 13:22:31 -0700


------- Comment #7 from eric-bugs at omnifarious dot org  2009-05-20 20:22 
-------
I've been playing around a bit more, and I've noticed that gcc in general does
not do a spectacular job of optimizing bitwise operations of any kind.


Some kind of general framework for tracking the movements of individual bits
and details like "16 bit values only have 16 bits, so using & to ensure this in
various ways is a null operation." might actually do a lot to speed up a lot of
code.

I distinctly remember a time long past when I and a co-worker fiddled some
complex bit operations this way and that to get the assembly out we knew was
close to optimal for a tight inner loop.  The resulting expression was
significantly less clear than the most obvious way of stating the same thing
and I also knew that if DEC changed their compiler in certain ways we'd have to
do it all over again.

As an example, there is no reason that:

(x << 8) | (x >> 8) should result in better code than ((x & 0xffu) << 8) | ((x
& 0xff00u) >> 8) when x is of type uint16_t, but it does.  And recognizing that
either can be done in one instruction on an x86 would be even better.

So, while I think you are likely correct that the byteswap builtins do not need
a lot of extensive optimization, I do think that bit operations in general
could be handled a lot better, and that would help out a whole lot of code. 
Once that framework was in place optimizing the byteswap builtin would be
trivial.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40210

[Bug tree-optimization/40210] gcc byte swap builtins inadequately optimized

Reply via email to