I have enountered a strange difference in behavior with gcc optimizations. I am testing this with gcc-4.5.0 on a x86_64 target. Given the following code which defines to template functions. The goal is to have a "find_first" and "find_last" for a bitset. It will return the size of the bitset if no bits are set, otherwise it should return the 0 based index of the first/last set bit. There is a generic version that loops, and a specialized version which uses builtins to avoid loops:
#include <bitset> #include <iostream> #include <string> namespace util { template <std::size_t N> int find_last(const std::bitset<N> &bs) { if(bs.none()) { return N; } int i = bs.size(); while(i >= 0) { if(bs[i]) { break; } --i; } return i; } template <std::size_t N> int find_first(const std::bitset<N> &bs) { if(bs.none()) { return N; } int i; for(i = 0; i < bs.size(); ++i) { if(bs[i]) { break; } } return i; } // optimized versions for 32-bit bitsets template <> int find_first(const std::bitset<32> &bs) { const int x = __builtin_ffsl(bs.to_ulong()); return x ? x - 1 : 32; } template <> int find_last(const std::bitset<32> &bs) { const int x = __builtin_clz(bs.to_ulong()); return (x != 31) ? 32 - x - 1 : 32; } } int main() { std::bitset<32> bs(0); std::cout << util::find_last(bs) << std::endl; std::cout << util::find_first(bs) << std::endl; std::cout << bs.to_string<char, std::char_traits<char>, std::allocator<char> >() << std::endl; } I receive the following output from this command: $ g++ -W -Wall -ansi -pedantic -O0 test.cpp -o test && ./test 32 32 00000000000000000000000000000000 but if i enable optimizations, i get this instead: $ g++ -W -Wall -ansi -pedantic -O1 test.cpp -o test && ./test -1217882104 32 00000000000000000000000000000000 here's where it get's interesting... If i put some code in the specialized version of find_last, it goes back to behaving as expected: template <> int find_last(const std::bitset<32> &bs) { const int x = __builtin_clz(bs.to_ulong()); std::cout << "\n"; return (x != 31) ? 32 - x - 1 : 32; } $ g++ -W -Wall -ansi -pedantic -O1 test.cpp -o test && ./test 32 32 00000000000000000000000000000000 I am having a hard time figuring out which optimization flag is causing this. Inspecing with -Q shows me that the following flags are enabled by -O1 (-fcprop-registers -fdefer-pop -fforward-propagate -fguess-branch-probability -fif-conversion -fif-conversion2 -fipa-pure-const -fipa-reference -fmerge-constants -fomit-frame-pointer -fsplit-wide-types -ftree-ccp -ftree-ch -ftree-copy-prop -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-fre -ftree-sink -ftree-sra -ftree-ter) yet manually enabling all of these, doesn't cause the behavior. -- Summary: __builtin_clz is behaving differently with different optimization levels Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: eteran at alum dot rit dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44330