I have enountered a strange difference in behavior with gcc optimizations. I am
testing this with gcc-4.5.0 on a x86_64 target. Given the following code which
defines to template functions. The goal is to have a "find_first" and
"find_last" for a bitset. It will return the size of the bitset if no bits are
set, otherwise it should return the 0 based index of the first/last set bit.
There is a generic version that loops, and a specialized version which uses
builtins to avoid loops:

#include <bitset>
#include <iostream>
#include <string>

namespace util {
        template <std::size_t N>
        int find_last(const std::bitset<N> &bs) {
                if(bs.none()) {
                        return N;
                }

                int i = bs.size();
                while(i >= 0) {
                        if(bs[i]) {
                                break;
                        }
                        --i;
                }
                return i;
        }       

        template <std::size_t N>
        int find_first(const std::bitset<N> &bs) {
                if(bs.none()) {
                        return N;
                }

                int i;
                for(i = 0; i < bs.size(); ++i) {
                        if(bs[i]) {
                                break;
                        }       
                }

                return i;
        }

        // optimized versions for 32-bit bitsets
        template <>
        int find_first(const std::bitset<32> &bs) {
                const int x = __builtin_ffsl(bs.to_ulong());
                return x ? x - 1 : 32;
        }

        template <>
        int find_last(const std::bitset<32> &bs) {
                const int x = __builtin_clz(bs.to_ulong());
                return (x != 31) ? 32 - x - 1 : 32;
        }
}

int main() {
        std::bitset<32> bs(0);
        std::cout << util::find_last(bs) << std::endl;
        std::cout << util::find_first(bs) << std::endl;
        std::cout << bs.to_string<char, std::char_traits<char>,
std::allocator<char> >() << std::endl;
}

I receive the following output from this command:

$ g++ -W -Wall -ansi -pedantic -O0 test.cpp -o test && ./test
32
32
00000000000000000000000000000000

but if i enable optimizations, i get this instead:

$ g++ -W -Wall -ansi -pedantic -O1 test.cpp -o test && ./test
-1217882104
32
00000000000000000000000000000000

here's where it get's interesting... If i put some code in the specialized
version of find_last, it goes back to behaving as expected:

        template <>
        int find_last(const std::bitset<32> &bs) {
                const int x = __builtin_clz(bs.to_ulong());
                std::cout << "\n";
                return (x != 31) ? 32 - x - 1 : 32;
        }

$ g++ -W -Wall -ansi -pedantic -O1 test.cpp -o test && ./test

32
32
00000000000000000000000000000000

I am having a hard time figuring out which optimization flag is causing this.
Inspecing with -Q shows me that the following flags are enabled by -O1
(-fcprop-registers -fdefer-pop -fforward-propagate -fguess-branch-probability
-fif-conversion -fif-conversion2 -fipa-pure-const -fipa-reference
-fmerge-constants -fomit-frame-pointer -fsplit-wide-types -ftree-ccp -ftree-ch
-ftree-copy-prop -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse
-ftree-fre -ftree-sink -ftree-sra -ftree-ter) yet manually enabling all of
these, doesn't cause the behavior.


-- 
           Summary: __builtin_clz is behaving differently with different
                    optimization levels
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: eteran at alum dot rit dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44330

Reply via email to