Hi Uros and Richard, I was rewriting the Alpha sched_find_first_bit implementation for the Linux Kernel, and in the process I think I've come across a gcc bug.
I rewrote the function using cmov instructions, and wrote a small program to test its correctness and performance. I wrote the function initially as an external .S file, and once I was reasonably sure it was correct, converted it to C function with inline assembly. Compiling both produce the exact same output, as shown. <rewritten>: ldq t0,0(a0) clr t2 ldq t1,8(a0) cmoveq t0,0x40,t2 cmoveq t0,t1,t0 cttz t0,t3 addq t3,t2,v0 ret In my test program, I found that when I executed the rewritten implementation _before_ the reference implementation that it produced bogus results. This only happens when using the C/inline asm function. When compiled with the external .S file, the results are correct. Attached is a tar.gz with my test code. Compile the test program with `gcc -O -mcpu=... find.c rewritten.S test.c -o test` with optional -D__REWRITTEN_INLINE and -D__REWRITTEN_FIRST. At -Os, -O2, or -O3 and -D__REWRITTEN_INLINE and -D__REWRITTEN_FIRST the program will produce incorrect results and assert(). At -O0 or -O1 or without one or both of the -D flags, it will produce correct results. I've tested with gcc-4.3.4 and gcc-4.4.2. Thanks. Let me know what I can do to help further. Matt Turner
sched_find_first_bit.tar.gz
Description: GNU Zip compressed data