Re: [Mesa3d-dev] minor u_math.h speedup fun

Matt Turner Sat, 28 Nov 2009 11:25:50 -0800

On Sat, Nov 28, 2009 at 2:13 PM, Yang Zhao <[email protected]> wrote:
> The speed-up is definitely there, but __builtin_popcount() will still
> be drastically faster when architecture-specific optimizations are
> enabled:


I don't think this is the case (except for with SSE4's popcnt
instruction, which your CFLAGS seem to be enabling.)

Even when compiling with the Intel CC, which can undoubtedly can
optimize code for Core 2 better than gcc, fast_bitcount is
significantly faster.

$ icc -O3 -ipo -march=core2 bc.c -o bc
ipo: remark #11001: performing single-file optimizations
ipo: remark #11005: generating object file /tmp/ipo_icce1aegt.o
bc.c(61): (col. 5) remark: LOOP WAS VECTORIZED.
$ ./bc
1 billion of __builtin_popcount(), fast_bitcount(), and naive() (in that order)
__builtin_popcount(): 5.361 seconds
fast_bitcount(): 1.274 seconds
kr_bitcount(): 20.302 seconds
naive(): 34.547 seconds

Matt

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Mesa3d-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Re: [Mesa3d-dev] minor u_math.h speedup fun

Reply via email to