On Feb 19, 2005, at 8:21 AM, Prakash Punnoor wrote:
Is this a known issue with gcc-3.4.3? I compiled the code using -O2 -march=athlon-xp -g3. If you want a smaller test case, I could try to do so. Right now I just didn't want to waste my time in case this is a know issue or I did something stupid...
Yes the builtins are known to be a little stupid in 3.4.x. Could you try a snapshot of 4.0.0?
So I tried with today's gcc4.0snapshot:
- union version is nearly as fast (or a little bit faster) as gcc3.4.3 - vector version is about 3% faster than above instead of 10% slower - wow!
But:
- vector version using mmintrin.h (intel style intrinsics) is ~3% slower than above union version - still better than what gcc3.4.3 produces, but worse than expected
So why is gcc 4.0 producing worse code when using intel style intrinsics and why isn't the union version using builtins as fast as using the vector version?
Cheers -- Prakash Punnoor
formerly known as Prakash K. Cheemplavam
signature.asc
Description: OpenPGP digital signature