------- Additional Comments From tbptbp at gmail dot com 2005-01-30 13:37 ------- But i had to rewrite the hit_t structure in a way more closer to what's found in the original source to avoid the same useless cloning i noted earlier with gcc. Something like: union float4_t { float f[4]; __m128 v;
operator const __m128() const { return v; } operator __m128() { return v; } }; union int4_t { int i[4]; __m128i v; operator const __m128i() const { return v; } operator __m128i() { return v; } }; struct hit_t { float4_t t,u,v; int4_t id; }; to avoid this: 40125f: andnps %xmm4,%xmm0 401262: orps 0x40(%esp),%xmm0 401267: movaps %xmm0,0x40(%esp) 40126c: movaps %xmm0,0x10(%eax) I really don't get why it's happening; and i'm pretty sure it wasn't there with some previous version of gcc. I'd seriously need a clue. That aside, ICC's ouput is much cleaner and despite the P4 way to compute addresses, faster on my K8. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19680