------- Additional Comments From tbptbp at gmail dot com  2005-01-30 13:37 
-------
But i had to rewrite the hit_t structure in a way more closer to what's found in
the original source to avoid the same useless cloning i noted earlier with gcc.
Something like:
union float4_t {
        float   f[4];
        __m128  v;

        operator const __m128() const   { return v; }
        operator __m128()               { return v; }
};

union int4_t {
        int     i[4];
        __m128i v;
        operator const __m128i()        const   { return v; }
        operator __m128i()                      { return v; }
};

struct hit_t {
        float4_t t,u,v;
        int4_t id;
};
to avoid this:
  40125f:       andnps %xmm4,%xmm0
  401262:       orps   0x40(%esp),%xmm0
  401267:       movaps %xmm0,0x40(%esp)
  40126c:       movaps %xmm0,0x10(%eax)

I really don't get why it's happening; and i'm pretty sure it wasn't there with
some previous version of gcc. I'd seriously need a clue.

That aside, ICC's ouput is much cleaner and despite the P4 way to compute
addresses, faster on my K8.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19680

Reply via email to