------- Comment #6 from kretz at kde dot org  2010-06-16 21:21 -------
(In reply to comment #4)
> You can also cast 128bit to 256bit with upper 128bit undefined.
If you cast from xmm to ymm after a 128bit instruction coded with VEX prefix
then the upper 128bit are actually guaranteed to be zero. If the SSE
instruction does not use the VEX prefix then the upper 128 bits are not
modified. Thus there is never really an undefined state. That might be useful
information for other optimizations?

> If I use union, it will always generate 2 moves via memory.
Yes, I noticed that unions are not a good choice for performance critical code.
It results in way more memory moves than necessary. BTW ICC also generates
memory moves when implementing the testcase with unions.

PS: Thanks a lot for looking into this!


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44551

Reply via email to