------- Comment #6 from kretz at kde dot org 2010-06-16 21:21 ------- (In reply to comment #4) > You can also cast 128bit to 256bit with upper 128bit undefined. If you cast from xmm to ymm after a 128bit instruction coded with VEX prefix then the upper 128bit are actually guaranteed to be zero. If the SSE instruction does not use the VEX prefix then the upper 128 bits are not modified. Thus there is never really an undefined state. That might be useful information for other optimizations?
> If I use union, it will always generate 2 moves via memory. Yes, I noticed that unions are not a good choice for performance critical code. It results in way more memory moves than necessary. BTW ICC also generates memory moves when implementing the testcase with unions. PS: Thanks a lot for looking into this! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44551