------- Comment #10 from ubizjak at gmail dot com 2008-09-12 18:03 ------- This is in fact undefined code. When Transform4x4() gets inlined in fun(), you are accessing pAR[0] (aliased to *pMatrix) as "short" and as __m128i. Since -fstrict-aliasing (the default) assumes that "short" can't alias __m128i, gcc reorders stores and loads to the same address at will.
This is the diff between -fstrict-aliasing (t_.s) and -fno-strict-aliasing (t.s): --- t.s 2008-09-12 19:27:23.000000000 +0200 +++ t_.s 2008-09-12 19:27:04.000000000 +0200 @@ -68,6 +68,7 @@ movq 8(%rsp), %rax movq %xmm2, 32(%rdi) movq %xmm5, 64(%rdi) + movw $0, (%rdi) movq %xmm0, 96(%rdi) movl %eax, %esi movq %rax, %rcx @@ -77,10 +78,9 @@ shrq $48, %rdx testw %si, %si movq %rax, (%rdi) - movw $0, (%rdi) + movl $.LC0, %edi setne %sil cmpw $1, %cx - movl $.LC0, %edi movzbl %sil, %esi sbbl $-1, %esi cmpw $1, %dx You can see that store of 0 to (%rdi) has been moved above store of %rax to the same address. You should use unions to fix your code. -- ubizjak at gmail dot com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |INVALID http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37096