------- Comment #10 from ubizjak at gmail dot com  2008-09-12 18:03 -------
This is in fact undefined code. When Transform4x4() gets inlined in fun(), you
are accessing pAR[0] (aliased to *pMatrix) as "short" and as __m128i. Since
-fstrict-aliasing (the default) assumes that "short" can't alias __m128i, gcc
reorders stores and loads to the same address at will.

This is the diff between -fstrict-aliasing (t_.s) and -fno-strict-aliasing
(t.s):

--- t.s 2008-09-12 19:27:23.000000000 +0200
+++ t_.s        2008-09-12 19:27:04.000000000 +0200
@@ -68,6 +68,7 @@
        movq    8(%rsp), %rax
        movq    %xmm2, 32(%rdi)
        movq    %xmm5, 64(%rdi)
+       movw    $0, (%rdi)
        movq    %xmm0, 96(%rdi)
        movl    %eax, %esi
        movq    %rax, %rcx
@@ -77,10 +78,9 @@
        shrq    $48, %rdx
        testw   %si, %si
        movq    %rax, (%rdi)
-       movw    $0, (%rdi)
+       movl    $.LC0, %edi
        setne   %sil
        cmpw    $1, %cx
-       movl    $.LC0, %edi
        movzbl  %sil, %esi
        sbbl    $-1, %esi
        cmpw    $1, %dx

You can see that store of 0 to (%rdi) has been moved above store of %rax to the
same address. You should use unions to fix your code.


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |INVALID


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37096

Reply via email to