--- Comment #10 from ubizjak at gmail dot com 2008-09-12 18:03 ---
This is in fact undefined code. When Transform4x4() gets inlined in fun(), you
are accessing pAR[0] (aliased to *pMatrix) as "short" and as __m128i. Since
-fstrict-aliasing (the default) assumes that "short" can't alias _
--- Comment #9 from erik dot moller at cycos dot com 2008-09-12 11:33
---
true, -fno-strict-aliasing makes even -O3 work... I don't know about the
liasing, the example is very simple, can that happen when the SSE2 intrinsics
are involved?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi
--- Comment #8 from ubizjak at gmail dot com 2008-09-11 18:16 ---
Hm, with -O2 -fno-strict-aliasing, it works fine.
Is there an aliasing issue involved?
short Transform4x4(short *pMatrix)
{
__m128i r4, r5;
__m128i r0 = _mm_loadl_epi64((__m128i *)(pMatrix + 0 * 16));
--- Comment #7 from ubizjak at gmail dot com 2008-09-11 17:57 ---
There is a runtime difference between -O1 and -O2:
g++ -O1 pr37096.cpp main.o
./a.out
nz: 3
g++ -O2 pr37096.cpp main.o
./a.out
nz: 3
98
Target: x86_64-unknown-linux-gnu
gcc version 4.4.0 20080911 (experimental) [trunk r